Introduction

Welcome to the Inference.net documentation. Inference.net makes it easy to access the leading open source AI models with only a few lines of code. Our mission is to build the best AI-native platform for developers to build their own AI applications. We currently offer:

Serverless LLM Inference - Use the API to access the top open source language models like Llama-3.1-8B. Pay only for the tokens you use. View the list of available models here.
LoRA Inference (Early access) - Upload LoRA adapters and access them via streaming or batch endpoints.
Image Generation (Early access) - Generate images with models like FLUX[DEV] and Stable Diffusion. Pay per generation.

Getting Started

Get up and running with Inference.net APIs.

Quickstart

Get up and running with Inference.net using the OpenAI SDK

Batch Processing

Process multiple asynchronous requests in a single API call and retrieve results

Resources

Learn more about Inference.net APIs and how to use them.

View Models

Explore the models available on Inference.net

Rate Limits

Learn about rate limits and how to manage them

FAQs

Find answers to common questions about Inference.net

Get Started

Workhorse Models

Features

Fine-Tuning

Use Cases

Resources

Getting Started

Quickstart

Batch Processing

Resources

View Models

Rate Limits

FAQs

Get Started

Workhorse Models

Features

Fine-Tuning

Use Cases

Resources

​Getting Started

Quickstart

Batch Processing

​Resources

View Models

Rate Limits

FAQs

Getting Started

Resources