Learn how to use our Group API to submit multiple inference requests together, perfect for processing related tasks that need to be tracked as a unit. The Group API supports both chat completions and text completions with up to 50 requests per group.
Group API is available for both /v1/slow/group/chat/completions
and /v1/slow/group/completions
endpoints.
You should not mix completion and chat-compeltion requests in the same group.
Overview
The Group API provides a streamlined way to submit multiple asynchronous inference requests as a single unit. Unlike the Batch API which requires JSONL file uploads, the Group API accepts requests directly in the request body, making it ideal for:
- Small to medium batches: Process up to 50 requests at once
- Related tasks: Group related inference requests together
- Webhook notifications: Get notified when all requests in a group complete
- Simpler integration: No file uploads or JSONL formatting required
- Faster implementation: Direct JSON API calls without file management
Group API vs Batch API
Feature | Group API | Batch API |
---|
Maximum requests | 50 | 1,000,000 |
Input format | JSON array in request body | JSONL file upload |
File management | Not required | Required |
Use case | Small batches, quick implementation | Large-scale processing |
Webhook support | Yes | Yes |
Completion time | 1-72 hours | 1-72 hours |
Getting Started
1. Submit a Group Request
Submit multiple requests together by sending them as an array in the request body:
const response = await fetch('https://api.inference.net/v1/slow/group/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.INFERENCE_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
requests: [
{
model: "meta-llama/llama-3.2-1b-instruct/fp-8",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "What is the capital of France?" }
],
max_tokens: 100
},
{
model: "meta-llama/llama-3.2-1b-instruct/fp-8",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "What is the capital of Germany?" }
],
max_tokens: 100
}
],
webhook_id: "my-webhook-123" // Optional: attach a webhook for notifications
})
});
const result = await response.json();
console.log(result); // { groupId: "group_abc123", groupSize: 2 }
The response will include a group ID and the number of requests:
{
"groupId": "group_xY3kL9mN2pQ",
"groupSize": 2
}
2. Retrieve Group Results
Once your group is processed, retrieve all generation results using the group ID:
const response = await fetch(`https://api.inference.net/v1/slow/group/${groupId}/generations`, {
headers: {
'Authorization': `Bearer ${process.env.INFERENCE_API_KEY}`
}
});
const result = await response.json();
console.log(result.generations); // Array of all completed generations
The response includes all generations in the group:
{
"generations": [
{
"_id": "gen_abc123",
"state": "Success",
"request": {
"model": "meta-llama/llama-3.2-1b-instruct/fp-8",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
"max_tokens": 100
},
"response": {
"id": "gen_abc123",
"object": "chat.completion",
"choices": [
{
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 8,
"total_tokens": 33
}
}
},
{
"_id": "gen_def456",
"state": "Success",
"request": {
"model": "meta-llama/llama-3.2-1b-instruct/fp-8",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of Germany?"}
],
"max_tokens": 100
},
"response": {
"id": "gen_def456",
"object": "chat.completion",
"choices": [
{
"message": {
"role": "assistant",
"content": "The capital of Germany is Berlin."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 8,
"total_tokens": 33
}
}
}
]
}
Using Webhooks
Attach a webhook to receive notifications when your group completes processing:
const response = await fetch('https://api.inference.net/v1/slow/group/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.INFERENCE_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
requests: [...], // Your requests array
webhook_id: "my-webhook-123" // Your configured webhook ID
})
});
When all requests in the group complete, your webhook will receive a notification with:
- Group ID
- Completion status
- Summary of successful and failed requests
- Custom IDs for each request (if provided)
See our Webhook Documentation for setup instructions.
Text Completions Support
The Group API also supports text completions:
const response = await fetch('https://api.inference.net/v1/slow/group/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.INFERENCE_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
requests: [
{
model: "meta-llama/llama-3.2-1b-instruct/fp-8",
prompt: "The capital of France is",
max_tokens: 10
},
{
model: "meta-llama/llama-3.2-1b-instruct/fp-8",
prompt: "The capital of Germany is",
max_tokens: 10
}
]
})
});
Limits and Constraints
- Maximum requests per group: 50
- Request format: Direct JSON (no JSONL files required)
- Supported endpoints:
/v1/slow/group/chat/completions
/v1/slow/group/completions
- Completion time: 24-72 hours
- Request expiration: Groups expire after 72 hours if not completed
Best Practices
-
Group related requests: Use groups for requests that logically belong together (e.g., analyzing multiple documents from the same source).
-
Use webhooks for notifications: Instead of polling, configure webhooks to be notified when your group completes.
-
Handle individual failures: Some requests in a group may fail while others succeed. Check each generation’s status.
-
Stay under limits: Keep groups to 50 requests or less. For larger batches, use the Batch API.
-
Include metadata: Add custom IDs or metadata to your requests for easier tracking:
{
"model": "meta-llama/llama-3.2-1b-instruct/fp-8",
"messages": [...],
"metadata": {
"custom_id": "doc_123",
"type": "summary"
}
}
Error Handling
The API validates your request structure immediately. Common errors include:
{
"error": {
"message": "Invalid request body.",
"type": "BadRequestError",
"fields": {
"_errors": ["Unrecognized key(s) in object: 'webhook_url'"]
}
}
}
Ensure you use the correct field names:
- ✅
webhook_id
(correct)
- ❌
webhook_url
(incorrect)
- ❌
webhook_idd
(typo)
When to Use Group API
Choose the Group API when you need:
- Quick implementation without file management
- To process 50 or fewer related requests
- Webhook notifications for a set of requests
- Simple JSON-based integration
For larger workloads (50+ requests), consider using the Batch API instead.
Responses are generated using AI and may contain mistakes.