Group API

Learn how to use our Group API to submit multiple inference requests together, perfect for processing related tasks that need to be tracked as a unit. The Group API supports both chat completions and text completions with up to 50 requests per group.

Group API is available for both /v1/slow/group/chat/completions and /v1/slow/group/completions endpoints.

You should not mix completion and chat-compeltion requests in the same group.

Overview

The Group API provides a streamlined way to submit multiple asynchronous inference requests as a single unit. Unlike the Batch API which requires JSONL file uploads, the Group API accepts requests directly in the request body, making it ideal for:

Small to medium batches: Process up to 50 requests at once
Related tasks: Group related inference requests together
Webhook notifications: Get notified when all requests in a group complete
Simpler integration: No file uploads or JSONL formatting required
Faster implementation: Direct JSON API calls without file management

Group API vs Batch API

Feature	Group API	Batch API
Maximum requests	50	1,000,000
Input format	JSON array in request body	JSONL file upload
File management	Not required	Required
Use case	Small batches, quick implementation	Large-scale processing
Webhook support	Yes	Yes
Completion time	1-72 hours	1-72 hours

Getting Started

1. Submit a Group Request

Submit multiple requests together by sending them as an array in the request body:

const response = await fetch('https://api.inference.net/v1/slow/group/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.INFERENCE_API_KEY}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    requests: [
      {
        model: "meta-llama/llama-3.2-1b-instruct/fp-16",
        messages: [
          { role: "system", content: "You are a helpful assistant." },
          { role: "user", content: "What is the capital of France?" }
        ],
        max_tokens: 100
      },
      {
        model: "meta-llama/llama-3.2-1b-instruct/fp-16",
        messages: [
          { role: "system", content: "You are a helpful assistant." },
          { role: "user", content: "What is the capital of Germany?" }
        ],
        max_tokens: 100
      }
    ],
    webhook_id: "my-webhook-123" // Optional: attach a webhook for notifications
  })
});

const result = await response.json();
console.log(result); // { groupId: "group_abc123", groupSize: 2 }

The response will include a group ID and the number of requests:

{
  "groupId": "group_xY3kL9mN2pQ",
  "groupSize": 2
}

2. Retrieve Group Results

Once your group is processed, retrieve all generation results using the group ID:

const response = await fetch(`https://api.inference.net/v1/slow/group/${groupId}/generations`, {
  headers: {
    'Authorization': `Bearer ${process.env.INFERENCE_API_KEY}`
  }
});

const result = await response.json();
console.log(result.generations); // Array of all completed generations

The response includes all generations in the group:

{
  "generations": [
    {
      "_id": "gen_abc123",
      "state": "Success",
      "request": {
        "model": "meta-llama/llama-3.2-1b-instruct/fp-16",
        "messages": [
          {"role": "system", "content": "You are a helpful assistant."},
          {"role": "user", "content": "What is the capital of France?"}
        ],
        "max_tokens": 100
      },
      "response": {
        "id": "gen_abc123",
        "object": "chat.completion",
        "choices": [
          {
            "message": {
              "role": "assistant",
              "content": "The capital of France is Paris."
            },
            "finish_reason": "stop"
          }
        ],
        "usage": {
          "prompt_tokens": 25,
          "completion_tokens": 8,
          "total_tokens": 33
        }
      }
    },
    {
      "_id": "gen_def456",
      "state": "Success",
      "request": {
        "model": "meta-llama/llama-3.2-1b-instruct/fp-16",
        "messages": [
          {"role": "system", "content": "You are a helpful assistant."},
          {"role": "user", "content": "What is the capital of Germany?"}
        ],
        "max_tokens": 100
      },
      "response": {
        "id": "gen_def456",
        "object": "chat.completion",
        "choices": [
          {
            "message": {
              "role": "assistant",
              "content": "The capital of Germany is Berlin."
            },
            "finish_reason": "stop"
          }
        ],
        "usage": {
          "prompt_tokens": 25,
          "completion_tokens": 8,
          "total_tokens": 33
        }
      }
    }
  ]
}

Using Webhooks

Attach a webhook to receive notifications when your group completes processing:

const response = await fetch('https://api.inference.net/v1/slow/group/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.INFERENCE_API_KEY}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    requests: [...], // Your requests array
    webhook_id: "my-webhook-123" // Your configured webhook ID
  })
});

When all requests in the group complete, your webhook will receive a notification with:

Group ID
Completion status
Summary of successful and failed requests
Custom IDs for each request (if provided)

See our Webhook Documentation for setup instructions.

Text Completions Support

The Group API also supports text completions:

const response = await fetch('https://api.inference.net/v1/slow/group/completions', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.INFERENCE_API_KEY}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    requests: [
      {
        model: "meta-llama/llama-3.2-1b-instruct/fp-16",
        prompt: "The capital of France is",
        max_tokens: 10
      },
      {
        model: "meta-llama/llama-3.2-1b-instruct/fp-16",
        prompt: "The capital of Germany is",
        max_tokens: 10
      }
    ]
  })
});

Limits and Constraints

Maximum requests per group: 50
Request format: Direct JSON (no JSONL files required)
Supported endpoints:
- /v1/slow/group/chat/completions
- /v1/slow/group/completions
Completion time: 24-72 hours
Request expiration: Groups expire after 72 hours if not completed

Best Practices

Group related requests: Use groups for requests that logically belong together (e.g., analyzing multiple documents from the same source).
Use webhooks for notifications: Instead of polling, configure webhooks to be notified when your group completes.
Handle individual failures: Some requests in a group may fail while others succeed. Check each generation’s status.
Stay under limits: Keep groups to 50 requests or less. For larger batches, use the Batch API.

Include metadata: Add custom IDs or metadata to your requests for easier tracking:

{
  "model": "meta-llama/llama-3.2-1b-instruct/fp-16",
  "messages": [...],
  "metadata": {
    "custom_id": "doc_123",
    "type": "summary"
  }
}

Error Handling

The API validates your request structure immediately. Common errors include:

{
  "error": {
    "message": "Invalid request body.",
    "type": "BadRequestError",
    "fields": {
      "_errors": ["Unrecognized key(s) in object: 'webhook_url'"]
    }
  }
}

Ensure you use the correct field names:

✅ webhook_id (correct)
❌ webhook_url (incorrect)
❌ webhook_idd (typo)

When to Use Group API

Choose the Group API when you need:

Quick implementation without file management
To process 50 or fewer related requests
Webhook notifications for a set of requests
Simple JSON-based integration

For larger workloads (50+ requests), consider using the Batch API instead.

Get Started

Workhorse Models

Features

Fine-Tuning

Use Cases

Resources

Overview

Group API vs Batch API

Getting Started

1. Submit a Group Request

2. Retrieve Group Results

Using Webhooks

Text Completions Support

Limits and Constraints

Best Practices

Error Handling

When to Use Group API

Get Started

Workhorse Models

Features

Fine-Tuning

Use Cases

Resources

​Overview

​Group API vs Batch API

​Getting Started

​1. Submit a Group Request

​2. Retrieve Group Results

​Using Webhooks

​Text Completions Support

​Limits and Constraints

​Best Practices

​Error Handling

​When to Use Group API

Overview

Group API vs Batch API

Getting Started

1. Submit a Group Request

2. Retrieve Group Results

Using Webhooks

Text Completions Support

Limits and Constraints

Best Practices

Error Handling

When to Use Group API