Webhook support is only available for /chat/completions calls, support for /completions will come later.

Overview

Webhooks provide an efficient push-based notification system for tracking generation completions in real-time. Rather than repeatedly polling the API to check generation status, webhooks automatically notify your application when generations complete, enabling streamlined workflows and better resource utilization.

Key Benefits

  • Resource Efficiency: Eliminate unnecessary API calls for status checks
  • Real-time Updates: Receive notifications within milliseconds of generation completion
  • Scalability: Handle thousands of concurrent generations efficiently
  • Improved User Experience: Update your UI instantly when results are ready

Getting Started

Step 1: Create a Webhook Endpoint

Your application needs an HTTPS endpoint capable of receiving POST requests. The endpoint should:

  1. Accept JSON payloads
  2. Respond with HTTP 200 status immediately
  3. Process the webhook data asynchronously

Step 2: Deploy Your Endpoint

Your webhook endpoint must be publicly accessible via HTTPS. For development environments, consider using:

  • ngrok: ngrok http 3000
  • Cloudflare Tunnel: Provides a stable URL
  • localtunnel: lt --port 3000

Step 3: Register Your Webhook

  1. Navigate to the inference.net dashboard
  2. Go to API KeysWebhooks in the sidebar
  3. Click Create Webhook
  4. Enter a descriptive name and your HTTPS endpoint URL
  5. Save your webhook

You’ll receive a webhook identifier (e.g., AhALzdz8S) that you’ll use when creating generations.

Include the webhook identifier in the metadata when creating a generation:

curl -X POST https://api.inference.net/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/llama-3.1-8b-instruct",
    "messages": [
      {"role": "user", "content": "Explain quantum computing"}
    ],
    "metadata": {
      "webhook_id": "AhALzdz8S"
    }
  }'

(The OpenAI SDK also supports this metadata field).

When the generation completes, your webhook endpoint will receive a notification.

Webhook Events

generation.completed

Sent when a generation finishes processing (successfully or with failure):

{
  "event": "generation.completed",
  "timestamp": "2025-01-03T06:46:22.838Z",
  "webhook_id": "AhALzdz8S",
  "generation_id": "XBKcs7F1s2oJ_AHiLMbF4",
  "data": {
    "state": "Success",
    "stateMessage": "Generation successful",
    "request": {
      "messages": [
        {
          "role": "system",
          "content": "You are a helpful assistant."
        },
        {
          "role": "user",
          "content": "Explain quantum computing"
        }
      ],
      "model": "meta-llama/llama-3.1-8b-instruct",
      "stream": false,
      "max_tokens": 100,
      "metadata": {
        "webhook_id": "AhALzdz8S"
      }
    },
    "response": {
      "id": "XBKcs7F1s2oJ_AHiLMbF4",
      "object": "chat.completion",
      "created": 1748933182,
      "model": "meta-llama/llama-3.1-8b-instruct",
      "choices": [{
        "index": 0,
        "message": {
          "role": "assistant",
          "content": "Quantum computing is a revolutionary approach..."
        },
        "finish_reason": "stop"
      }],
      "usage": {
        "prompt_tokens": 28,
        "completion_tokens": 42,
        "total_tokens": 70
      }
    },
    "finishedAt": "2025-01-03T06:46:22.307Z"
  }
}

The response object is compatible with the types exported from the official OpenAI SDKs.

Python

You can use the following type to parse the response:

from openai.types.chat.chat_completion import ChatCompletion

response: ChatCompletion = webhook_payload.response.data.response

Typescript

import type { OpenAI } from "openai";

const response = responseJsonObject as OpenAI.Chat.Completions.ChatCompletion;

Headers

All webhook requests include the following headers:

HeaderDescriptionExample
X-Inference-EventEvent typegeneration.completed
X-Inference-Webhook-IDWebhook identifierAhALzdz8S
X-Inference-Generation-IDGeneration ID (for completion events)XBKcs7F1s2oJ_AHiLMbF4
User-Agentinference.net webhook agentinference.net-Webhook/1.0
Content-TypeAlways application/jsonapplication/json

(Security) Verifying the request source

The X-Inference-Webhook-ID is a good way to verify that the payload you’re recieving is officially coming from our API.

This ID is unique to your webhook, and is completely private to you and your team.

If the ID does not match what you see in the dashboard, your endpoint has most likely been discovered by a malicious actor.

Testing Webhooks

You can test your webhook endpoint from the dashboard:

  1. Navigate to Webhooks in the dashboard
  2. Find your webhook in the list
  3. Click the menu (⋮) and select Test
  4. Check your endpoint logs for the test payload

A successful test will show a green success indicator in the dashboard.

Best Practices

1. Respond Immediately

Your endpoint must respond within 30 seconds. Always return a 200 status immediately and process the webhook asynchronously:

// Correct approach
app.post('/webhook', (req, res) => {
  res.status(200).send('OK');
  processWebhookAsync(req.body);
});

// Incorrect approach - may timeout
app.post('/webhook', async (req, res) => {
  await heavyProcessing(req.body); // Risk of timeout
  res.status(200).send('OK');
});

2. Implement Idempotency

Failed webhooks may be retried. Use the generation_id to ensure you don’t process the same event twice:

const processedGenerations = new Set();

async function processWebhook(payload) {
  if (processedGenerations.has(payload.generation_id)) {
    return; // Already processed
  }

  processedGenerations.add(payload.generation_id);
  // Process the generation
}

3. Validate Webhook Source

Always verify that webhooks originate from inference.net by checking the presence of expected headers:

function validateWebhookSource(headers) {
  const requiredHeaders = [
    'x-inference-webhook-id',
    'x-inference-event'
  ];

  return requiredHeaders.every(header => headers[header]);
}

4. Handle Errors Gracefully

Implement proper error handling to prevent individual failures from affecting your entire system:

async function handleWebhook(payload) {
  try {
    await processWebhook(payload);
  } catch (error) {
    console.error('Webhook processing failed:', error);
    // Log to monitoring service
    // Return 200 to prevent unnecessary retries
  }
}

5. Monitor Webhook Processing

Track key metrics to ensure reliable webhook processing:

  • Webhook receipt rate
  • Processing success/failure rates
  • Average processing time
  • Queue depth (if using queues)

Troubleshooting

Not Receiving Webhooks

  1. Check webhook status: Ensure your webhook is not disabled in the dashboard
  2. Test connectivity: Use the test feature in the dashboard
  3. Verify URL: Confirm your endpoint is publicly accessible via HTTPS
  4. Check logs: Review both your server logs and any reverse proxy logs
  5. Validate metadata: Ensure you’re including the correct webhook_id in generation requests

Webhooks Arriving Late

  • Verify your endpoint responds quickly (< 1 second ideally)
  • Check that you’re not performing heavy processing before responding
  • Monitor your server load and resource usage

Duplicate Webhook Deliveries

  • Implement idempotency using the generation_id
  • Ensure your endpoint always returns 200 OK for successful receipt
  • Check for any errors in your webhook processing that might trigger retries

Frequently Asked Questions

Q: What happens if my endpoint is down?
A: Failed webhook deliveries are retried up to 3 times with exponential backoff. After all retries are exhausted, the delivery is marked as failed.

Q: What’s the webhook timeout?
A: Webhook endpoints must respond within 30 seconds. Timeouts are treated as failures and will trigger retries.

Q: Can I filter which events I receive?
A: Currently, webhooks receive all event types. Event filtering is planned for a future update.

Q: How secure are webhooks?
A: All webhooks are sent over HTTPS. You should validate the webhook source using the provided headers. HMAC signature verification is planned for additional security.

Q: What’s the maximum payload size?
A: Webhook payloads can be up to 10MB, though typical payloads range from 5-50KB.

Q: Can I replay missed webhooks?
A: Webhook replay functionality is not currently available. As a fallback, you can poll the generation status endpoint.

Support

For assistance with webhooks: