Getting Started With Webhooks

Webhook support is currently available for /chat/completions and /embeddings calls. Support for /completions will come later.

Overview

Webhooks provide an efficient push-based notification system for tracking generation completions in real-time. Rather than repeatedly polling the API to check generation status, webhooks automatically notify your application when generations complete, enabling streamlined workflows and better resource utilization.

Key Benefits

Resource Efficiency: Eliminate unnecessary API calls for status checks
Real-time Updates: Receive notifications within milliseconds of generation completion
Scalability: Handle thousands of concurrent generations efficiently
Improved User Experience: Update your UI instantly when results are ready

Getting Started

Step 1: Create a Webhook Endpoint

Your application needs an HTTPS endpoint capable of receiving POST requests. The endpoint should:

Accept JSON payloads
Respond with HTTP 200 status immediately
Process the webhook data asynchronously

Step 2: Deploy Your Endpoint

Your webhook endpoint must be publicly accessible via HTTPS. For development environments, consider using:

ngrok: ngrok http 3000
Cloudflare Tunnel: Provides a stable URL
localtunnel: lt --port 3000

Step 3: Register Your Webhook

Navigate to the inference.net dashboard
Go to API Keys → Webhooks in the sidebar
Click Create Webhook
Enter a descriptive name and your HTTPS endpoint URL
Save your webhook

You’ll receive a webhook identifier (e.g., AhALzdz8S) that you’ll use when creating generations.

Step 4: Link Webhook to Generations

Include the webhook identifier in the metadata when creating a generation:

curl -X POST https://api.inference.net/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/llama-3.1-8b-instruct",
    "messages": [
      {"role": "user", "content": "Explain quantum computing"}
    ],
    "metadata": {
      "webhook_id": "AhALzdz8S"
    }
  }'

(The OpenAI SDK also supports this metadata field).

When the generation completes, your webhook endpoint will receive a notification.

Webhook Events

generation.completed

Sent when a generation finishes processing (successfully or with failure):

{
  "event": "generation.completed",
  "timestamp": "2025-01-03T06:46:22.838Z",
  "webhook_id": "AhALzdz8S",
  "generation_id": "XBKcs7F1s2oJ_AHiLMbF4",
  "data": {
    "state": "Success",
    "stateMessage": "Generation successful",
    "request": {
      "messages": [
        {
          "role": "system",
          "content": "You are a helpful assistant."
        },
        {
          "role": "user",
          "content": "Explain quantum computing"
        }
      ],
      "model": "meta-llama/llama-3.1-8b-instruct",
      "stream": false,
      "max_tokens": 100,
      "metadata": {
        "webhook_id": "AhALzdz8S"
      }
    },
    "response": {
      "id": "XBKcs7F1s2oJ_AHiLMbF4",
      "object": "chat.completion",
      "created": 1748933182,
      "model": "meta-llama/llama-3.1-8b-instruct",
      "choices": [{
        "index": 0,
        "message": {
          "role": "assistant",
          "content": "Quantum computing is a revolutionary approach..."
        },
        "finish_reason": "stop"
      }],
      "usage": {
        "prompt_tokens": 28,
        "completion_tokens": 42,
        "total_tokens": 70
      }
    },
    "finishedAt": "2025-01-03T06:46:22.307Z"
  }
}

The response object is compatible with the types exported from the official OpenAI SDKs.

Python

You can use the following type to parse the response:

from openai.types.chat.chat_completion import ChatCompletion

response: ChatCompletion = webhook_payload.response.data.response

Typescript

import type { OpenAI } from "openai";

const response = responseJsonObject as OpenAI.Chat.Completions.ChatCompletion;

async-embedding.completed

Sent when an async embedding request finishes processing:

{
  "event": "async-embedding.completed",
  "timestamp": "2025-01-15T10:30:00Z",
  "webhook_id": "AhALzdz8S",
  "generation_id": "EMB_abc123",
  "data": {
    "state": "Success",
    "stateMessage": "Embeddings generated successfully",
    "request": {
      "model": "qwen/qwen3-embedding-4b",
      "input": ["text1", "text2", ...],
      "metadata": { "webhook_id": "AhALzdz8S" }
    },
    "response": {
      "object": "list",
      "data": [
        {
          "object": "embedding",
          "index": 0,
          "embedding": [0.0023064255, -0.009327292, ...]
        }
      ],
      "model": "qwen/qwen3-embedding-4b",
      "usage": {
        "prompt_tokens": 100,
        "total_tokens": 100
      }
    },
    "finishedAt": "2025-01-15T10:30:00Z"
  }
}

The response object follows the standard OpenAI embeddings format.

Headers

All webhook requests include the following headers:

Header	Description	Example
`X-Inference-Event`	Event type	`generation.completed`
`X-Inference-Webhook-ID`	Webhook identifier	`AhALzdz8S`
`X-Inference-Generation-ID`	Generation ID (for completion events)	`XBKcs7F1s2oJ_AHiLMbF4`
`User-Agent`	inference.net webhook agent	`inference.net-Webhook/1.0`
`Content-Type`	Always `application/json`	`application/json`

(Security) Verifying the request source

The X-Inference-Webhook-ID is a good way to verify that the payload you’re recieving is officially coming from our API.

This ID is unique to your webhook, and is completely private to you and your team.

If the ID does not match what you see in the dashboard, your endpoint has most likely been discovered by a malicious actor.

Testing Webhooks

You can test your webhook endpoint from the dashboard:

Navigate to Webhooks in the dashboard
Find your webhook in the list
Click the menu (⋮) and select Test
Check your endpoint logs for the test payload

A successful test will show a green success indicator in the dashboard.

Best Practices

1. Respond Immediately

Your endpoint must respond within 30 seconds. Always return a 200 status immediately and process the webhook asynchronously:

// Correct approach
app.post('/webhook', (req, res) => {
  res.status(200).send('OK');
  processWebhookAsync(req.body);
});

// Incorrect approach - may timeout
app.post('/webhook', async (req, res) => {
  await heavyProcessing(req.body); // Risk of timeout
  res.status(200).send('OK');
});

2. Implement Idempotency

Failed webhooks may be retried. Use the generation_id to ensure you don’t process the same event twice:

const processedGenerations = new Set();

async function processWebhook(payload) {
  if (processedGenerations.has(payload.generation_id)) {
    return; // Already processed
  }

  processedGenerations.add(payload.generation_id);
  // Process the generation
}

3. Validate Webhook Source

Always verify that webhooks originate from inference.net by checking the presence of expected headers:

function validateWebhookSource(headers) {
  const requiredHeaders = [
    'x-inference-webhook-id',
    'x-inference-event'
  ];

  return requiredHeaders.every(header => headers[header]);
}

4. Handle Errors Gracefully

Implement proper error handling to prevent individual failures from affecting your entire system:

async function handleWebhook(payload) {
  try {
    await processWebhook(payload);
  } catch (error) {
    console.error('Webhook processing failed:', error);
    // Log to monitoring service
    // Return 200 to prevent unnecessary retries
  }
}

5. Monitor Webhook Processing

Track key metrics to ensure reliable webhook processing:

Webhook receipt rate
Processing success/failure rates
Average processing time
Queue depth (if using queues)

Troubleshooting

Not Receiving Webhooks

Check webhook status: Ensure your webhook is not disabled in the dashboard
Test connectivity: Use the test feature in the dashboard
Verify URL: Confirm your endpoint is publicly accessible via HTTPS
Check logs: Review both your server logs and any reverse proxy logs
Validate metadata: Ensure you’re including the correct webhook_id in generation requests

Webhooks Arriving Late

Verify your endpoint responds quickly (< 1 second ideally)
Check that you’re not performing heavy processing before responding
Monitor your server load and resource usage

Duplicate Webhook Deliveries

Implement idempotency using the generation_id
Ensure your endpoint always returns 200 OK for successful receipt
Check for any errors in your webhook processing that might trigger retries

Frequently Asked Questions

Q: What happens if my endpoint is down?
A: Failed webhook deliveries are retried up to 3 times with exponential backoff. After all retries are exhausted, the delivery is marked as failed.

Q: What’s the webhook timeout?
A: Webhook endpoints must respond within 30 seconds. Timeouts are treated as failures and will trigger retries.

Q: Can I filter which events I receive?
A: Currently, webhooks receive all event types. Event filtering is planned for a future update.

Q: How secure are webhooks?
A: All webhooks are sent over HTTPS. You should validate the webhook source using the provided headers. HMAC signature verification is planned for additional security.

Q: What’s the maximum payload size?
A: Webhook payloads can be up to 10MB, though typical payloads range from 5-50KB.

Q: Can I replay missed webhooks?
A: Webhook replay functionality is not currently available. As a fallback, you can poll the generation status endpoint.

Support

For assistance with webhooks:

📧 Email: [email protected]
💬 Discord: Join our developer community
📚 Documentation: https://docs.inference.net
🐛 Issues: Report bugs via our support portal

Get Started

Features

Resources

Overview

Key Benefits

Getting Started

Step 1: Create a Webhook Endpoint

Step 2: Deploy Your Endpoint

Step 3: Register Your Webhook

Step 4: Link Webhook to Generations

Webhook Events

generation.completed

Python

Typescript

async-embedding.completed

Headers

(Security) Verifying the request source

Testing Webhooks

Best Practices

1. Respond Immediately

2. Implement Idempotency

3. Validate Webhook Source

4. Handle Errors Gracefully

5. Monitor Webhook Processing

Troubleshooting

Not Receiving Webhooks

Webhooks Arriving Late

Duplicate Webhook Deliveries

Frequently Asked Questions

Support

Get Started

Features

Resources

​Overview

​Key Benefits

​Getting Started

​Step 1: Create a Webhook Endpoint

​Step 2: Deploy Your Endpoint

​Step 3: Register Your Webhook

​Step 4: Link Webhook to Generations

​Webhook Events

​generation.completed

​Python

​Typescript

​async-embedding.completed

​Headers

​(Security) Verifying the request source

​Testing Webhooks

​Best Practices

​1. Respond Immediately

​2. Implement Idempotency

​3. Validate Webhook Source

​4. Handle Errors Gracefully

​5. Monitor Webhook Processing

​Troubleshooting

​Not Receiving Webhooks

​Webhooks Arriving Late

​Duplicate Webhook Deliveries

​Frequently Asked Questions

​Support

Overview

Key Benefits

Getting Started

Step 1: Create a Webhook Endpoint

Step 2: Deploy Your Endpoint

Step 3: Register Your Webhook

Step 4: Link Webhook to Generations

Webhook Events

generation.completed

Python

Typescript

async-embedding.completed

Headers

(Security) Verifying the request source

Testing Webhooks

Best Practices

1. Respond Immediately

2. Implement Idempotency

3. Validate Webhook Source

4. Handle Errors Gracefully

5. Monitor Webhook Processing

Troubleshooting

Not Receiving Webhooks

Webhooks Arriving Late

Duplicate Webhook Deliveries

Frequently Asked Questions

Support