> ## Documentation Index
> Fetch the complete documentation index at: https://docs.inference.net/llms.txt
> Use this file to discover all available pages before exploring further.

# Getting Started With Webhooks

> Everything you need to know to get started with webhooks.

<Warning>
  Webhook support is currently available for `/chat/completions` and `/embeddings` calls. Support for `/completions` will come later.
</Warning>

## Overview

Webhooks provide an efficient push-based notification system for tracking generation completions in real-time. Rather than repeatedly polling the API to check generation status, webhooks automatically notify your application when generations complete, enabling streamlined workflows and better resource utilization.

## Key Benefits

* **Resource Efficiency**: Eliminate unnecessary API calls for status checks
* **Real-time Updates**: Receive notifications within milliseconds of generation completion
* **Scalability**: Handle thousands of concurrent generations efficiently
* **Improved User Experience**: Update your UI instantly when results are ready

## Getting Started

### Step 1: Create a Webhook Endpoint

Your application needs an HTTPS endpoint capable of receiving POST requests. The endpoint should:

1. Accept JSON payloads
2. Respond with HTTP 200 status immediately
3. Process the webhook data asynchronously

<CodeGroup>
  ```typescript TypeScript theme={"system"}
  import express from "express";

  const app = express();

  app.post("/webhooks/inference", express.json(), (req, res) => {
    const { event, webhook_id, generation_id, data } = req.body;

    // Verify webhook source via headers
    const webhookId = req.headers["x-inference-webhook-id"];

    if (event === "generation.completed") {
      console.log(`Generation ${generation_id} completed with status: ${data.state}`);

      // Process asynchronously
      setImmediate(() => {
        processGenerationResult(data);
      });
    } else if (event === "async-embedding.completed") {
      console.log(`Embedding ${generation_id} completed with status: ${data.state}`);

      setImmediate(() => {
        processEmbeddingResult(data);
      });
    }

    // Always respond immediately
    res.status(200).json({ received: true });
  });

  app.listen(3000, () => {
    console.log("Webhook receiver listening on port 3000");
  });
  ```

  ```python Python theme={"system"}
  from fastapi import FastAPI, Request, BackgroundTasks
  from pydantic import BaseModel
  from typing import Optional, Dict, Any

  app = FastAPI()

  class WebhookPayload(BaseModel):
      event: str
      timestamp: str
      webhook_id: str
      generation_id: Optional[str] = None
      data: Dict[str, Any]

  def process_generation(payload: WebhookPayload):
      """Process generation result asynchronously"""
      if payload.event == "generation.completed":
          print(f"Processing generation {payload.generation_id}")
          # Your processing logic here
      elif payload.event == "async-embedding.completed":
          print(f"Processing embedding {payload.generation_id}")
          # Your embedding processing logic here

  @app.post("/webhooks/inference")
  async def handle_webhook(
      payload: WebhookPayload,
      request: Request,
      background_tasks: BackgroundTasks,
  ):
      # Verify webhook source
      webhook_id = request.headers.get("x-inference-webhook-id")

      # Queue for background processing
      background_tasks.add_task(process_generation, payload)

      return {"received": True}
  ```
</CodeGroup>

<details>
  <summary><b>Go Example</b></summary>

  ```go theme={"system"}
  package main

  import (
      "encoding/json"
      "fmt"
      "io"
      "net/http"
  )

  type WebhookPayload struct {
      Event        string                 `json:"event"`
      Timestamp    string                 `json:"timestamp"`
      WebhookID    string                 `json:"webhook_id"`
      GenerationID string                 `json:"generation_id,omitempty"`
      Data         map[string]interface{} `json:"data"`
  }

  func handleWebhook(w http.ResponseWriter, r *http.Request) {
      body, err := io.ReadAll(r.Body)
      if err != nil {
          http.Error(w, "Failed to read body", http.StatusBadRequest)
          return
      }

      var payload WebhookPayload
      if err := json.Unmarshal(body, &payload); err != nil {
          http.Error(w, "Invalid JSON", http.StatusBadRequest)
          return
      }

      // Verify webhook source
      webhookID := r.Header.Get("X-Inference-Webhook-ID")

      // Process asynchronously
      go func() {
          if payload.Event == "generation.completed" {
              fmt.Printf("Processing generation %s\n", payload.GenerationID)
              // Your processing logic here
          } else if payload.Event == "async-embedding.completed" {
              fmt.Printf("Processing embedding %s\n", payload.GenerationID)
              // Your embedding processing logic here
          }
      }()

      // Respond immediately
      w.WriteHeader(http.StatusOK)
      json.NewEncoder(w).Encode(map[string]bool{"received": true})
  }

  func main() {
      http.HandleFunc("/webhooks/inference", handleWebhook)
      fmt.Println("Webhook server listening on :3000")
      http.ListenAndServe(":3000", nil)
  }
  ```
</details>

### Step 2: Deploy Your Endpoint

Your webhook endpoint must be publicly accessible via HTTPS. For development environments, consider using:

* **ngrok**: `ngrok http 3000`
* **Cloudflare Tunnel**: Provides a stable URL
* **localtunnel**: `lt --port 3000`

### Step 3: Register Your Webhook

1. Navigate to the inference.net dashboard
2. Go to **API Keys** → **Webhooks** in the sidebar
3. Click **Create Webhook**
4. Enter a descriptive name and your HTTPS endpoint URL
5. Save your webhook

You'll receive a webhook identifier (e.g., `AhALzdz8S`) that you'll use when creating generations.

### Step 4: Link Webhook to Generations

Include the webhook identifier in the metadata when creating a generation:

<CodeGroup>
  ```typescript TypeScript theme={"system"}
  import OpenAI from "openai";

  const client = new OpenAI({
    baseURL: "https://api.inference.net/v1/slow",
    apiKey: process.env.INFERENCE_API_KEY,
  });

  const response = await client.chat.completions.create({
    model: "google/gemma-3-27b-instruct/bf-16",
    messages: [{ role: "user", content: "Explain quantum computing" }],
    // @ts-expect-error metadata is not in the OpenAI SDK types
    metadata: {
      webhook_id: "AhALzdz8S",
    },
  });
  ```

  ```python Python theme={"system"}
  import os
  from openai import OpenAI

  client = OpenAI(
      base_url="https://api.inference.net/v1/slow",
      api_key=os.environ["INFERENCE_API_KEY"],
  )

  response = client.chat.completions.create(
      model="google/gemma-3-27b-instruct/bf-16",
      messages=[{"role": "user", "content": "Explain quantum computing"}],
      extra_body={
          "metadata": {"webhook_id": "AhALzdz8S"},
      },
  )
  ```

  ```bash cURL theme={"system"}
  curl https://api.inference.net/v1/slow/chat/completions \
    -H "Authorization: Bearer $INFERENCE_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "google/gemma-3-27b-instruct/bf-16",
      "messages": [
        {"role": "user", "content": "Explain quantum computing"}
      ],
      "metadata": {
        "webhook_id": "AhALzdz8S"
      }
    }'
  ```
</CodeGroup>

When the generation completes, your webhook endpoint will receive a notification.

## Webhook Events

### generation.completed

Sent when a generation finishes processing (successfully or with failure):

```json JSON theme={"system"}
{
  "event": "generation.completed",
  "timestamp": "2025-01-03T06:46:22.838Z",
  "webhook_id": "AhALzdz8S",
  "generation_id": "XBKcs7F1s2oJ_AHiLMbF4",
  "data": {
    "state": "Success",
    "stateMessage": "Generation successful",
    "request": {
      "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing"}
      ],
      "model": "google/gemma-3-27b-instruct/bf-16",
      "stream": false,
      "max_tokens": 100,
      "metadata": {
        "webhook_id": "AhALzdz8S"
      }
    },
    "response": {
      "id": "XBKcs7F1s2oJ_AHiLMbF4",
      "object": "chat.completion",
      "created": 1748933182,
      "model": "google/gemma-3-27b-instruct/bf-16",
      "choices": [{
        "index": 0,
        "message": {
          "role": "assistant",
          "content": "Quantum computing is a revolutionary approach..."
        },
        "finish_reason": "stop"
      }],
      "usage": {
        "prompt_tokens": 28,
        "completion_tokens": 42,
        "total_tokens": 70
      }
    },
    "finishedAt": "2025-01-03T06:46:22.307Z"
  }
}
```

The `response` object is compatible with the types exported from the official OpenAI SDKs.

<CodeGroup>
  ```typescript TypeScript theme={"system"}
  import type { OpenAI } from "openai";

  const response = responseJsonObject as OpenAI.Chat.Completions.ChatCompletion;
  ```

  ```python Python theme={"system"}
  from openai.types.chat.chat_completion import ChatCompletion

  response: ChatCompletion = webhook_payload.data["response"]
  ```
</CodeGroup>

### async-embedding.completed

Sent when an async embedding request finishes processing:

```json JSON theme={"system"}
{
  "event": "async-embedding.completed",
  "timestamp": "2025-01-15T10:30:00Z",
  "webhook_id": "AhALzdz8S",
  "generation_id": "EMB_abc123",
  "data": {
    "state": "Success",
    "stateMessage": "Embeddings generated successfully",
    "request": {
      "model": "qwen/qwen3-embedding-4b",
      "input": ["text1", "text2"],
      "metadata": { "webhook_id": "AhALzdz8S" }
    },
    "response": {
      "object": "list",
      "data": [
        {
          "object": "embedding",
          "index": 0,
          "embedding": [0.0023064255, -0.009327292]
        }
      ],
      "model": "qwen/qwen3-embedding-4b",
      "usage": {
        "prompt_tokens": 100,
        "total_tokens": 100
      }
    },
    "finishedAt": "2025-01-15T10:30:00Z"
  }
}
```

The `response` object follows the standard OpenAI embeddings format.

## Headers

All webhook requests include the following headers:

| Header                      | Description                           | Example                     |
| --------------------------- | ------------------------------------- | --------------------------- |
| `X-Inference-Event`         | Event type                            | `generation.completed`      |
| `X-Inference-Webhook-ID`    | Webhook identifier                    | `AhALzdz8S`                 |
| `X-Inference-Generation-ID` | Generation ID (for completion events) | `XBKcs7F1s2oJ_AHiLMbF4`     |
| `User-Agent`                | inference.net webhook agent           | `inference.net-Webhook/1.0` |
| `Content-Type`              | Always `application/json`             | `application/json`          |

### (Security) Verifying the request source

<Warning>
  The `X-Inference-Webhook-ID` is a good way to verify that the payload you're receiving is officially coming from our API.

  This ID is unique to your webhook, and is completely private to you and your team.

  If the ID does not match what you see in the dashboard, your endpoint has most likely been discovered by a malicious actor.
</Warning>

## Testing Webhooks

You can test your webhook endpoint from the dashboard:

1. Navigate to **Webhooks** in the dashboard
2. Find your webhook in the list
3. Click the menu and select **Test**
4. Check your endpoint logs for the test payload

A successful test will show a green success indicator in the dashboard.

## Best Practices

### 1. Respond Immediately

Your endpoint must respond within 30 seconds. Always return a 200 status immediately and process the webhook asynchronously:

<CodeGroup>
  ```typescript TypeScript theme={"system"}
  // Correct approach
  app.post("/webhook", (req, res) => {
    res.status(200).send("OK");
    processWebhookAsync(req.body);
  });

  // Incorrect approach — may timeout
  app.post("/webhook", async (req, res) => {
    await heavyProcessing(req.body); // Risk of timeout
    res.status(200).send("OK");
  });
  ```

  ```python Python theme={"system"}
  # Correct approach — process in background
  @app.post("/webhook")
  async def handle_webhook(
      payload: WebhookPayload,
      background_tasks: BackgroundTasks,
  ):
      background_tasks.add_task(process_webhook, payload)
      return {"received": True}

  # Incorrect approach — may timeout
  @app.post("/webhook")
  async def handle_webhook(payload: WebhookPayload):
      await heavy_processing(payload)  # Risk of timeout
      return {"received": True}
  ```
</CodeGroup>

### 2. Implement Idempotency

Failed webhooks may be retried. Use the `generation_id` to ensure you don't process the same event twice:

<CodeGroup>
  ```typescript TypeScript theme={"system"}
  const processedGenerations = new Set<string>();

  async function processWebhook(payload: any) {
    if (processedGenerations.has(payload.generation_id)) {
      return; // Already processed
    }

    processedGenerations.add(payload.generation_id);
    // Process the generation
  }
  ```

  ```python Python theme={"system"}
  processed_generations: set[str] = set()

  def process_webhook(payload: WebhookPayload):
      if payload.generation_id in processed_generations:
          return  # Already processed

      processed_generations.add(payload.generation_id)
      # Process the generation
  ```
</CodeGroup>

### 3. Validate Webhook Source

Always verify that webhooks originate from inference.net by checking the presence of expected headers:

<CodeGroup>
  ```typescript TypeScript theme={"system"}
  function validateWebhookSource(headers: Record<string, string>): boolean {
    const requiredHeaders = ["x-inference-webhook-id", "x-inference-event"];
    return requiredHeaders.every((header) => headers[header]);
  }
  ```

  ```python Python theme={"system"}
  def validate_webhook_source(headers: dict) -> bool:
      required_headers = ["x-inference-webhook-id", "x-inference-event"]
      return all(headers.get(h) for h in required_headers)
  ```
</CodeGroup>

### 4. Handle Errors Gracefully

Implement proper error handling to prevent individual failures from affecting your entire system:

<CodeGroup>
  ```typescript TypeScript theme={"system"}
  async function handleWebhook(payload: any) {
    try {
      await processWebhook(payload);
    } catch (error) {
      console.error("Webhook processing failed:", error);
      // Log to monitoring service
      // Return 200 to prevent unnecessary retries
    }
  }
  ```

  ```python Python theme={"system"}
  async def handle_webhook(payload: WebhookPayload):
      try:
          await process_webhook(payload)
      except Exception as error:
          print(f"Webhook processing failed: {error}")
          # Log to monitoring service
          # Return 200 to prevent unnecessary retries
  ```
</CodeGroup>

### 5. Monitor Webhook Processing

Track key metrics to ensure reliable webhook processing:

* Webhook receipt rate
* Processing success/failure rates
* Average processing time
* Queue depth (if using queues)

## Troubleshooting

### Not Receiving Webhooks

1. **Check webhook status**: Ensure your webhook is not disabled in the dashboard
2. **Test connectivity**: Use the test feature in the dashboard
3. **Verify URL**: Confirm your endpoint is publicly accessible via HTTPS
4. **Check logs**: Review both your server logs and any reverse proxy logs
5. **Validate metadata**: Ensure you're including the correct `webhook_id` in generation requests

### Webhooks Arriving Late

* Verify your endpoint responds quickly (\< 1 second ideally)
* Check that you're not performing heavy processing before responding
* Monitor your server load and resource usage

### Duplicate Webhook Deliveries

* Implement idempotency using the `generation_id`
* Ensure your endpoint always returns 200 OK for successful receipt
* Check for any errors in your webhook processing that might trigger retries

## Frequently Asked Questions

**Q: What happens if my endpoint is down?**
A: Failed webhook deliveries are retried up to 3 times with exponential backoff. After all retries are exhausted, the delivery is marked as failed.

**Q: What's the webhook timeout?**
A: Webhook endpoints must respond within 30 seconds. Timeouts are treated as failures and will trigger retries.

**Q: Can I filter which events I receive?**
A: Currently, webhooks receive all event types. Event filtering is planned for a future update.

**Q: How secure are webhooks?**
A: All webhooks are sent over HTTPS. You should validate the webhook source using the provided headers. HMAC signature verification is planned for additional security.

**Q: What's the maximum payload size?**
A: Webhook payloads can be up to 10MB, though typical payloads range from 5-50KB.

**Q: Can I replay missed webhooks?**
A: Webhook replay functionality is not currently available. As a fallback, you can poll the generation status endpoint.

## Support

For assistance with webhooks:

* Email: [support@inference.net](mailto:support@inference.net)
* Discord: Join our developer community
* Documentation: [https://docs.inference.net](https://docs.inference.net)
* Issues: Report bugs via our support portal
