Getting Started With Webhooks
Everything you need to know to get started with webhooks.
Webhook support is only available for /chat/completions
calls, support for /completions
will come later.
Overview
Webhooks provide an efficient push-based notification system for tracking generation completions in real-time. Rather than repeatedly polling the API to check generation status, webhooks automatically notify your application when generations complete, enabling streamlined workflows and better resource utilization.
Key Benefits
- Resource Efficiency: Eliminate unnecessary API calls for status checks
- Real-time Updates: Receive notifications within milliseconds of generation completion
- Scalability: Handle thousands of concurrent generations efficiently
- Improved User Experience: Update your UI instantly when results are ready
Getting Started
Step 1: Create a Webhook Endpoint
Your application needs an HTTPS endpoint capable of receiving POST requests. The endpoint should:
- Accept JSON payloads
- Respond with HTTP 200 status immediately
- Process the webhook data asynchronously
Step 2: Deploy Your Endpoint
Your webhook endpoint must be publicly accessible via HTTPS. For development environments, consider using:
- ngrok:
ngrok http 3000
- Cloudflare Tunnel: Provides a stable URL
- localtunnel:
lt --port 3000
Step 3: Register Your Webhook
- Navigate to the inference.net dashboard
- Go to API Keys → Webhooks in the sidebar
- Click Create Webhook
- Enter a descriptive name and your HTTPS endpoint URL
- Save your webhook
You’ll receive a webhook identifier (e.g., AhALzdz8S
) that you’ll use when creating generations.
Step 4: Link Webhook to Generations
Include the webhook identifier in the metadata when creating a generation:
(The OpenAI SDK also supports this metadata
field).
When the generation completes, your webhook endpoint will receive a notification.
Webhook Events
generation.completed
Sent when a generation finishes processing (successfully or with failure):
The response
object is compatible with the types exported from the official OpenAI SDKs.
Python
You can use the following type to parse the response:
Typescript
Headers
All webhook requests include the following headers:
Header | Description | Example |
---|---|---|
X-Inference-Event | Event type | generation.completed |
X-Inference-Webhook-ID | Webhook identifier | AhALzdz8S |
X-Inference-Generation-ID | Generation ID (for completion events) | XBKcs7F1s2oJ_AHiLMbF4 |
User-Agent | inference.net webhook agent | inference.net-Webhook/1.0 |
Content-Type | Always application/json | application/json |
(Security) Verifying the request source
The X-Inference-Webhook-ID
is a good way to verify that the payload you’re recieving is officially coming from our API.
This ID is unique to your webhook, and is completely private to you and your team.
If the ID does not match what you see in the dashboard, your endpoint has most likely been discovered by a malicious actor.
Testing Webhooks
You can test your webhook endpoint from the dashboard:
- Navigate to Webhooks in the dashboard
- Find your webhook in the list
- Click the menu (⋮) and select Test
- Check your endpoint logs for the test payload
A successful test will show a green success indicator in the dashboard.
Best Practices
1. Respond Immediately
Your endpoint must respond within 30 seconds. Always return a 200 status immediately and process the webhook asynchronously:
2. Implement Idempotency
Failed webhooks may be retried. Use the generation_id
to ensure you don’t process the same event twice:
3. Validate Webhook Source
Always verify that webhooks originate from inference.net by checking the presence of expected headers:
4. Handle Errors Gracefully
Implement proper error handling to prevent individual failures from affecting your entire system:
5. Monitor Webhook Processing
Track key metrics to ensure reliable webhook processing:
- Webhook receipt rate
- Processing success/failure rates
- Average processing time
- Queue depth (if using queues)
Troubleshooting
Not Receiving Webhooks
- Check webhook status: Ensure your webhook is not disabled in the dashboard
- Test connectivity: Use the test feature in the dashboard
- Verify URL: Confirm your endpoint is publicly accessible via HTTPS
- Check logs: Review both your server logs and any reverse proxy logs
- Validate metadata: Ensure you’re including the correct
webhook_id
in generation requests
Webhooks Arriving Late
- Verify your endpoint responds quickly (< 1 second ideally)
- Check that you’re not performing heavy processing before responding
- Monitor your server load and resource usage
Duplicate Webhook Deliveries
- Implement idempotency using the
generation_id
- Ensure your endpoint always returns 200 OK for successful receipt
- Check for any errors in your webhook processing that might trigger retries
Frequently Asked Questions
Q: What happens if my endpoint is down?
A: Failed webhook deliveries are retried up to 3 times with exponential backoff. After all retries are exhausted, the delivery is marked as failed.
Q: What’s the webhook timeout?
A: Webhook endpoints must respond within 30 seconds. Timeouts are treated as failures and will trigger retries.
Q: Can I filter which events I receive?
A: Currently, webhooks receive all event types. Event filtering is planned for a future update.
Q: How secure are webhooks?
A: All webhooks are sent over HTTPS. You should validate the webhook source using the provided headers. HMAC signature verification is planned for additional security.
Q: What’s the maximum payload size?
A: Webhook payloads can be up to 10MB, though typical payloads range from 5-50KB.
Q: Can I replay missed webhooks?
A: Webhook replay functionality is not currently available. As a fallback, you can poll the generation status endpoint.
Support
For assistance with webhooks:
- 📧 Email: [email protected]
- 💬 Discord: Join our developer community
- 📚 Documentation: https://docs.inference.net
- 🐛 Issues: Report bugs via our support portal