Overview
Webhooks provide an efficient push-based notification system for tracking generation completions in real-time. Rather than repeatedly polling the API to check generation status, webhooks automatically notify your application when generations complete, enabling streamlined workflows and better resource utilization.Key Benefits
- Resource Efficiency: Eliminate unnecessary API calls for status checks
- Real-time Updates: Receive notifications within milliseconds of generation completion
- Scalability: Handle thousands of concurrent generations efficiently
- Improved User Experience: Update your UI instantly when results are ready
Getting Started
Step 1: Create a Webhook Endpoint
Your application needs an HTTPS endpoint capable of receiving POST requests. The endpoint should:- Accept JSON payloads
- Respond with HTTP 200 status immediately
- Process the webhook data asynchronously
Step 2: Deploy Your Endpoint
Your webhook endpoint must be publicly accessible via HTTPS. For development environments, consider using:- ngrok:
ngrok http 3000 - Cloudflare Tunnel: Provides a stable URL
- localtunnel:
lt --port 3000
Step 3: Register Your Webhook
- Navigate to the inference.net dashboard
- Go to API Keys → Webhooks in the sidebar
- Click Create Webhook
- Enter a descriptive name and your HTTPS endpoint URL
- Save your webhook
AhALzdz8S) that you’ll use when creating generations.
Step 4: Link Webhook to Generations
Include the webhook identifier in the metadata when creating a generation:metadata field).
When the generation completes, your webhook endpoint will receive a notification.
Webhook Events
generation.completed
Sent when a generation finishes processing (successfully or with failure):response object is compatible with the types exported from the official OpenAI SDKs.
Python
You can use the following type to parse the response:Typescript
async-embedding.completed
Sent when an async embedding request finishes processing:response object follows the standard OpenAI embeddings format.
Headers
All webhook requests include the following headers:| Header | Description | Example |
|---|---|---|
X-Inference-Event | Event type | generation.completed |
X-Inference-Webhook-ID | Webhook identifier | AhALzdz8S |
X-Inference-Generation-ID | Generation ID (for completion events) | XBKcs7F1s2oJ_AHiLMbF4 |
User-Agent | inference.net webhook agent | inference.net-Webhook/1.0 |
Content-Type | Always application/json | application/json |
(Security) Verifying the request source
Testing Webhooks
You can test your webhook endpoint from the dashboard:- Navigate to Webhooks in the dashboard
- Find your webhook in the list
- Click the menu (⋮) and select Test
- Check your endpoint logs for the test payload
Best Practices
1. Respond Immediately
Your endpoint must respond within 30 seconds. Always return a 200 status immediately and process the webhook asynchronously:2. Implement Idempotency
Failed webhooks may be retried. Use thegeneration_id to ensure you don’t process the same event twice:
3. Validate Webhook Source
Always verify that webhooks originate from inference.net by checking the presence of expected headers:4. Handle Errors Gracefully
Implement proper error handling to prevent individual failures from affecting your entire system:5. Monitor Webhook Processing
Track key metrics to ensure reliable webhook processing:- Webhook receipt rate
- Processing success/failure rates
- Average processing time
- Queue depth (if using queues)
Troubleshooting
Not Receiving Webhooks
- Check webhook status: Ensure your webhook is not disabled in the dashboard
- Test connectivity: Use the test feature in the dashboard
- Verify URL: Confirm your endpoint is publicly accessible via HTTPS
- Check logs: Review both your server logs and any reverse proxy logs
- Validate metadata: Ensure you’re including the correct
webhook_idin generation requests
Webhooks Arriving Late
- Verify your endpoint responds quickly (< 1 second ideally)
- Check that you’re not performing heavy processing before responding
- Monitor your server load and resource usage
Duplicate Webhook Deliveries
- Implement idempotency using the
generation_id - Ensure your endpoint always returns 200 OK for successful receipt
- Check for any errors in your webhook processing that might trigger retries
Frequently Asked Questions
Q: What happens if my endpoint is down?A: Failed webhook deliveries are retried up to 3 times with exponential backoff. After all retries are exhausted, the delivery is marked as failed. Q: What’s the webhook timeout?
A: Webhook endpoints must respond within 30 seconds. Timeouts are treated as failures and will trigger retries. Q: Can I filter which events I receive?
A: Currently, webhooks receive all event types. Event filtering is planned for a future update. Q: How secure are webhooks?
A: All webhooks are sent over HTTPS. You should validate the webhook source using the provided headers. HMAC signature verification is planned for additional security. Q: What’s the maximum payload size?
A: Webhook payloads can be up to 10MB, though typical payloads range from 5-50KB. Q: Can I replay missed webhooks?
A: Webhook replay functionality is not currently available. As a fallback, you can poll the generation status endpoint.
Support
For assistance with webhooks:- 📧 Email: [email protected]
- 💬 Discord: Join our developer community
- 📚 Documentation: https://docs.inference.net
- 🐛 Issues: Report bugs via our support portal