Embeddings API
Convert text into numerical vectors for semantic search, similarity, and more
Introduction
The Inference.net Embeddings API provides a fully OpenAI-compatible interface for generating high-quality text embeddings. Embeddings are numerical representations of text that capture semantic meaning, enabling powerful applications like semantic search, clustering, recommendations, and anomaly detection.
Our embeddings API is designed to be a drop-in replacement for OpenAI’s embeddings endpoint, making it easy to switch between providers without changing your code.
Key Features
- OpenAI Compatible: Use the same code and libraries you already have
- Multiple Models: Choose from a variety of embedding models optimized for different use cases
- High Performance: Fast inference with low latency
- Scalable: Handle high-volume workloads with our robust infrastructure
- Asynchronous Support: Process large batches with webhooks for completion notifications
Quick Start
API Endpoint
Basic Request
Request Parameters
Parameter | Type | Required | Description |
---|---|---|---|
model | string | Yes | The embedding model to use. See available models |
input | string or array | Yes | Text(s) to embed. Can be a string or array of strings |
encoding_format | string | No | Format for the embeddings. Options: float (default) or base64 |
metadata | object | No | Custom metadata for tracking and webhooks |
Batch Processing
You can embed multiple texts in a single request by passing an array:
Response Format
The API returns embeddings in the standard OpenAI format:
Asynchronous Embeddings
For large-scale embedding tasks, use our asynchronous API with webhooks to receive notifications when processing is complete.
Async Request
To make asynchronous embedding requests, use the /v1/async
base URL and include a webhook ID in your request metadata:
Webhook Payload
When the embedding process completes, you’ll receive a webhook with the following structure:
See our webhook documentation for more details on setting up and handling webhooks.
Use Cases
Semantic Search
Text Classification
Embeddings can be used as features for classification tasks:
Best Practices
- Batch Requests: Send multiple texts in a single request for better efficiency
- Model Selection: Choose models based on your specific use case. Visit our models page to compare options
- Normalization: Some applications benefit from normalizing embeddings to unit length
- Caching: Store generated embeddings to avoid redundant API calls
- Async for Large Batches: Use webhooks for processing large datasets asynchronously
Error Handling
Rate Limits
Embeddings API requests are subject to our standard rate limits. See our rate limits documentation for details.
Pricing
Embeddings are priced per token. Visit our models page for current pricing information.