Best fit
Use embeddings for:- semantic search
- retrieval pipelines
- reranking and similarity matching
- clustering and taxonomy work
- dataset deduplication and nearest-neighbor lookup
Request shape
The direct API uses the standard OpenAI-compatible embeddings endpoint:- endpoint:
POST /v1/embeddings - auth:
Authorization: Bearer $INFERENCE_API_KEY - input: a string or an array of strings
Operational guidance
- use realtime embeddings for synchronous application flows
- use background or batch paths when you need to process very large corpora
- keep the same model across index creation and query time unless you plan a full reindex
Common choices
- Single text input for interactive similarity lookups
- Array input when you want one request to generate multiple vectors
- Batch API when you need to process a large offline corpus