Embeddings

Best fit
Request shape
Operational guidance
Common choices
Related pages

The Embeddings API converts text into numerical vectors that preserve semantic meaning. Use it when you need retrieval, ranking, deduplication, clustering, or similarity search.

Best fit

Use embeddings for:

semantic search
retrieval pipelines
reranking and similarity matching
clustering and taxonomy work
dataset deduplication and nearest-neighbor lookup

Request shape

The direct API uses the standard OpenAI-compatible embeddings endpoint:

endpoint: POST /v1/embeddings
auth: Authorization: Bearer $INFERENCE_API_KEY
input: a string or an array of strings

Operational guidance

use realtime embeddings for synchronous application flows
use background or batch paths when you need to process very large corpora
keep the same model across index creation and query time unless you plan a full reindex

Common choices

Single text input for interactive similarity lookups
Array input when you want one request to generate multiple vectors
Batch API when you need to process a large offline corpus

API Overview Vision

⌘I

Start Here

Guides

Reference

Tutorials

Best fit

Request shape

Operational guidance

Common choices

Start Here

Guides

Reference

Tutorials

​Best fit

​Request shape

​Operational guidance

​Common choices

​Related pages

Best fit

Request shape

Operational guidance

Common choices

Related pages