Skip to main content
The Embeddings API converts text into numerical vectors that preserve semantic meaning. Use it when you need retrieval, ranking, deduplication, clustering, or similarity search.

Best fit

Use embeddings for:
  • semantic search
  • retrieval pipelines
  • reranking and similarity matching
  • clustering and taxonomy work
  • dataset deduplication and nearest-neighbor lookup

Request shape

The direct API uses the standard OpenAI-compatible embeddings endpoint:
  • endpoint: POST /v1/embeddings
  • auth: Authorization: Bearer $INFERENCE_API_KEY
  • input: a string or an array of strings

Operational guidance

  • use realtime embeddings for synchronous application flows
  • use background or batch paths when you need to process very large corpora
  • keep the same model across index creation and query time unless you plan a full reindex

Common choices

  • Single text input for interactive similarity lookups
  • Array input when you want one request to generate multiple vectors
  • Batch API when you need to process a large offline corpus