Translation

Why LLMs for Translation?

Benefit	What It Means
Small models, big quality	1‑8 B models handle most language pairs at human‑usable accuracy.
Pennies per million tokens	1-3B models can be more than 100x cheaper than their 70B+ counterparts. If you need a language-specific model that isn’t supported, let us know!
Zero setup	Same OpenAI SDK, just point to `https://api.inference.net/v1`.
Formatting control	Preserve markdown, HTML, or CSV structure with a one‑line system prompt.

Quick Example

curl https://api.inference.net/v1/chat/completions \
  -H "Authorization: Bearer $INFERENCE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/llama-3.2-1b-instruct/fp-8",
    "messages": [
      {
        "role": "system",
        "content": "Translate to Spanish. Preserve markdown, code, and product names. Return only the translation."
      },
      {
        "role": "user",
        "content": "Smart home thermostat with energy-saving features"
      }
    ],
    "temperature": 0.3
  }'

Best Practices

Pick the smallest model that passes a human spot‑check – cheaper & faster.
Glossaries: add required term mappings in the system prompt for brand consistency.
Chunk long docs at headings/paragraphs; translate chunks, then re‑assemble.
Temperature 0.2 – 0.4 for faithful, non‑creative output.
Double‑check high‑stakes strings with a second model or back‑translation.

Avoid feeding huge (> 4 K tokens) documents in one call — split instead to keep quality high and costs low.

Next Steps

Curious how to validate translations automatically? See our Structured Outputs guide.

Get Started

Features

Fine-Tuning

Use Cases

Resources

Why LLMs for Translation?

Quick Example

Best Practices

Next Steps

Get Started

Features

Fine-Tuning

Use Cases

Resources

​Why LLMs for Translation?

​Quick Example

​Best Practices

​Next Steps

Why LLMs for Translation?

Quick Example

Best Practices

Next Steps