Why LLMs for Translation?

BenefitWhat It Means
Small models, big quality1‑8 B models handle most language pairs at human‑usable accuracy.
Pennies per million tokens1-3B models can be more than 100x cheaper than their 70B+ counterparts. If you need a language-specific model that isn’t supported, let us know!
Zero setupSame OpenAI SDK, just point to https://api.inference.net/v1.
Formatting controlPreserve markdown, HTML, or CSV structure with a one‑line system prompt.

Quick Example

curl https://api.inference.net/v1/chat/completions \
  -H "Authorization: Bearer $INFERENCE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/llama-3.2-1b-instruct/fp-8",
    "messages": [
      {
        "role": "system",
        "content": "Translate to Spanish. Preserve markdown, code, and product names. Return only the translation."
      },
      {
        "role": "user",
        "content": "Smart home thermostat with energy-saving features"
      }
    ],
    "temperature": 0.3
  }'

Best Practices

  1. Pick the smallest model that passes a human spot‑check – cheaper & faster.
  2. Glossaries: add required term mappings in the system prompt for brand consistency.
  3. Chunk long docs at headings/paragraphs; translate chunks, then re‑assemble.
  4. Temperature 0.2 – 0.4 for faithful, non‑creative output.
  5. Double‑check high‑stakes strings with a second model or back‑translation.

Avoid feeding huge (> 4 K tokens) documents in one call — split instead to keep quality high and costs low.

Next Steps