Why LLMs for Translation?
| Benefit | What It Means |
| Small models, big quality | 1‑8 B models handle most language pairs at human‑usable accuracy. |
| Pennies per million tokens | 1-3B models can be more than 100x cheaper than their 70B+ counterparts. If you need a language-specific model that isn’t supported, let us know! |
| Zero setup | Same OpenAI SDK, just point to https://api.inference.net/v1. |
| Formatting control | Preserve markdown, HTML, or CSV structure with a one‑line system prompt. |
Quick Example
curl https://api.inference.net/v1/chat/completions \
-H "Authorization: Bearer $INFERENCE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/llama-3.2-1b-instruct/fp-16",
"messages": [
{
"role": "system",
"content": "Translate to Spanish. Preserve markdown, code, and product names. Return only the translation."
},
{
"role": "user",
"content": "Smart home thermostat with energy-saving features"
}
],
"temperature": 0.3
}'
Best Practices
- Pick the smallest model that passes a human spot‑check – cheaper & faster.
- Glossaries: add required term mappings in the system prompt for brand consistency.
- Chunk long docs at headings/paragraphs; translate chunks, then re‑assemble.
- Temperature 0.2 – 0.4 for faithful, non‑creative output.
- Double‑check high‑stakes strings with a second model or back‑translation.
Avoid feeding huge (> 4 K tokens) documents in one call — split instead to keep quality high and costs low.
Next Steps