Prompt large models for instant labelers
Zero-shot means you provide no labeled examples—the model relies solely on its pre-training plus the instructions (and optional label names or schema) you include.
Few-shot adds a small set (typically 2-5) labeled examples to the prompt—these can be hard-coded or retrieved automatically with vector search (dynamic few-shot)—and often boosts accuracy by ~5-15%.
Adding examples dramatically improves accuracy for domain-specific tasks:
Model Size | Use Case | Accuracy | Speed | Cost |
---|---|---|---|---|
Small (1-3B) | Simple binary classification, high volume | Good | Fast | $ |
Medium (7-8B) | Multi-class, nuanced sentiment | Better | Medium | $$ |
Large (70B+) | Complex reasoning, few-shot learning | Best | Slower | $$$ |
There are two good ways to get a confidence score for an LLM:
To learn more about using logprobs to assess confidence for classification tasks, check out the OpenAI cookbook: https://cookbook.openai.com/examples/using_logprobs.
Once we have a confidence score, we can use it to flag low confidence results for review, pass them to a more powerful model.
For high volume, use the Batch API to process large volumes of data.
Problem | Cause | Solution |
---|---|---|
Inconsistent labels | Vague categories | Use specific, non-overlapping labels or prompt label descriptions |
Low confidence | Ambiguous input text | Add few-shot examples for edge cases |
Wrong language | Model defaults to English | Specify language in system prompt |
Slow responses | Large model for simple task | Use smaller model or batch processing |