Zero-shot means you provide no labeled examples—the model relies solely on its pre-training plus the instructions (and optional label names or schema) you include.
Few-shot adds a small set (typically 2-5) labeled examples to the prompt—these can be hard-coded or retrieved automatically with vector search (dynamic few-shot)—and often boosts accuracy by ~5-15%.

Zero-Shot Example: Sentiment Analysis

curl https://api.inference.net/v1/chat/completions \
  -H "Authorization: Bearer $INFERENCE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/llama-3.2-3b-instruct/fp-16",
    "messages": [
      { "role": "system",
        "content": "You are a sentiment classifier. Reply with JSON only." },
      { "role": "user",
        "content": "The battery lasts forever—love this phone!" }
    ],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "sentiment",
        "strict": true,
        "schema": {
          "type": "object",
          "properties": {
            "label": {
              "type": "string",
              "enum": ["positive","neutral","negative"]
            },
            "confidence": {
              "type": "number",
              "minimum": 0,
              "maximum": 1
            }
          },
          "required": ["label","confidence"],
          "additionalProperties": false
        }
      }
    }
  }'

Few-Shot Example: Customer Support Intent

Adding examples dramatically improves accuracy for domain-specific tasks:

curl https://api.inference.net/v1/chat/completions \
  -H "Authorization: Bearer $INFERENCE_API_KEY" \
  -H "Content-Type: application/json" \
  -d @- <<'JSON'
{
  "model": "meta-llama/llama-3.2-3b-instruct/fp-16",
  "messages": [
    {
      "role": "system",
      "content": "You are a support ticket classifier."
    },
    {
      "role": "user",
      "content": "Classify customer support requests into intents. Examples:\n\n\"My order hasn't arrived yet\" → billing_issue\n\"How do I reset my password?\" → account_help\n\"The app keeps crashing on iOS\" → technical_support\n\"I want to cancel my subscription\" → billing_issue\n\"Can you explain how the free trial works?\" → product_info\n\nNow classify: \"I was charged twice for the same order\"\nReply with JSON only."
    }
  ],
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "intent_classification",
      "strict": true,
      "schema": {
        "type": "object",
        "properties": {
          "intent": {
            "type": "string",
            "enum": ["billing_issue", "account_help", "technical_support", "product_info", "other"]
          },
          "confidence": { "type": "number", "minimum": 0, "maximum": 1 },
          "reasoning":  { "type": "string", "maxLength": 100 }
        },
        "required": ["intent", "confidence", "reasoning"],
        "additionalProperties": false
      }
    }
  }
}

Model Selection Guide

Model SizeUse CaseAccuracySpeedCost
Small (1-3B)Simple binary classification, high volumeGoodFast$
Medium (7-8B)Multi-class, nuanced sentimentBetterMedium$$
Large (70B+)Complex reasoning, few-shot learningBestSlower$$$

Advanced Techniques

1. Chain-of-Thought for Complex Cases

prompt = """Classify this email as urgent/normal/low priority.

Think step by step:
1. What is the sender asking for?
2. Are there time-sensitive keywords?
3. What's the business impact?

Email: "Hi, our production API is returning 500 errors for all users since 2pm. Customers can't complete purchases. Please help ASAP!"

Classification:"""

2. Confidence Thresholding

There are two good ways to get a confidence score for an LLM:

  1. You can simply ask the model to return a confidence score, and parse it out of the structured output response. This will work well enough for some use cases, but note that LLMs are not great at this, and even worse at returning decimal values.
  2. You can use logprobs to get a confidence score. Some Inference.net models support logprobs, which are the log of the probability of the most likely token. You can then use the logprobs to get a confidence score.

To learn more about using logprobs to assess confidence for classification tasks, check out the OpenAI cookbook: https://cookbook.openai.com/examples/using_logprobs.

Once we have a confidence score, we can use it to flag low confidence results for review, pass them to a more powerful model.

3. Batch Classification

For high volume, use the Batch API to process large volumes of data.

Why Zero-Shot Works So Well

  • Pre-trained Knowledge – Models already understand concepts like sentiment, topics, and intent
  • Natural Language Labels – No need for numeric codes; use descriptive names like “frustrated_customer”
  • Context Awareness – Considers surrounding text, not just keywords
  • Robustness – Handles typos, slang, and informal language naturally

Common Pitfalls & Solutions

ProblemCauseSolution
Inconsistent labelsVague categoriesUse specific, non-overlapping labels or prompt label descriptions
Low confidenceAmbiguous input textAdd few-shot examples for edge cases
Wrong languageModel defaults to EnglishSpecify language in system prompt
Slow responsesLarge model for simple taskUse smaller model or batch processing