When using Structured Outputs, always include instructions in the system prompt to respond in JSON format. For example: “You are a helpful assistant. Respond in JSON format.”

Introduction

JSON is one of the most widely used formats in the world for applications to exchange data.

Structured Outputs is a feature that ensures the model will always generate responses that adhere to your supplied JSON Schema, so you don’t need to worry about the model omitting a required key, or hallucinating an invalid enum value.

Some benefits of Structured Outputs include:

  1. Reliable type-safety: No need to validate or retry incorrectly formatted responses
  2. Simpler prompting: No need for strongly worded prompts to achieve consistent formatting

Getting Started

You’ll need an Inference.net account and API key to use Structured Outputs. See our Quick Start Guide for instructions on how to create an account and get an API key.

Install the OpenAI SDK for your language of choice. To connect to Inference.net using the OpenAI SDK, you will need to set the base URL to https://api.inference.net/v1. In this example, we are reading the API key from the environment variable INFERENCE_API_KEY.

import os
from openai import OpenAI

openai = OpenAI(
    base_url="https://api.inference.net/v1",
    api_key=os.getenv("INFERENCE_API_KEY"),
)

When to use Structured Outputs

Structured Outputs are suitable when you want to indicate a structured schema for use when the model responds to the user.

For example, if you are building a math tutoring application, you might want the assistant to respond to your user using a specific JSON Schema so that you can generate a UI that displays different parts of the model’s output in distinct ways.

Put simply:

  • If you are connecting the model to tools, functions, data, etc. in your system, then you should use function calling
  • If you want to structure the model’s output when it responds to the user, then you should use a structured response_format

Structured Outputs vs JSON mode

Structured Outputs is the evolution of JSON mode. While both ensure valid JSON is produced, only Structured Outputs ensure schema adherance. Both Structured Outputs and JSON mode are supported in the Chat Completions API and Batch API.

We recommend always using Structured Outputs instead of JSON mode when possible.

Structured OutputsJSON Mode
Outputs valid JSONYesYes
Adheres to schemaYes (see supported schemas)No
Enablingresponse_format: { type: "json_schema", json_schema: {"strict": true, "schema": ... } }response_format: { type: "json_object" }

Example

Chain of thought

You can ask the model to output an answer in a structured, step-by-step way, to guide the user through the solution.

from openai import OpenAI
import os

openai = OpenAI(
    base_url="https://api.inference.net/v1",
    api_key=os.getenv("INFERENCE_API_KEY"),
);
import json

completion = openai.chat.completions.create(
    model="mistralai/mistral-nemo-12b-instruct/fp-8",
    messages=[
        {"role": "system", "content": "You are a helpful math tutor. Guide the user through the solution step by step. Respond in JSON format."},
        {"role": "user", "content": "how can I solve 8x + 7 = -23"}
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "math_reasoning",
            "schema": {
                "type": "object",
                "properties": {
                    "steps": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "explanation": {"type": "string"},
                                "output": {"type": "string"}
                            },
                            "required": ["explanation", "output"],
                            "additionalProperties": False
                        }
                    },
                    "final_answer": {"type": "string"}
                },
                "required": ["steps", "final_answer"],
                "additionalProperties": False
            },
            "strict": True
        }
    }
)

math_reasoning = json.loads(completion.choices[0].message.content)

print(json.dumps(math_reasoning, indent=2))

Example response

{
  "steps": [
    {
      "explanation": "Start with the equation 8x + 7 = -23.",
      "output": "8x + 7 = -23"
    },
    {
      "explanation": "Subtract 7 from both sides to isolate the term with the variable.",
      "output": "8x = -23 - 7"
    },
    {
      "explanation": "Simplify the right side of the equation.",
      "output": "8x = -30"
    },
    {
      "explanation": "Divide both sides by 8 to solve for x.",
      "output": "x = -30 / 8"
    },
    {
      "explanation": "Simplify the fraction.",
      "output": "x = -15 / 4"
    }
  ],
  "final_answer": "x = -15 / 4"
}

Defining Structured Outputs Schemas with the OpenAI SDK

In addition to supporting JSON Schema in the REST API, the OpenAI SDK for JavaScript makes it easy to define object schemas using Zod. Below, you can see how to extract information from unstructured text that conforms to a schema defined in code. There is also a Python SDK helper, but it is currently in beta and does not support complex schemas.

import OpenAI from "openai";
import { zodResponseFormat } from "openai/helpers/zod";
import { z } from "zod";

const openai = new OpenAI({
    baseURL: "https://api.inference.net/v1",
    apiKey: process.env.INFERENCE_API_KEY,
});

const CalendarEvent = z.object({
  name: z.string(),
  date: z.string(),
  participants: z.array(z.string()),
});

const completion = await openai.beta.chat.completions.parse({
  model: "mistralai/mistral-nemo-12b-instruct/fp-8",
  messages: [
    { role: "system", content: "Extract the event information. Respond in JSON format." },
    { role: "user", content: "Alice and Bob are going to a science fair on Friday." },
  ],
  response_format: zodResponseFormat(CalendarEvent, "event"),
});

const event = completion.choices[0].message.parsed;

console.log(event)

Step By Step Example - Parsing The Model’s Output

You can also use the OpenAI SDK helper to parse the model’s output into an object of your desired format.

The following examples use OpenAI’s built-in zod helper for more complex schema specification.
The Python SDK helper is currently in beta and does not currently support complex schemas, so there are no Python snippets for the following examples.

Step 1: Define your object

First you must define an object or data structure to represent the JSON Schema that the model should be constrained to follow.

For example, you can define an object like this:

import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";

const Step = z.object({
  explanation: z.string(),
  output: z.string(),
});

const MathResponse = z.object({
  steps: z.array(Step),
  final_answer: z.string(),
});

Step 2: Supply your object in the API call

You can use the parse method to automatically parse the JSON response into the object you defined.

Under the hood, the SDK takes care of supplying the JSON schema corresponding to your data structure, and then parsing the response as an object.

import OpenAI from "openai";

const openai = new OpenAI({
  baseURL: "https://api.inference.net/v1",
  apiKey: process.env.INFERENCE_API_KEY,
});


const completion = await openai.beta.chat.completions.parse({
  model: "mistralai/mistral-nemo-12b-instruct/fp-8",
  messages: [
    { role: "system", content: "You are a helpful math tutor. Guide the user through the solution step by step. Respond in JSON format." },
    { role: "user", content: "how can I solve 8x + 7 = -23" },
  ],
  response_format: zodResponseFormat(MathResponse, "math_response"),
});

console.log(completion.choices[0].message.parsed);

Handling edge cases

In some cases, the model might not generate a valid response that matches the provided JSON schema.

This can happen if for example you reach a max tokens limit and the response is incomplete.

import OpenAI from "openai";

const openai = new OpenAI({
  baseURL: "https://api.inference.net/v1",
  apiKey: process.env.INFERENCE_API_KEY,
});

try {
  const completion = await openai.chat.completions.create({
    model: "mistralai/mistral-nemo-12b-instruct/fp-8",
    messages: [{
        role: "system",
        content: "You are a helpful math tutor. Guide the user through the solution step by step. Respond in JSON format.",
      },
      {
        role: "user",
        content: "how can I solve 8x + 7 = -23"
      },
    ],
    response_format: {
      type: "json_schema",
      json_schema: {
        name: "math_response",
        schema: {
          type: "object",
          properties: {
            steps: {
              type: "array",
              items: {
                type: "object",
                properties: {
                  explanation: {
                    type: "string"
                  },
                  output: {
                    type: "string"
                  },
                },
                required: ["explanation", "output"],
                additionalProperties: false,
              },
            },
            final_answer: {
              type: "string"
            },
          },
          required: ["steps", "final_answer"],
          additionalProperties: false,
        },
        strict: true,
      },
    },
    max_tokens: 50,
  });

  if (completion.choices[0].finish_reason === "length") {
    // Handle the case where the model did not return a complete response
    throw new Error("Incomplete response");
  }

  const math_response = completion.choices[0].message;

} catch (e) {
  // Handle edge cases
  console.error(e);
}

Streaming

You can use streaming to process model responses as they are being generated, and parse them as structured data.

That way, you don’t have to wait for the entire response to complete before handling it. This is particularly useful if you would like to display JSON fields one by one, or handle function call arguments as soon as they are available.

Here is how you can stream a model response with the stream helper:

import os
from typing import List
from pydantic import BaseModel
from openai import OpenAI

# The Python SDK helper is currently in beta and does not currently support complex schemas, but this schema is simple enough.
class EntitiesModel(BaseModel):
    attributes: List[str]
    colors: List[str]
    animals: List[str]

client = OpenAI(
    base_url="https://api.inference.net/v1",
    api_key=os.getenv("INFERENCE_API_KEY"), # Use your actual API key
);

with client.beta.chat.completions.stream(
    model="mistralai/mistral-nemo-12b-instruct/fp-8",
    messages=[
        {
          "role": "system", 
          "content": "Extract entities from the input text. Respond in JSON format."
        },
        {
            "role": "user",
            "content": "The quick brown fox jumps over the lazy dog with piercing blue eyes",
        },
    ],
    response_format=EntitiesModel,
) as stream:
    for event in stream:
        if event.type == "content.delta":
            if event.parsed is not None:
                # Print the parsed data as JSON
                print("content.delta parsed:", event.parsed)
        elif event.type == "content.done":
            print("content.done")
        elif event.type == "error":
            print("Error in stream:", event.error)

final_completion = stream.get_final_completion()
print("Final completion:", final_completion)

Supported schemas

Structured Outputs supports a subset of the JSON Schema language.

Supported types

The following types are supported for Structured Outputs:

  • String
  • Number
  • Boolean
  • Integer
  • Object
  • Array
  • Enum
  • anyOf

Required Fields And Additional Properties

To use Structured Outputs, all properties on all objects must be specified as required. Also, additionalProperties must be set to false. In the following example, note how both location and unit are listed as required properties.

{
    "name": "get_weather",
    "description": "Fetches the weather in the given location",
    "strict": true,
    "parameters": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "The location to get the weather for"
            },
            "unit": {
                "type": "string",
                "description": "The unit to return the temperature in",
                "enum": ["F", "C"]
            }
        },
        "additionalProperties": false,
        "required": ["location", "unit"]
    }
}

Schema Limitations Depend on the Model

Limitations on the number of properties, enum values, and total string size may vary depending on the model you are using.

Key ordering

When using Structured Outputs, outputs will be produced in the same order as the ordering of keys in the schema.

JSON mode

When using JSON mode, always instruct the model to produce JSON in the system prompt. For example: “You are a helpful assistant. Respond in JSON format.”

JSON mode is a more basic version of the Structured Outputs feature. While JSON mode ensures that model output is valid JSON, Structured Outputs reliably matches the model’s output to the schema you specify. We recommend you use Structured Outputs if it is supported for your use case.

When JSON mode is turned on, the model’s output is ensured to be valid JSON, except for in some edge cases that you should detect and handle appropriately.

To turn on JSON mode with the Chat Completions or Assistants API you can set the response_format to { "type": "json_object" }.

Important notes:

  • When using JSON mode, you must always instruct the model to produce JSON via some message in the conversation, for example via your system message. If you don’t include an explicit instruction to generate JSON, the model may generate non-JSON or an unending stream of whitespace.
  • JSON mode will not guarantee the output matches any specific schema, only that it is valid and parses without errors. You should use Structured Outputs to ensure it matches your schema, or if that is not possible, you should use a validation library and potentially retries to ensure that the output matches your desired schema.
  • Your application must detect and handle the edge cases that can result in the model output not being a complete JSON object (see below)
  • Some models will include a triple backtick / JSON code format block around the JSON response. This should be detected and handled appropriately.

Handling JSON Mode edge cases

import os
from openai import OpenAI

openai = OpenAI(
    base_url="https://api.inference.net/v1",
    api_key=os.getenv("INFERENCE_API_KEY"),
)

we_did_not_specify_stop_tokens = True

try:
    response = openai.chat.completions.create(
        model="mistralai/mistral-nemo-12b-instruct/fp-8",
        messages=[
            {"role": "system", "content": "You are a helpful assistant designed to output JSON."},
            {"role": "user", "content": "Who won the world series in 2020? Please respond in the format {winner: ...}"}
        ],
        response_format={"type": "json_object"}
    )

    message = response.choices[0].message

    # Check if the conversation was too long for the context window, resulting in incomplete JSON 
    if hasattr(message, "finish_reason") and message.finish_reason == "length":
        # your code should handle this error case
        pass

    # Check if the OpenAI safety system refused the request and generated a refusal instead
    if hasattr(message, "refusal"):
        # your code should handle this error case
        # In this case, the .content field will contain the explanation (if any) that the model generated for why it is refusing
        print(message.refusal)

    # Check if the model's output included restricted content, so the generation of JSON was halted and may be partial
    if hasattr(message, "finish_reason") and message.finish_reason == "content_filter":
        # your code should handle this error case
        pass

    if hasattr(message, "finish_reason") and message.finish_reason == "stop":
        # In this case the model has either successfully finished generating the JSON object according to your schema, or the model generated one of the tokens you provided as a "stop token"
        if we_did_not_specify_stop_tokens:
            # If you didn't specify any stop tokens, then the generation is complete and the content key will contain the serialized JSON object
            # This will parse successfully and should now contain  "{"winner": "Los Angeles Dodgers"}"
            print(message.content)
        else:
            # Check if the response.choices[0].message.content ends with one of your stop tokens and handle appropriately
            pass
except Exception as e:
    # Your code should handle errors here, for example a network error calling the API
    print(e)