Support for Function Calling is in beta and will be available soon.

Introduction

Function calling provides a powerful and flexible way for OpenAI models to interface with your code or external services, and has two primary use cases:

Fetching DataRetrieve up-to-date information to incorporate into the model’s response (RAG). Useful for searching knowledge bases and retrieving specific data from APIs (e.g. current weather data).
Taking ActionPerform actions like submitting a form, calling APIs, modifying application state (UI/frontend or backend), or taking agentic workflow actions (like handing off the conversation).

If you only want the model to produce JSON, see our docs on structured outputs.

Getting Started

You’ll need an Inference.net account and API key to use Function Calling. See our Quick Start Guide for instructions on how to create an account and get an API key.

Install the OpenAI SDK for your language of choice. To connect to Inference.net using the OpenAI SDK, you will need to set the base URL to https://api.inference.net/v1. In this example, we are reading the API key from the environment variable INFERENCE_API_KEY.

import os
from openai import OpenAI

openai = OpenAI(
    base_url="https://api.inference.net/v1",
    api_key=os.getenv("INFERENCE_API_KEY"),
)

Overview

You can extend the capabilities of OpenAI models by giving them access to functions that you define called tools. This is also called “function calling”.

While OpenAI’s models have access to built-in tools, our models only support function calling for functions that you define.

With function calling, you’ll tell the model what tools are available to it, and it will decide which one to use.

You’ll then execute the function code, send back the results, and the model will incorporate them into its final response.

Sample function

Let’s look at the steps to allow a model to use a real get_weather function defined below:

import requests

def get_weather(latitude, longitude):
    response = requests.get(f"https://api.open-meteo.com/v1/forecast?latitude={latitude}&longitude={longitude}&current=temperature_2m,wind_speed_10m&hourly=temperature_2m,relative_humidity_2m,wind_speed_10m")
    data = response.json()
    return data['current']['temperature_2m']

All functions must return strings. You can format the string as JSON or another format if you like, but the return type itself must be a string.

Unlike the diagram earlier, this function expects precise latitude and longitude instead of a general location parameter.

Step By Step Example

Step 1: Call model with get_weather tool defined

from openai import OpenAI
import json
import os

client = OpenAI(
    base_url="https://api.inference.net/v1",
    api_key=os.getenv("INFERENCE_API_KEY"),
)

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current temperature for provided coordinates in celsius.",
        "parameters": {
            "type": "object",
            "properties": {
                "lat": {"type": "number"},
                "lon": {"type": "number"}
            },
            "required": ["lat", "lon"],
            "additionalProperties": False
        },
        "strict": True
    }
}]

messages = [
    {"role": "system", "content": "You are a helpful assistant that can answer questions and uses tools to get information."},
    {"role": "user", "content": "What's the weather like in Paris today?"}
]

completion = client.chat.completions.create(
    model="meta-llama/llama-3.1-70b-instruct/fp-16",
    messages=messages,
    tools=tools,
)
Less powerful models may not reliably respond with tool calls, and may not provide all requested parameters. Experiment with system prompts and other models to find the best results.

Step 2: Pull the selected function call from the model’s response

// Located at: completion.choices[0].message.tool_calls
[{
    "id": "call_12345xyz",
    "type": "function",
    "function": {
      "name": "get_weather",
      "arguments": "{\"lat\":48.8566,\"lon\":2.3522}"
    }
}]

Step 3: Execute the get_weather function

tool_call = completion.choices[0].message.tool_calls[0]
args = json.loads(tool_call.function.arguments)

result = get_weather(args["lat"], args["lon"])

Step 4: Supply result and call model again

messages.append(completion.choices[0].message)  # append model's function call message
messages.append({                               # append result message
    "role": "tool",
    "tool_call_id": tool_call.id,
    "content": str(result)
})

completion_2 = client.chat.completions.create(
    model="meta-llama/llama-3.1-70b-instruct/fp-16",
    messages=messages,
    tools=tools,
)

Output

// Located at: completion2.choices[0].message.content
"The current temperature in Paris is 14°C (57.2°F)."

Defining functions

Functions can be set in the tools parameter of each API request.

A function is defined by its schema, which informs the model what it does and what input arguments it expects. It comprises the following fields:

FieldDescription
nameThe function’s name (e.g. get_weather)
descriptionDetails on when and how to use the function
parametersJSON schema defining the function’s input arguments

Take a look at this example:

{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Retrieves current weather for the given location.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City and country e.g. Bogotá, Colombia"
                },
                "units": {
                    "type": "string",
                    "enum": [
                        "celsius",
                        "fahrenheit"
                    ],
                    "description": "Units the temperature will be returned in."
                }
            },
            "required": [
                "location",
                "units"
            ],
            "additionalProperties": false
        },
        "strict": true
    }
}

Because the parameters are defined by a JSON schema, you can leverage many of its rich features like property types, enums, and descriptions.

SDK Helpers

While you can define function schemas directly, OpenAI’s SDKs have helpers to convert pydantic and zod objects into schemas.

Not all pydantic and zod features are currently supported by Function Calling, but simple, flat schemas are supported.
Here is an example of how to use the SDK to define a schema.

import os
from openai import OpenAI, pydantic_function_tool
from pydantic import BaseModel, Field

client = OpenAI(
    base_url="https://api.inference.net/v1",
    api_key=os.getenv("INFERENCE_API_KEY"),
)

class GetWeather(BaseModel):
    location: str = Field(
        ...,
        description="City and country e.g. Bogotá, Colombia"
    )

tools = [pydantic_function_tool(GetWeather)]

completion = client.chat.completions.create(
    model="meta-llama/llama-3.1-70b-instruct/fp-16",
    messages=[
        {"role": "system", "content": "You are a helpful assistant that can answer questions and help with tasks."},
        {"role": "user", "content": "What's the weather like in Paris today?"}
    ],
    tools=tools
)

print(completion.choices[0].message.tool_calls)

Best practices for defining functions

  1. Write clear and detailed function names, parameter descriptions, and instructions.

    • Explicitly describe the purpose of the function and each parameter (and its format), and what the output represents.
    • Use the system prompt to describe when (and when not) to use each function. Generally, tell the model exactly what to do.
    • Include examples and edge cases, especially to rectify any recurring failures.
  2. Apply software engineering best practices.

    • Make the functions obvious and intuitive. (principle of least surprise)
    • Use enums and object structure to make invalid states unrepresentable. (e.g. toggle_light(on: bool, off: bool) allows for invalid calls)
    • Pass the intern test. Can an intern/human correctly use the function given nothing but what you gave the model? (If not, what questions do they ask you? Add the answers to the prompt.)
  3. Offload the burden from the model and use code where possible.

    • Don’t make the model fill arguments you already know. For example, if you already have an order_id based on a previous menu, don’t have an order_id param – instead, have no params submit_refund() and pass the order_id with code.
    • Combine functions that are always called in sequence. For example, if you always call mark_location() after query_location(), just move the marking logic into the query function call.
  4. Keep the number of functions small for higher accuracy.

    • Evaluate your performance with different numbers of functions.
    • Aim for fewer than 20 functions at any one time, though this is just a soft suggestion.

Streaming

Streaming can be used to surface progress by showing which function is called as the model fills its arguments, and even displaying the arguments in real time.

Streaming function calls is very similar to streaming regular responses: you set stream to true and get chunks with delta objects.

from openai import OpenAI
import os

client = OpenAI(
    base_url = "https://api.inference.net/v1",
    api_key = os.getenv("INFERENCE_API_KEY"), # Use your actual API key
)

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current temperature for a given location.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City and country e.g. Bogotá, Colombia"
                }
            },
            "required": ["location"],
            "additionalProperties": False
        },
        "strict": True
    }
}]

stream = client.chat.completions.create(
    model="meta-llama/llama-3.1-70b-instruct/fp-16",
    messages=[{"role": "user", "content": "What's the weather like in Paris today?"}],
    tools=tools,
    stream=True
)

for chunk in stream:
    delta = chunk.choices[0].delta
    print(delta.tool_calls)

Output of delta.tool_calls:

[{"index": 0, "id": "call_DdmO9pD3xa9XTPNJ32zg2hcA", "function": {"arguments": "", "name": "get_weather"}, "type": "function"}]
[{"index": 0, "id": null, "function": {"arguments": "{\"", "name": null}, "type": null}]
[{"index": 0, "id": null, "function": {"arguments": "location", "name": null}, "type": null}]
[{"index": 0, "id": null, "function": {"arguments": "\":\"", "name": null}, "type": null}]
[{"index": 0, "id": null, "function": {"arguments": "Paris", "name": null}, "type": null}]
[{"index": 0, "id": null, "function": {"arguments": ",", "name": null}, "type": null}]
[{"index": 0, "id": null, "function": {"arguments": " France", "name": null}, "type": null}]
[{"index": 0, "id": null, "function": {"arguments": "\"}", "name": null}, "type": null}]
null

Instead of aggregating chunks into a single content string, however, you’re aggregating chunks into an encoded arguments JSON object.

When the model calls one or more functions the tool_calls field of each delta will be populated. Each tool_call contains the following fields:

FieldDescription
indexIdentifies which function call the delta is for
idTool call id.
functionFunction call delta (name and arguments)
typeType of tool_call (always function for function calls)

Many of these fields are only set for the first delta of each tool call, like id, function.name, and type.

Below is a code snippet demonstrating how to aggregate the delta objects into a final tool_calls object.

final_tool_calls = {}

from openai import OpenAI
import os

client = OpenAI(
    base_url = "https://api.inference.net/v1",
    api_key = os.getenv("INFERENCE_API_KEY"),
)

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current temperature for a given location.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City and country e.g. Bogotá, Colombia"
                }
            },
            "required": ["location"],
            "additionalProperties": False
        },
        "strict": True
    }
}]

stream = client.chat.completions.create(
    model="meta-llama/llama-3.1-70b-instruct/fp-16",
    messages=[{"role": "user", "content": "What's the weather like in Paris today?"}],
    tools=tools,
    stream=True
)

for chunk in stream:
    for tool_call in chunk.choices[0].delta.tool_calls or []:
        index = tool_call.index

        if index not in final_tool_calls:
            final_tool_calls[index] = tool_call

        final_tool_calls[index].function.arguments += tool_call.function.arguments

Accumulated final_tool_calls[0]

{
    "index": 0,
    "id": "call_RzfkBpJgzeR0S242qfvjadNe",
    "function": {
        "name": "get_weather",
        "arguments": "{\"location\":\"Paris, France\"}"
    }
}