Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.inference.net/llms.txt

Use this file to discover all available pages before exploring further.

Route direct Gemini API calls through the Catalyst gateway to get request observability, latency tracking, and persisted request and response payloads. This guide covers Google’s Gemini API at generativelanguage.googleapis.com, not Vertex AI. Use the native Gemini paths when you want Google’s generateContent and streamGenerateContent request format:
Gemini API operationCatalyst path
Non-streaming generation/v1beta/models/{model}:generateContent
Streaming generation/v1beta/models/{model}:streamGenerateContent
The gateway defaults these paths to the gemini provider. You can still set x-inference-provider: gemini explicitly to make routing obvious.
Looking for Gemini through Google Cloud Vertex AI instead? Use the Vertex AI guide.

Setup

1

Get your API keys

You need two keys:
2

Set environment variables

export INFERENCE_API_KEY=<your-project-api-key>
export GEMINI_API_KEY=<your-gemini-api-key>
export GEMINI_MODEL=gemini-3-flash-preview
3

Use the Google Gen AI SDK

The Google Gen AI SDK can point at the Catalyst gateway with httpOptions.baseUrl. The SDK sends your Gemini key as x-goog-api-key; Catalyst forwards that header downstream and uses Authorization for your Catalyst project key.
import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({
  apiKey: process.env.GEMINI_API_KEY,
  httpOptions: {
    baseUrl: "https://api.inference.net",
    apiVersion: "v1beta",
    headers: {
      Authorization: `Bearer ${process.env.INFERENCE_API_KEY}`,
      "x-inference-provider": "gemini",
      "x-inference-environment": "production",
      "x-inference-task-id": "gemini-direct",
    },
  },
});

const response = await ai.models.generateContent({
  model: process.env.GEMINI_MODEL ?? "gemini-3-flash-preview",
  contents: "Reply with exactly OK.",
  config: { maxOutputTokens: 128, temperature: 0 },
});

console.log(response.text);

const stream = await ai.models.generateContentStream({
  model: process.env.GEMINI_MODEL ?? "gemini-3-flash-preview",
  contents: "Reply with exactly OK.",
  config: { maxOutputTokens: 128, temperature: 0 },
});

for await (const chunk of stream) {
  process.stdout.write(chunk.text ?? "");
}
4

Use cURL for raw Gemini paths

Raw HTTP callers can pass the Gemini key as x-inference-provider-api-key. Catalyst converts that to x-goog-api-key when forwarding to Gemini.
curl "https://api.inference.net/v1beta/models/${GEMINI_MODEL}:generateContent" \
  -H "Authorization: Bearer ${INFERENCE_API_KEY}" \
  -H "Content-Type: application/json" \
  -H "x-inference-provider: gemini" \
  -H "x-inference-provider-api-key: ${GEMINI_API_KEY}" \
  -H "x-inference-environment: production" \
  -H "x-inference-task-id: gemini-direct" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [{ "text": "Reply with exactly OK." }]
      }
    ],
    "generationConfig": { "maxOutputTokens": 128, "temperature": 0 }
  }'

Headers

HeaderRequiredDescription
AuthorizationYesBearer <your-project-api-key> authenticates the request to Catalyst and links telemetry to your project.
x-inference-providerNoSet to gemini to make routing explicit. Native Gemini paths default to Gemini when omitted.
x-inference-provider-api-keyYes for cURLYour Gemini API key. Catalyst forwards it to Gemini as x-goog-api-key.
x-inference-environmentNoTags requests with an environment, such as production or staging.
x-inference-task-idNoGroups requests under a logical task for filtering and analytics.

Supported paths

Catalyst currently supports the direct Gemini generation paths:
  • /v1beta/models/{model}:generateContent
  • /v1beta/models/{model}:streamGenerateContent
  • /v1/models/{model}:generateContent
  • /v1/models/{model}:streamGenerateContent
Other Gemini API paths should be called directly until they are explicitly supported by the gateway.

OpenAI-compatible endpoint

If you would rather use the OpenAI request format (for example, to reuse an existing OpenAI SDK setup), Gemini exposes an OpenAI-compatible surface at https://generativelanguage.googleapis.com/v1beta/openai. Catalyst can route to it by combining the OpenAI-format path with a provider URL override:
HeaderValue
x-inference-providergemini
x-inference-provider-urlhttps://generativelanguage.googleapis.com/v1beta/openai
x-inference-provider-api-keyYour Gemini API key. Catalyst forwards it as Authorization: Bearer <key> because the OpenAI-compat endpoint requires bearer auth.
cURL
curl "https://api.inference.net/v1/chat/completions" \
  -H "Authorization: Bearer ${INFERENCE_API_KEY}" \
  -H "Content-Type: application/json" \
  -H "x-inference-provider: gemini" \
  -H "x-inference-provider-url: https://generativelanguage.googleapis.com/v1beta/openai" \
  -H "x-inference-provider-api-key: ${GEMINI_API_KEY}" \
  -H "x-inference-environment: production" \
  -H "x-inference-task-id: gemini-openai-compat" \
  -d '{
    "model": "'"${GEMINI_MODEL}"'",
    "messages": [{ "role": "user", "content": "Reply with exactly OK." }],
    "max_completion_tokens": 32,
    "temperature": 0
  }'
The native :generateContent paths above remain the recommended surface — they expose Gemini features (system instructions, thinking traces, response schemas, image inputs) that the OpenAI-compat shim does not pass through.