Google Gemini - Inference.net Documentation

Route direct Gemini API calls through the Catalyst gateway to get request observability, latency tracking, and persisted request and response payloads. This guide covers Google’s Gemini API at generativelanguage.googleapis.com, not Vertex AI. Use the native Gemini paths when you want Google’s generateContent and streamGenerateContent request format:

Gemini API operation	Catalyst path
Non-streaming generation	`/v1beta/models/{model}:generateContent`
Streaming generation	`/v1beta/models/{model}:streamGenerateContent`

The gateway defaults these paths to the gemini provider. You can still set x-inference-provider: gemini explicitly to make routing obvious.

Looking for Gemini through Google Cloud Vertex AI instead? Use the Vertex AI guide.

Setup

Get your API keys

You need two keys:

Inference Catalyst project API key — from your dashboard under API Keys
Gemini API key — from Google AI Studio

Set environment variables

export INFERENCE_API_KEY=<your-project-api-key>
export GEMINI_API_KEY=<your-gemini-api-key>
export GEMINI_MODEL=gemini-3-flash-preview

Use the Google Gen AI SDK

The Google Gen AI SDK can point at the Catalyst gateway with httpOptions.baseUrl. The SDK sends your Gemini key as x-goog-api-key; Catalyst forwards that header downstream and uses Authorization for your Catalyst project key.

import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({
  apiKey: process.env.GEMINI_API_KEY,
  httpOptions: {
    baseUrl: "https://api.inference.net",
    apiVersion: "v1beta",
    headers: {
      Authorization: `Bearer ${process.env.INFERENCE_API_KEY}`,
      "x-inference-provider": "gemini",
      "x-inference-environment": "production",
      "x-inference-task-id": "gemini-direct",
    },
  },
});

const response = await ai.models.generateContent({
  model: process.env.GEMINI_MODEL ?? "gemini-3-flash-preview",
  contents: "Reply with exactly OK.",
  config: { maxOutputTokens: 128, temperature: 0 },
});

console.log(response.text);

const stream = await ai.models.generateContentStream({
  model: process.env.GEMINI_MODEL ?? "gemini-3-flash-preview",
  contents: "Reply with exactly OK.",
  config: { maxOutputTokens: 128, temperature: 0 },
});

for await (const chunk of stream) {
  process.stdout.write(chunk.text ?? "");
}

Use cURL for raw Gemini paths

Raw HTTP callers can pass the Gemini key as x-inference-provider-api-key. Catalyst converts that to x-goog-api-key when forwarding to Gemini.

curl "https://api.inference.net/v1beta/models/${GEMINI_MODEL}:generateContent" \
  -H "Authorization: Bearer ${INFERENCE_API_KEY}" \
  -H "Content-Type: application/json" \
  -H "x-inference-provider: gemini" \
  -H "x-inference-provider-api-key: ${GEMINI_API_KEY}" \
  -H "x-inference-environment: production" \
  -H "x-inference-task-id: gemini-direct" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [{ "text": "Reply with exactly OK." }]
      }
    ],
    "generationConfig": { "maxOutputTokens": 128, "temperature": 0 }
  }'

Headers

Header	Required	Description
`Authorization`	Yes	`Bearer <your-project-api-key>` authenticates the request to Catalyst and links telemetry to your project.
`x-inference-provider`	No	Set to `gemini` to make routing explicit. Native Gemini paths default to Gemini when omitted.
`x-inference-provider-api-key`	Yes for cURL	Your Gemini API key. Catalyst forwards it to Gemini as `x-goog-api-key`.
`x-inference-environment`	No	Tags requests with an environment, such as `production` or `staging`.
`x-inference-task-id`	No	Groups requests under a logical task for filtering and analytics.

Supported paths

Catalyst currently supports the direct Gemini generation paths:

/v1beta/models/{model}:generateContent
/v1beta/models/{model}:streamGenerateContent
/v1/models/{model}:generateContent
/v1/models/{model}:streamGenerateContent

Other Gemini API paths should be called directly until they are explicitly supported by the gateway.

OpenAI-compatible endpoint

If you would rather use the OpenAI request format (for example, to reuse an existing OpenAI SDK setup), Gemini exposes an OpenAI-compatible surface at https://generativelanguage.googleapis.com/v1beta/openai. Catalyst can route to it by combining the OpenAI-format path with a provider URL override:

Header	Value
`x-inference-provider`	`gemini`
`x-inference-provider-url`	`https://generativelanguage.googleapis.com/v1beta/openai`
`x-inference-provider-api-key`	Your Gemini API key. Catalyst forwards it as `Authorization: Bearer <key>` because the OpenAI-compat endpoint requires bearer auth.

cURL

curl "https://api.inference.net/v1/chat/completions" \
  -H "Authorization: Bearer ${INFERENCE_API_KEY}" \
  -H "Content-Type: application/json" \
  -H "x-inference-provider: gemini" \
  -H "x-inference-provider-url: https://generativelanguage.googleapis.com/v1beta/openai" \
  -H "x-inference-provider-api-key: ${GEMINI_API_KEY}" \
  -H "x-inference-environment: production" \
  -H "x-inference-task-id: gemini-openai-compat" \
  -d '{
    "model": "'"${GEMINI_MODEL}"'",
    "messages": [{ "role": "user", "content": "Reply with exactly OK." }],
    "max_completion_tokens": 32,
    "temperature": 0
  }'

The native :generateContent paths above remain the recommended surface — they expose Gemini features (system instructions, thinking traces, response schemas, image inputs) that the OpenAI-compat shim does not pass through.

Documentation Index

​Setup

​Headers

​Supported paths

​OpenAI-compatible endpoint

Setup

Headers

Supported paths

OpenAI-compatible endpoint