Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.inference.net/llms.txt

Use this file to discover all available pages before exploring further.

Catalyst can proxy Vertex AI in two modes:
  • OpenAI-compatible Vertex endpoint: use the OpenAI SDK and set x-inference-provider-url to your Vertex /endpoints/openapi URL.
  • Native Vertex APIs: call the Vertex model operation path through the Catalyst gateway and set x-inference-provider-url to the global or regional aiplatform.googleapis.com base URL.
Use x-inference-provider: vertex-ai for both modes so Catalyst applies Vertex-specific URL and authentication handling.
This guide is for Gemini and Anthropic through Google Cloud Vertex AI. For direct Gemini API calls using generativelanguage.googleapis.com, see the Google Gemini guide.

Supported Vertex endpoints

API shapeCatalyst path exampleStreaming
OpenAI-compatible Gemini/v1/chat/completions with x-inference-provider-url: .../endpoints/openapiYes
Native Gemini/v1/projects/{project}/locations/{location}/publishers/google/models/{model}:generateContentNo
Native Gemini stream/v1/projects/{project}/locations/{location}/publishers/google/models/{model}:streamGenerateContentYes
Anthropic on Vertex/v1/projects/{project}/locations/{location}/publishers/anthropic/models/{model}:rawPredictNo
Anthropic on Vertex stream/v1/projects/{project}/locations/{location}/publishers/anthropic/models/{model}:streamRawPredictYes
Native Gemini with API key/v1/publishers/google/models/{model}:generateContent with x-inference-provider-api-key: $GEMINI_VERTEX_API_KEYNo

Environment

export INFERENCE_API_KEY="<your-catalyst-project-api-key>"
export GOOGLE_CLOUD_PROJECT="<your-gcp-project-id>"
export VERTEX_LOCATION="global"

# For OpenAI-compatible Vertex and Anthropic-on-Vertex.
export VERTEX_AI_ACCESS_TOKEN="$(gcloud auth print-access-token)"

# Optional for native Gemini on Vertex. Google API keys are forwarded as ?key=...
export GEMINI_VERTEX_API_KEY="<your-google-api-key>"
For regional Vertex endpoints, use the regional host:
export VERTEX_BASE_URL="https://${VERTEX_LOCATION}-aiplatform.googleapis.com"
For the global location, use:
export VERTEX_BASE_URL="https://aiplatform.googleapis.com"

OpenAI-compatible Vertex

Use this path when you want to keep the OpenAI SDK shape for Vertex Gemini models.
import OpenAI from "openai";

const projectId = process.env.GOOGLE_CLOUD_PROJECT!;
const location = process.env.VERTEX_LOCATION ?? "global";
const vertexHost =
  location === "global"
    ? "https://aiplatform.googleapis.com"
    : `https://${location}-aiplatform.googleapis.com`;

const client = new OpenAI({
  baseURL: "https://api.inference.net/v1",
  apiKey: process.env.INFERENCE_API_KEY,
  defaultHeaders: {
    "x-inference-provider": "vertex-ai",
    "x-inference-provider-api-key": process.env.VERTEX_AI_ACCESS_TOKEN!,
    "x-inference-provider-url": `${vertexHost}/v1/projects/${projectId}/locations/${location}/endpoints/openapi`,
    "x-inference-environment": "production",
  },
});

const response = await client.chat.completions.create(
  {
    model: "google/gemini-2.0-flash-001",
    messages: [{ role: "user", content: "Reply with exactly OK." }],
    max_tokens: 32,
  },
  {
    headers: { "x-inference-task-id": "vertex-openai-compatible" },
  },
);
import os
from openai import OpenAI

project_id = os.environ["GOOGLE_CLOUD_PROJECT"]
location = os.getenv("VERTEX_LOCATION", "global")
vertex_host = (
    "https://aiplatform.googleapis.com"
    if location == "global"
    else f"https://{location}-aiplatform.googleapis.com"
)

client = OpenAI(
    base_url="https://api.inference.net/v1",
    api_key=os.environ["INFERENCE_API_KEY"],
    default_headers={
        "x-inference-provider": "vertex-ai",
        "x-inference-provider-api-key": os.environ["VERTEX_AI_ACCESS_TOKEN"],
        "x-inference-provider-url": f"{vertex_host}/v1/projects/{project_id}/locations/{location}/endpoints/openapi",
        "x-inference-environment": "production",
    },
)

response = client.chat.completions.create(
    model="google/gemini-2.0-flash-001",
    messages=[{"role": "user", "content": "Reply with exactly OK."}],
    max_tokens=32,
    extra_headers={"x-inference-task-id": "vertex-openai-compatible"},
)

Native Gemini on Vertex

Use native Gemini paths when you need Vertex’s generateContent or streamGenerateContent request format.
curl "https://api.inference.net/v1/projects/${GOOGLE_CLOUD_PROJECT}/locations/${VERTEX_LOCATION}/publishers/google/models/gemini-2.0-flash-001:generateContent" \
  -H "Authorization: Bearer ${INFERENCE_API_KEY}" \
  -H "Content-Type: application/json" \
  -H "x-inference-provider: vertex-ai" \
  -H "x-inference-provider-api-key: ${VERTEX_AI_ACCESS_TOKEN}" \
  -H "x-inference-provider-url: ${VERTEX_BASE_URL}" \
  -H "x-inference-environment: production" \
  -H "x-inference-task-id: vertex-native-gemini" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [{ "text": "Reply with exactly OK." }]
      }
    ],
    "generationConfig": { "maxOutputTokens": 64, "temperature": 0 }
  }'
If you are using a Google API key instead of an OAuth access token for Gemini, pass GEMINI_VERTEX_API_KEY as x-inference-provider-api-key. Catalyst forwards Google API keys to Vertex as the key query parameter.
curl "https://api.inference.net/v1/publishers/google/models/gemini-2.0-flash-001:generateContent" \
  -H "Authorization: Bearer ${INFERENCE_API_KEY}" \
  -H "Content-Type: application/json" \
  -H "x-inference-provider: vertex-ai" \
  -H "x-inference-provider-api-key: ${GEMINI_VERTEX_API_KEY}" \
  -H "x-inference-provider-url: https://aiplatform.googleapis.com" \
  -H "x-inference-environment: production" \
  -H "x-inference-task-id: vertex-native-gemini-api-key" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [{ "text": "Reply with exactly OK." }]
      }
    ],
    "generationConfig": { "maxOutputTokens": 64, "temperature": 0 }
  }'
For Gemini streaming, use :streamGenerateContent. Add ?alt=sse if you want Vertex to return server-sent events.
curl "https://api.inference.net/v1/projects/${GOOGLE_CLOUD_PROJECT}/locations/${VERTEX_LOCATION}/publishers/google/models/gemini-2.0-flash-001:streamGenerateContent?alt=sse" \
  -H "Authorization: Bearer ${INFERENCE_API_KEY}" \
  -H "Content-Type: application/json" \
  -H "x-inference-provider: vertex-ai" \
  -H "x-inference-provider-api-key: ${VERTEX_AI_ACCESS_TOKEN}" \
  -H "x-inference-provider-url: ${VERTEX_BASE_URL}" \
  -H "x-inference-environment: production" \
  -H "x-inference-task-id: vertex-native-gemini-stream" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [{ "text": "Reply with exactly OK." }]
      }
    ],
    "generationConfig": { "maxOutputTokens": 64, "temperature": 0 }
  }'

Anthropic on Vertex

Anthropic models on Vertex use Vertex operation paths and Anthropic’s Vertex payload shape. Use a Google Cloud OAuth access token or service-account-minted access token as x-inference-provider-api-key.
curl "https://api.inference.net/v1/projects/${GOOGLE_CLOUD_PROJECT}/locations/${VERTEX_LOCATION}/publishers/anthropic/models/claude-sonnet-4-5:rawPredict" \
  -H "Authorization: Bearer ${INFERENCE_API_KEY}" \
  -H "Content-Type: application/json" \
  -H "x-inference-provider: vertex-ai" \
  -H "x-inference-provider-api-key: ${VERTEX_AI_ACCESS_TOKEN}" \
  -H "x-inference-provider-url: ${VERTEX_BASE_URL}" \
  -H "x-inference-environment: production" \
  -H "x-inference-task-id: vertex-anthropic" \
  -d '{
    "anthropic_version": "vertex-2023-10-16",
    "max_tokens": 512,
    "messages": [
      {
        "role": "user",
        "content": "Reply with exactly OK."
      }
    ]
  }'
For streaming Anthropic responses on Vertex, use :streamRawPredict and include "stream": true in the body.
curl "https://api.inference.net/v1/projects/${GOOGLE_CLOUD_PROJECT}/locations/${VERTEX_LOCATION}/publishers/anthropic/models/claude-sonnet-4-5:streamRawPredict" \
  -H "Authorization: Bearer ${INFERENCE_API_KEY}" \
  -H "Content-Type: application/json" \
  -H "x-inference-provider: vertex-ai" \
  -H "x-inference-provider-api-key: ${VERTEX_AI_ACCESS_TOKEN}" \
  -H "x-inference-provider-url: ${VERTEX_BASE_URL}" \
  -H "x-inference-environment: production" \
  -H "x-inference-task-id: vertex-anthropic-stream" \
  -d '{
    "anthropic_version": "vertex-2023-10-16",
    "max_tokens": 512,
    "stream": true,
    "messages": [
      {
        "role": "user",
        "content": "Reply with exactly OK."
      }
    ]
  }'

Header summary

HeaderRequiredNotes
AuthorizationYesYour Catalyst project API key.
x-inference-providerYesSet to vertex-ai.
x-inference-provider-api-keyYesGoogle API key for native Gemini, or OAuth2 access token for Vertex.
x-inference-provider-urlYesVertex host or /endpoints/openapi URL, depending on the API shape.
x-inference-environmentNoDashboard environment tag.
x-inference-task-idNoDashboard task grouping.