Vertex AI - Inference.net Documentation

Catalyst can proxy Vertex AI in two modes:

OpenAI-compatible Vertex endpoint: use the OpenAI SDK and set x-inference-provider-url to your Vertex /endpoints/openapi URL.
Native Vertex APIs: call the Vertex model operation path through the Catalyst gateway and set x-inference-provider-url to the global or regional aiplatform.googleapis.com base URL.

Use x-inference-provider: vertex-ai for both modes so Catalyst applies Vertex-specific URL and authentication handling.

This guide is for Gemini and Anthropic through Google Cloud Vertex AI. For direct Gemini API calls using generativelanguage.googleapis.com, see the Google Gemini guide.

Supported Vertex endpoints

API shape	Catalyst path example	Streaming
OpenAI-compatible Gemini	`/v1/chat/completions` with `x-inference-provider-url: .../endpoints/openapi`	Yes
Native Gemini	`/v1/projects/{project}/locations/{location}/publishers/google/models/{model}:generateContent`	No
Native Gemini stream	`/v1/projects/{project}/locations/{location}/publishers/google/models/{model}:streamGenerateContent`	Yes
Anthropic on Vertex	`/v1/projects/{project}/locations/{location}/publishers/anthropic/models/{model}:rawPredict`	No
Anthropic on Vertex stream	`/v1/projects/{project}/locations/{location}/publishers/anthropic/models/{model}:streamRawPredict`	Yes
Native Gemini with API key	`/v1/publishers/google/models/{model}:generateContent` with `x-inference-provider-api-key: $GEMINI_VERTEX_API_KEY`	No

Environment

export INFERENCE_API_KEY="<your-catalyst-project-api-key>"
export GOOGLE_CLOUD_PROJECT="<your-gcp-project-id>"
export VERTEX_LOCATION="global"

# For OpenAI-compatible Vertex and Anthropic-on-Vertex.
export VERTEX_AI_ACCESS_TOKEN="$(gcloud auth print-access-token)"

# Optional for native Gemini on Vertex. Google API keys are forwarded as ?key=...
export GEMINI_VERTEX_API_KEY="<your-google-api-key>"

For regional Vertex endpoints, use the regional host:

export VERTEX_BASE_URL="https://${VERTEX_LOCATION}-aiplatform.googleapis.com"

For the global location, use:

export VERTEX_BASE_URL="https://aiplatform.googleapis.com"

OpenAI-compatible Vertex

Use this path when you want to keep the OpenAI SDK shape for Vertex Gemini models.

import OpenAI from "openai";

const projectId = process.env.GOOGLE_CLOUD_PROJECT!;
const location = process.env.VERTEX_LOCATION ?? "global";
const vertexHost =
  location === "global"
    ? "https://aiplatform.googleapis.com"
    : `https://${location}-aiplatform.googleapis.com`;

const client = new OpenAI({
  baseURL: "https://api.inference.net/v1",
  apiKey: process.env.INFERENCE_API_KEY,
  defaultHeaders: {
    "x-inference-provider": "vertex-ai",
    "x-inference-provider-api-key": process.env.VERTEX_AI_ACCESS_TOKEN!,
    "x-inference-provider-url": `${vertexHost}/v1/projects/${projectId}/locations/${location}/endpoints/openapi`,
    "x-inference-environment": "production",
  },
});

const response = await client.chat.completions.create(
  {
    model: "google/gemini-2.0-flash-001",
    messages: [{ role: "user", content: "Reply with exactly OK." }],
    max_tokens: 32,
  },
  {
    headers: { "x-inference-task-id": "vertex-openai-compatible" },
  },
);

import os
from openai import OpenAI

project_id = os.environ["GOOGLE_CLOUD_PROJECT"]
location = os.getenv("VERTEX_LOCATION", "global")
vertex_host = (
    "https://aiplatform.googleapis.com"
    if location == "global"
    else f"https://{location}-aiplatform.googleapis.com"
)

client = OpenAI(
    base_url="https://api.inference.net/v1",
    api_key=os.environ["INFERENCE_API_KEY"],
    default_headers={
        "x-inference-provider": "vertex-ai",
        "x-inference-provider-api-key": os.environ["VERTEX_AI_ACCESS_TOKEN"],
        "x-inference-provider-url": f"{vertex_host}/v1/projects/{project_id}/locations/{location}/endpoints/openapi",
        "x-inference-environment": "production",
    },
)

response = client.chat.completions.create(
    model="google/gemini-2.0-flash-001",
    messages=[{"role": "user", "content": "Reply with exactly OK."}],
    max_tokens=32,
    extra_headers={"x-inference-task-id": "vertex-openai-compatible"},
)

Native Gemini on Vertex

Use native Gemini paths when you need Vertex’s generateContent or streamGenerateContent request format.

curl "https://api.inference.net/v1/projects/${GOOGLE_CLOUD_PROJECT}/locations/${VERTEX_LOCATION}/publishers/google/models/gemini-2.0-flash-001:generateContent" \
  -H "Authorization: Bearer ${INFERENCE_API_KEY}" \
  -H "Content-Type: application/json" \
  -H "x-inference-provider: vertex-ai" \
  -H "x-inference-provider-api-key: ${VERTEX_AI_ACCESS_TOKEN}" \
  -H "x-inference-provider-url: ${VERTEX_BASE_URL}" \
  -H "x-inference-environment: production" \
  -H "x-inference-task-id: vertex-native-gemini" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [{ "text": "Reply with exactly OK." }]
      }
    ],
    "generationConfig": { "maxOutputTokens": 64, "temperature": 0 }
  }'

If you are using a Google API key instead of an OAuth access token for Gemini, pass GEMINI_VERTEX_API_KEY as x-inference-provider-api-key. Catalyst forwards Google API keys to Vertex as the key query parameter.

curl "https://api.inference.net/v1/publishers/google/models/gemini-2.0-flash-001:generateContent" \
  -H "Authorization: Bearer ${INFERENCE_API_KEY}" \
  -H "Content-Type: application/json" \
  -H "x-inference-provider: vertex-ai" \
  -H "x-inference-provider-api-key: ${GEMINI_VERTEX_API_KEY}" \
  -H "x-inference-provider-url: https://aiplatform.googleapis.com" \
  -H "x-inference-environment: production" \
  -H "x-inference-task-id: vertex-native-gemini-api-key" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [{ "text": "Reply with exactly OK." }]
      }
    ],
    "generationConfig": { "maxOutputTokens": 64, "temperature": 0 }
  }'

For Gemini streaming, use :streamGenerateContent. Add ?alt=sse if you want Vertex to return server-sent events.

curl "https://api.inference.net/v1/projects/${GOOGLE_CLOUD_PROJECT}/locations/${VERTEX_LOCATION}/publishers/google/models/gemini-2.0-flash-001:streamGenerateContent?alt=sse" \
  -H "Authorization: Bearer ${INFERENCE_API_KEY}" \
  -H "Content-Type: application/json" \
  -H "x-inference-provider: vertex-ai" \
  -H "x-inference-provider-api-key: ${VERTEX_AI_ACCESS_TOKEN}" \
  -H "x-inference-provider-url: ${VERTEX_BASE_URL}" \
  -H "x-inference-environment: production" \
  -H "x-inference-task-id: vertex-native-gemini-stream" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [{ "text": "Reply with exactly OK." }]
      }
    ],
    "generationConfig": { "maxOutputTokens": 64, "temperature": 0 }
  }'

Anthropic on Vertex

Anthropic models on Vertex use Vertex operation paths and Anthropic’s Vertex payload shape. Use a Google Cloud OAuth access token or service-account-minted access token as x-inference-provider-api-key.

curl "https://api.inference.net/v1/projects/${GOOGLE_CLOUD_PROJECT}/locations/${VERTEX_LOCATION}/publishers/anthropic/models/claude-sonnet-4-5:rawPredict" \
  -H "Authorization: Bearer ${INFERENCE_API_KEY}" \
  -H "Content-Type: application/json" \
  -H "x-inference-provider: vertex-ai" \
  -H "x-inference-provider-api-key: ${VERTEX_AI_ACCESS_TOKEN}" \
  -H "x-inference-provider-url: ${VERTEX_BASE_URL}" \
  -H "x-inference-environment: production" \
  -H "x-inference-task-id: vertex-anthropic" \
  -d '{
    "anthropic_version": "vertex-2023-10-16",
    "max_tokens": 512,
    "messages": [
      {
        "role": "user",
        "content": "Reply with exactly OK."
      }
    ]
  }'

For streaming Anthropic responses on Vertex, use :streamRawPredict and include "stream": true in the body.

curl "https://api.inference.net/v1/projects/${GOOGLE_CLOUD_PROJECT}/locations/${VERTEX_LOCATION}/publishers/anthropic/models/claude-sonnet-4-5:streamRawPredict" \
  -H "Authorization: Bearer ${INFERENCE_API_KEY}" \
  -H "Content-Type: application/json" \
  -H "x-inference-provider: vertex-ai" \
  -H "x-inference-provider-api-key: ${VERTEX_AI_ACCESS_TOKEN}" \
  -H "x-inference-provider-url: ${VERTEX_BASE_URL}" \
  -H "x-inference-environment: production" \
  -H "x-inference-task-id: vertex-anthropic-stream" \
  -d '{
    "anthropic_version": "vertex-2023-10-16",
    "max_tokens": 512,
    "stream": true,
    "messages": [
      {
        "role": "user",
        "content": "Reply with exactly OK."
      }
    ]
  }'

Header summary

Header	Required	Notes
`Authorization`	Yes	Your Catalyst project API key.
`x-inference-provider`	Yes	Set to `vertex-ai`.
`x-inference-provider-api-key`	Yes	Google API key for native Gemini, or OAuth2 access token for Vertex.
`x-inference-provider-url`	Yes	Vertex host or `/endpoints/openapi` URL, depending on the API shape.
`x-inference-environment`	No	Dashboard environment tag.
`x-inference-task-id`	No	Dashboard task grouping.

Documentation Index

​Supported Vertex endpoints

​Environment

​OpenAI-compatible Vertex

​Native Gemini on Vertex

​Anthropic on Vertex

​Header summary

Supported Vertex endpoints

Environment

OpenAI-compatible Vertex

Native Gemini on Vertex

Anthropic on Vertex

Header summary