Documentation Index
Fetch the complete documentation index at: https://docs.inference.net/llms.txt
Use this file to discover all available pages before exploring further.
Catalyst can proxy Vertex AI in two modes:
- OpenAI-compatible Vertex endpoint: use the OpenAI SDK and set
x-inference-provider-url to your Vertex /endpoints/openapi URL.
- Native Vertex APIs: call the Vertex model operation path through the Catalyst gateway and set
x-inference-provider-url to the global or regional aiplatform.googleapis.com base URL.
Use x-inference-provider: vertex-ai for both modes so Catalyst applies Vertex-specific URL and authentication handling.
This guide is for Gemini and Anthropic through Google Cloud Vertex AI. For
direct Gemini API calls using generativelanguage.googleapis.com, see the
Google Gemini guide.
Supported Vertex endpoints
| API shape | Catalyst path example | Streaming |
|---|
| OpenAI-compatible Gemini | /v1/chat/completions with x-inference-provider-url: .../endpoints/openapi | Yes |
| Native Gemini | /v1/projects/{project}/locations/{location}/publishers/google/models/{model}:generateContent | No |
| Native Gemini stream | /v1/projects/{project}/locations/{location}/publishers/google/models/{model}:streamGenerateContent | Yes |
| Anthropic on Vertex | /v1/projects/{project}/locations/{location}/publishers/anthropic/models/{model}:rawPredict | No |
| Anthropic on Vertex stream | /v1/projects/{project}/locations/{location}/publishers/anthropic/models/{model}:streamRawPredict | Yes |
| Native Gemini with API key | /v1/publishers/google/models/{model}:generateContent with x-inference-provider-api-key: $GEMINI_VERTEX_API_KEY | No |
Environment
export INFERENCE_API_KEY="<your-catalyst-project-api-key>"
export GOOGLE_CLOUD_PROJECT="<your-gcp-project-id>"
export VERTEX_LOCATION="global"
# For OpenAI-compatible Vertex and Anthropic-on-Vertex.
export VERTEX_AI_ACCESS_TOKEN="$(gcloud auth print-access-token)"
# Optional for native Gemini on Vertex. Google API keys are forwarded as ?key=...
export GEMINI_VERTEX_API_KEY="<your-google-api-key>"
For regional Vertex endpoints, use the regional host:
export VERTEX_BASE_URL="https://${VERTEX_LOCATION}-aiplatform.googleapis.com"
For the global location, use:
export VERTEX_BASE_URL="https://aiplatform.googleapis.com"
OpenAI-compatible Vertex
Use this path when you want to keep the OpenAI SDK shape for Vertex Gemini models.
import OpenAI from "openai";
const projectId = process.env.GOOGLE_CLOUD_PROJECT!;
const location = process.env.VERTEX_LOCATION ?? "global";
const vertexHost =
location === "global"
? "https://aiplatform.googleapis.com"
: `https://${location}-aiplatform.googleapis.com`;
const client = new OpenAI({
baseURL: "https://api.inference.net/v1",
apiKey: process.env.INFERENCE_API_KEY,
defaultHeaders: {
"x-inference-provider": "vertex-ai",
"x-inference-provider-api-key": process.env.VERTEX_AI_ACCESS_TOKEN!,
"x-inference-provider-url": `${vertexHost}/v1/projects/${projectId}/locations/${location}/endpoints/openapi`,
"x-inference-environment": "production",
},
});
const response = await client.chat.completions.create(
{
model: "google/gemini-2.0-flash-001",
messages: [{ role: "user", content: "Reply with exactly OK." }],
max_tokens: 32,
},
{
headers: { "x-inference-task-id": "vertex-openai-compatible" },
},
);
import os
from openai import OpenAI
project_id = os.environ["GOOGLE_CLOUD_PROJECT"]
location = os.getenv("VERTEX_LOCATION", "global")
vertex_host = (
"https://aiplatform.googleapis.com"
if location == "global"
else f"https://{location}-aiplatform.googleapis.com"
)
client = OpenAI(
base_url="https://api.inference.net/v1",
api_key=os.environ["INFERENCE_API_KEY"],
default_headers={
"x-inference-provider": "vertex-ai",
"x-inference-provider-api-key": os.environ["VERTEX_AI_ACCESS_TOKEN"],
"x-inference-provider-url": f"{vertex_host}/v1/projects/{project_id}/locations/{location}/endpoints/openapi",
"x-inference-environment": "production",
},
)
response = client.chat.completions.create(
model="google/gemini-2.0-flash-001",
messages=[{"role": "user", "content": "Reply with exactly OK."}],
max_tokens=32,
extra_headers={"x-inference-task-id": "vertex-openai-compatible"},
)
Native Gemini on Vertex
Use native Gemini paths when you need Vertex’s generateContent or streamGenerateContent request format.
curl "https://api.inference.net/v1/projects/${GOOGLE_CLOUD_PROJECT}/locations/${VERTEX_LOCATION}/publishers/google/models/gemini-2.0-flash-001:generateContent" \
-H "Authorization: Bearer ${INFERENCE_API_KEY}" \
-H "Content-Type: application/json" \
-H "x-inference-provider: vertex-ai" \
-H "x-inference-provider-api-key: ${VERTEX_AI_ACCESS_TOKEN}" \
-H "x-inference-provider-url: ${VERTEX_BASE_URL}" \
-H "x-inference-environment: production" \
-H "x-inference-task-id: vertex-native-gemini" \
-d '{
"contents": [
{
"role": "user",
"parts": [{ "text": "Reply with exactly OK." }]
}
],
"generationConfig": { "maxOutputTokens": 64, "temperature": 0 }
}'
If you are using a Google API key instead of an OAuth access token for Gemini, pass GEMINI_VERTEX_API_KEY as x-inference-provider-api-key. Catalyst forwards Google API keys to Vertex as the key query parameter.
curl "https://api.inference.net/v1/publishers/google/models/gemini-2.0-flash-001:generateContent" \
-H "Authorization: Bearer ${INFERENCE_API_KEY}" \
-H "Content-Type: application/json" \
-H "x-inference-provider: vertex-ai" \
-H "x-inference-provider-api-key: ${GEMINI_VERTEX_API_KEY}" \
-H "x-inference-provider-url: https://aiplatform.googleapis.com" \
-H "x-inference-environment: production" \
-H "x-inference-task-id: vertex-native-gemini-api-key" \
-d '{
"contents": [
{
"role": "user",
"parts": [{ "text": "Reply with exactly OK." }]
}
],
"generationConfig": { "maxOutputTokens": 64, "temperature": 0 }
}'
For Gemini streaming, use :streamGenerateContent. Add ?alt=sse if you want Vertex to return server-sent events.
curl "https://api.inference.net/v1/projects/${GOOGLE_CLOUD_PROJECT}/locations/${VERTEX_LOCATION}/publishers/google/models/gemini-2.0-flash-001:streamGenerateContent?alt=sse" \
-H "Authorization: Bearer ${INFERENCE_API_KEY}" \
-H "Content-Type: application/json" \
-H "x-inference-provider: vertex-ai" \
-H "x-inference-provider-api-key: ${VERTEX_AI_ACCESS_TOKEN}" \
-H "x-inference-provider-url: ${VERTEX_BASE_URL}" \
-H "x-inference-environment: production" \
-H "x-inference-task-id: vertex-native-gemini-stream" \
-d '{
"contents": [
{
"role": "user",
"parts": [{ "text": "Reply with exactly OK." }]
}
],
"generationConfig": { "maxOutputTokens": 64, "temperature": 0 }
}'
Anthropic on Vertex
Anthropic models on Vertex use Vertex operation paths and Anthropic’s Vertex payload shape. Use a Google Cloud OAuth access token or service-account-minted access token as x-inference-provider-api-key.
curl "https://api.inference.net/v1/projects/${GOOGLE_CLOUD_PROJECT}/locations/${VERTEX_LOCATION}/publishers/anthropic/models/claude-sonnet-4-5:rawPredict" \
-H "Authorization: Bearer ${INFERENCE_API_KEY}" \
-H "Content-Type: application/json" \
-H "x-inference-provider: vertex-ai" \
-H "x-inference-provider-api-key: ${VERTEX_AI_ACCESS_TOKEN}" \
-H "x-inference-provider-url: ${VERTEX_BASE_URL}" \
-H "x-inference-environment: production" \
-H "x-inference-task-id: vertex-anthropic" \
-d '{
"anthropic_version": "vertex-2023-10-16",
"max_tokens": 512,
"messages": [
{
"role": "user",
"content": "Reply with exactly OK."
}
]
}'
For streaming Anthropic responses on Vertex, use :streamRawPredict and include "stream": true in the body.
curl "https://api.inference.net/v1/projects/${GOOGLE_CLOUD_PROJECT}/locations/${VERTEX_LOCATION}/publishers/anthropic/models/claude-sonnet-4-5:streamRawPredict" \
-H "Authorization: Bearer ${INFERENCE_API_KEY}" \
-H "Content-Type: application/json" \
-H "x-inference-provider: vertex-ai" \
-H "x-inference-provider-api-key: ${VERTEX_AI_ACCESS_TOKEN}" \
-H "x-inference-provider-url: ${VERTEX_BASE_URL}" \
-H "x-inference-environment: production" \
-H "x-inference-task-id: vertex-anthropic-stream" \
-d '{
"anthropic_version": "vertex-2023-10-16",
"max_tokens": 512,
"stream": true,
"messages": [
{
"role": "user",
"content": "Reply with exactly OK."
}
]
}'
| Header | Required | Notes |
|---|
Authorization | Yes | Your Catalyst project API key. |
x-inference-provider | Yes | Set to vertex-ai. |
x-inference-provider-api-key | Yes | Google API key for native Gemini, or OAuth2 access token for Vertex. |
x-inference-provider-url | Yes | Vertex host or /endpoints/openapi URL, depending on the API shape. |
x-inference-environment | No | Dashboard environment tag. |
x-inference-task-id | No | Dashboard task grouping. |