Skip to main content
Deploy any supported Hugging Face model on Inference.net without managing your own infrastructure. This guide walks through selecting a model, creating a deployment, and verifying the endpoint.

What you’ll have when you finish

  • one dedicated deployment serving a Hugging Face model
  • a public model identifier you can use with the standard API shape
  • a verified smoke test confirming the model is responding

Before you start

  • an Inference.net account with deployment access
  • the Hugging Face model ID you want to deploy (e.g., meta-llama/Llama-3.2-1B-Instruct)

Step 1: Choose the model

Browse the model catalog or use a Hugging Face model ID directly when creating a deployment.

Step 2: Create the deployment

Follow the same deployment creation flow described in Deployment. The key difference is that you specify a Hugging Face model ID instead of a trained model reference.

Step 3: Verify the endpoint

Send a smoke test request using the deployment’s public model identifier:
curl -N https://api.inference.net/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $INFERENCE_API_KEY" \
  -d '{
    "messages": [
      { "role": "user", "content": "Hello, world!" }
    ],
    "model": "your-team/your-deployment-id",
    "stream": true
  }'

Next steps

API Quickstart

Learn the full API surface for calling deployed models.

Deployment

Understand deployment configuration, scaling, and operations.