All systems operational0+ models available

Unified
AI Model Gateway

A unified AI model API gateway. Better pricing, better reliability. Just swap your base URL to get started.

Get API Key
base_url: /v1 ·  OpenAI · Anthropic · Gemini
--- +Models
99.9 %Availability
< 80 msMedian TTFT
10 K+Developers

Supports many AI model providers.

Smart routing, automatically selects the best channel.

Loading models...


5 minutes to integrate, zero migration cost.

Compatible with OpenAI, Anthropic, Gemini formats

Multi-protocol Compatible
Works with any OpenAI client. Just replace base_url — no code changes needed.
Protocol Conversion
Automatically converts between OpenAI, Anthropic, and Gemini formats. Transparent to clients.
Streaming SSE Support
All models support full streaming output. No adjustments needed when switching providers.
Function Calling & JSON Mode
Models supporting tool calls and structured output are fully exposed via standard API specs.
One Key, All Models
A single API key routes to all supported models. No need to register with each provider.
from openai import OpenAI

client = OpenAI(
    api_key="sk-xxx",
    base_url="/v1",
)

# Switch to any model by name
resp = client.chat.completions.create(
    model="deepseek-reasoner",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)

for chunk in resp:
    print(chunk.choices[0].delta.content, end="")
import OpenAI from 'openai'

const client = new OpenAI({
  apiKey: 'sk-xxx',
  baseURL: '/v1',
})

// Switch to any model by name
const stream = await client.chat.completions.create({
  model: 'qwen3-235b-a22b',
  messages: [{ role: 'user', content: 'Hello!' }],
  stream: true,
})

for await (const chunk of stream) {
  process.stdout.write(
    chunk.choices[0]?.delta?.content ?? ''
  )
}
curl /v1/chat/completions \
  -H "Authorization: Bearer sk-xxx" \
  -H "Content-Type: application/json" \
  -d {
    "model": "glm-4-plus",
    "stream": true,
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }

Everything you need, at your fingertips.

Smart Load Balancing
Auto-routes to the fastest available upstream. Instant failover, no manual config.
📊
Usage Dashboard
Token usage by model, latency distribution, and cost breakdown. Export or query via API.
🔑
Team Key Management
Create scoped API keys per team or project. Set rate limits, spend caps, and expiry.
💾
Prompt Caching
Automatic semantic caching reduces repeat request costs and latency. Real-time dashboard.
🔔
Spend Alerts
Threshold alerts via email, webhook, Feishu, or DingTalk. Avoid overspending.
📋
Audit Logs
Full request-level logs with latency, model, token counts, and status codes. Search and export.

Get started in 5 minutes.

Sign up, get your key, and make your first API call.