Unified AI Model Gateway

Manage OpenAI, Claude, Gemini, and more behind one gateway. Keep your client and replace only the base URL.

Get API Key View Models

20 +Models

99.9 %Availability

< 80 msMedian TTFT

10 K+Developers

Integrate in 5 minutes without rewriting product code.

Keep your client and point it at the gateway.

Multi-protocol Compatible

Keep your OpenAI client and replace only base_url.

Protocol Conversion

Convert between OpenAI, Anthropic, and Gemini formats.

Streaming SSE Support

Unified streaming passthrough across providers.

Function Calling & JSON Mode

Expose tools and structured output through standard APIs.

One Key, All Models

One key reaches every model through gateway routing.

from openai import OpenAI

client = OpenAI(
    api_key="sk-xxx",
    base_url="https://tokencode.dev/v1",
)

# Switch to any model by name
resp = client.chat.completions.create(
    model="gpt-5.5",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)

for chunk in resp:
    print(chunk.choices[0].delta.content, end="")

import OpenAI from 'openai'

const client = new OpenAI({
  apiKey: 'sk-xxx',
  baseURL: 'https://tokencode.dev/v1',
})

// Switch to any model by name
const stream = await client.chat.completions.create({
  model: 'gpt-5.5',
  messages: [{ role: 'user', content: 'Hello!' }],
  stream: true,
})

for await (const chunk of stream) {
  process.stdout.write(
    chunk.choices[0]?.delta?.content ?? ''
  )
}

curl https://tokencode.dev/v1/chat/completions \
  -H "Authorization: Bearer sk-xxx" \
  -H "Content-Type: application/json" \
  -d {
    "model": "gpt-5.5",
    "stream": true,
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }

We simplify AI integration with a structured plan

Create one key

Replace base URL

Keep your OpenAI-compatible client and point it at the gateway endpoint.

Monitor every call

Track token usage, latency, provider status, and cost from the portal.

Everything you need, at your fingertips.

Smart Load Balancing

Auto-routes to the fastest available upstream. Instant failover, no manual config.

Usage Dashboard

Token usage by model, latency distribution, and cost breakdown. Export or query via API.

Team Key Management

Create scoped API keys per team or project. Set rate limits, spend caps, and expiry.

Prompt Caching

Automatic semantic caching reduces repeat request costs and latency. Real-time dashboard.

Spend Alerts

Threshold alerts via email, webhook, Feishu, or DingTalk. Avoid overspending.

Audit Logs

Full request-level logs with latency, model, token counts, and status codes. Search and export.

Get started in 5 minutes.

Start Free →View Docs