API Reference

Cailos is an OpenAI-compatible AI gateway. Point any OpenAI SDK at it — everything just works.

Authentication

Every request requires an API key in the Authorization header:

Authorization: Bearer cai_your_key_here

Create keys in the dashboard (per-team). Keys are Argon2-hashed at rest — the full key is shown only once at creation.

Base URL

Set this as base_url in any OpenAI SDK. All paths below are relative to this.

Quickstart

Pythonfrom openai import OpenAI

client = OpenAI(
    base_url=,
    api_key="cai_your_key_here",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)

cURLcurl /chat/completions \
  -H "Authorization: Bearer cai_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Any OpenAI-compatible SDK works. Python, Node, Go, Rust — just change base_url and api_key.

List Models

GET /v1/models

Returns all available model codenames. Each codename may be backed by multiple providers — Cailos picks the best one based on your routing hints.

Requestcurl /models \
  -H "Authorization: Bearer cai_your_key_here"

Response{
  "object": "list",
  "data": [
    {
      "id": "claude-sonnet",
      "object": "model",
      "created": 0,
      "owned_by": "cailos"
    },
    {
      "id": "gpt-4o",
      "object": "model",
      "created": 0,
      "owned_by": "cailos"
    }
  ]
}

Use these id values as the model parameter in chat completions.

Chat Completions

POST /v1/chat/completions

Send a conversation, get a completion. Fully compatible with the OpenAI Chat Completions API.

Request body

Parameter	Type		Description
model	string	required	The model codename, e.g. `gpt-4o`, `claude-sonnet`. Get the full list from List Models.
messages	array	required	The conversation. See Messages.
temperature	float	optional	`0`–`2`. Higher = more random.
top_p	float	optional	`0`–`1`. Nucleus sampling.
max_tokens	integer	optional	Max tokens to generate. Capped at the model's limit.
max_completion_tokens	integer	optional	Alias for `max_tokens`. Takes precedence if both set.
stop	string \| array	optional	Up to 4 stop sequences.
n	integer	optional	Number of completions. `1`–`128`. Default: `1`.
presence_penalty	float	optional	`-2`–`2`.
frequency_penalty	float	optional	`-2`–`2`.
tools	array	optional	Tool/function definitions for the model to call.
tool_choice	string \| object	optional	`"auto"`, `"none"`, `"required"`, or `{"type": "function", "function": {"name": "..."}}`.
response_format	object	optional	`{"type": "json_object"}` or `{"type": "json_schema", ...}`.
seed	integer	optional	Deterministic sampling (best-effort).
logit_bias	object	optional	Token ID to bias value mapping.
user	string	optional	End-user identifier for abuse detection.
stream	boolean	optional	Not yet supported. Returns `501`.
cailos	object	optional	Routing hints. See Routing.

Messages

Standard OpenAI message format:

Field	Type		Description
role	string	required	`"system"`, `"user"`, `"assistant"`, or `"tool"`
content	string \| array	required*	Text or multipart content. Required for `system` and `user`.
tool_calls	array	optional	Tool calls from the assistant (when replaying conversation).
tool_call_id	string	required*	Required when `role` is `"tool"`. References a tool call.
name	string	optional	Participant name for multi-turn disambiguation.

Routing & Codenames

The model field takes a codename — a standardised name like gpt-4o or claude-sonnet. The same codename can be served by multiple providers (e.g. llama-3.3-70b might be available from Groq, Together, and DeepInfra).

Cailos picks the best provider automatically. By default it optimises for quality (highest intelligence rating). You can change this with the cailos extension:

Example{
  "model": "llama-3.3-70b",
  "messages": [{ "role": "user", "content": "Hello" }],
  "cailos": {
    "optimise": "speed"
  }
}

SDK-safe. OpenAI SDKs pass unknown keys through, so cailos works without any patching.

Field	Type	Description
optimise	string	How to pick between providers for the same codename: `"quality"` — highest intelligence rating (default) `"speed"` — fastest model (highest speed tier, then tokens/s) `"cost"` — cheapest (lowest input token cost)
trust_level	integer	`0` (low) to `3` (maximum). Filter by trust tier. Accepted, not yet enforced.
require_tools	boolean	Only route to models with tool/function calling support. Accepted, not yet enforced.
require_vision	boolean	Only route to models with vision support. Accepted, not yet enforced.
require_streaming	boolean	Only route to models with streaming support. Accepted, not yet enforced.
speed	string	`"fast"`, `"moderate"`, `"slow"`. Accepted, not yet enforced.
model_hints	array	Preferred model IDs, tried in order. Accepted, not yet enforced.

Response Format

Standard OpenAI chat completion response:

200 OK{
  "id": "chatcmpl-8a3b2c1d4e5f...",
  "object": "chat.completion",
  "created": 1710000000,
  "model": "gpt-4o-2024-08-06",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
  }
}

Field	Type	Description
id	string	Unique completion ID.
object	string	Always `"chat.completion"`.
created	integer	Unix timestamp.
model	string	The provider's model ID that actually generated the response.
choices[].message	object	The assistant's message (`role` + `content`).
choices[].finish_reason	string	`"stop"`, `"length"`, or `"tool_calls"`.
usage	object	`prompt_tokens`, `completion_tokens`, `total_tokens`.

Tool call response

When the model invokes tools, finish_reason is "tool_calls" and the message includes a tool_calls array:

Tool calls{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_abc123",
        "type": "function",
        "function": {
          "name": "get_weather",
          "arguments": "{\"city\": \"Amsterdam\"}"
        }
      }]
    },
    "finish_reason": "tool_calls"
  }]
}

To continue the conversation, append the assistant message (with tool_calls) and a tool message (with the result) back into messages.

Errors

All errors follow the OpenAI error envelope:

{
  "error": {
    "message": "Model 'nonexistent' not found or not active.",
    "type": "not_found_error",
    "param": null,
    "code": null
  }
}

Status	Meaning
200	Success.
400	Bad request — malformed JSON or invalid parameters.
401	Missing or invalid API key.
403	Key valid but team is inactive.
404	Model codename not found or no active providers for it.
429	Rate limit exceeded. Retry after 60 seconds.
501	Streaming requested but not yet supported.
502	Upstream provider returned an error.
503	Model unavailable — circuit breaker is open or provider is down.
504	Upstream provider timed out.

Rate Limits

Each API key has a configurable per-minute rate limit (set at creation). When exceeded, requests return 429.

The window resets every 60 seconds.
A rate limit of 0 means unlimited.
Rate limiting is per-key, not per-team or per-model.

Supported Providers

You never interact with providers directly — just send a codename and Cailos handles format translation. For transparency, here's what's under the hood:

Provider	Format	Notes
OpenAI	native	Forwarded as-is.
Anthropic	converted	System prompt, tools, vision — fully translated.
Google	converted	Gemini generateContent API. Thinking parts filtered.
Cohere	converted	v2 chat API. Tool choice uppercased.
Groq	native	OpenAI-compatible.
Together AI	native	OpenAI-compatible.
DeepInfra	native	OpenAI-compatible.
Cerebras	native	OpenAI-compatible.
Scaleway	native	OpenAI-compatible.
SambaNova	native	OpenAI-compatible.
xAI	native	OpenAI-compatible.
OpenRouter	native	OpenAI-compatible.
Novita AI	native	OpenAI-compatible.
NCompass	native	OpenAI-compatible.
NScale	native	OpenAI-compatible.
Inception	native	OpenAI-compatible.
Tinfoil	native	OpenAI-compatible.