API Reference
Cailos is an OpenAI-compatible AI gateway. Point any OpenAI SDK at it — everything just works.
Authentication
Every request requires an API key in the Authorization header:
Authorization: Bearer cai_your_key_here
Create keys in the dashboard (per-team). Keys are Argon2-hashed at rest — the full key is shown only once at creation.
Base URL
Set this as base_url in any OpenAI SDK. All paths below are relative to this.
Quickstart
Pythonfrom openai import OpenAI client = OpenAI( base_url=, api_key="cai_your_key_here", ) response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello"}], ) print(response.choices[0].message.content)
cURLcurl /chat/completions \ -H "Authorization: Bearer cai_your_key_here" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}] }'
base_url and api_key.List Models
Returns all available model codenames. Each codename may be backed by multiple providers — Cailos picks the best one based on your routing hints.
Requestcurl /models \ -H "Authorization: Bearer cai_your_key_here"
Response{ "object": "list", "data": [ { "id": "claude-sonnet", "object": "model", "created": 0, "owned_by": "cailos" }, { "id": "gpt-4o", "object": "model", "created": 0, "owned_by": "cailos" } ] }
Use these id values as the model parameter in chat completions.
Chat Completions
Send a conversation, get a completion. Fully compatible with the OpenAI Chat Completions API.
Request body
| Parameter | Type | Description | |
|---|---|---|---|
| model | string | required | The model codename, e.g. gpt-4o, claude-sonnet. Get the full list from List Models. |
| messages | array | required | The conversation. See Messages. |
| temperature | float | optional | 0–2. Higher = more random. |
| top_p | float | optional | 0–1. Nucleus sampling. |
| max_tokens | integer | optional | Max tokens to generate. Capped at the model's limit. |
| max_completion_tokens | integer | optional | Alias for max_tokens. Takes precedence if both set. |
| stop | string | array | optional | Up to 4 stop sequences. |
| n | integer | optional | Number of completions. 1–128. Default: 1. |
| presence_penalty | float | optional | -2–2. |
| frequency_penalty | float | optional | -2–2. |
| tools | array | optional | Tool/function definitions for the model to call. |
| tool_choice | string | object | optional | "auto", "none", "required", or {"type": "function", "function": {"name": "..."}}. |
| response_format | object | optional | {"type": "json_object"} or {"type": "json_schema", ...}. |
| seed | integer | optional | Deterministic sampling (best-effort). |
| logit_bias | object | optional | Token ID to bias value mapping. |
| user | string | optional | End-user identifier for abuse detection. |
| stream | boolean | optional | Not yet supported. Returns 501. |
| cailos | object | optional | Routing hints. See Routing. |
Messages
Standard OpenAI message format:
| Field | Type | Description | |
|---|---|---|---|
| role | string | required | "system", "user", "assistant", or "tool" |
| content | string | array | required* | Text or multipart content. Required for system and user. |
| tool_calls | array | optional | Tool calls from the assistant (when replaying conversation). |
| tool_call_id | string | required* | Required when role is "tool". References a tool call. |
| name | string | optional | Participant name for multi-turn disambiguation. |
Routing & Codenames
The model field takes a codename — a standardised name like gpt-4o or claude-sonnet. The same codename can be served by multiple providers (e.g. llama-3.3-70b might be available from Groq, Together, and DeepInfra).
Cailos picks the best provider automatically. By default it optimises for quality (highest intelligence rating). You can change this with the cailos extension:
Example{ "model": "llama-3.3-70b", "messages": [{ "role": "user", "content": "Hello" }], "cailos": { "optimise": "speed" } }
cailos works without any patching.| Field | Type | Description |
|---|---|---|
| optimise | string |
How to pick between providers for the same codename:"quality" — highest intelligence rating (default)"speed" — fastest model (highest speed tier, then tokens/s)"cost" — cheapest (lowest input token cost)
|
| trust_level | integer | 0 (low) to 3 (maximum). Filter by trust tier. Accepted, not yet enforced. |
| require_tools | boolean | Only route to models with tool/function calling support. Accepted, not yet enforced. |
| require_vision | boolean | Only route to models with vision support. Accepted, not yet enforced. |
| require_streaming | boolean | Only route to models with streaming support. Accepted, not yet enforced. |
| speed | string | "fast", "moderate", "slow". Accepted, not yet enforced. |
| model_hints | array | Preferred model IDs, tried in order. Accepted, not yet enforced. |
Response Format
Standard OpenAI chat completion response:
200 OK{ "id": "chatcmpl-8a3b2c1d4e5f...", "object": "chat.completion", "created": 1710000000, "model": "gpt-4o-2024-08-06", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Hello! How can I help you today?" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 9, "completion_tokens": 12, "total_tokens": 21 } }
| Field | Type | Description |
|---|---|---|
| id | string | Unique completion ID. |
| object | string | Always "chat.completion". |
| created | integer | Unix timestamp. |
| model | string | The provider's model ID that actually generated the response. |
| choices[].message | object | The assistant's message (role + content). |
| choices[].finish_reason | string | "stop", "length", or "tool_calls". |
| usage | object | prompt_tokens, completion_tokens, total_tokens. |
Tool call response
When the model invokes tools, finish_reason is "tool_calls" and the message includes a tool_calls array:
Tool calls{ "choices": [{ "message": { "role": "assistant", "content": null, "tool_calls": [{ "id": "call_abc123", "type": "function", "function": { "name": "get_weather", "arguments": "{\"city\": \"Amsterdam\"}" } }] }, "finish_reason": "tool_calls" }] }
To continue the conversation, append the assistant message (with tool_calls) and a tool message (with the result) back into messages.
Errors
All errors follow the OpenAI error envelope:
{
"error": {
"message": "Model 'nonexistent' not found or not active.",
"type": "not_found_error",
"param": null,
"code": null
}
}
| Status | Meaning |
|---|---|
| 200 | Success. |
| 400 | Bad request — malformed JSON or invalid parameters. |
| 401 | Missing or invalid API key. |
| 403 | Key valid but team is inactive. |
| 404 | Model codename not found or no active providers for it. |
| 429 | Rate limit exceeded. Retry after 60 seconds. |
| 501 | Streaming requested but not yet supported. |
| 502 | Upstream provider returned an error. |
| 503 | Model unavailable — circuit breaker is open or provider is down. |
| 504 | Upstream provider timed out. |
Rate Limits
Each API key has a configurable per-minute rate limit (set at creation). When exceeded, requests return 429.
- The window resets every 60 seconds.
- A rate limit of
0means unlimited. - Rate limiting is per-key, not per-team or per-model.
Supported Providers
You never interact with providers directly — just send a codename and Cailos handles format translation. For transparency, here's what's under the hood:
| Provider | Format | Notes |
|---|---|---|
| OpenAI | native | Forwarded as-is. |
| Anthropic | converted | System prompt, tools, vision — fully translated. |
| converted | Gemini generateContent API. Thinking parts filtered. | |
| Cohere | converted | v2 chat API. Tool choice uppercased. |
| Groq | native | OpenAI-compatible. |
| Together AI | native | OpenAI-compatible. |
| DeepInfra | native | OpenAI-compatible. |
| Cerebras | native | OpenAI-compatible. |
| Scaleway | native | OpenAI-compatible. |
| SambaNova | native | OpenAI-compatible. |
| xAI | native | OpenAI-compatible. |
| OpenRouter | native | OpenAI-compatible. |
| Novita AI | native | OpenAI-compatible. |
| NCompass | native | OpenAI-compatible. |
| NScale | native | OpenAI-compatible. |
| Inception | native | OpenAI-compatible. |
| Tinfoil | native | OpenAI-compatible. |