API Reference

Cailos is an OpenAI-compatible AI gateway. Point any OpenAI SDK at it — everything just works.

Authentication

Every request requires an API key in the Authorization header:

Authorization: Bearer cai_your_key_here

Create keys in the dashboard (per-team). Keys are Argon2-hashed at rest — the full key is shown only once at creation.

Base URL


    

Set this as base_url in any OpenAI SDK. All paths below are relative to this.

Quickstart

Pythonfrom openai import OpenAI

client = OpenAI(
    base_url=,
    api_key="cai_your_key_here",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)
cURLcurl /chat/completions \
  -H "Authorization: Bearer cai_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'
Any OpenAI-compatible SDK works. Python, Node, Go, Rust — just change base_url and api_key.

List Models

GET /v1/models

Returns all available model codenames. Each codename may be backed by multiple providers — Cailos picks the best one based on your routing hints.

Requestcurl /models \
  -H "Authorization: Bearer cai_your_key_here"
Response{
  "object": "list",
  "data": [
    {
      "id": "claude-sonnet",
      "object": "model",
      "created": 0,
      "owned_by": "cailos"
    },
    {
      "id": "gpt-4o",
      "object": "model",
      "created": 0,
      "owned_by": "cailos"
    }
  ]
}

Use these id values as the model parameter in chat completions.

Chat Completions

POST /v1/chat/completions

Send a conversation, get a completion. Fully compatible with the OpenAI Chat Completions API.

Request body

ParameterTypeDescription
model string required The model codename, e.g. gpt-4o, claude-sonnet. Get the full list from List Models.
messages array required The conversation. See Messages.
temperature float optional 02. Higher = more random.
top_p float optional 01. Nucleus sampling.
max_tokens integer optional Max tokens to generate. Capped at the model's limit.
max_completion_tokens integer optional Alias for max_tokens. Takes precedence if both set.
stop string | array optional Up to 4 stop sequences.
n integer optional Number of completions. 1128. Default: 1.
presence_penalty float optional -22.
frequency_penalty float optional -22.
tools array optional Tool/function definitions for the model to call.
tool_choice string | object optional "auto", "none", "required", or {"type": "function", "function": {"name": "..."}}.
response_format object optional {"type": "json_object"} or {"type": "json_schema", ...}.
seed integer optional Deterministic sampling (best-effort).
logit_bias object optional Token ID to bias value mapping.
user string optional End-user identifier for abuse detection.
stream boolean optional Not yet supported. Returns 501.
cailos object optional Routing hints. See Routing.

Messages

Standard OpenAI message format:

FieldTypeDescription
role string required "system", "user", "assistant", or "tool"
content string | array required* Text or multipart content. Required for system and user.
tool_calls array optional Tool calls from the assistant (when replaying conversation).
tool_call_id string required* Required when role is "tool". References a tool call.
name string optional Participant name for multi-turn disambiguation.

Routing & Codenames

The model field takes a codename — a standardised name like gpt-4o or claude-sonnet. The same codename can be served by multiple providers (e.g. llama-3.3-70b might be available from Groq, Together, and DeepInfra).

Cailos picks the best provider automatically. By default it optimises for quality (highest intelligence rating). You can change this with the cailos extension:

Example{
  "model": "llama-3.3-70b",
  "messages": [{ "role": "user", "content": "Hello" }],
  "cailos": {
    "optimise": "speed"
  }
}
SDK-safe. OpenAI SDKs pass unknown keys through, so cailos works without any patching.
FieldTypeDescription
optimise string How to pick between providers for the same codename:
"quality" — highest intelligence rating (default)
"speed" — fastest model (highest speed tier, then tokens/s)
"cost" — cheapest (lowest input token cost)
trust_level integer 0 (low) to 3 (maximum). Filter by trust tier. Accepted, not yet enforced.
require_tools boolean Only route to models with tool/function calling support. Accepted, not yet enforced.
require_vision boolean Only route to models with vision support. Accepted, not yet enforced.
require_streaming boolean Only route to models with streaming support. Accepted, not yet enforced.
speed string "fast", "moderate", "slow". Accepted, not yet enforced.
model_hints array Preferred model IDs, tried in order. Accepted, not yet enforced.

Response Format

Standard OpenAI chat completion response:

200 OK{
  "id": "chatcmpl-8a3b2c1d4e5f...",
  "object": "chat.completion",
  "created": 1710000000,
  "model": "gpt-4o-2024-08-06",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
  }
}
FieldTypeDescription
idstringUnique completion ID.
objectstringAlways "chat.completion".
createdintegerUnix timestamp.
modelstringThe provider's model ID that actually generated the response.
choices[].messageobjectThe assistant's message (role + content).
choices[].finish_reasonstring"stop", "length", or "tool_calls".
usageobjectprompt_tokens, completion_tokens, total_tokens.

Tool call response

When the model invokes tools, finish_reason is "tool_calls" and the message includes a tool_calls array:

Tool calls{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_abc123",
        "type": "function",
        "function": {
          "name": "get_weather",
          "arguments": "{\"city\": \"Amsterdam\"}"
        }
      }]
    },
    "finish_reason": "tool_calls"
  }]
}

To continue the conversation, append the assistant message (with tool_calls) and a tool message (with the result) back into messages.

Errors

All errors follow the OpenAI error envelope:

{
  "error": {
    "message": "Model 'nonexistent' not found or not active.",
    "type": "not_found_error",
    "param": null,
    "code": null
  }
}
StatusMeaning
200Success.
400Bad request — malformed JSON or invalid parameters.
401Missing or invalid API key.
403Key valid but team is inactive.
404Model codename not found or no active providers for it.
429Rate limit exceeded. Retry after 60 seconds.
501Streaming requested but not yet supported.
502Upstream provider returned an error.
503Model unavailable — circuit breaker is open or provider is down.
504Upstream provider timed out.

Rate Limits

Each API key has a configurable per-minute rate limit (set at creation). When exceeded, requests return 429.

Supported Providers

You never interact with providers directly — just send a codename and Cailos handles format translation. For transparency, here's what's under the hood:

ProviderFormatNotes
OpenAInativeForwarded as-is.
AnthropicconvertedSystem prompt, tools, vision — fully translated.
GoogleconvertedGemini generateContent API. Thinking parts filtered.
Cohereconvertedv2 chat API. Tool choice uppercased.
GroqnativeOpenAI-compatible.
Together AInativeOpenAI-compatible.
DeepInfranativeOpenAI-compatible.
CerebrasnativeOpenAI-compatible.
ScalewaynativeOpenAI-compatible.
SambaNovanativeOpenAI-compatible.
xAInativeOpenAI-compatible.
OpenRouternativeOpenAI-compatible.
Novita AInativeOpenAI-compatible.
NCompassnativeOpenAI-compatible.
NScalenativeOpenAI-compatible.
InceptionnativeOpenAI-compatible.
TinfoilnativeOpenAI-compatible.