The decision engine for agentic AI.

One API. 17 providers. 31 endpoints. Automatic model selection.

acme-corp

0 routed 0 tokens $0.00

TIME	KEY	REQUEST *	STRATEGY	PIPELINE	ROUTED TO	TPS	MS	COST

Optimization space

0 req/min

0ms avg latency

99.9% uptime

17/ 17 providers

Strategy distribution

38% 34% 28%

prod_7x

beta_3k

Simulated routing decisions. Request content is illustrative only.

What you don't have to build.

No SDK sprawl

One format for every provider. Format translation, tool schemas, and vision payloads handled automatically.

No maintenance

Provider APIs change. Models deprecate. Cailos absorbs every breaking change so your integration doesn't.

No lock-in

Switch providers in seconds. Circuit breakers auto-failover when a provider goes down. Your on-call never wakes up.

Live evals

Every endpoint is evaluated on intelligence, tool calling, and vision. Routing always reflects the current model landscape.

Build smarter agents.

Drop Cailos into any OpenAI-compatible agent framework. The SDK doesn't change — just the endpoint. Model selection becomes automatic.

Before Hardcoded

from agents import Agent

triage = Agent(
    name="triage",
    model="gpt-4o-mini",
    instructions="Classify: billing, technical, or escalate.",
)

resolver = Agent(
    name="resolver",
    model="gpt-4o",
    instructions="Draft resolution from KB.",
    tools=[search_kb, lookup_customer],
)

Triage is cheap but may misclassify. Resolver is expensive for simple tickets. If GPT-4o goes down, the pipeline breaks.

After Cailos

from agents import Agent
from openai import AsyncOpenAI

cailos = AsyncOpenAI(base_url="https://cailos.com/v1", api_key="cai_...")

triage = Agent(
    name="triage",
    model="auto",              # fastest cheap model
    instructions="Classify: billing, technical, or escalate.",
)

resolver = Agent(
    name="resolver",
    model="auto",              # best tool-calling model
    instructions="Draft resolution from KB.",
    tools=[search_kb, lookup_customer],
)

model="auto" — each agent gets the optimal model. Automatic fallback across 17 providers.

How it works

Stack capability constraints. Cailos narrows to one endpoint. Integrate with two lines.

Capability filtering

available

post hard_filters

post soft_filters

Every request flows through the filtering pipeline. Cailos narrows 17 endpoints down to the best candidates for your constraints.

Integration main.py

from openai import OpenAI

client = OpenAI(
    base_url="https://cailos.com/v1",
    api_key="cai_...",
)

response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "..."}],
    extra_body={
        "cailos": {
            "optimise": "quality",
            "require_vision": True,
            "speed": "fast",
        }
    },
)

Standard OpenAI SDK. Change base_url and api_key. Add cailos for routing hints.