We've raised $80M funding, with our seed led by Sequoia and Series A led by Kleiner Perkins!
Infrastructure for abundant intelligence

The most efficient inference for
long-horizon agents

Sail is the most cost-efficient platform to run the leading open-source models on demand.

Get started with $5 in monthly free credits.

Kimi-K2.6GLM-5.2gpt-oss-120bDeepSeek V4 ProNemotron 3 Super 120BGemma 4 31B ITKimi-K2.6GLM-5.2gpt-oss-120bDeepSeek V4 ProNemotron 3 Super 120BGemma 4 31B IT
Get started freeSee all models & pricing
Pay less for tokens
when you can wait ~1 minute
(Same model vs. traditional inference provider)
FasterCost-efficient
asapprioritystandardflex
Input
$0.70
Cached
$0.18
Output
$3.00
GLM-5.2 FP8
USD / 1M tokens
quickstart.py▶ Run
from openai import OpenAI

client = OpenAI(
    base_url="https://api.sailresearch.com/v1",
    api_key="YOUR_SAIL_API_KEY",
)

response = client.responses.create(
    model="zai-org/GLM-5.2-FP8",
    input="Plan and run a deep research report.",
        metadata={
        "completion_window": "priority"
    },
)
Introducing

The Completion Window

A longer window means a lower token price for the same intelligence.

Completion windows are specified per request, giving you maximum flexibility to set the speed-vs-cost tradeoff that fits your workload.

WindowAvg. turn timePrice vs asap
asapImmediatebaseline
priority~1 min~30–50% lower
standard~5 min~45–65% lower
flexbest-effort~60–80% lower
See pricing

Drop-in APIs

OpenAI- and Anthropic-compatible (Responses, Chat Completions, and Messages APIs). Migrate to Sail in minutes.

Fine-tunes and RL Rollouts

Serve your own LoRA fine-tunes and run RL rollouts on Sail’s sampling path. Drop-in Tinker integration.

Usage-based

Send requests immediately and pay as you go, with $5 in free credits every month.

Battle-tested

Trillions of tokens served per week, and ready to serve yours.

Everyone else races to be the fastest.

But speed comes at a cost that isn’t fit for today's AI agents, which consume more tokens and work autonomously in the background for longer than ever.

Sail is the most efficient.

Our why and how

Scale your workloads without limits

Send us millions, billions, or trillions of tokens. No sales call required.

Sail is designed to absorb large bursts of traffic, so you can scale on your schedule.

Trusted by teams at the frontier
Parallel logo
Web search & research APIs for AI agents
Detail logo
AI-powered code review
Jack & Jill logo
Your AI career agent
Quadrillion logo
AI agent workforce for research

“We and Sail share a belief that background agents are about to do far more useful work. Getting there takes efficient, scalable inference paired with the highest-quality context, including from the web. Sail is building the inference side of that, and we’re glad to be aligned on where this is going.”

Travers Nisbet, Co-founder, Parallel

“Building on Sail lets us ship long-horizon agents with great economics. Trillions of tokens and counting — we’re happy customers.”

Dan Robinson, CEO, Detail.dev

“We’re working with Sail Research to deliver the best experience any researcher can ask for, with Sail’s infrastructure offering the most flexible compute.”

Tiffany Zhao, Head of Strategy, Quadrillion
Pricing plans

Two ways to Sail.

Self-Serve
Freeto start, pay-as-you-go

Get your API key, and let the tokens flow.

Get started with $5 free credit
$5 in free credits every month when you attach a payment method
Usage-based pricing with prepaid credits
Enterprise
Custom

Workloads at extreme scale.

Talk to us
Volume pricing, billed monthly in arrears
HIPAA-compliant, region-locked datacenters
Uptime and latency SLAs
Early access to new features
Dedicated support

Every agent needs a computer

Sailboxes are persistent, hosted sandboxes that give long-horizon agents compute they can run indefinitely.

Pair them with Sail inference so your agents can read, write, build, and test for hours, days, or even longer without losing state.

Read more about Sailboxes

FAQ

How do I get an API key?

Sign up here and generate an API key in seconds. Point any OpenAI- or Anthropic-compatible client at our base URL and start sending requests.

How much will my agent cost to run on Sail?

It depends on your token volume, model, and chosen completion window. Use the agent cost calculator to estimate spend for your workload, then tune the window to hit your target.

Open the agent cost calculator
Are the APIs really OpenAI- and Anthropic-compatible?

Yes. Use the official SDKs or any compatible client — just swap the base URL and key. Responses, Chat Completions, and Messages all work as expected.

What are completion windows?

Tiers of service that trade latency for cost-efficiency, specified per request. A longer window means a lower token price for the same intelligence — from asap (immediate) to flex (best-effort, lowest price).

Does Sail enforce rate limits?

No strict rate limits and no limit-increase process. Sail is designed to absorb large bursts of traffic, and the flex completion window is best for maximum throughput on very large workloads.

Is Sail SOC 2 compliant?

Yes. Sail is SOC 2 compliant. You can review our security controls, compliance reports, and the subprocessors we use in our Trust Center.

Visit the Trust Center
How do you make inference so efficient?

We work at every level of the stack:

  • Writing CUDA to push toward speed-of-light performance on GPUs
  • Digging into the guts of inference engines like SGLang to maximize efficiency
  • Distributing work across providers to maximize robustness and fleet utilization
  • Using spot compute when it's available, and safely failing over to more reliable compute when it's not
Which models are supported?

The leading open models, listed here, with support for LoRA fine-tunes.

Do you support fine-tunes? RL rollouts?

Yes. Bring your own LoRA adapters and run them on supported models. If you train LoRAs with Tinker, you can also sample directly from Tinker checkpoints without uploading, and use Sail as a drop-in TokenCompleter in Tinker.

How is pricing structured?

Usage-based and extremely competitive, with $5 in free credits refreshed every month. Enterprise contracting is also available (contact us).