Question 1

How do I get an API key?

Accepted Answer

Sign up here and generate an API key in seconds. Point any OpenAI- or Anthropic-compatible client at our base URL and start sending requests.

Question 2

How much will my agent cost to run on Sail?

Accepted Answer

It depends on your token volume, model, and chosen completion window. Use the agent cost calculator to estimate spend for your workload, then tune the window to hit your target.

Question 3

Are the APIs really OpenAI- and Anthropic-compatible?

Accepted Answer

Yes. Use the official SDKs or any compatible client — just swap the base URL and key. Responses, Chat Completions, and Messages all work as expected.

Question 4

What are completion windows?

Accepted Answer

Tiers of service that trade latency for cost-efficiency, specified per request. A longer window means a lower token price for the same intelligence — from asap (immediate) to flex (best-effort, lowest price).

Question 5

Does Sail enforce rate limits?

Accepted Answer

No strict rate limits and no limit-increase process. Sail is designed to absorb large bursts of traffic, and the flex completion window is best for maximum throughput on very large workloads.

Question 6

Is Sail SOC 2 compliant?

Accepted Answer

Yes. Sail is SOC 2 compliant. You can review our security controls, compliance reports, and the subprocessors we use in our Trust Center.

Question 7

How do you make inference so efficient?

Accepted Answer

We work at every level of the stack: Writing CUDA to push toward speed-of-light performance on GPUs Digging into the guts of inference engines like SGLang to maximize efficiency Distributing work across providers to maximize robustness and fleet utilization Using spot compute when it's available, and safely failing over to more reliable compute when it's not

Question 8

Which models are supported?

Accepted Answer

The leading open models, listed here, with support for LoRA fine-tunes.

Question 9

Do you support fine-tunes? RL rollouts?

Accepted Answer

Yes. Bring your own LoRA adapters and run them on supported models. If you train LoRAs with Tinker, you can also sample directly from Tinker checkpoints without uploading, and use Sail as a drop-in TokenCompleter in Tinker.

Question 10

How is pricing structured?

Accepted Answer

Usage-based and extremely competitive, with $5 in free credits refreshed every month. Enterprise contracting is also available (contact us).

The most efficient inference for
long-horizon agents

Sail is the most cost-efficient platform to run the leading open-source models on demand.

The Completion Window

Drop-in APIs

Fine-tunes and RL Rollouts

Usage-based

Battle-tested

Everyone else races to be the fastest.

Sail is the most efficient.

Scale your workloads without limits

Two ways to Sail.

Every agent needs a computer

FAQ