Sail is the foundation of useful, agentic AI. We are here to take a big swing at the most ambitious engineering challenge of our careers. Everyone working at Sail will become an expert; nothing less will do in our immensely competitive market.
Open positions
Build the systems that make AI inference fast, reliable, and cost-efficient at global scale. You’ll design the control plane that schedules a huge queue of tokens over a diverse fleet of machines, spread all over the world.
What you’ll do
Design and implement high-performance schedulers (admission control, queuing, priority, fairness, preemption, bin packing).
Build global routing and traffic management (latency-aware dispatch, predictive autoscaling, failover strategies).
LLM-specific routing optimizations, e.g. KV caching that lets us trade memory for compute, across the pyramid of GPU RAM, CPU RAM, and NVMe flash.
Build deep observability: we want to trace every millisecond of our systems, and catch failures early enough that we can make things right before the customer even notices.
What we’re looking for
Strong distributed systems fundamentals (concurrency, networking, databases, performance engineering).
Eagerness to work with agents. Distributed systems are not easy to one-shot; you’ll always have to think carefully about testing correctness and edge cases. Writing extremely clear plans and tests is a must.
Bonus: experience with ML inference stacks (vLLM/SGLang), GPUs/accelerators.
Optimize token processing down to the lowest layers of the stack. You'll optimize kernel performance, develop new scheduling and parallelism strategies, and help us squeeze every FLOP out of our hardware.
What you’ll do
Modify and extend state-of-the-art inference engines like vLLM and SGLang.
Understand every microsecond of GPU time spent during a forward pass. You'll be able to explain every kernel launch on an NSys profile.
Design and implement exotic parallelism schemes to work with "interesting" hardware topologies.
Write custom GPU kernels to excel in specific regimes, such as cascade attention
What we’re looking for
Strong understanding of LLM mechanics, like KV cache, mixture-of-experts, prefill vs. decode phases.
Interest in MLSys research - great ideas like speculative decoding and sparse attention come from research, that we need to follow closely.
Familiarity with modern, tile-based GPU programming, e.g. Triton, CUTLASS, ThunderKittens, etc. Or an interest in learning these!
Inference is just one piece of an effective background agent. Let's design and build the rest of the system, that turns billions of tokens into the best possible answers.
What you’ll do
Design custom evals for multi-turn, massively parallel agents.
Build agent harnesses to improve open model (Deepseek, Qwen, Llama) performance. Claude Code is all about agent/harness codesign; let's do the same for open source!
Automate prompt optimization techniques like DSPy.
We're hiring a Strategic Finance Lead as our first dedicated finance hire. This is a builder's role — you're not stepping into existing systems, you're creating them.
What you’ll do
Build and own Sail's operating P&L — the single source of truth connecting GPU cluster unit economics (cost per GPU-hour, utilization, depreciation/useful life, power costs) and how it flows down to product pricing, unit economics, and actual financial performance.
Partner cross-functionally on board materials: financial modeling, data room preparation, and diligence support.
Stand up Sail's first real FP&A process: budgeting and forecasting, sized appropriately for an early-stage company rather than over-built for one.
Develop the headcount and opex planning model as the team grows, and own burn rate and runway tracking.
What we’re looking for
5-8+ years in finance roles with real modeling depth — investment banking, growth equity/private equity, corporate development, or strategic finance/FP&A at a high-growth company. Infrastructure or capital-intensive business experience (data centers, energy, equipment leasing, or similar) is a strong plus.
Genuine fluency in building financial models from scratch — not just maintaining someone else's template. You should be comfortable building an operating P&L and pro forma model without a lot of hand-holding.
Comfort with ambiguity and a preference for building the right-sized process rather than the most sophisticated one. We're early — judgment about what to build now vs. later matters as much as technical skill.
Bonus: prior exposure to revenue recognition for usage-based/consumption billing, or experience inside a GPU cloud, data center, or other infrastructure-as-a-service business.
Interview process
Meet the Head of BD. You'll be working in close collaboration on compute procurement, GTM decisions, and the financial engineering behind them.
Meet the CEO, who will ask about your experience and share as much detail about Sail as you want to hear.
Take-home case study walking through a regular financial analysis we think through on a daily basis.
Come in to Sail's SF office for an interview day. Meet the whole team, then you'll have the opportunity to present your case study describing your process, learnings, and results.
Meet the CEO. This is the first step because we respect your time. Ask any question and get a definitive answer immediately.
Meet the CTO, who will ask about your experience, and share as much technical detail about Sail as you want to hear.
Come in to Sail's SF office for an interview day. Meet the whole team, then you'll have 3-4 hours to work on a problem that closely simulates the work we do daily. It's an objectively scored task, so you'll have immediate feedback on how well your code is working - just like we do in production! AI assistance is highly encouraged, and we'll provide a laptop with all the best tools set up. Finish with a short presentation describing your process, learnings, and results.
Offer. Once the team decides we want to work with you, we make a strong offer quickly and will be quite persistent over email/text/calls :)
Life at Sail
We work out of a beautiful, sunny office in downtown San Francisco. All meals are on us (and actually great; SF is a food paradise and it would be a shame to eat only bowl slop). Everyone gets a Studio Display at their desk. We are serious about investing in anything that saves us time or energy. There are six different ways to make coffee or tea in the office. A friendly (hypoallergenic) black cat named Coco visits occasionally.