Professional services

For the workloads that belong on infrastructure you control. We bring the inference engineering team you don’t have to hire.

Some inference workloads can’t live on a shared API — regulated data, classified environments, IP that can’t leave the tenancy, weight-level customization, latency where the round-trip is the product. Enterprises have ML teams for those workloads. They don’t have inference engineering teams. These ten engagements are the delivery vehicle that stands the platform up around them and keeps it running — alongside whatever else, frontier APIs included, makes sense for the rest of the portfolio.

Schedule a discovery call See the catalog

Ten services · pick → tune → deploy → operate → scale

Where to start

Three named entry points. Most accounts come in through one of these.

Six-week assessment to make the case. Optimization engagement to prove the SLA. Or the full migration program with a quality and cost guarantee. The other seven services in the catalog descend from these.

The full ladder

Seven more services. They descend from the entry points.

Most accounts pick one or two of these on top of the entry-point engagement. Click any row for scope, deliverables, and what comes after.

S10 guarantee

Three terms. In writing. Before kickoff.

The migration program is the only engagement we sell with an outcome guarantee. The terms are short and they are signed before we start.

Sovereign workload migration · signed eval + TCO contract

01
Quality parity
Defined upfront on a signed eval harness. The migrated workload must hit or exceed the production-baseline scores it’s replacing — on the evals you wrote.
02
TCO reduction
Defined upfront as a percentage of prior token spend at matched traffic volume. The percentage and the measurement window are in the contract.
03
Miss the guarantee
Service credits against your S9 managed-ops subscription, up to the engagement fee. We eat the overrun, not you.

How we work

Three rules. No exceptions.

Three roles cover everything we sell

Inference Engineer (CUDA, vLLM, SGLang, Dynamo, quantization, parallelism), ML Engineer (training, fine-tuning, distillation, eval design), Platform Engineer / SRE (K8s, compute-agent, observability, on-call). Plus a Solutions Architect on the front end and a Technical Account Manager on the back end of S9.

Services exist to deliver the platform, not paper over it

Anything we hand-build twice in services becomes a product feature. Conversely: we will not sell a service whose only purpose is making the product usable — that is a product bug, not a service. Monthly sync between product and services to keep that boundary honest.

Pricing philosophy

Lead with fixed-fee for predictability. Subscription is the north star — it is how we build durable revenue. Outcome-based where we can measure, like S10. Hourly or T&M only as a last resort. If we are selling hours, we lost.

Ready to talk? Pick the path.

For services engagements, schedule a discovery call. For platform access, sign up to the waitlist. Both routes go to the same team.

Schedule a discovery call Sign Up for the platform