Build Your Own AI Model or Wrap the API? A 2026 Decision Brief

Every team shipping an AI feature in 2026 hits this fork: keep wrapping a frontier API (Claude, GPT, Gemini), or train and host something of your own. It is usually argued as an engineering question. It is a strategy question wearing an engineering costume — and getting it wrong is expensive in the direction people rarely expect.

This is a worked decision brief: the same seven-section structure YourBrief generates for a paying customer, applied here to a real, common decision so you can see the shape before you run it on your own numbers.

The decision

First, classify the door. Wrapping an API is a two-way door — you can switch providers, add a fallback, or move to your own model later with contained cost. Training and self-hosting a model is closer to a one-way door: the data pipeline, eval harness, GPU commitments, and MLOps headcount are hard to unwind once you have hired and architected around them.

Bezos's rule applies cleanly: two-way-door decisions should be made fast and by the people closest to the work; one-way doors deserve slow, senior deliberation. The asymmetry alone tells you the default. When one option is reversible and roughly as good, you take the reversible one unless you have a specific, tested reason not to.

Key questions to answer before deciding

Is the model your moat, or your plumbing? If a customer would pay you because your model is better, building may be core. If the model is a means to a product they value for other reasons (workflow, data, distribution), it is plumbing — and you rent plumbing.
What is your actual monthly inference spend, today and at 10x? Below roughly $50–100K/month, self-hosting rarely pays back the fixed cost of the team and GPUs to run it. Run the real number, not the sticker fear.
Do you have proprietary data a general model has never seen — and have you confirmed a fine-tune on it beats prompt-engineering + retrieval on the same task? Most teams have not actually run this test.
What latency and privacy constraints are non-negotiable? On-device, air-gapped, or sub-100ms requirements are among the few reasons that legitimately force a build.
Who maintains this in 18 months? A self-hosted model is not a launch; it is a standing team and an on-call rotation.

Recommended frameworks

Core-vs-context (Geoffrey Moore). Build what is core (the thing customers pay you for and competitors cannot copy); buy/rent everything context. For the vast majority of application-layer companies in 2026, the model is context and the product around it is core. Be honest about which one you are.

The inference-cost crossover. Plot rented API cost (scales linearly with usage) against self-hosted cost (high fixed floor: GPUs + MLOps + eval infra, then cheap marginal). Below the crossover, renting wins on total cost and keeps you on the frontier for free every time the provider ships a better model. Only sustained volume well past the crossover flips it.

Wrap-to-learn. Treat the API as a paid experiment that tells you exactly which narrow slice, if any, is worth owning. You cannot know what to build until production traffic shows you where the general model actually fails you.

Decision criteria

A decision to build should clear all three of these bars, not just one:

Moat: the model itself is a reason customers choose you, provable in a sentence.
Economics: inference spend is high and sustained enough to clear the crossover after fully loading team cost.
Capability gap: you have run the fine-tune-vs-retrieval test and your data measurably wins on your task.

Fail any bar and the honest answer is wrap — and revisit in two quarters.

Sources to consult

Pull your own last-90-days inference bill and project it at 3x and 10x usage. Read a16z's writing on the AI application-vs-infrastructure margin question, and any recent teardown of fine-tuning vs RAG for your task class. Then talk to one team that self-hosts at your scale and ask what it actually costs them in headcount — not GPUs.

Next steps

This week: (1) compute your real crossover number; (2) design one honest eval where a fine-tune could beat retrieval, and run it; (3) add a provider-abstraction layer now so "switch or self-host later" stays a two-way door. If all three bars are not cleanly cleared, ship on the API and put a calendar reminder to re-decide in 90 days.

When to escalate

Stop and get senior/board input if a build would require a net-new ML team, a multi-year GPU commitment, or moving regulated data — those convert this from a reversible call into a company-shaping bet. And escalate immediately if the honest answer to "is the model our moat?" is "we are not sure," because that uncertainty, not the engineering, is the real decision.

The verdict for most teams in 2026 is wrap first, and earn the right to build by proving a specific slice where the general model fails you. But your numbers are not most teams' numbers. Generate this exact brief against your real inference spend, your data, and your moat — and get a recommendation you can act on.

Related worked briefs: Raise now or wait · Build vs buy, in general — the template · Usage-based pricing