Why scoping is where projects die
We've shipped 40-plus AI builds across the last two years. The ones that fail almost always fail in week one — not because the model didn't work, but because nobody answered the obvious questions about what the system was supposed to do, what it was allowed to do, and what would happen when it didn't.
The checklist below is what we walk through with every client before we agree to a build. It's intentionally boring. The interesting decisions get made later; this is the load-bearing wall.
1. The job
- What does this system do for the user? Not the architecture — the user-visible action. If you can't say it in one sentence, the scope isn't ready.
- What does success look like quantitatively? "Faster" doesn't count. "P95 latency under 800ms with 92% accuracy on the eval set" counts.
- What's the kill criterion? The metric that, if missed, means we shut it off. If you don't have one, you don't have a project — you have a hobby.
2. The data
- Where does the input data come from? Production system, manual upload, third-party API, user-typed?
- Who owns the data, legally? Different answer for customer data, employee data, public data, scraped data.
- What's the data residency requirement? US-only, EU-only, on-prem, no constraint?
- What's the PII story? Is the model allowed to see it? Logged? Stored? Sent to third parties?
- What's the eval dataset? If you don't have 50 labelled examples representative of production traffic, build that first.
3. The model choice
- Hosted or self-hosted? Drives the entire infrastructure decision.
- Which provider, which tier, why? Use our LLM cost calculator to model what each provider would cost at your projected volume.
- What's the fallback when the primary provider is down? Secondary provider, cached response, user-facing error?
- Is the model behaviour reproducible? Same input → same output, or is some non-determinism acceptable?
4. The cost ceiling
- What's the budget per request? Math it out from monthly volume × cost-per-call. Most teams discover the ceiling by blowing through it; better to set it now.
- What happens at 10x volume? Linear cost, sub-linear, or does something break?
- What's the prompt-caching story? If your prompt has a stable prefix, caching can drop costs 60-80%. See our Claude review for how to think about it.
5. The observability
- What gets logged? Inputs, outputs, latencies, errors, costs.
- How do you know it's getting worse? Continuous eval, sampling, manual review cadence?
- How do you know users hate it? Thumbs-up/down, abandonment, support tickets?
6. The safety net
- What's the worst thing the model could output? If it's "leak a customer's data," you need different controls than if it's "give a slightly weird answer."
- What's the human-in-the-loop story? Always-on, sampling, escalation triggers, none?
- What's the kill switch? Single config flag, feature flag, deploy-rollback?
7. The handoff
- Who runs this in production after we leave? If "nobody" — that's the project, not the build.
- What do they need to know to do that? Runbook, eval harness, cost dashboard, rollback procedure.
What to do with the answers
If you can answer all 23 questions clearly, the build phase is mostly execution risk. If you can't answer 5+ of them, scope first — don't build. We've never seen a project that skipped this work and saved time on net.
If you'd like the printable PDF version of this checklist (one page, formatted for client meetings), subscribe below — it's the welcome email.