The developer-forward LLM. Strongest for long-context work, coding, and
structured analysis — and the rare foundation model whose rough edges
are about over-caution rather than under-competence.
Estimate one request. Multiply by your monthly request volume to
get a monthly number. Cache-hit rate captures how much of your
input is a stable prefix (system prompt, long docs) — the higher
it is, the more prompt caching saves.
COST PER REQUEST
$0.05
USD / REQUEST
Standard API rates — Batch API takes 50% off. 1M requests at this
shape ≈ the cost shown × 1,000,000.
Long-context reasoning, code generation, agentic workflows, structured analysis of documents.
NOT FOR
Free-tier consumer use cases, apps that need the absolute cheapest per-token price, creative fiction pushing edgy content.
PRICING
Opus 4.7 $5 / $25 · Sonnet 4.6 $3 / $15 · Haiku 4.5 $1 / $5 per million input/output tokens. Prompt caching reads at 10% of input, Batch API at 50% off.
ALTERNATIVES
GPT-4-class models (OpenAI), Gemini (Google), open-weights Llama/Qwen for self-hosted.
What it is
Claude is Anthropic's family of large language models, sold through a
consumer chat product at claude.ai and through a developer API. The
lineup follows a three-tier shape that's stayed consistent across
major versions: Haiku at the small/fast end,
Sonnet as the workhorse tier, and Opus
as the frontier tier for the hardest reasoning tasks. As of 2026-04,
that means Haiku 4.5, Sonnet 4.6, and Opus 4.7, with older variants
still available on the API for teams that haven't migrated yet.
Anthropic also ships Claude Code, a command-line
coding agent that runs locally on your machine and uses Claude as its
backing model. Claude Code meaningfully changes the calculus for
developer teams — we'll get to that in a minute — because it turns the
model from "smart chat box" into "autonomous collaborator that can
actually edit your repo."
Positioning-wise, Claude competes head-on with OpenAI's GPT-class
models and Google's Gemini. The three are close enough on raw
intelligence that the practical choice usually comes down to taste:
Claude is more willing to admit uncertainty, follows long instructions
carefully, and produces structured output without going off-script.
GPT has the biggest ecosystem and the broadest feature set. Gemini has
enormous context windows and deep Google integration.
What makes Claude unusual inside that competitive set is how
reliable it is at doing what you asked. In production
workloads — where you care less about max IQ and more about the model
doing the same thing twice in a row — Claude wins more often than
either competitor. That quietly matters more than the marketing
suggests.
The API itself is clean, well-documented, and stable. Anthropic
publishes deprecation schedules, a clear versioning scheme, and
predictable pricing that doesn't shift every two months. For a
developer trying to build something real on top of a model, that
stability is a feature.
What we tested
In our testing across client work, we've leaned on Claude for three
primary use cases: code generation and refactoring
inside agentic coding tools, structured document extraction
against messy real-world inputs (contracts, PDFs, inconsistent CSVs,
forms, screenshots), and long-context summarization
on inputs in the 100k–200k token range. We've used Sonnet for the
overwhelming majority of that work, with Opus reserved for the
occasional gnarly reasoning task and Haiku for anything that smells
like classification or routing.
On the infrastructure side, we've exercised the API through direct
Anthropic endpoints, through AWS Bedrock, and through Google Vertex AI.
We've built prompt-cached pipelines against long system prompts,
batch-processed workloads via the Batch API to verify the 50% discount
lands where promised, and stress-tested tool-use schemas through
hundreds of thousands of function calls.
On the consumer side, we've used claude.ai daily for two years —
through Projects, Artifacts, Computer Use, and the file-upload flow —
which gives us a feel for how Claude behaves when the end user is a
non-technical operator rather than a developer holding the API.
None of what follows is a formal benchmark. Every benchmark-focused
review already exists. What we can offer is the texture of running
Claude in client production for sustained periods and living with
the results: where it earns its keep, where it surprises, where the
edges still need working around.
Pricing, in detail
VERIFIED FROM ANTHROPIC DOCS · PER MILLION TOKENS
HAIKU 4.5
$1IN / $5 OUT
Smallest, fastest current model. The right tier for classification, routing, and simple extraction.
Fastest latency in the family
Cache read: $0.10 / MTok
Batch: $0.50 / $2.50 per MTok
SONNET 4.6 · WORKHORSE
$3IN / $15 OUT
The tier we default to for most production workloads. Near-Opus quality at a fraction of the spend.
Production-grade coding & reasoning
Cache read: $0.30 / MTok
Batch: $1.50 / $7.50 per MTok
OPUS 4.7 · FRONTIER
$5IN / $25 OUT
Frontier reasoning tier. New tokenizer; noticeably cheaper than the old Opus 4 line ($15 / $75).
Best reasoning + agentic depth
Cache read: $0.50 / MTok
Batch: $2.50 / $12.50 per MTok
DISCOUNTS STACK
−90%CACHE HITS
Prompt caching charges 10% of input rate on reads. Batch API takes a straight 50% off both directions.
Cache writes: 1.25x (5min) / 2x (1hr)
Cache reads: 0.1x base input
Batch + cache multiply together
What's good
The most consistent thing we can say about Claude is that it
follows instructions. If you tell it to return JSON
with these five keys and no prose, it does. If you tell it to keep its
answer under two sentences, it does. If you tell it to wait for user
confirmation before calling a destructive tool, it waits. For anyone
who has spent time wrangling a lesser model into shape with retries
and schema validation, this single property is worth the price premium
alone — the cost of "did the model do what I asked" failures
compounds fast in production, and Claude has fewer of them than any
competitor we've measured.
Coding performance is the second clear strength. Across the industry
benchmarks we trust — SWE-bench Verified, HumanEval-plus, and
real-world "fix this repo" harnesses — Sonnet and Opus both land at or
near the top of the pack. More importantly, the generated code
compiles more often and the diffs read more like something a
senior engineer would ship. Claude doesn't just produce plausible
code; it produces code that holds up to review.
Claude Code, the CLI agent, amplifies this advantage. Users report
that pairing with the CLI on well-scoped tasks feels less like
prompting a model and more like driving a capable junior developer.
It reads files, runs tests, asks before destructive operations,
iterates on feedback, and produces PRs that real engineers can
land without heavy rewriting. No competitor has a first-party agent
at this level of integration, and the benefit compounds the more
your team gets comfortable with it.
Long-context behavior is genuinely good, not just "large." Models
with large advertised context windows often degrade dramatically once
you push past a certain depth — forgetting earlier content, confusing
sections, losing the plot on multi-document reasoning. Claude holds
up. In our testing, Sonnet 4.6 is usable at 150k+ tokens for real
document analysis, not just as a demo.
Where Claude shines in production
Long-context work: 200k+ token windows without the usual "lost in the middle" collapse.
Structured output and tool use: consistent JSON, reliable function calling, minimal schema drift.
Agentic loops: the model is willing to pause, ask, or stop when uncertain, which is exactly what you want in an agent.
Prose quality: tone is measured and editorial, not the florid default of other models.
Sonnet's price/performance is the best production default in the market right now.
Prompt caching pays off after a single re-hit — no complex cache strategy required.
If OpenAI optimizes for breadth and Google optimizes for integration,
Anthropic optimizes for being the model you can trust to actually do
what you said. In production code, that matters more than the
marketing suggests.
Tool use is the quiet star of the kit. Claude's function-calling is
unusually reliable — the model sticks to declared schemas, chooses
the right tool more consistently than peers, and doesn't hallucinate
parameters. For anyone building agents that call real APIs in
production, this translates directly to fewer retries, fewer
validation layers, and fewer late-night pages for schema drift.
Pros & cons
OUR HONEST TAKE
WHAT WORKS
Follows instructions more literally than any competitor. Less post-processing of outputs.
Generated code compiles more often and reads like a senior engineer wrote it.
Long-context behavior is genuinely good — not just large, but usable.
Tool use and structured output are reliable enough to remove validation layers.
Sonnet's price/performance is the best production default in the market.
Claude Code CLI changes how teams pair with the model.
Agents built on Claude stop and ask rather than confidently hallucinating.
WHAT DOESN'T
Occasional over-caution on benign asks (security, legal, some fiction).
Rate limits on the frontier tiers tighten during peak hours.
Not the cheapest option if output quality doesn't matter.
No built-in web browsing on the API side out of the box.
Image generation is not part of the product — pair with Flux, Ideogram, etc.
Consumer Claude has a different system prompt than the API — behaviors can diverge.
Common pitfalls
A few recurring failure modes show up in the Claude projects we've
seen go wrong — none of them fatal, all of them worth naming.
Treating consumer Claude and API Claude as the same product.
They're not. claude.ai has a specific system prompt, a safety filter
stack, and default behaviors tuned for a broad consumer audience. The
API has none of that — it applies what you put in the system
prompt. Teams that build a prototype on claude.ai and then port it to
the API sometimes hit unexpected behavior changes, especially around
refusal patterns. Build on the API from the start if you're heading
toward production.
Under-using prompt caching. Any app with a stable
system prompt or a long reference document should be caching. We
routinely see production Claude bills drop 60–80% when caching is
added to an existing workload. The common mistake is assuming "my
prompt is too short to bother" — but on repeat requests, even a
few-hundred-token prefix pays off almost immediately.
Defaulting to Opus. Teams new to Claude often pick
Opus because it's the flagship. For 80% of production tasks, Sonnet
is indistinguishable from Opus in quality and one-third the price.
Start on Sonnet; move to Opus only when you can point at a specific
task where Sonnet demonstrably hits a ceiling.
Ignoring the Batch API on async workloads. If your
workload doesn't need a real-time response, batching gets you 50% off,
stacked on top of prompt caching. The engineering investment is
modest — a few hours to rewrite the call pattern — and the savings
are immediate.
Using one model for everything. The trick to
controlling a Claude-powered system is routing: Haiku for
classification and triage, Sonnet for the reasoning, Opus only for
the hardest final step. Most apps that land on "we're spending too
much on Claude" aren't using the right model for each step.
What's actually offered
CAPABILITIES AT A GLANCE
LONG CONTEXT
200k+ token windows on all current models, with usable recall deep into the context.
TOOL USE
First-class function calling with strong schema adherence and minimal drift.
VISION
Image understanding across all tiers — charts, diagrams, screenshots, PDFs.
PROMPT CACHING
Pin hot prefixes (system prompts, long docs) to pay only for new tokens.
BATCH API
Submit async jobs at 50% off for any workload that doesn't need real-time responses.
CLAUDE CODE CLI
A first-party agentic coding CLI that runs locally and edits your repo directly.
CONSUMER APP
claude.ai with Projects, Artifacts, and Computer Use for end-user workflows.
AWS + GCP AVAILABILITY
Runs on Anthropic's API, Amazon Bedrock, and Google Vertex AI for teams locked to either cloud.
SEEN ENOUGH?
A fresh account takes under a minute, and Sonnet is free to try on claude.ai.
The most common complaint is over-caution. Claude will occasionally
refuse or hedge on requests that a reasonable reader would consider
entirely benign — security research, legal analysis, some kinds of
creative fiction. The frequency has improved significantly over
successive model versions, but it hasn't disappeared. Users report
that framing and system prompts mitigate most of it, and that the API
(with an explicit developer-side system prompt) is noticeably less
cautious than the default consumer product.
Rate limits are the second honest gripe. During peak hours,
Anthropic's capacity for the top tiers has historically tightened —
this is a reality of any foundation model at the frontier, but it's
worth budgeting for in production. Prompt caching and batch endpoints
help smooth costs, and AWS Bedrock / Google Vertex AI give you
independent capacity pools, but if your app is spiky you'll want a
secondary provider to fall back to.
Price is not exactly a weakness, but worth naming: Claude is not the
cheapest LLM per token by a significant margin. If your workload can
tolerate lower quality, open-weights models on RunPod or a mid-tier
OpenAI model will be cheaper. The case for Claude is always that the
quality gap justifies the price gap — and once you factor in the cost
of retries, validation, and failed outputs on a lesser model, Claude
usually still wins. But if quality really doesn't matter for your
workload, use something else.
No native image generation. Claude understands images on input but
doesn't generate them. For multimodal apps, pair Claude with an image
model like Flux, Ideogram, or DALL-E via the appropriate provider.
This is a product decision Anthropic seems firm on — don't expect
image-gen to land in Claude anytime soon.
No built-in web browsing on the API by default. Claude has a
web-search tool available, but it's a separate paid capability
($10 per 1,000 searches) rather than automatic browsing. For apps
that need live web context, you have to wire it in explicitly. For
research-heavy workflows, Perplexity's
Sonar API is sometimes a better integration point.
The consumer product evolves less predictably than the API. Features
in claude.ai ship and get reworked on a cadence that's harder to plan
around than the API versioning. This isn't a problem if you're using
Claude for work — but if you're building an internal tool that
depends on a specific claude.ai feature behaving a specific way, be
ready for it to shift.
Who should use it
If you're building anything that involves code, long documents, or an
agent doing multi-step work, Claude should be your default. The same
goes for any application where reliability of output shape
matters more than the absolute cheapest token price — which, once you
factor in the cost of retries and validation, is most production
applications we see.
Claude Code is a distinct reason to pick this ecosystem even before
you consider the base model. No competitor has a first-party coding
agent at the same level, and the benefit compounds as your team gets
comfortable with it. For a small dev team looking for force
multiplication, the combination of Claude Code CLI and Sonnet's
pricing is genuinely hard to beat.
Enterprise teams should use Claude through AWS Bedrock or Google
Vertex AI. It gives you the cloud provider's compliance posture on
top of Anthropic's model, simplifies procurement, and usually fits
into an existing cloud contract. The latency is slightly higher than
direct Anthropic endpoints, but the procurement story is much cleaner.
If you're shipping a consumer product where volume is massive and the
per-query quality bar is low, a cheaper model will serve you better.
ChatGPT or Gemini's
smaller tiers are closer to the right price-performance curve for
those shapes. And if you're building on open weights for sovereignty,
data-residency, or self-hosting reasons, Claude isn't on the table —
pick a Llama- or Qwen-class model and run it yourself on
RunPod or similar.
For research, Claude is one of the better tools for reading through
long academic papers or pulling structured findings from a messy
corpus of documents. Paired with Perplexity for citation-grounded
web search and a scratchpad workflow, a solo researcher can cover
ground that would have taken a team a decade ago.
Verdict
Claude is the LLM we reach for first. It's the one we'd recommend
without hesitation to a team shipping code, working with long
documents, or running agentic workflows in production. The
over-caution and pricing are real but manageable, and the trajectory
has been consistently forward: every major version in the last two
years has improved on the previous in the ways that matter for real
work — better instruction-following, better code, better long-context
behavior, lower prices (Opus 4.7 at $5/$25 is a meaningful step down
from the Opus 4 line at $15/$75).
We rate it 9.1 / 10. It loses a half-point for
occasional refusals and a half-point for the price floor. Almost
everything else is better than the competition, and the things that
aren't (ecosystem, multimodal features, free-tier reach) are features
of the competition, not failures of Claude.
If you're on the fence, spend a week using Claude Code on your
current work. You'll know within three days whether the combination
of model + agent fits how you build.
Frequently asked
TAP TO EXPAND
Start on Sonnet. It's the right answer for roughly 80% of production workloads we build. Move up to Opus only after you've tested a task and confirmed it hits a quality ceiling Sonnet can't clear. Drop down to Haiku for classification, routing, or any step where you care about latency and per-call cost over raw reasoning depth.
Claude leads on code generation, long-context reliability, and instruction-following. GPT-4-class models lead on ecosystem breadth, third-party integrations, and raw consumer reach. For developer workloads — agents, code, structured output, document analysis — Claude is the default. For broad consumer chat with plugin ecosystems, GPT is the default.
Claude Code is Anthropic's official command-line coding agent. It runs on your machine, has full access to your repo, and can read, edit, run commands, and iterate autonomously. It's one of the reasons we'd pick Claude even ignoring the base-model quality — no competitor has a first-party agent at the same level, and the benefit compounds as your team gets comfortable with it.
Three levers, in order of impact: prompt caching (huge wins on any workload with repeated system prompts or long documents), the Batch API (50% off for anything that doesn't need real-time), and model right-sizing (move any step that doesn't need Opus down to Sonnet or Haiku). We've seen production bills drop 60–80% from caching alone.
Anthropic offers a zero-data-retention option for API customers and has SOC 2 Type II available. For HIPAA or more regulated workloads, running Claude through Amazon Bedrock or Google Vertex AI gives you the cloud provider's compliance posture on top. Always double-check with your security team — the answer depends on your specific regime.
Consumer Claude leans cautious by default because it serves a very broad audience. Most developer refusals disappear once you're on the API with an explicit system prompt that frames the task and user context. If you still see refusals, tighten the system prompt and provide context for why the task is legitimate — this resolves the vast majority of false positives in our experience.
You mark a prefix of your prompt as cacheable. On the first request, that prefix is written to cache at 1.25× the normal input rate (or 2× for a 1-hour TTL). On subsequent requests within the TTL, reading the cached prefix costs 10% of the normal input rate. For apps that reuse a big system prompt or a long document across many requests, this pays off on the second call.
DONE READING?
Pick a task you'd normally give to another LLM this week and try it on Claude first.