LLM

Claude

Q: Should I use Haiku 4.5, Sonnet 4.6, or Opus 4.7?

Start on Sonnet . It's the right answer for roughly 80% of production workloads we build. Move up to Opus only after you've tested a task and confirmed it hits a quality ceiling Sonnet can't clear. Drop down to Haiku for classification, routing, or any step where you care about latency and per-call cost over raw reasoning depth.

Q: How do I actually reduce my Claude bill?

Three levers, in order of impact: prompt caching (huge wins on any workload with repeated system prompts or long documents), the Batch API (50% off for anything that doesn't need real-time), and model right-sizing (move any step that doesn't need Opus down to Sonnet or Haiku). We've seen production bills drop 60–80% from caching alone.

The developer-forward LLM. Strongest for long-context work, coding, and structured analysis — and the rare foundation model whose rough edges are about over-caution rather than under-competence.

RATING · 9.1 / 10 PRICING · OPUS 4.7 $5/$25 · SONNET 4.6 $3/$15 · HAIKU 4.5 $1/$5 (PER MTOK) UPDATED · 2026-04-23

TRY CLAUDE → TOKEN COST CALCULATOR → FAQ →

Estimate your token spend

INTERACTIVE · LIVE · VERIFIED RATES

MODEL

INPUT TOKENS / REQUEST 8K

OUTPUT TOKENS / REQUEST 2K

PROMPT CACHE HIT RATE 0%

Estimate one request. Multiply by your monthly request volume to get a monthly number. Cache-hit rate captures how much of your input is a stable prefix (system prompt, long docs) — the higher it is, the more prompt caching saves.

COST PER REQUEST

$0.05

USD / REQUEST

Standard API rates — Batch API takes 50% off. 1M requests at this shape ≈ the cost shown × 1,000,000.

OPEN CLAUDE.COM →

BEST FOR

Long-context reasoning, code generation, agentic workflows, structured analysis of documents.

NOT FOR

Free-tier consumer use cases, apps that need the absolute cheapest per-token price, creative fiction pushing edgy content.

PRICING

Opus 4.7 $5 / $25 · Sonnet 4.6 $3 / $15 · Haiku 4.5 $1 / $5 per million input/output tokens. Prompt caching reads at 10% of input, Batch API at 50% off.

ALTERNATIVES

GPT-4-class models (OpenAI), Gemini (Google), open-weights Llama/Qwen for self-hosted.

What it is

Claude is Anthropic's family of large language models, sold through a consumer chat product at claude.ai and through a developer API. The lineup follows a three-tier shape that's stayed consistent across major versions: Haiku at the small/fast end, Sonnet as the workhorse tier, and Opus as the frontier tier for the hardest reasoning tasks. As of 2026-04, that means Haiku 4.5, Sonnet 4.6, and Opus 4.7, with older variants still available on the API for teams that haven't migrated yet.

Anthropic also ships Claude Code, a command-line coding agent that runs locally on your machine and uses Claude as its backing model. Claude Code meaningfully changes the calculus for developer teams — we'll get to that in a minute — because it turns the model from "smart chat box" into "autonomous collaborator that can actually edit your repo."

Positioning-wise, Claude competes head-on with OpenAI's GPT-class models and Google's Gemini. The three are close enough on raw intelligence that the practical choice usually comes down to taste: Claude is more willing to admit uncertainty, follows long instructions carefully, and produces structured output without going off-script. GPT has the biggest ecosystem and the broadest feature set. Gemini has enormous context windows and deep Google integration.

What makes Claude unusual inside that competitive set is how reliable it is at doing what you asked. In production workloads — where you care less about max IQ and more about the model doing the same thing twice in a row — Claude wins more often than either competitor. That quietly matters more than the marketing suggests.

The API itself is clean, well-documented, and stable. Anthropic publishes deprecation schedules, a clear versioning scheme, and predictable pricing that doesn't shift every two months. For a developer trying to build something real on top of a model, that stability is a feature.

What we tested

In our testing across client work, we've leaned on Claude for three primary use cases: code generation and refactoring inside agentic coding tools, structured document extraction against messy real-world inputs (contracts, PDFs, inconsistent CSVs, forms, screenshots), and long-context summarization on inputs in the 100k–200k token range. We've used Sonnet for the overwhelming majority of that work, with Opus reserved for the occasional gnarly reasoning task and Haiku for anything that smells like classification or routing.

On the infrastructure side, we've exercised the API through direct Anthropic endpoints, through AWS Bedrock, and through Google Vertex AI. We've built prompt-cached pipelines against long system prompts, batch-processed workloads via the Batch API to verify the 50% discount lands where promised, and stress-tested tool-use schemas through hundreds of thousands of function calls.

On the consumer side, we've used claude.ai daily for two years — through Projects, Artifacts, Computer Use, and the file-upload flow — which gives us a feel for how Claude behaves when the end user is a non-technical operator rather than a developer holding the API.

None of what follows is a formal benchmark. Every benchmark-focused review already exists. What we can offer is the texture of running Claude in client production for sustained periods and living with the results: where it earns its keep, where it surprises, where the edges still need working around.

Pricing, in detail

VERIFIED FROM ANTHROPIC DOCS · PER MILLION TOKENS

HAIKU 4.5

$1IN / $5 OUT

Smallest, fastest current model. The right tier for classification, routing, and simple extraction.

Fastest latency in the family
Cache read: $0.10 / MTok
Batch: $0.50 / $2.50 per MTok

SONNET 4.6 · WORKHORSE

$3IN / $15 OUT

The tier we default to for most production workloads. Near-Opus quality at a fraction of the spend.

Production-grade coding & reasoning
Cache read: $0.30 / MTok
Batch: $1.50 / $7.50 per MTok

OPUS 4.7 · FRONTIER

$5IN / $25 OUT

Frontier reasoning tier. New tokenizer; noticeably cheaper than the old Opus 4 line ($15 / $75).

Best reasoning + agentic depth
Cache read: $0.50 / MTok
Batch: $2.50 / $12.50 per MTok

DISCOUNTS STACK

−90%CACHE HITS

Prompt caching charges 10% of input rate on reads. Batch API takes a straight 50% off both directions.

Cache writes: 1.25x (5min) / 2x (1hr)
Cache reads: 0.1x base input
Batch + cache multiply together

What's good

The most consistent thing we can say about Claude is that it follows instructions. If you tell it to return JSON with these five keys and no prose, it does. If you tell it to keep its answer under two sentences, it does. If you tell it to wait for user confirmation before calling a destructive tool, it waits. For anyone who has spent time wrangling a lesser model into shape with retries and schema validation, this single property is worth the price premium alone — the cost of "did the model do what I asked" failures compounds fast in production, and Claude has fewer of them than any competitor we've measured.

Coding performance is the second clear strength. Across the industry benchmarks we trust — SWE-bench Verified, HumanEval-plus, and real-world "fix this repo" harnesses — Sonnet and Opus both land at or near the top of the pack. More importantly, the generated code compiles more often and the diffs read more like something a senior engineer would ship. Claude doesn't just produce plausible code; it produces code that holds up to review.

Claude Code, the CLI agent, amplifies this advantage. Users report that pairing with the CLI on well-scoped tasks feels less like prompting a model and more like driving a capable junior developer. It reads files, runs tests, asks before destructive operations, iterates on feedback, and produces PRs that real engineers can land without heavy rewriting. No competitor has a first-party agent at this level of integration, and the benefit compounds the more your team gets comfortable with it.

Long-context behavior is genuinely good, not just "large." Models with large advertised context windows often degrade dramatically once you push past a certain depth — forgetting earlier content, confusing sections, losing the plot on multi-document reasoning. Claude holds up. In our testing, Sonnet 4.6 is usable at 150k+ tokens for real document analysis, not just as a demo.

Where Claude shines in production

Long-context work: 200k+ token windows without the usual "lost in the middle" collapse.
Structured output and tool use: consistent JSON, reliable function calling, minimal schema drift.
Agentic loops: the model is willing to pause, ask, or stop when uncertain, which is exactly what you want in an agent.
Prose quality: tone is measured and editorial, not the florid default of other models.
Sonnet's price/performance is the best production default in the market right now.
Prompt caching pays off after a single re-hit — no complex cache strategy required.

If OpenAI optimizes for breadth and Google optimizes for integration, Anthropic optimizes for being the model you can trust to actually do what you said. In production code, that matters more than the marketing suggests.

Tool use is the quiet star of the kit. Claude's function-calling is unusually reliable — the model sticks to declared schemas, chooses the right tool more consistently than peers, and doesn't hallucinate parameters. For anyone building agents that call real APIs in production, this translates directly to fewer retries, fewer validation layers, and fewer late-night pages for schema drift.

Pros & cons

OUR HONEST TAKE

WHAT WORKS

Follows instructions more literally than any competitor. Less post-processing of outputs.
Generated code compiles more often and reads like a senior engineer wrote it.
Long-context behavior is genuinely good — not just large, but usable.
Tool use and structured output are reliable enough to remove validation layers.
Sonnet's price/performance is the best production default in the market.
Claude Code CLI changes how teams pair with the model.
Agents built on Claude stop and ask rather than confidently hallucinating.

WHAT DOESN'T

Occasional over-caution on benign asks (security, legal, some fiction).
Rate limits on the frontier tiers tighten during peak hours.
Not the cheapest option if output quality doesn't matter.
No built-in web browsing on the API side out of the box.
Image generation is not part of the product — pair with Flux, Ideogram, etc.
Consumer Claude has a different system prompt than the API — behaviors can diverge.

Common pitfalls

A few recurring failure modes show up in the Claude projects we've seen go wrong — none of them fatal, all of them worth naming.

Treating consumer Claude and API Claude as the same product. They're not. claude.ai has a specific system prompt, a safety filter stack, and default behaviors tuned for a broad consumer audience. The API has none of that — it applies what you put in the system prompt. Teams that build a prototype on claude.ai and then port it to the API sometimes hit unexpected behavior changes, especially around refusal patterns. Build on the API from the start if you're heading toward production.

Under-using prompt caching. Any app with a stable system prompt or a long reference document should be caching. We routinely see production Claude bills drop 60–80% when caching is added to an existing workload. The common mistake is assuming "my prompt is too short to bother" — but on repeat requests, even a few-hundred-token prefix pays off almost immediately.

Defaulting to Opus. Teams new to Claude often pick Opus because it's the flagship. For 80% of production tasks, Sonnet is indistinguishable from Opus in quality and one-third the price. Start on Sonnet; move to Opus only when you can point at a specific task where Sonnet demonstrably hits a ceiling.

Ignoring the Batch API on async workloads. If your workload doesn't need a real-time response, batching gets you 50% off, stacked on top of prompt caching. The engineering investment is modest — a few hours to rewrite the call pattern — and the savings are immediate.

Using one model for everything. The trick to controlling a Claude-powered system is routing: Haiku for classification and triage, Sonnet for the reasoning, Opus only for the hardest final step. Most apps that land on "we're spending too much on Claude" aren't using the right model for each step.

What's actually offered

CAPABILITIES AT A GLANCE

LONG CONTEXT

200k+ token windows on all current models, with usable recall deep into the context.

TOOL USE

First-class function calling with strong schema adherence and minimal drift.

VISION

Image understanding across all tiers — charts, diagrams, screenshots, PDFs.

PROMPT CACHING

Pin hot prefixes (system prompts, long docs) to pay only for new tokens.

BATCH API

Submit async jobs at 50% off for any workload that doesn't need real-time responses.

CLAUDE CODE CLI

A first-party agentic coding CLI that runs locally and edits your repo directly.

CONSUMER APP

claude.ai with Projects, Artifacts, and Computer Use for end-user workflows.

AWS + GCP AVAILABILITY

Runs on Anthropic's API, Amazon Bedrock, and Google Vertex AI for teams locked to either cloud.

SEEN ENOUGH?

A fresh account takes under a minute, and Sonnet is free to try on claude.ai.

TRY CLAUDE →

What's not

The most common complaint is over-caution. Claude will occasionally refuse or hedge on requests that a reasonable reader would consider entirely benign — security research, legal analysis, some kinds of creative fiction. The frequency has improved significantly over successive model versions, but it hasn't disappeared. Users report that framing and system prompts mitigate most of it, and that the API (with an explicit developer-side system prompt) is noticeably less cautious than the default consumer product.

Rate limits are the second honest gripe. During peak hours, Anthropic's capacity for the top tiers has historically tightened — this is a reality of any foundation model at the frontier, but it's worth budgeting for in production. Prompt caching and batch endpoints help smooth costs, and AWS Bedrock / Google Vertex AI give you independent capacity pools, but if your app is spiky you'll want a secondary provider to fall back to.

Price is not exactly a weakness, but worth naming: Claude is not the cheapest LLM per token by a significant margin. If your workload can tolerate lower quality, open-weights models on RunPod or a mid-tier OpenAI model will be cheaper. The case for Claude is always that the quality gap justifies the price gap — and once you factor in the cost of retries, validation, and failed outputs on a lesser model, Claude usually still wins. But if quality really doesn't matter for your workload, use something else.

No native image generation. Claude understands images on input but doesn't generate them. For multimodal apps, pair Claude with an image model like Flux, Ideogram, or DALL-E via the appropriate provider. This is a product decision Anthropic seems firm on — don't expect image-gen to land in Claude anytime soon.

No built-in web browsing on the API by default. Claude has a web-search tool available, but it's a separate paid capability ($10 per 1,000 searches) rather than automatic browsing. For apps that need live web context, you have to wire it in explicitly. For research-heavy workflows, Perplexity's Sonar API is sometimes a better integration point.

The consumer product evolves less predictably than the API. Features in claude.ai ship and get reworked on a cadence that's harder to plan around than the API versioning. This isn't a problem if you're using Claude for work — but if you're building an internal tool that depends on a specific claude.ai feature behaving a specific way, be ready for it to shift.

Who should use it

If you're building anything that involves code, long documents, or an agent doing multi-step work, Claude should be your default. The same goes for any application where reliability of output shape matters more than the absolute cheapest token price — which, once you factor in the cost of retries and validation, is most production applications we see.

Claude Code is a distinct reason to pick this ecosystem even before you consider the base model. No competitor has a first-party coding agent at the same level, and the benefit compounds as your team gets comfortable with it. For a small dev team looking for force multiplication, the combination of Claude Code CLI and Sonnet's pricing is genuinely hard to beat.

Enterprise teams should use Claude through AWS Bedrock or Google Vertex AI. It gives you the cloud provider's compliance posture on top of Anthropic's model, simplifies procurement, and usually fits into an existing cloud contract. The latency is slightly higher than direct Anthropic endpoints, but the procurement story is much cleaner.

If you're shipping a consumer product where volume is massive and the per-query quality bar is low, a cheaper model will serve you better. ChatGPT or Gemini's smaller tiers are closer to the right price-performance curve for those shapes. And if you're building on open weights for sovereignty, data-residency, or self-hosting reasons, Claude isn't on the table — pick a Llama- or Qwen-class model and run it yourself on RunPod or similar.

For research, Claude is one of the better tools for reading through long academic papers or pulling structured findings from a messy corpus of documents. Paired with Perplexity for citation-grounded web search and a scratchpad workflow, a solo researcher can cover ground that would have taken a team a decade ago.

Verdict

Claude is the LLM we reach for first. It's the one we'd recommend without hesitation to a team shipping code, working with long documents, or running agentic workflows in production. The over-caution and pricing are real but manageable, and the trajectory has been consistently forward: every major version in the last two years has improved on the previous in the ways that matter for real work — better instruction-following, better code, better long-context behavior, lower prices (Opus 4.7 at $5/$25 is a meaningful step down from the Opus 4 line at $15/$75).

We rate it 9.1 / 10. It loses a half-point for occasional refusals and a half-point for the price floor. Almost everything else is better than the competition, and the things that aren't (ecosystem, multimodal features, free-tier reach) are features of the competition, not failures of Claude.

If you're on the fence, spend a week using Claude Code on your current work. You'll know within three days whether the combination of model + agent fits how you build.

Frequently asked

TAP TO EXPAND

Start on Sonnet. It's the right answer for roughly 80% of production workloads we build. Move up to Opus only after you've tested a task and confirmed it hits a quality ceiling Sonnet can't clear. Drop down to Haiku for classification, routing, or any step where you care about latency and per-call cost over raw reasoning depth.

Claude leads on code generation, long-context reliability, and instruction-following. GPT-4-class models lead on ecosystem breadth, third-party integrations, and raw consumer reach. For developer workloads — agents, code, structured output, document analysis — Claude is the default. For broad consumer chat with plugin ecosystems, GPT is the default.

Claude Code is Anthropic's official command-line coding agent. It runs on your machine, has full access to your repo, and can read, edit, run commands, and iterate autonomously. It's one of the reasons we'd pick Claude even ignoring the base-model quality — no competitor has a first-party agent at the same level, and the benefit compounds as your team gets comfortable with it.

Three levers, in order of impact: prompt caching (huge wins on any workload with repeated system prompts or long documents), the Batch API (50% off for anything that doesn't need real-time), and model right-sizing (move any step that doesn't need Opus down to Sonnet or Haiku). We've seen production bills drop 60–80% from caching alone.

Anthropic offers a zero-data-retention option for API customers and has SOC 2 Type II available. For HIPAA or more regulated workloads, running Claude through Amazon Bedrock or Google Vertex AI gives you the cloud provider's compliance posture on top. Always double-check with your security team — the answer depends on your specific regime.

Consumer Claude leans cautious by default because it serves a very broad audience. Most developer refusals disappear once you're on the API with an explicit system prompt that frames the task and user context. If you still see refusals, tighten the system prompt and provide context for why the task is legitimate — this resolves the vast majority of false positives in our experience.

You mark a prefix of your prompt as cacheable. On the first request, that prefix is written to cache at 1.25× the normal input rate (or 2× for a 1-hour TTL). On subsequent requests within the TTL, reading the cached prefix costs 10% of the normal input rate. For apps that reuse a big system prompt or a long document across many requests, this pays off on the second call.

DONE READING?

Pick a task you'd normally give to another LLM this week and try it on Claude first.

TRY CLAUDE → RE-RUN CALCULATOR →

Shipping a Claude-powered system? We build those.

OPEN CLAUDE → SCOPE A BUILD WITH US →