Back to Blog
AI Tools

Long Context Is Here: What Claude 4 Changed for Enterprise Work

18 March 20268 min read

For three years the standard enterprise AI pattern was retrieval-augmented generation: chunk your documents, embed them, retrieve the most relevant pieces, stuff them into a 100,000-token context window, generate the answer. We built whole stacks around it.

Anthropic's Claude 4 family, with the one-million-token context window now widely available, has made parts of that stack optional. Not all of it, but more of it than I expected when I first started testing.

What actually changed

A one-million-token context is roughly 750,000 words. A typical enterprise contract is around 10,000 words. A full year of board minutes might be 100,000. A medium codebase fits comfortably. A complete employee handbook with policies, procedures, and FAQs fits with room to spare.

The shift isn't just size. It's that the model can now hold all of that in mind simultaneously and reason across it without losing detail. The recall accuracy on million-token contexts in Claude 4 is genuinely good — the "lost in the middle" problem that plagued earlier long-context models is largely gone.

Combined with prompt caching at the provider level, sending the same large context across many queries has gone from cost-prohibitive to routine.

Where long context wins outright

Document analysis

If your task is "answer questions about this specific contract" or "summarise this 200-page report", long context is now strictly better than RAG. There's no chunking step to lose detail across. No retrieval step that might fetch the wrong section. The model sees the whole document and reasons over it directly.

I rebuilt a contract analysis tool earlier this year that previously chunked agreements into sections and retrieved the relevant clauses. Switching to a long-context-first design with the full contract loaded made the answers noticeably more accurate, particularly on questions that required understanding the relationship between different clauses.

Codebase understanding

For coding assistants and code review tools, fitting an entire repository (or at least the relevant subsystem) into context produces qualitatively different output than retrieval-based approaches. The model can trace dependencies, understand naming conventions, and follow patterns across files in a way that chunked retrieval struggles to match.

Multi-document reasoning over a known set

"Compare these three vendor proposals and identify the differences in their service-level commitments" is a long-context task. The retrieval-based equivalent has to guess which sections to pull from each, and usually misses something.

Where RAG still wins

Long context isn't a replacement for retrieval. It's a different tool with a different shape.

Truly large corpora

If your knowledge base is genuinely large — millions of pages, the entire history of a Slack workspace, the full contents of SharePoint — you still need retrieval. Long context handles a single complete document well. It doesn't handle "search across everything we have."

Cost-sensitive high-volume use cases

Sending a large context on every query, even with caching, costs more than retrieving a small relevant slice. For customer-facing systems serving millions of queries, the economics still favour RAG.

Frequently changing data

Prompt caching breaks if the context changes. If your knowledge updates throughout the day, the caching benefit disappears and long context becomes expensive again. Retrieval over a fresh index handles this naturally.

The hybrid pattern that's emerging

The teams getting the most out of Claude 4's long context aren't choosing between long context and RAG. They're combining them.

A typical pattern: a lightweight retrieval step narrows from millions of documents down to the few hundred pages that might be relevant. Those few hundred pages then go into long context as a complete bundle. The model reads the whole bundle, reasons across it, and answers.

This gets you the precision of long-context reasoning without paying to load the entire corpus on every query. It's also more robust than pure RAG, because you're not relying on retrieval to find exactly the right paragraph — just to find the right set of documents to look at.

What this means for enterprise AI architecture

If you're building or rebuilding an enterprise AI system in 2026, the design questions have shifted:

  • What's the natural unit of context for this task — a document, a project, a customer record? Can it fit in a million tokens?
  • How often does that context change? Can it be cached for the duration of a session, a day, longer?
  • Where do you genuinely need search-style retrieval versus where you've been doing it out of habit?

For greenfield builds I now default to long-context-first. RAG gets added where the corpus genuinely demands it or where cost analysis makes it necessary. That's the opposite of the default I would have used a year ago.

What I'd revisit

Long context didn't kill RAG. It removed RAG from a lot of places where it was being used by default. If you built your AI stack in 2023 or 2024, some of the architecture decisions you made then were the right call at the time and are no longer the right call now. The chunking strategy is usually the first thing worth looking at again.

If you'd like a second opinion on whether long context simplifies your current setup, get in touch.

ClaudeAnthropicLong ContextRAG