Anthropic Opus 4.5: Breaks 80% on SWE-Bench Tests

Anthropic just dropped Opus 4.5, and it’s breaking records on coding tests. But the real story isn’t the benchmarks. It’s what this model can actually do in your browser and spreadsheets.

Released Monday, Opus 4.5 completes Anthropic’s 4.5 series rollout. It follows Sonnet 4.5 from September and Haiku 4.5 from October. This version targets developers and power users who need serious computational muscle.

First Model to Break 80% on SWE-Bench

Opus 4.5 hit a milestone nobody else reached yet. It scored over 80% on SWE-Bench verified, a respected coding benchmark that tests real-world programming skills.

That’s not just a number. It means the model can handle actual software engineering tasks at a level previous models couldn’t touch. Plus, it dominated other benchmarks including Terminal-bench for command-line operations and tau2-bench for tool use.

For general problem solving, Opus 4.5 scored high on ARC-AGI 2 and GPQA Diamond. These tests measure reasoning ability beyond simple pattern matching. So the model doesn’t just memorize answers—it thinks through problems.

Chrome Extension and Excel Integration Go Live

Anthropic paired the release with two practical tools. Claude for Chrome and Claude for Excel both moved from pilot to broader availability.

The Chrome extension lands for all Max users. That means you can invoke Claude directly while browsing without switching tabs. For research, coding, or analyzing web content, that’s a game changer.

Meanwhile, Claude for Excel reaches Max, Team, and Enterprise users. Now you can run complex data analysis inside spreadsheets without exporting to Python or R. The model understands Excel formulas, can suggest optimizations, and helps debug broken calculations.

Opus 4.5 scored over 80 percent on SWE-Bench verified benchmark

Both integrations showcase computer use capabilities. Opus 4.5 can control applications, navigate interfaces, and perform multi-step tasks. That’s the direction AI assistants need to move—actually working inside the tools people already use.

Memory Management Gets Smarter

Opus 4.5 includes significant memory improvements for long-context operations. But it’s not just about bigger context windows.

“Knowing the right details to remember is really important in complement to just having a longer context window,” Dianne Penn, Anthropic’s head of product management for research, told TechCrunch. The model now compresses context intelligently rather than just expanding memory limits.

This enables “endless chat” for paid Claude users. When the model hits its context window, it compresses previous conversation without alerting you. So conversations flow naturally without interruption or loss of coherence.

For developers working on large codebases, this matters enormously. The model can explore files, track dependencies, and maintain context across thousands of lines of code. Plus, it knows when to backtrack and recheck assumptions.

Built for Agentic Workflows

Many upgrades target agentic use cases. Picture Opus as a lead agent commanding multiple Haiku-powered sub-agents. That requires sophisticated working memory and task coordination.

“Claude needs to be able to explore code bases and large documents, and also know when to backtrack and recheck something,” Penn explained. The memory improvements make this possible.

For example, Opus could manage a team of Haiku agents analyzing different parts of a codebase simultaneously. It tracks their findings, identifies conflicts, and synthesizes insights. That’s far more practical than single-agent approaches for complex projects.

Stiff Competition From OpenAI and Google

Opus 4.5 faces serious rivals. OpenAI released GPT 5.1 on November 12. Google launched Gemini 3 on November 18. Both models claim state-of-the-art performance.

The timing matters. All three companies raced to release flagship models before year-end. Each wants to establish benchmarks and capture enterprise customers heading into 2026 planning cycles.

But Opus 4.5 differentiates itself through practical integrations. Chrome and Excel support give it immediate utility beyond API access. While competitors focus on raw capability, Anthropic built tools people can use today.

The benchmark wars continue. Yet what matters more is which model actually helps people work better. Opus 4.5’s focus on real applications suggests Anthropic understands this.

What This Means for Developers

If you’re building AI-powered applications, Opus 4.5 raises the bar. Its coding abilities mean you can delegate more complex tasks to the model. Its memory management enables longer, more sophisticated interactions.

The Chrome and Excel integrations also signal where AI assistants are heading. Embedding models directly into daily tools makes more sense than forcing users to switch contexts. Expect more of this from all major AI companies.

For enterprise users, the Team and Enterprise Excel access matters most. Finance teams, data analysts, and operations managers can now run complex analyses without technical staff. That democratizes capabilities previously locked behind programming knowledge.

Still, competition remains fierce. OpenAI and Google won’t cede ground easily. Plus, smaller models from companies like Mistral and Cohere keep improving. The next few months will determine which approaches gain real traction.

Choose your AI tools based on actual workflows, not just benchmark scores. Opus 4.5 excels at coding and integration. But your specific needs might align better with different models. Test thoroughly before committing.

Anthropic’s Opus 4.5 Crushes Coding Benchmarks With Chrome and Excel Integration

First Model to Break 80% on SWE-Bench

Chrome Extension and Excel Integration Go Live

Memory Management Gets Smarter

Built for Agentic Workflows

Stiff Competition From OpenAI and Google

What This Means for Developers

Microsoft Just Locked Down 116,000 Nvidia GPUs Through Nscale Deal

Vibe Coding Is Changing Everything. Here’s the Honest Truth About It

Europe Built One App to Verify Every Kid’s Age Online. Here’s How It Works

Disney Just Sold Mickey Mouse to AI. Here’s What That Actually Means

Another Spyware Maker Exposed: Fake Android Apps Hide Italian Surveillance Tool

Claude Stays Ad-Free While ChatGPT Shows Sponsored Content

Leave a Reply Cancel reply

First Model to Break 80% on SWE-Bench

Chrome Extension and Excel Integration Go Live

Memory Management Gets Smarter

Built for Agentic Workflows

Stiff Competition From OpenAI and Google

What This Means for Developers

Similar Posts

Leave a Reply Cancel reply