Skip to main content

Claude Opus 4.8: Anthropic makes a more honest AI

An Anthropic logo behind an iPhone logging into Claude

Anthropic isn't ready to let regular users look at its supposedly super-powerful Claude Mythos AI model just yet. But the AI company has just released an upgrade to its flagship product, Claude Opus — now in its 4.8 version.

"It builds on Opus 4.7 with improvements across benchmarks, and is a more effective collaborator," Anthropic promised in a press release Thursday. Indeed, the benchmark numbers, below, show very minor improvements across the board.

One major improvement, allegedly, is in the area of hallucinations. Claude Opus 4.8 won't lie to users as much. "Early testers report that Opus 4.8 is more likely to flag uncertainties about its work and less likely to make unsupported claims," Anthropic said, touting the model's "honesty."

Claude Opus 4.8 has 'better judgment'

"Claude Opus 4.8 has noticeably better judgment," an engineer at Shopify, Tom Pritchard, told Anthropic. The coding version of the model "asks the right questions, catches its own mistakes, and pushes back when a plan isn't sound."

Given the increasing number of horror stories about AI agents deleting entire corporate databases, that promise may be music to the ears of vibe coders everywhere.

To please power users, Anthropic is offering a significant discount on "fast mode," where Claude will work at 2.5 times regular speed. Fast mode "is now three times cheaper than it was for previous models," the company said.

Users on Reddit weren't buying it, however. Many feared a loss of access to a more popular model, Claude Opus 4.6. "Nobody trusts the benchmark charts," wrote one redditor in summary, noting that Opus 4.7 also seemed to have some pretty good numbers when it was released.

Whether or not we can trust the benchmarks — and to be clear, Mashable hasn't independently verified these numbers — here's what Anthropic is claiming.

A list of benchmark numbers for Claude Opus 4.8
Credit: Anthropic

How to try Claude Opus 4.8

Claude Opus 4.8 is available now via Anthropic's website, Claude.AI, as well as via the Claude API, plus Anthropic partners like Microsoft Foundry.

The new model is priced exactly the same as its predecessors, which is to say models going all the way back to Claude Opus 4.5. All of them will cost you $5 per million input tokens and $25 per million output tokens.

Given that Anthropic is promising Claude Mythos within a matter of weeks, however, you may want to hang back and wait to see whether that model can be even more "honest" about its hallucinations.



from Mashable https://ift.tt/n3GzcrR
https://ift.tt/1FBG4fz

Comments

Popular posts from this blog

The Shortcut AI Excel agent could one-shot spreadsheet jobs. Heres how to try it.

There's a new AI agent on the block for people who spend their waking hours inside spreadsheets. Navigate to Shortcut AI's website , and you'll find a page that looks almost exactly like an empty Microsoft Excel spreadsheet. The main difference is a sidebar chatbot that can be tasked with taking on the tedious legwork of building, say, complex financial models or competitive analyses. Because Shortcut is agentic , meaning it can handle multi-step tasks on the user's behalf, the tool can do more than just generate Excel formulas or analyze spreadsheet data. In a demo on X, Nico Christie, founder and CEO of the Shortcut AI agent, showed how the tool swapped out the data from a Microsoft distributed cash flow analysis (DCF) for Google data by looking up Google's SEC filings and populating the data in the same template. This Tweet is currently unavailable. It might be loading or has been removed. Shortcut launched on Monday with a rather ominous tagline: "Try...

Mystery Pixel smartphones detailed in code references

The devices also pack 12GB of RAM apiece. Shiba is said to feature a screen with a resolution of 2,268 x 1,080 pixels while Husky could be a bit larger at 2,822 x 1,344 pixels. Given the amount of RAM, however, both would likely qualify as premium devices. from TechSpot https://ift.tt/cefMDJW via

When the clocks change for Daylight Saving Time, and why we do it at all

The clocks on our smartphones do something bizarre twice a year: One day in the spring, they jump ahead an hour, and our alarms go off an hour sooner. We wake up bleary-eyed and confused until we remember what just happened. Afterward, "Daylight Saving Time" becomes the norm for about eight months (And yes, it's called "Daylight Saving" not "Daylight Savings." I don't make the rules). Then, in the fall, the opposite happens. Our clocks set themselves back an hour, and we wake up refreshed, if a little uneasy.  Mild chaos ensues at both annual clock changes. What feels like an abrupt and drastic lengthening or shortening of the day causes time itself to seem fictional. Babies and dogs demand that their old sleep and feeding habits remain unchanged. And more consequential effects — for better or worse — may be involved as well (more on which in a minute). Changing our clocks is an all-out attack on our perception of time as an immutable law of ...