Anthropic Shipped Two New Models. They’re the Same Model.

TL;DR: Anthropic launched Claude Fable 5 For more context, see: how GPT-5.5 compares on AWS Bedrock. (general availability) and Claude Mythos 5 (restricted, cyber defenders only) on June 9, 2026. Same underlying model. Different safety posture. $10/M input tokens and $50/M output — less than half the price of Mythos Preview (Related: my Claude Opus 4.8 playbook for production prompts). Part 1 of 2: specs and use cases. Part 2 will be the review after I have run something real on it.

I opened Anthropic’s newsroom on Tuesday and saw two new flagship models. Then I read the second paragraph and realized they were the same model.

That is the structural fact nobody is writing about. The release notes, the benchmark table, the customer quotes — all of that is downstream of one decision Anthropic made: a single frontier model, packaged twice. Once for everyone. Once for the people with a clear reason to be dangerous.

The packaging is the product. Let me walk you through what’s actually shipping, what the safeguards mean in practice, and what I want to test once the access thaws.

The two-model packaging, in one paragraph

Claude Fable 5 is the general-availability model. It is what most developers will use. $10 per million input tokens, $50 per million output tokens, with the existing 90% input token discount for prompt caching, US-only inference available at 1.1x if you need it. Available on the Claude API, AWS, Google Cloud, Microsoft Foundry, plus the consumption-based Enterprise plan.

Claude Mythos 5 is the same model with the cybersecurity and biology safeguards lifted. It is currently restricted to a small group of vetted partners — about 150 organizations in more than 15 countries through Project Glasswing, with a planned expansion to biology researchers through a new trusted access program. Same price. Same benchmarks, more or less. Different rules about who can use it.

If you have a Claude Pro, Max, Team, or seat-based Enterprise plan, Fable 5 is included at no extra cost from now through June 22. After that, usage credits. Subscribe-tier access is being rolled out conservatively because demand is expected to be high and unpredictable.

What the spec actually says

The Fable/Mythos 5 model card calls itself “a Mythos-class 1 model made safe for general use.” Three things stand out from the announcement and the model page:

1. It is the strongest coding model Anthropic has ever shipped. Cursor calls it state-of-the-art on CursorBench and says it “opened up a class of long-horizon problems that were out of reach for earlier models.” Cognition says it is the highest-scoring model on FrontierBench. GitHub tested it on complex multi-day coding tasks and reported autonomy and reliability beyond their previous benchmarks. Replit (ViBench) says it nearly saturates their end-to-end vibe-coding benchmark. The pattern is consistent: long-horizon coding, agentic workflows, multi-file refactors.

2. It is built for asynchronous, multi-day work. The model page says explicitly: “Tackle days-long, complex, and asynchronous tasks previous models couldn’t sustain.” The “agents” use case describes running it in Claude Code or Claude Managed Agents where it plans across stages, delegates to sub-agents, and checks its own work. The “coding” use case says “multi-day autonomous sessions.” This is not a chatbot model. It is a model for the kind of work I would previously have staffed an engineer to do.

3. It writes and runs its own tests. The page calls this out specifically: “It can write its own tests to check its work, implement designs with high fidelity, and use vision to check outputs against goals.” That is the part that made me stop and reread. A model that validates its own output against the original goal, in vision, not just in text — that is a different category of autonomy than what we had with Opus 4.8.

The benchmark section itself is mostly a series of customer quotes. Anthropic is not publishing a clean score grid this round. The framing has shifted from “here are the numbers” to “here are the people using it and what they are saying.” I will dig into the system card for the raw scores when it gets used in anger.

The safeguards, and why they matter

This is the part that deserves the most attention. Fable 5 ships with three new classifier categories that route flagged queries to Opus 4.8 instead of the new model:

1. Cybersecurity. Fable 5’s classifiers are tuned to block exploitation, offensive cyber tasks, and the broader category of agentic hacking (reconnaissance, lateral movement, post-exploitation). External red-teamers and an internal bug bounty did not find universal jailbreaks in 1,000+ hours of testing. The UK AISI made progress toward one in a brief window. The point is: jailbreaks are not eliminated. They are slowed down enough to detect.

2. Biology and chemistry. Mythos-class models can complete real gene-therapy research steps. The system card gives the example of predicting how a genetic modification impacts the assembly of an AAV (adeno-associated virus) outer shell, evaluated against unpublished Dyno Therapeutics candidates. The model beat dedicated protein language models on this task without being explicitly trained for it. The dual-use risk is obvious. Fable falls back to Opus on most bio/chem queries for now, with a planned trusted-access program for verified researchers.

3. Distillation. Queries that look like systematic attempts to extract capabilities to train competing models fall back to Opus. This is an industrial policy more than a safety policy — Anthropic is saying “you cannot use Fable to clone Fable.”

The fallback is a UX feature, not just a safety feature. A refused query is a dead end. A query that falls back to Opus 4.8 is a working answer from a still-very-capable model. More than 95% of Fable sessions have no fallback at all. For those sessions, Fable 5 performs identically to Mythos 5.

There is one more thing. Using Fable 5 requires 30-day data retention for safety monitoring. The data is not used for training, will not be used for any non-safety purpose, and gets deleted in almost all cases after 30 days. Human access to the data is logged. This is opt-in at the application level, not a default. If you are building a consumer product on Fable, plan for the data retention posture upfront.

The 6 use cases I want to test

I have not run anything on Fable 5 yet. Access is rolling out in stages and the most ambitious harness work will take me a few weeks to set up properly. But I have a short list. These are the things I want to learn first:

1. Multi-day ERP migration as an agent task. I have a real Postgres schema from a client’s staging environment that I have been hand-migrating for three months. The boring parts (column renames, index conversions, backfill scripts) eat weeks. I want to see if Fable 5 + Claude Code can plan the migration in stages, run the backfills in parallel against a snapshot, and surface the actual risk points for me to review. The “multi-day autonomous sessions” claim is built for exactly this.

2. Long-horizon coding task on a private repo. My homelab monorepo has 4 services that I have wanted to refactor into a shared library for 6 months. Each service is small. The interface surface is large. The work is boring. This is the kind of “ambitious but not exciting” project that eats evenings. I want to hand it to Fable 5 in Claude Code and walk away for 48 hours.

3. Code review of my own AI agent’s recent commits. I have a habit of shipping AI-generated code and reviewing it myself a week later. Half the time the code is fine. Half the time it has a bug I missed because I was reading the diff, not the system. I want to run Fable 5 as a second-pass reviewer with vision, on a real PR with real test output. If the customer quote from the Anon legal team (“its redlines matched or beat our current model every time”) holds, this saves me a few hours a week.

4. Document-heavy work in finance/legal. My clients send me 80-page PDF contracts and expect me to find the three clauses that matter. I have been doing this manually with a 200K-context Opus session. I want to see if Fable 5 with vision handles the same task in less time and with fewer “I missed that” moments. The vision spec says it understands diagrams, charts, and tables nested in files and PDFs. That is the load-bearing claim.

5. Self-validating research output. I have a workflow where I ask an LLM to research a topic, write a brief, and check its own work against a rubric. With Opus I have to do the second pass myself. The Fable 5 spec specifically says it “reflects on and validates its own work” at the highest effort. The customer quote from Avee (Aman.ai) backs this up. I want to test whether the self-validation is real or marketing.

6. Vision-as-feedback for AI agent output. The “use vision to check outputs against goals” claim is the one I am most skeptical of and most excited about. Most AI agents cannot see their own output. Fable 5 supposedly can. If it actually works, this changes how I build UIs for agents. If it does not, I want to know how it fails.

The structural thing I keep coming back to

Two models, same weights, different safety posture. That is the whole product. It is not a clever trick — it is a forcing function. Anthropic is saying: this model is capable enough that we cannot give everyone the unfiltered version, but we can give everyone the filtered version, and the filter has to be good enough that 95% of users never notice it.

That is the bet. The bet is that the difference between “frontier capabilities” and “frontier capabilities minus cyber/biology” is small enough that most users will not care. The 95% no-fallback number is the proof point. If the bet holds, Mythos-class becomes the default and we stop talking about model generations and start talking about safety tiers.

If the bet fails — if too many people hit the fallback in real workflows — Mythos becomes a niche product and Fable becomes the only one that matters. Anthropic is not going to tell us which way the bet is going. We will find out by trying to do real work.

I have a follow-up post coming once I have a few real things to report. The test list above is the order I am going in. If you are also planning to test Fable 5, hit me on Telegram with what you are trying first.

Two models. Same weights. Different rules. The interesting question is not how good Fable 5 is at coding. The interesting question is whether the safety tier becomes the default unit of model comparison by the end of 2026.


Discover more from Susiloharjo

Subscribe to get the latest posts sent to your email.

Leave a Comment

Discover more from Susiloharjo

Subscribe now to keep reading and get access to the full archive.

Continue reading