Secure IT – Claude Mythos: AI Vulnerability Hype vs. Evidence, E23
Download MP3Jason Kikta (00:00)
Hello and welcome to the Secure IT Podcast from Automox. I'm your host, Jason Kikta. It's been a lively couple of weeks in AI news, especially around Claude Mythos. Last week, there were lots of discussion around the Cloud Security Alliance's strategy briefing called the AI Vulnerability Storm, building a Mythos-ready security program. Then by this week, sharp public skepticism around the Mythos claims bubbled up. And yesterday, Bloomberg reported that unauthorized users on Discord
have been accessing Mythos since the day of the limited launch. The story is moving fast and lots of details are still unclear. But that doesn't mean that there aren't useful things to consider. So let me tell you what I think is actually durable. ⁓
Okay, here's a thesis and I want to state it plainly before I get into any of the specifics. It does not matter whether Claude Mythos preview is the game changer anthropic claims. New models are going to keep pushing the boundaries of what AI can do an offensive security, and they're going to have unanticipated advances. So thinking about how we deal with the rapid deluge of vulnerabilities and exploitation is prudent, regardless of which specific model announcement you believe.
Look at the trajectory and look at what has already been independently verified. In June of 2025, Expo became the first autonomous system to top HackerOne's US leaderboard, outperforming every human hacker on the platform. Two months later, DARPA's AI Cyber Challenge found 54 vulnerabilities in four hours across 54 million lines of code.
Google's big sleep project found 20 real zero days in open source projects, each discovered and reproduced without human guidance. In November, Anthropiq disclosed that a Chinese state sponsored group had used Claude code to run full attack chains across roughly 30 targets.
By February of this year, Claude Opus 4.6 had surfaced over 500 high severity vulnerabilities in open source software. I'll found 12 open SSL zero days, including one that had been sitting there since 1998. That was like, I think my sophomore or maybe junior year of college. Long time ago.
Every one of those is independently confirmed by the organization that did the work. Named researchers, reproducible results, public competition records, and every one of those predates Mythos. So when someone argues about whether Mythos is specifically a step changer or not,
My honest reaction is that the argument misses the point. The trend is real and is going to keep moving whether or not any single vendor announcement lives up to that press release. Now, let me be direct about Mythos itself because the picture has changed just since my blog post went up last week. Davi Ottenheimer published a careful teardown of the Mythos system card and the numbers deserve to be seen. Anthropic headline 72.4 % exploit success rate on Firefox drops to just 4.4%.
when the top two pre-discovered bugs are removed from the evaluation corpus. The security firm, Aisle reproduced the showcase FreeBSD vulnerability on all eight open weight models they tested, including a 3.6 billion parameter model at 11 cents or million tokens.
And Tom's Hardware reported that the thousands of zero days claim decomposes to roughly 198 manually reviewed findings behind a pile of automated triage. I'm not endorsing every conclusion in Ottenheimer's piece, but the specific numeracy sites come directly from Anthropic's own system card. Then yesterday, Bloomberg broke the Discord leak story. A private group
reportedly gained unauthorized access to Mythos on the same day of the limited launch and has been using it regularly since. Anthropic confirmed its investigating. So the framing of too dangerous to release meant operational reality in about two weeks.
Here's what I take away from all that. Head your bets on any single model's capability claims. Anchor your planning to the direction of the trend which is established rather than to the magnitude of the latest headline which is contested. That's a more defensible posture and it does not require you to pick a side in the Mythos debate.
Now, here's what I find encouraging about the CSA briefing. Even if you set aside the Mythos specific numbers, the recommendations hold because they are the same things we should have been doing already. The most prioritized recommendation that a group of 50 plus CISOs is focusing on the basics, patching known vulnerabilities, segmentation, egress filtering, multifactor authentication, defense and depth. These controls increase the cost of exploitation.
So regardless of whether the vulnerability is found by human researcher and AI model, a teenage with a fuzzer, a state actor, the NSA, like it doesn't matter, right? Good security starts with good it. That's been the thesis of the show since day one. And the math doesn't work at human speed anymore. Half of organizations take five or more days to patch critical vulnerabilities. 94 % have not fully automated their endpoint management 94.
percent haven't fully automated it, right? Those were calculated risk when time to exploit was measured in weeks. Today, the zero day clock puts a mean time to exploit in under a day, right? Five days to patch is an open door. I'd frame this urgency a little differently than the paper does. The paper calls for compressing timelines. I call this long overdue, right? This is things again,
They should have been done years ago. This should already be a part of your repertoire. And if it's not, you need to change that very quickly. ⁓ One other piece of the Anthropic Commanding Guidance that I want to repeat because it got under covered. Their recommendation, patching the KEV list first, and then ⁓ everything above a chosen EPSS threshold will help you turn thousands of open CVEs into a manageable queue. EPSS isn't new. It's developed by ⁓ empirical security, published through FIRST
and already integrated into 120 security products. What's new is a Frontier AI lab pointing defenders at a probabilistic machine-driven triage instead of CVSS, which NIST itself has publicly acknowledged that it cannot keep up with. So that guidance stands on its own merits independent of any Mythos claim. So what do you actually do with all this? Well, three things. First, read both the CSA paper and the sharpest critique side by side.
Sort out the differences between the trend, is stable, and the specific vendor claims, which are contested. I want to keep repeating that. The trend is stable, the claims are contested, but the trend is what you can focus on. Calibrate accordingly. Secondly, treat the basics as the highest leverage investment you can make. Asset inventory, automated patching, identity hygiene, segmentation.
If you had funded automation for routine patching before Mythos was announced, you'd already be in a better position than most of the industry is now. Third, raise the governance question with your leadership.
The real limit on response speed is not tooling or technical capability is the organization's risk appetite for patching disruption. How fast is the business willing to absorb change in exchange for reducing exposure? That conversation is overdue at most companies and it does not require a new AI model to make it worth having this week. All right. So thanks for listening. And if this one resonated, please share it with someone else on your team. We'll be back soon. Until then, stay secure and stay ahead.
Creators and Guests
