Hacking the AI

Why Your LLM Is Your Newest Attack Surface

Mar 31, 2026

Last week I spent a few hours on a Hack The Box machine called “Artificial.” It’s rated “Easy” by the site’s standards. That should concern you.

The machine hosts a web application that lets users upload TensorFlow models and run predictions against them. This is a standard model serving infrastructure. It’s the kind of thing a lot of organizations are building right now. I uploaded a model. The server loaded it. And because of how TensorFlow deserializes Keras Lambda layers, it executed the Python payload I’d embedded inside.

I had a reverse shell in seconds. I had full root access within the hour.

Most conversations about AI security start and end with prompt injection, tricking a chatbot into saying something it shouldn’t. That’s real, but it’s a narrow part of the problem. The deeper risk is in the AI supply chain itself, including the models, the pipelines that move them, and the infrastructure that serves them. That’s where I spent my week, and that’s what this post is about.

The numbers tell the same story, and by now it’s really no secret. CrowdStrike’s 2026 Global Threat Report found an 89% increase in AI-enabled adversary attacks in a single year, based on observed intrusions across their threat intelligence operations.[1] The World Economic Forum and Accenture’s Global Cybersecurity Outlook 2026 reported that 87% of security leaders now rank AI-related vulnerabilities as their fastest-growing cyber risk.[2] IBM’s X-Force Threat Intelligence Index 2026 found that major supply chain breaches have quadrupled over the past five years.[3]

We’re deploying AI faster than we’re securing it, and to be sure, adversaries have noticed.

The Model Is the Malware

Here’s what some security teams haven’t internalized yet: AI models are executable code.

When TensorFlow loads a saved .h5 model, it isn’t just reading data. It deserializes Python objects, including any functions embedded in the model’s architecture. Keras Lambda layers allow arbitrary Python functions to be baked into a model file. When the server calls load_model(), it runs whatever code is inside. All of it. No limits.

A quick word on what .h5 files actually are, because this matters. HDF5 is a data format that’s been around for over twenty years. NASA uses it to store satellite telemetry. Financial firms use it for time-series data and risk models. Healthcare and biotech organizations use it for genomic and clinical datasets. In the ML world, it became the standard way to save and share Keras models. There are years of .h5 model files sitting in repositories, shared drives, and production pipelines across every industry that touches machine learning. It is not a niche format.

The specific vulnerability I exploited is tracked as CVE-2024-3660.[4] If you’ve been in security for a while, you’ll recognize the pattern. It’s conceptually identical to the Python pickle deserialization attacks that have been burning web applications for years. The difference is that most security teams know to be suspicious of pickle files. Almost nobody is looking at .h5 files the same way.

I built a TensorFlow model with a Lambda layer containing a reverse shell. A few lines of Python that tell the server to connect back to my machine and hand over command-line access. Upload the file. Click “View Predictions.” Done.

One thing I learned the hard way during this: TensorFlow executes Lambda code during model.save() too, not just during loading. If your listener is running while you’re building the payload locally, you’ll catch a shell from your own Docker container instead of the target. It’s the kind of detail you won’t find in the CVE advisory, and it cost me about twenty minutes of confused troubleshooting before I figured out what happened.

The Fix That Wasn’t

Now, if this were just a story about a patched CVE on a training platform, it wouldn’t be worth writing about. But the history of this vulnerability is where it gets interesting, and where the real lesson lives.

The underlying behavior, Keras deserializing arbitrary Python code from Lambda layers, was known for years before it got a CVE. Google’s own TensorFlow security documentation explicitly warns that models are programs and should not be loaded from untrusted sources. For a long time, the project treated this as intended functionality, not a vulnerability. It’s documented. It’s a feature.[4][5]

CERT/CC and its partners eventually pushed for a formal advisory, and CVE-2024-3660 was assigned in April 2024 with a CVSS score of 9.8, Critical. The Keras team responded by adding a safe_mode parameter in version 2.13, set to True by default, intended to block unsafe Lambda layer deserialization.[4]

Problem solved? Not quite.

Researchers at Oligo Security discovered that the safe_mode fix only applied to the newer .keras file format. The legacy .h5 format, which is what the vast majority of existing models use, was left completely unprotected. When you load an .h5 file, Keras silently ignores the safe_mode parameter. No warning. No error. It just runs the code anyway. Oligo called this a “downgrade attack” and a “shadow vulnerability,” because an attacker can simply package their malicious model in the older format and bypass the fix entirely.[5]

JFrog’s security team then found additional bypasses even in the newer .keras format, using Keras’s own internal functions as gadgets for code execution.[6]

This led to a second CVE, CVE-2025-9905, specifically for the .h5 bypass.[7] Google’s security team responded by explicitly stating that they don’t consider safe_mode a security boundary.[7] That fix didn’t land until Keras 3.11.3.

And here’s the practical reality. Over in the Keras GitHub discussions, real users were reporting that upgrading from older TensorFlow versions was, in their words, “a huge amount of change” due to Python version dependencies and downstream package conflicts.[8] These aren’t lazy developers. They’re teams with production systems that can’t easily rip and replace their ML stack. So the legacy format, with its silent bypass, persists in the wild.

This is the pattern I want you to pay attention to. Not the specific CVE, but the cycle: a known dangerous behavior goes unaddressed because it’s considered a feature. A partial fix ships that protects the new format but leaves the old one exposed. The old format is what most of the ecosystem actually uses. The gap between “patched” and “safe” can be measured in years.

From Model to Full Compromise

The specific attack chain on the Artificial machine followed a path that will feel familiar to anyone who’s investigated a real breach.

The initial shell landed with limited access, a service account running the web application. The application’s SQLite database was sitting in the web directory with user credentials stored as unsalted MD5 hashes. On modern hardware, those crack in seconds. One password later, I had SSH access as a real user on the system.

That user turned out to be in a system administration group with read access to backup archives. I pulled a backup, found the admin credentials for a Backrest instance running as root on localhost, and cracked the bcrypt hash. The password was stored behind Base64 encoding, which adds nothing.

The last step: Backrest’s underlying utility, restic, accepts a --password-command flag that runs arbitrary commands with root privileges. I pointed it at a reverse shell script. Root.

Every step after the initial model upload is a well-known attack pattern. Credential theft, lateral movement through group permissions, privilege escalation through a misconfigured backup tool. Nothing new there. The only thing novel is the entry point. And the entry point is the one most organizations aren’t testing for.

What You Should Be Asking

If you’re deploying AI tooling in any capacity, here’s what this should put on your agenda.

Where are your models stored, and who has write access? The machine I tested accepted uploads from any authenticated user. In production, this translates to shared model registries, S3 buckets with model artifacts, CI/CD pipelines pulling from external sources. If you can’t answer the question “who can modify our models?” with specifics, start there.

Are you verifying model integrity before deployment? Checksums. Cryptographic signing. Provenance tracking. Most ML pipelines have none of it. A model file that moves through multiple teams and staging environments with no integrity checks could have been tampered with at any point, and you’d never know.

What’s the blast radius if your model server gets popped? In the system I tested, the AI application could reach the user database, the backup infrastructure, and ultimately the entire host. Does your model serving infrastructure run in a hardened container? Can it reach production databases? Internal services? API keys? Map it out.

Is your AI infrastructure actually in your pentest scope? Most organizations exclude it entirely. Model serving endpoints, training pipelines, model registries: these are production systems with their own attack surfaces, and they need to be tested that way. The OWASP Top 10 for LLM Applications[9] and MITRE ATLAS[10] provide frameworks for this kind of assessment, but adoption is still early. If AI is a “black box” exception in your security program, you’re ignoring the part of your attack surface that’s growing fastest.

The Bigger Picture

This isn’t really a story about one vulnerable machine on a training platform. And it’s not a story about a single CVE. It’s about how the AI/ML ecosystem handles security as an afterthought. Dangerous behaviors get documented instead of fixed. Partial patches ship for new formats while legacy formats are left exposed. And the gap between “vulnerability disclosed” and “ecosystem actually safe” stretches on for years while production systems sit in the middle.

AI systems are production infrastructure. They need the same rigor you’d apply to anything else in your environment. In a lot of cases they need more, because the attack surface is unfamiliar and the defensive tooling is still being built.

If you’re deploying AI and haven’t mapped this exposure, that’s a conversation worth having.

— — —

Adam Jarvis is the founder of JARVIS Threat Intelligence (JTI), a cybersecurity advisory firm specializing in threat assessments, red team operations, and security strategy for organizations navigating emerging attack surfaces. Subscribe to After Action for the next post in this series, where I’ll look at why 82% of modern attacks are now malware-free, and what that means for your defenses.

Pawel Jozefiak

Apr 3

The .h5 deserialization vulnerability is a good example of the supply chain problem that gets less attention than prompt injection. The 'AI models are executable code' framing is the right one to use with security teams, which reframes threat modeling around deployment rather than just user input.

The blast radius mapping point is underappreciated too. Whether you're running local models or cloud, figuring out what an infected model server can reach should come before the pentest, not during it. Most teams I've seen treat the model weights as static data. They're not.

After Action

Discussion about this post

Ready for more?