LMDeploy SSRFCVE-2026-33626AI securityLMDeploy load_image SSRF exploitSSRF vulnerability AI deployment tools

LMDeploy SSRF Flaw Exploited in 13 Hours, Hits AI Stacks

28 Apr 20266 min readJames O'Brien

// IN THIS ARTICLE

01The Numbers 02What's Actually New 03What's Priced In for AI Development 04Contrarian View 05Key Takeaways 06Frequently Asked Questions

Picture a hotel concierge who'll fetch any package from any address you scribble on a napkin, no questions asked, no postcode check. That's roughly what LMDeploy's load_image() function has been doing inside production AI stacks. And thirteen hours after the door was named publicly, somebody walked in and started rifling through the back office.

Thirteen hours. That's the window between public disclosure of CVE-2026-33626 and the first exploitation attempts caught in the wild. The concierge metaphor is going to keep coming back, because the guts of this story is about who you trust to fetch things on your behalf, and what you let them touch on the way back.

The Numbers

The vulnerability sits in LMDeploy, an open-source toolkit for managing large language models, and as SC Media reported, it's a server-side request forgery flaw inside the vision-language module. The vulnerable code path is load_image(), which fetches URLs without validating whether those URLs point at internal or private IP addresses. If you've ever debugged an SSRF in a webhook handler, you already know the shape of the problem before reading another sentence.

The thirteen-hour figure is the one that should make platform leads sit up. Researchers at Sysdig clocked exploitation attempts inside that window after public disclosure. For context, the rough industry baseline for opportunistic scanning of newly disclosed CVEs has historically been measured in days, sometimes weeks for niche software. Thirteen hours says the attackers were either watching the disclosure feed in real time or running fuzzers against AI infra they already knew about.

What attackers did with it is the boring bit, in the sense that it's textbook SSRF tradecraft: port scanning of internal networks, hits against AWS Instance Metadata Service to grab cloud credentials, probing of Redis instances, and out-of-band DNS exfiltration to pull data through name resolution when direct connections are blocked. None of this is novel. The novelty is the entry point being a vision-language model endpoint.

One detail worth pausing on: exploitation involved multiple requests spread across different vision-language models to evade detection. That tells you the attackers understood the deployment topology of LMDeploy users well enough to pivot between models inside the same toolkit, which suggests either prior reconnaissance or a playbook already shared in the relevant corners of the internet.

What's Actually New

SSRF is older than most of the engineers reading this. The technique features in OWASP material going back over a decade, and any backend dev who's shipped a "fetch this URL and preview it" feature has, at some point, stared at a private IP filter and wondered if they got the CIDR ranges right. So what's genuinely different here?

The novelty is the attack surface itself. AI inference toolkits like LMDeploy started life as research code: get the model loaded, get the tokens flowing, ship something that works. Vision-language modules in particular accept URLs as inputs because that's how you hand a model a picture without base64-encoding the entire payload through a JSON request. The convenience is the vulnerability. The toolkit is doing exactly what it advertises.

The second new thing is the operational tempo. Anyone who has watched a CVE drop on a Friday afternoon knows the painful gap between public disclosure and patched production. Thirteen hours is short enough that the traditional patch-management cadence (review, stage, schedule, deploy) doesn't even start its first meeting before exploitation lands. For teams running LMDeploy behind an internal API gateway with cloud metadata reachable from the inference pod, that's a lost weekend.

The third thing, and the one I'd argue matters most, is the target profile. AWS IMDS and Redis aren't random. They're the bits of infrastructure that hold credentials and session state. An attacker who can pivot from a model-serving endpoint to IMDS has effectively turned your inference cluster into a credential vending machine. Hugging Face's deployment guides have spent years pushing engineers toward containerised inference, but containers share metadata services with everything else in the VPC unless you go out of your way to lock IMDSv2 down.

The concierge isn't just fetching packages anymore. He's been told the master key cabinet is right behind his desk.

What's Priced In for AI Development

Most senior engineers reading this already assumed that AI tooling, especially the open-source layer, had a security debt problem. That's priced in. The model-hosting ecosystem grew up fast, prioritised researcher ergonomics, and inherited the security posture of academic codebases. Nobody's surprised that an SSRF showed up in a vision-language URL fetcher. If anything, the surprise is that we didn't see this exact CVE six months earlier.

What's not priced in: the speed of weaponisation against AI-specific infra. The thirteen-hour window suggests attackers now treat AI toolkits as first-class targets, not afterthoughts. That's a shift. A year ago, the assumption was that opportunistic scanners would catch up to AI-stack CVEs eventually. "Eventually" is now "before lunch the next day".

Also not priced in: the fact that defensive tooling around AI inference is still immature. Standard WAF rules don't know what a legitimate vision-language request looks like. Network policies in most Kubernetes deployments allow pod-to-IMDS traffic by default. The agentic patterns documented in Anthropic's docs and similar references assume you've already solved the network egress question, but at the model-serving layer, that question is often unanswered.

Contrarian View

Here's the unpopular take: the thirteen-hour figure, while alarming, might also mean the detection ecosystem is finally working. Sysdig caught the exploitation attempts. Five years ago, the same flaw might have been silently exploited for months before anyone noticed, because nobody was watching AI inference traffic with the same instruments aimed at traditional web stacks.

The other contrarian beat: SSRF in load_image() is a fixable problem with a known shape. Validate IPs, block private ranges, force IMDSv2 with hop-limit one, restrict egress at the network layer. None of this requires inventing new defensive approaches. The real risk isn't this CVE. It's the next one, in the next AI toolkit, in a code path nobody's audited because the toolkit is six months old and ships a new feature every Tuesday.

If you treat CVE-2026-33626 as a one-off, you'll patch it and move on. If you treat it as a representative sample of what AI infra security looks like in 2026, you'll start asking harder questions about every URL fetcher, every tool-use hook, and every model that takes a string and turns it into a network call.

Key Takeaways

Patch LMDeploy immediately if you're running the vision-language module. The thirteen-hour exploitation window means assume-breach is the right posture for unpatched deployments.
Lock down AWS IMDS to v2 with hop-limit one across all inference pods, not just the ones you think are exposed. Cloud metadata is the prize attackers actually want.
Treat AI toolkit URL fetchers as untrusted egress points. Network policies should block pod-to-private-range traffic unless explicitly required.
Build detection for multi-model exploitation patterns. The attackers in this case spread requests across different vision-language models to evade single-endpoint monitoring.
The concierge model of trust, where AI tooling fetches whatever it's asked to fetch, is the design flaw. Until validation lives at the toolkit layer by default, every new AI infra component should be assumed to ship with a similar door.

Frequently Asked Questions

Q: What is CVE-2026-33626 and why does it matter?

It's a server-side request forgery vulnerability in LMDeploy's vision-language module, specifically in the load_image() function. It matters because the function doesn't validate whether URLs point at internal or private IPs, letting attackers pivot from a model endpoint into cloud metadata services and internal networks.

Q: How quickly was the LMDeploy vulnerability exploited after disclosure?

Researchers at Sysdig detected exploitation attempts within 13 hours of public disclosure. That's fast enough to outrun standard patch-management cycles, which is the practical reason this CVE is worth treating as urgent rather than routine.

Q: What should engineering teams running AI inference infrastructure do now?

Patch LMDeploy if you use the vision-language module, force AWS IMDSv2 with a hop limit of one, restrict egress from inference pods to private IP ranges, and audit any toolkit code path that fetches URLs on behalf of a model. Treat AI infra URL fetchers as untrusted by default.

James O'Brien

RiverCore Analyst · Dublin, Ireland

// RELATED ARTICLES

GPT-5.5 Ships: OpenAI Retakes the Frontier Model Lead

OpenAI's GPT-5.5 ships with a 20% token speedup, an 82.7% Terminal-Bench score, and no API. Here's what platform teams should plan for this quarter.

DeepSeek V4 Lands Open-Source on Hugging Face

DeepSeek dropped V4-Pro and V4-Flash on Hugging Face today. A 1.6T MoE flagship, a 90% lighter KV cache, and a calm jab at Claude Opus 4.6.

Warehouse-Native CDP vs Tealium: The Real Engineering Tradeoff

Warehouse-native CDPs trade licensing fees for engineering headcount. For mid-sized teams, that swap rarely pencils out. A breakdown of when each model wins.