Every untrusted byte is an attacker

If your system reads bytes off the internet — a search result, a robots.txt, a synced file, a URL a model picked — every one of those bytes is an adversary. AI systems ingest hostile input by design, which means the boring old appsec bugs are back, just pointed somewhere new.

There's a comfortable lie going around that AI security is a new discipline — prompt injection, jailbreaks, model-extraction, a whole novel threat surface that needs novel defenses. Some of that is real. But most of what actually bites you is the oldest stuff in the book: SSRF, ReDoS, path traversal, an unauthenticated debug port. The same bugs we've been writing since CGI scripts. The only thing that changed is where the hostile bytes come from.

And that change matters, because it inverts an assumption every web app quietly makes. A normal server treats its own outbound requests as trusted — the URLs it fetches are ones the developer wrote. An AI system does the opposite by construction. It fetches the URL a model picked from a page the model just read. It parses the robots.txt of whatever site showed up in a search result. It writes a file whose name came from a transcript synced off another machine. The agent's entire job is to act on bytes a stranger controls.

So the threat model isn't “what if someone sends a malicious request to my server.” It's “my server is going to go fetch malicious input on its own, enthusiastically, because I told it to be helpful.”

Here are five of these I found across my open-source fleet. Every one is a classic appsec bug. Every one is public, with a real PR. And the throughline I care about most: I found these by reading the live deployment against the source — not by running a linter that would have shrugged at all of them.

ReDoS from a robots.txt

Start with the one that sounds most absurd until you measure it. deepdive is a research agent: it searches the web, then politely reads each result site's robots.txt before crawling. Being polite is what got it.

The path matcher compiled robots.txt Allow/Disallow rules to a regex, expanding every * into .*. A rule like Disallow: / followed by a few dozen stars and a character compiles to ^.*.*.*.*… — a regex that backtracks catastrophically. And robots.txt is fetched from the untrusted site you're researching. Any site that appears in your search results can serve that file.

I measured it on a 17-character path. Each extra pair of stars cost roughly 10x:

baseline   0.04 ms
10 stars    218 ms          (~5000x slower)
~30 stars   the run hangs   (self-DoS, no data exposure)

No data leaks. The attacker just freezes your process — a denial of service triggered by a text file you fetched because you were being a good web citizen. The fix wasn't a longer timeout or a star limit; it was deleting the regex entirely and replacing it with a linear two-pointer wildcard matcher, O(path × pattern) worst case, no backtracking, same * / prefix / trailing-$ semantics. A regression test fires a 50-star pattern and asserts it finishes in under a second. The quietly funny part, noted in the PR: the sibling URL utility was already deliberately regex-free for exactly this reason. One module learned the lesson; its neighbor hadn't yet.

The cloud-metadata SSRF, and the gap I left open

This is the one that can actually take your credentials. hands is a coding agent with a read_page tool: hand it a URL, it fetches and cleans the HTML. The original version fetched any http(s) URL and followed redirects silently. So a prompt-injected page only had to convince the model to read one more link — http://169.254.169.254/, the cloud metadata endpoint — and the agent would happily fetch your instance's IAM credentials and hand them back into the conversation. Or any host on your private network. The model picks the URL; you fetch it; that's the whole exploit.

The naive defense is to blocklist 169.254.169.254 as a string. That's theater. http://[::ffff:169.254.169.254] is the same address. So is a hostname that resolves to it. So is an external URL that 302-redirects there after you've already checked it. The correct shape is resolve, then classify:

  • Refuse internal-by-name hosts before any DNS at all — localhost, *.localhost, metadata.google.internal.
  • Resolve, then reject by IP class: loopback, private (RFC 1918), link-local, CGNAT, and special-use ranges — across IPv4 and IPv6, including v4-mapped ::ffff: addresses parsed back through the v4 checker. That last one is where most homegrown guards leak.
  • Re-check every redirect hop (max 5), so an external URL can't 302 you into the internal network after passing the first gate.

Now the part I want to be loudest about, because it's the whole point of writing this honestly. That guard does not fully close SSRF, and I documented exactly why. Between the moment I resolve the hostname and classify it as safe, and the moment the HTTP client resolves it again to actually connect, an attacker who controls the DNS can change the answer — safe public IP on the first lookup, 169.254.169.254 on the second. That's DNS rebinding, and the only airtight fix is to pin the connection to the exact IP you validated, bypassing the second resolution.

I chose not to do that here. For a read-only fetcher, pinning the socket to a validated IP is a disproportionate amount of machinery, and I'd rather ship a guard that closes the easy 99% with a comment naming the residual 1% than ship one that claims closure it doesn't have. A security control that lies about its own coverage is worse than no control, because someone downstream trusts it. HANDS_ALLOW_PRIVATE_URLS=1 exists for operators who actually need internal fetches. The rebinding gap is written down, in the source, where the next person will read it. False assurance is the bug I refuse to ship.

(The same resolve-then-classify guard, spanning IPv4/IPv6/v4-mapped/CGNAT, also went into one of our Git provider clients, where a user-supplied Grafana URL was the untrusted byte and http://169.254.169.254 sailed straight through a protocol-only check.)

An unauthenticated browser, wide open by default

Most “remote Chromium in a container” images you can pull right now ship a remote-takeover bug, and almost nobody notices because it works fine in the happy path. browser-bridge exposes the Chrome DevTools Protocol so agents can drive a headless browser. CDP is unauthenticated by design — and the obvious launch flag, the one in every tutorial, is --remote-allow-origins=*.

That asterisk means any web origin can open a CDP WebSocket and drive your browser. The attack is DNS rebinding again: a malicious page the user visits rebinds its own DNS to 127.0.0.1, connects to the debug port, and silently takes over the headless browser — navigate it anywhere, read its cookies and sessions, pivot into whatever it can reach. No prompt injection required; the user just has to visit a bad page in a normal tab.

The fix replaced * with an explicit loopback-origin allowlist (http://127.0.0.1 and localhost on the bridge ports, overridable for real deployments). The insight that makes this non-breaking is a fact about the clients: Playwright, Puppeteer, and MCP CDP clients send no Origin header at all. The allowlist only ever blocks browser-based DevTools frontends loaded from a foreign origin — which is precisely the attack surface and nothing else. The legitimate automation clients don't even notice.

Path traversal in a tool whose entire job is ingesting files

claude-sync moves Claude Code session transcripts between machines over whatever transport you've got — Dropbox, iCloud, Syncthing, a USB stick. Its threat model is, by definition, “a file arrived from another machine I don't fully control.” That is the entire premise of the tool. Which is what makes this one sting.

The write path built its destination like this:

targetPath = join(projDir, `${sessionId}.jsonl`)

— with no sanitization, and sessionId flowing in from the parsed, attacker-controllable synced file. A session id of ../../x escapes projDir and writes wherever it likes. A classic path-traversal write primitive, in the one tool on the fleet whose whole reason to exist is reading bytes someone else wrote. The fix is the boring correct one: assertSafeSessionId() with an allowlist (^[A-Za-z0-9._-]+$, explicit rejection of /, \, .., ., and empty), called before any path is built — while still accepting real UUIDs and the -copy suffixes the tool legitimately generates. Shipped with atomic temp-then-rename writes so a mid-sync snapshot never sees a half-written transcript. Allowlist the input shape; don't try to enumerate the bad ones.

Poisoned URLs walking straight past CSP

Last one, and it's a reminder that a defense you trust can be irrelevant to the attack you're facing. amnesia is a metasearch front-end: it renders results from an upstream engine. The renderer piped each result's url and image src straight into href/src through an escaper that only escaped quotes.

So a poisoned upstream result carrying a javascript: or data: URL became a clickable script link. And here's the trap: the page has a Content Security Policy. It feels protected. But CSP's unsafe-inline doesn't block a javascript: URL a user clicks — the scheme injection walks right past it. The CSP was real and the attack didn't care. The fix is a safeUrl() http(s) allowlist that drops any non-http(s) result before it ever reaches the DOM. Same PR closed an empty-key HMAC forgery (the session secret defaulted to '', forgeable from public source) and a Turnstile token that wasn't bound to its origin. None of those are exotic. All three were found by reading the live site against the repo.

The throughline

Notice what's not in this list: a single bug that needed AI to exist. ReDoS, SSRF, path traversal, an open debug port, scheme injection — this is a 2008 appsec syllabus. What's new is only the delivery mechanism. The hostile bytes used to arrive in a request to your server. Now your server goes and gets them, because an agent's purpose is to act on input from the world, and the world includes people who want your credentials.

That reframes how you have to look at every input boundary in an AI system. A URL a model chose is attacker-influenced. A page it fetched is attacker-controlled. A robots.txt, a synced session, an upstream search result — each is a stranger handing you bytes and trusting you'll process them carefully. Treat them as adversarial, because they are.

And none of these came from a scanner. A linter doesn't know that robots.txt is attacker-controlled, or that read_page's URL is model-chosen, or that a CSP is irrelevant to a clicked javascript: link. It doesn't have the threat model; it only has patterns. You find these by sitting with the running system, asking “where do the bytes come from,” and following the hostile ones all the way to where they land. That's the work. It doesn't make a launch video. It's the difference between a system that looks secure and one that is.

This is what Own Your Stack means at the byte level: own the input boundaries of your AI system instead of assuming the framework drew them for you. It didn't. The hostile bytes are already inside the request you were about to make — and the residual gap you can't close cheaply, you write down, you don't paper over.

We build and run software on AI infrastructure that ingests hostile input by design — agents, fetchers, integrations, and the guardrails that keep a helpful system from being a credential-theft vector. If you're putting agents near anything that matters, that's the kind of thing we're good at holding steady.

Start a conversation →
← All writing