SprayberryLabs The Audit offer →

Sample deliverable

We audited a mature codebase, found a real ReDoS — and threw out two findings that didn't survive the code.

An actual Sprayberry Labs audit of deepdive, our open-source research agent. It fetches and parses untrusted web content — search results, pages, and each site's robots.txt — so it has a genuine attack surface. This is exactly the deliverable you receive for the $1,500 Audit.

The headline finding was real — and confirmed. The robots.txt matcher compiled an attacker-controlled pattern straight into a regex (every *.*). A site you research can serve a robots.txt with a few dozen *, and the resulting regex backtracks catastrophically. We didn't just assert it — we measured it: ~10× slower per two extra wildcards (0.04ms → 218ms at ten), extrapolating to a hung process. We fixed it with a linear matcher + a regression test, CI-green the same day (PR #50).
And we throw out findings that don't hold up. An automated first pass on this same repo flagged a "session path traversal" and an "API-key leak." Both were checked against the real code and the real threat model — a single-user local CLI, and a key that's sent exactly as the vendor's own API documents — and dropped, not published. Every finding that did ship was re-extracted from the source and confirmed by a deterministic gate: 5 of 5 passed, 0 fabricated. That's the difference between a report you can act on and one you have to second-guess.

Subject: github.com/askalf/deepdive @ d478a23 · TypeScript / Node · ~40 source modules · ~30 test files · Method: static source analysis (the ReDoS was additionally reproduced with a standalone timing harness)

1. Executive Summary

deepdive is in strong shape — a security-conscious, well-tested codebase with the threat model written into its comments, bounded inputs throughout, atomic file writes, and a deliberate, documented effort to avoid ReDoS in its URL handling. The audit found one P2 worth fixing now and four P3 hardening items — no P0/P1.

The P2 is a slightly ironic one: the codebase carefully hand-writes its URL helpers regex-free specifically to dodge ReDoS, but the one place it compiles untrusted input to a regex — the robots.txt matcher — is exactly where a ReDoS slipped in. It's a denial-of-service against your own process (no data exposure), it's externally triggerable on the default path, and the fix is small. We shipped it.

2. Findings

P2✓ Fixed — same dayReDoS via attacker-controlled robots.txt

Before fetching a host, deepdive fetches and respects that host's robots.txt (the polite default). Each Allow/Disallow pattern was compiled to a regex by expanding every * to .* with no bound on the wildcard count. Since the robots.txt is served by the site being researched, that input is attacker-controlled — and .*.*.*… against a non-matching path backtracks polynomially. Reproduced on a 17-character path: 2★ → 0.04ms, 6★ → 2.2ms, 10★ → 218ms — roughly 10× per two extra wildcards, so a robots line with a few dozen legal * takes the per-URL check from microseconds to minutes and hangs the run.

Remediation (shipped, PR #50, CI-green): replaced the regex with a linear two-pointer wildcard matcher — O(path × pattern) worst case, no catastrophic backtracking — preserving the exact */$ semantics, plus a regression test that feeds a 50-wildcard pattern and asserts matching stays linear. Full suite: 433/433.

P3 · hardeningFour hardening items (no live exploit)
  • robots crawl-delay parsed but never enforced — deepdive honors allow/deny but ignores the host's requested pacing (the field is read and surfaced, then nothing reads it). A stated-politeness gap.
  • Default search adapter reads the response unbounded — the DuckDuckGo path buffers the full HTML with no size cap, where every other fetch path is byte-bounded. Low risk (trusted endpoint), but inconsistent.
  • Cache temp-file suffix isn't collision-proof — the atomic-write temp name uses the PID, so two concurrent writes of the same URL could interleave. Latent, not active.
  • No SSRF guard on fetched URLs — fine for the primary local-CLI use; matters the moment deepdive is run server-side, where a malicious result could reach localhost or a cloud metadata endpoint. Flagged as deployment-conditional.

3. What we did not publish

Two findings an automated pass had flagged were checked against the real code and discarded — included here because not publishing an unproven finding is the discipline you're paying for:

✗ Dropped"Session path traversal" — not a vulnerability here

The save path's id is system-generated, and the load path runs against the user's own ~/.deepdive on a single-user local CLI — the "attacker" would be the user reading their own files. No privilege boundary is crossed, so there is nothing to exploit in this threat model.

✗ Dropped"API key sent in the request body" — correct usage

The Tavily adapter puts api_key in the POST body — which is Tavily's own documented API contract, sent over HTTPS to Tavily's endpoint. No leak; flagging it would have been noise.

4. What's Working — Keep It


Verification: every published finding was re-extracted from the cited source and confirmed present by a deterministic mechanical gate (5 of 5 passed, 0 fabricated); the ReDoS was additionally reproduced with a live timing harness; two findings that didn't hold against the real code were discarded, not published. Exploit specifics are generalized here (this is a public sample) — a client engagement includes exact reproductions. Static analysis only. Every engagement includes a 60-minute readout call to walk through findings and the roadmap.

Want this for the code you're already running?

$1,500 · ~1 week · a written, verified, impact-ranked audit — findings you can act on, not second-guess.

See what the Audit includes →