dario: a self-healing autonomous release pipeline

A published package that has to track a dependency Anthropic moves every few days — with no manual bandwidth to do it. Here's the closed-loop pipeline that watches, validates, ships to npm, and heals itself. Six verifiable releases in 72 hours, zero humans in the loop.

The engineering problem

Fast-moving upstream dependency. No manual bandwidth to track it. A published npm package that must stay in lockstep or break for every user.

The naive solution is a cron job that opens a PR when a version number changes. That works until GitHub suppresses your pipeline event, the release job silently skips a version, or the package publishes without confirming the build was actually green.

The real solution is a closed-loop autonomous pipeline that watches, validates, publishes, and monitors itself — and self-heals when any of those steps break.

That is what dario is.

What dario does

@askalf/dario is an open-source OAuth router that lets you use your own Claude Pro/Max subscription in any Anthropic-compatible tool.

This case study is about the release-automation engineering — not the product premise. The product itself updates every time Anthropic ships a new version of Claude Code. Tracking that manually, at this cadence, is not viable. So the pipeline does it automatically, end to end, with zero human in the loop.

  • Repository: github.com/askalf/dariopublic
  • npm: @askalf/dariopublished, latest v4.8.97
  • Stars: 289 (as of 2026-06-25)
  • CI: all green across the full workflow suite (verified 2026-06-25)

The full autonomous loop (verified in source)

All 17 GitHub Actions workflows are readable at .github/workflows. The pipeline-critical ones:

1. Hourly upstream watch — cost-aware design

cc-drift-watch.yml runs on a 60-minute schedule. Its design is explicitly cost-aware: a cheap version-check gate runs first and exits immediately if the upstream @anthropic-ai/claude-code@latest version hasn't changed. The heavyweight validation only fires when drift is actually detected. The logic and the rationale are both documented inline.

2. Auto-rebake on template drift

cc-drift-template-watch.yml detects when Anthropic ships template changes alongside version bumps — not just a version string change — and opens a dedicated auto-rebake PR. This distinction matters: a version bump that doesn't include a template change takes a different code path than one that does.

Example from the live PR trail:

  • PR #568 2026-06-24 — auto-rebake: template drift detected 2026-06-24
  • PR #560 2026-06-23 — auto-rebake: template drift detected 2026-06-23

3. Auto-merge

auto-merge-bot-prs.yml picks up the drift PRs after they pass CI and merges them without operator involvement. No queue. No dashboard. No notification.

4. Self-validating release chain — idempotent, three trigger paths

cc-drift-auto-release.yml is the most engineered piece. Its sequence: build → smoke → tag → GitHub Release → npm-publish. Two design decisions worth noting:

Idempotency gate. The first real step checks whether a v<version> tag already exists on the repo. If it does, the job short-circuits and exits clean. This means re-runs, retries, and schedule-triggered duplicates are safe by construction — they won't double-publish.

Three trigger paths — engineered around GitHub's own constraints. The workflow is triggered by: (a) PR-merge fast path, (b) hourly schedule fallback, (c) manual dispatch. The schedule fallback exists specifically because GitHub suppresses workflow-run events triggered by GITHUB_TOKEN pushes — a real platform limitation that silently swallowed versions v3.32.1 and v3.32.2 before the fallback was added. This is documented in the workflow header, with the specific historical releases that motivated each trigger path. The engineering decisions are traceable to the failure history.

5. Self-healing and liveness guards around the core loop

The pipeline is monitored by its own monitors:

  • self-healing-supervisor.yml — watches the pipeline itself and fires recovery paths on failure
  • cc-drift-watcher-liveness.yml — liveness check for the drift-watch job
  • cc-oauth-health.yml — credential health monitoring
  • npm-token-health.yml — npm publish credential health
  • compat-test-self-hosted.yml — compatibility validation on self-hosted runners
  • sdk-drift-watch.yml — watches the Anthropic SDK independently

The self-healing principle also extends to functional fixes. PR #564 (2026-06-23) — feat(oauth): independent self-hosted OAuth-health alert — was a real self-healing fix that landed as part of a normal autonomous cycle.

Verifiable output: 6 auto-releases in 3 days

The cleanest evidence for a pipeline is what it ships. From the GitHub Releases page — 6 releases, 3 days, no operator intervention:

v4.8.92   2026-06-23
v4.8.93   2026-06-23
v4.8.94   2026-06-24
v4.8.95   2026-06-24
v4.8.96   2026-06-25
v4.8.97   2026-06-25

The corresponding autonomous PRs (all merged, CI green):

  • PR #580 2026-06-25 — chore(cc-drift): v4.8.97 — maxTested → v2.1.193
  • PR #576 2026-06-24 — chore(cc-drift): v4.8.95 — maxTested → v2.1.191
  • PR #568 2026-06-24 — auto-rebake: template drift detected 2026-06-24
  • PR #560 2026-06-23 — auto-rebake: template drift detected 2026-06-23
  • PR #564 2026-06-23 — feat(oauth): independent self-hosted OAuth-health alert (self-healing fix)

The full PR trail and Actions history are readable at the public repo. Every release in the list above has a corresponding green Actions run.

The project has 608 commits, first published 2026-04-08 — this is a mature, sustained build, not a weekend project.

What this demonstrates

Sustained autonomous execution. The pipeline ran six full drift-detect → PR → CI → merge → build → smoke → tag → release → publish cycles in 72 hours without operator involvement. The output is independently verifiable by anyone with a browser.

Defensive engineering for platform constraints. The three-trigger architecture and the idempotency gate are direct responses to specific GitHub platform limitations. The design decisions are documented in the workflow source, traceable to the failures that motivated them.

Self-monitoring, not just automation. Liveness checks, credential health monitors, a self-healing supervisor, and compat tests run on the same schedule as the core loop. The pipeline knows when it's broken.

Cost-aware design. The cheap gate before the heavyweight check is not an optimization afterthought — it's in the workflow header as a first-class design comment. The same thinking applies to the rest of the AskAlf platform: every agent run is budgeted, every autonomous cycle has a hard cap.

The broader portfolio (reference only — dario is the spine)

dario's self-healing release engineering is the lead demonstration. For those who want to see more of the stack, the broader portfolio includes:

Agent-security trilogy (warden · canon · keeper). A composable open-source defense-in-depth stack for AI agent tool calls: a runtime firewall (warden), supply-chain gate (canon), and credential vault (keeper). All three repos are public, MIT-licensed, and CI-green, with red-team-confirmed bypass fixes documented in the PR trail, and CodeQL clean across all three. A review of the major agent frameworks (LangGraph, CrewAI, AutoGen, Letta, Agno, Dify, and others) found none with an equivalent built-in control plane at the tool-call level. — warden · canon · keeper · agent-security-stack

deepdive — source-trust engine. A local research agent (plan → search → headless-browser fetch → extract → synthesize → cited answer) with a measured source-authority scoring axis decomposed as a full P1→P4 epic with benchmarks: Epic #111, shipped as @askalf/deepdive v0.26.0. Public, published, CI-green.

All claims are verifiable at the linked public GitHub repositories and the npm registry (as of 2026-06-25).

We build and run software on AI infrastructure that shifts under it — agents, release pipelines, the guardrails and automation that keep them shipping without a human in the loop. If you're putting agents near anything that matters, that's the kind of thing we're good at holding steady.

Start a conversation →
← All writing