Notes

Working observations on building agentic workflows in production.

June 2026

Controls for an autonomous coding pipeline

June 24, 2026

Notes on building Daedalus, an end-to-end coding pipeline on Claude Code, and the controls that make it safe to run unattended: reviewability, scope enforcement, where to parallelize, cost caps, and bounded stopping.
- agents
- claude-code
- methods

May 2026

Evaluating Design in Agentic Development

May 4, 2026

A document-loader A/B where the synthetic corpus said reject and a real-world rerun said accept. Notes on corpus shape, joint metrics, and the evaluation criteria I had wrong.
- evaluation
- methods

April 2026

Evaluating agent skill effectiveness

April 20, 2026

A 60-trial A/B of the Advisors plugin against a no-skill baseline: 96% vs 45%, zero parse failures. Open harness, clean-room isolation, reproducible from a public repo.
- evaluation
- claude-code
- methods