When the Toolmaker Reveals the Blueprint
For months, developers using Claude Code pieced together best practices through trial and error, blog posts, and community forums. Then the creator of the tool itself broke his silence. Boris Cherny, who leads Claude Code at Anthropic, shared his personal development workflow in a thread on X that promptly went viral across the engineering community.
"If you're not reading the Claude Code best practices straight from its creator, you're behind as a programmer," wrote Jeff Tang, a prominent developer voice. Kyle McNease called it Anthropic's "ChatGPT moment" — the inflection point where a technology stops being impressive and starts being indispensable.
Five Agents at Once: The Fleet Commander Model
The centerpiece of Cherny's approach is parallelism. Rather than working through a development task linearly, Cherny runs five Claude instances simultaneously in separate terminal tabs, managing them the way a commander manages units in the field.
"I run 5 Claudes in parallel in my terminal," Cherny wrote. "I number my tabs 1-5, and use system notifications to know when a Claude needs input." He also runs additional sessions in the browser, using a custom teleport command to hand off work between web and local machine contexts.
One agent runs a test suite while another refactors legacy code, a third drafts documentation, and two more tackle independent feature work. The human's role shifts from writing code to directing agents and resolving blockers — a fundamentally different cognitive mode that Cherny compares to playing a real-time strategy game rather than typing syntax.
The Case for the Slowest Model
One of Cherny's most counterintuitive revelations was his model choice. In an industry obsessed with inference speed, he exclusively uses Opus 4.5 — Anthropic's largest, slowest model — for everything.
"I use Opus 4.5 with thinking for everything," he wrote. "It's the best coding model I've ever used, and even though it's bigger and slower than Sonnet, since you have to steer it less and it's better at tool use, it is almost always faster than using a smaller model in the end."
The reasoning is sound. The latency bottleneck in AI-assisted development is not token generation speed — it is human correction time. A faster but less capable model completes tasks quickly but requires frequent intervention to fix errors. A slower but more accurate model front-loads the compute cost while eliminating the correction cost, and when five instances run in parallel, the wall-clock advantage compounds significantly.
CLAUDE.md: Making Every Mistake Into a Rule
Standard LLMs have no persistent memory across sessions. Cherny's team addresses this with a file named CLAUDE.md checked into the project's git repository. "Anytime we see Claude do something incorrectly we add it to the CLAUDE.md, so Claude knows not to do it next time," he explained.
The file serves as a growing rulebook prepended to every session's context. A human reviewer spots a mistake in a pull request, tags Claude to document the error as a rule, and every future session benefits. Over time, CLAUDE.md becomes a precision instrument tuned to the specific conventions and pitfalls of each codebase.
Verification as the True Multiplier
Cherny's workflow gives the agent the ability to verify its own work through browser automation, bash command execution, and test suite runs. "Claude tests every single change I land to claude.ai/code using the Claude Chrome extension," he wrote. "It opens a browser, tests the UI, and iterates until the code works and the UX feels good." He estimates this verification loop improves output quality by 2 to 3x compared to generation without verification — a multiplicative effect that applies regardless of the underlying model's capability.
This article is based on reporting by VentureBeat. Read the original article.


