Technology

Building a Leaner AI Development Workflow for Claude Pro

How I built bmad-lite-skills — a token-efficient port of the BMAD methodology — to fit structured AI-assisted development inside Claude Pro's context budget.

Earlier this year, I started using BMAD as a structured way to build software with Windsurf at work. The basic idea is sound: write planning docs first (PRD, architecture, epic breakdown), then give the AI one well-scoped story at a time to implement. Instead of one giant "build me this app" prompt that wanders off course, you get repeatable cycles with clear checkpoints.

It worked well enough that I wanted to use it on everything, including some hobby projects at home. Then I ran into the context budget like a brick wall.

The Problem with Token-Heavy Workflows on Claude Pro

Sonnet (on Claude Pro) gives you roughly 1.5 million input tokens per month. That sounds like a lot until you're running a structured planning workflow that front-loads all your documentation into every session. BMAD's original design has an "activation ceremony" that runs on every skill invocation, persona overhead baked into each agent, and a habit of re-reading the full PRD and architecture docs every time you create a new story. This is part of the magic of what makes BMAD so great at keeping AI coding on guardrails.

However, on a 12-story hobby project, those reading costs add up fast. I was burning through my 5-hour window budgets in like 30 minutes, and my weekly budgets within a few days.

The frustrating part was that most of the tokens weren't doing useful work. The PRD I already read to write story 1.1 doesn't need to be re-read for story 1.2. The persona overhead that tells Claude it's "BMAD Agent PM" before every planning call doesn't change the output in any meaningful way. And a 1,500-line retrospective template is a lot of tokens for 7 questions I actually care about.

So I did what I usually do when a tool doesn't fit my constraints: I stripped it down and rebuilt it.

What I Built

bmad-lite-skills is a standalone Claude Code skills library that ports the BMAD planning flywheel into a leaner form. The full workflow is still there: PRD, UX design specs, architecture, epics, story creation, implementation, code review, retrospective, security review. What's gone is the overhead that doesn't earn its token cost, especially for hobby projects.

Specifically, I removed:

  • The activation ceremony that ran ~700 tokens per skill invocation
  • Agent persona overhead (~400 tokens per call, zero benefit for solo use)
  • The three-tier TOML customization infrastructure (replaced by plain-English rules in CLAUDE.md)
  • sprint-status.yaml (replaced by GitHub issue labels - same visibility, no extra file to maintain)
  • The 8-file JIT step architecture for the architecture skill (collapsed to a single inline workflow)

What that gets you, concretely: about 55% fewer input tokens over a 12-story project. The numbers break down roughly like this:

Phase Original BMAD bmad-lite Reduction
Planning (PRD + arch + epics) ~18,000 tokens ~8,000 tokens ~55%
create-story x 12 ~58,000 tokens ~14,000 tokens ~76%
dev-story + review x 12 ~62,000 tokens ~42,000 tokens ~32%
Total ~162,600 ~72,600 ~55%

The big win on create-story comes from caching.

The Key Insight: Epic Context Caching

When you run /create-story for the first story in an epic, you pay the full reading cost. The skill reads the PRD, the architecture doc, and the epics breakdown to understand the full picture, then distills the relevant subset into an epic-N-context.md cache file. That's the expensive call.

Every story after that reads only the cache. Story 1.2, 1.3, 1.4 - they all pull from the distilled epic context rather than re-reading the full PRD. That's where the 76% reduction comes from on create-story after the first.

The /dev-story skill takes this further: it reads only the story file itself. The story file has everything embedded (acceptance criteria, architecture constraints, implementation notes, test cases). The implementation agent never touches the PRD or architecture docs, because the story file already has what it needs.

This also means code review can run inline at the end of a dev session. The reviewer already has the story context. No separate session startup cost, no re-reading the design docs.

Session Hygiene as Architecture

One thing I added that isn't really a "feature" in the traditional sense: explicit session hygiene guidance baked into the workflow.

Context accumulates silently. If you run /prd, then /architecture, then /epics, all in one session, the PRD stays in context for every subsequent message, even after you're done with it. By the time you're in a dev session three planning phases later, you're carrying thousands of tokens of documentation that the dev work doesn't need.

The rule I settled on: start a new Claude Code session for each major phase. Finish the PRD, end the session. Open a fresh one for architecture. This sounds like friction, but it pays for itself almost immediately. Each session starts clean. The implementation agent can't accidentally over-index on a PRD constraint that doesn't apply to its story.

Session hygiene is essentially a form of dependency management for context windows. You're deciding, deliberately, what each agent gets to know.

What I Added That BMAD Didn't Have

Stripping the overhead was the starting point, but over a few weeks of using this on actual projects, I added things that weren't in the original at all.

Guardrail hooks. Three shell scripts wired into .claude/hooks/: one that blocks hardcoded secrets at write time, one that warns when you're about to use a color that's not in your design token set, and one that streams tool-call telemetry to a JSONL file. These run at zero model tokens - they're deterministic shell scripts, not AI calls. The harness enforces them, the model doesn't have to remember them.

Eval regression net. /create-story seeds executable test cases from each story's acceptance criteria. /dev-story runs them after implementation. If a later story breaks an earlier eval case, that's a regression and it blocks the story close - same as a red build would. The cases accumulate across the project lifetime, so you get a growing safety net that costs nothing to run.

Subagent delegation via /story-flywheel. Each phase of the create/dev/review loop now runs in an isolated subagent context. The story creator reads the PRD, architecture, and epics to write the story spec, then exits. Those docs never enter the main thread. The developer reads only the story file. The reviewer reads only the diff. This shaves another 10-15% off the main thread context compared to running the phases manually in a long session.

Observability ledger. Every dev and review pass appends a structured line to docs/metrics/flywheel-ledger.jsonl - story, model used, build result, eval pass rate, finding counts. You can jq this file to see per-story quality trends over time. Useful when you're trying to figure out whether a specific story's complexity or a specific model choice is causing review findings to spike.

Platform guidance systems. The skills scaffold platform-specific best-practice reference docs into docs/setup/swift/ or docs/setup/web/ depending on your project type. The dev and review skills read the relevant sections before acting. There's also a /refresh-swift and /refresh-web skill that researches current patterns from primary sources and updates those docs, so the guidance doesn't go stale.

How to Use It

The setup is designed to work across all your projects without copying files into each one:

git clone https://github.com/rterakedis/bmad-lite-skills ~/repos/bmad-lite-skills

Then at the start of any Claude Code session:

/add-dir ~/repos/bmad-lite-skills

Or wire it as a startup hook in your project's .claude/settings.json so it loads automatically. The /setup skill writes that hook for you when initializing a new project.

From there the workflow runs: /prd to write requirements, /architecture for the technical design, /epics to break it into stories and create GitHub milestones, then the /create-story/dev-story → code review loop for each story. The /story-flywheel skill automates that loop entirely if you want to run it hands-off.

If you're already using full BMAD, there's a /setup migrate flow that reads _bmad/config.toml, moves all your planning docs and story files into the bmad-lite layout, stamps story statuses from sprint-status.yaml into the GitHub issue labels, and cleans up the old infrastructure.

The Broader Takeaway

What I kept learning while building this is that most AI workflow overhead isn't load-bearing. A lot of it is there because the original authors were being thorough, or because they were optimizing for a different use case (teams, not solo), or because the cost only becomes visible when you're counting tokens instead of just running things and hoping for the best.

The discipline of actually measuring what each part of a workflow costs in tokens forces you to think about which parts earn their keep. The activation ceremony that ran on every invocation? Cut it and nothing broke. The PRD re-read on every story creation? Cache it and the output got better, not worse, because the cache is more focused than the raw doc.

Most workflows have slack in them. You only find it when you have to.


The repo is at github.com/rterakedis/bmad-lite-skills. MIT licensed. Comments welcome!