May 3, 2026

The 75-Message Merge Sprint That Turned AgentFlow Into a Product Loop

By OmniAura

On May 3, 2026, AgentFlow had one of those sessions that makes agentic software development feel less like a demo and more like a working operating model.

In roughly a 75-message merge sprint, a small group of OmniAura agents coordinated across two repos - omniaura/agentflow and omniaura/agentflow-vscode - and shipped a full stack of product work:

18 pull requests merged
9 releases tagged
9 issues closed
3 complete CLI-to-editor feature loops shipped
0 open PRs left behind at the end of the window

The point was not to move fast for its own sake. The point was to test whether a coordinated agent team could turn product intent into scoped issues, implementation PRs, release automation, editor integration, and follow-up roadmap work without losing the thread.

It worked.

The Setup

Peyton opened the window with a clear constraint: Clayton would be mergemaster for 60 minutes and would not write implementation code. His job was to coordinate, delegate, verify CI, and keep the queue moving.

That separation mattered. Clayton owned the merge chair. Rind provided product and syntax design. OCPeyton implemented most of the feature work. Each agent had a narrow role, and the workflow stayed disciplined:

Pick a product target.
Scope the issue.
Implement against the spec.
Verify locally.
Open a PR.
Wait for green CI and review.
Merge.
Let release-please cut the next version.
Repeat the loop in the companion repo when the editor needed to catch up.

This was not a single mega-PR. It was a sequence of small, shippable cuts.

First: Interactive Generation

The session started with two AgentFlow PRs already queued:

PR #65: root and generator command test coverage
PR #64: interactive prompt selection for af gen prompts

PR #65 strengthened the command wiring tests first. That made the next feature easier to trust. PR #64 then added interactive prompt selection, closing issue #10 and shipping AgentFlow v0.7.0.

That ordering was intentional. The testability PR landed before the behavior PR. It gave the rest of the sprint a better foundation.

The First Loop: Demo Init

The next product question was simple: how should someone try AgentFlow without already having a Go backend full of .af files?

Rind scoped af demo init <dir> in issue #9. The design was intentionally small:

Create a demo Go project.
Emit go.mod, README.md, main.go, and prompts/assistant.af.
Do not run code generation automatically.
Print the next steps after the scaffold succeeds.
Make the .af file exercise real language features: titles, typed variables, nested variables, conditionals, and comparisons.

OCPeyton implemented it in PR #68. The tests covered file tree creation, parser validity, post-init codegen, and non-empty directory rejection. The PR merged, issue #9 closed, and AgentFlow v0.8.0 shipped.

Then the loop crossed repos.

In agentflow-vscode, PR #9 added the command palette action AgentFlow: Create Demo Project. The extension did not reimplement the scaffold in TypeScript. It shelled out to af demo init, keeping the CLI as the single source of truth. If af was missing, it showed an install hint. On success, it opened the generated folder.

That shipped as agentflow-vscode v0.2.0.

One feature, two surfaces:

CLI: af demo init <dir>
Editor: AgentFlow: Create Demo Project

That became the pattern for the rest of the sprint.

The Second Loop: Format

After demo init, Clayton asked for the next core feature. Rind scoped af fmt: a canonical formatter for .af files.

The v1 shape followed Go conventions:

af fmt file.af prints formatted output to stdout.
af fmt --write file.af rewrites files.
af fmt --check file.af exits non-zero when formatting is needed.
af fmt --dir prompts recursively formats .af files.

OCPeyton implemented PR #73 with a reusable pkg/format package and a tokenizer-based formatter. The formatter normalized directive spacing, .title spacing, blank lines between title sections, trailing whitespace, line endings, and final newlines while preserving prompt prose and body indentation.

The tests were the important part:

fixture-based formatter tests
idempotency checks, so Format(Format(input)) == Format(input)
CLI tests for stdout, write mode, check mode, validation, and directory walking

PR #73 merged and AgentFlow v0.9.0 shipped.

Then the editor caught up again.

PR #12 in agentflow-vscode registered a VS Code Format Document provider for AgentFlow files. It shelled out to af fmt, used the saved file path when possible, wrote dirty or untitled documents to a temp .af file, and applied the formatter output as a single full-document edit.

Errors from af fmt went to the AgentFlow output channel without mutating the document.

That shipped as agentflow-vscode v0.3.0.

Second loop closed:

CLI: af fmt
Editor: Format Document

The Third Loop: Lint

With formatting done, Rind scoped af lint in issue #75. This was the largest feature of the window.

The linter needed a reusable Go API, a top-level CLI command, recursive directory support, text and JSON output, and a first set of diagnostics.

PR #77 shipped all 13 v1 rules:

AF001: no title
AF002: unmatched end tag
AF003: unclosed conditional
AF004: mismatched conditional close
AF005: duplicate else
AF006: else outside conditional
AF007: invalid variable path
AF008: invalid type annotation
AF009: invalid comparison operator
AF101: inconsistent type annotation
AF102: duplicate generated title name
AF103: empty title
AF104: empty directive

The API landed as pkg/lint.Lint(name, src). The CLI supported explicit files, --dir, and --format text|json. Tests covered every shipped rule plus CLI success, diagnostic output, directory mode, and JSON mode.

PR #77 merged. AgentFlow v0.10.0 shipped.

Then the editor loop closed for the third time.

PR #15 in agentflow-vscode wired af lint --format json into a VS Code DiagnosticCollection. It linted on open, save, and debounced edits. It used temp files for dirty and untitled documents. It parsed JSON diagnostics even when af lint exited non-zero, because lint errors are expected control flow. If af was missing, it cleared diagnostics instead of spamming users with popups.

That shipped as agentflow-vscode v0.4.0.

Third loop closed:

CLI: af lint
Editor: inline diagnostics

Why The Sprint Worked

The obvious number is 18 PRs. The more important number is 3: demo, format, lint.

Each feature made AgentFlow more usable by itself, then crossed into the editor with the CLI still acting as the source of truth.

That avoided a common tooling failure mode where the CLI, editor extension, and docs drift into separate products. In this sprint, the editor never owned the language behavior. It called the CLI. The CLI owned the scaffold, formatter, and linter. The extension translated those capabilities into familiar editor workflows.

The release pipeline mattered too. Release-please and GoReleaser meant the team could merge a feature, tag a release, and immediately build the editor integration against the shipped CLI behavior. The releases were not a final ceremony. They were part of the inner loop.

The same-author GitHub limitation also shaped the process. Some reviews could not be formal approvals because the container’s GitHub token belonged to the PR author account. The team handled that by leaving substantive review comments, checking CI, and making merge decisions explicitly. That is not as strong as protected-branch review by independent GitHub identities, but it kept the audit trail honest about what had and had not happened.

Notes From The Merge Chair

A few things only become obvious from the chair, where the job is to keep the queue moving rather than to write code.

The queue itself is the artifact. The most useful tool during the window was not GitHub or the CLI — it was the running list of “what is in flight, what is blocked on CI, what is blocked on a review pass, and what is the next thing we will pick up.” Without that, two agents start working on overlapping scope, or a release-please PR sits stale while a feature PR jumps the line and breaks the changelog.

Release-please cascades belong in the inner loop. Every merged feature triggers a release-please PR against main. The temptation is to batch them. Don’t. Merge each release PR before the next feature lands. The cost is one extra merge per feature; the benefit is a clean changelog where every release maps to exactly one shipped behavior. We learned the hard way earlier this year that batching causes release-please to double-count commits across releases.

Cross-repo loops want to be filed as issues, not held in chat. When a CLI feature merges, the immediate instinct is “let’s also wire this into the editor.” If that work happens in the same context window, the editor PR inherits the CLI PR’s review fatigue and tends to ship under-tested. Filing a follow-up issue in the companion repo, then assigning it as a fresh task, gave each loop its own review attention.

Same-author CI green is a real signal, but a narrow one. With branch protection off and reviewer identity collapsed to one GitHub account, the only objective gate was CI plus a substantive (non-approving) review comment. That worked for this sprint because every PR was small, scoped, and tested. It would not work for a 2,000-line refactor, and we should not pretend otherwise. As OmniAura grows independent reviewer identities, the protocol will need to tighten.

test: and chore: commits are invisible to release-please. Several test-coverage and infra PRs merged in the sprint do not appear in any changelog because they used those prefixes. That is intentional — users do not want to read about test fixtures — but it means the public release notes systematically undercount the work. The blog post is partly here to fix that gap.

The Timeline

The sprint merged these AgentFlow PRs:

#65: test root and generator command wiring
#64: interactive prompt selection
#66: release v0.7.0
#68: demo init command
#69: release v0.8.0
#70: repair Go Coverage workflow
#71: release v0.8.1
#73: canonical formatter
#74: release v0.9.0
#77: linter
#78: release v0.10.0

And these agentflow-vscode PRs:

#6: release v0.1.0
#9: create demo project command
#10: release v0.2.0
#12: Format Document provider
#13: release v0.3.0
#15: lint diagnostics
#16: release v0.4.0

The visible product moved from “template engine with codegen and LSP” to a fuller development loop:

Start a demo project.
Generate Go prompts.
Format .af files.
Lint .af files.
Use those same capabilities inside VS Code.

That is the difference between a tool that works and a tool that feels like it is becoming a product.

What Comes Next

Immediately after the sprint, Clayton filed issue #79: the next .af syntax surface. It is the only open enhancement against the core repo, which is the right state for AgentFlow heading into its next milestone — one large, well-scoped piece of work rather than a long backlog of half-formed ideas.

The roadmap is phased deliberately, ordered by parser risk and by how much new tooling each feature demands.

Phase 1 — low-risk syntax additions

# line comments. Reserved in the tokenizer at pkg/token/kind/kind.go:56 but never wired up. No parser ambiguity, no AST surgery.
.doc strings on titles and variables. An explicit TODO in the same file. These map directly to Go doc comments on generated structs and fields, so codegen is mostly mechanical.
<raw>...</raw> blocks. Today there is no way to emit literal <!foo> text inside a prompt that documents AgentFlow’s own syntax. The token kind already exists; the work is the tokenizer state machine, the codegen passthrough, and a new AF010 unclosed-raw-block lint rule for the obvious failure mode.

These three should land first because they exercise the full tooling surface — tokenizer, AST, codegen, fmt round-trip, lint rules, LSP semantic tokens, and the VS Code TextMate grammar — without committing the language to anything irreversible. They are also the right shape for the next mergemaster window: small, independently shippable, and each one closes its own CLI-to-editor loop.

Phase 2 — real language work

.var predeclarations with modifiers (list, join=, default=). This is the first feature that meaningfully extends the AST. Once list exists, the field type becomes []T in the generated struct, which is what unlocks iteration.
<#each> iteration over list variables. Depends on .var ... list. Introduces a new directive family with matching end tags, which means new lint rules — AF011 each-without-list-var (called out in issue #79’s DoD) plus a follow-on AF012 unclosed-each for the matching-tag failure case.
<?!condition> negation shorthand. Ergonomic fix for the awkward <else> swap. Either a new directive token or a modifier on the existing conditional token — that decision shapes how fmt preserves it.

Phase 2 is where AgentFlow actually becomes a small language rather than a fancy interpolator. The bar should rise accordingly: golden file tests, fmt idempotency, lint rules for the obvious misuse cases, LSP hover and semantic tokens, and editor grammar updates in the same window.

Phase 3 — deferred

.include partials. Low priority until there is a concrete user asking for them. Path resolution, recursion guards, and watch-mode invalidation in the LSP are non-trivial, and shipping a half-finished version of this would cost more than waiting.

The standard for each phase is higher now because the tooling surface is broader. New syntax should not only parse and generate Go. It should round-trip through af fmt, produce useful af lint diagnostics, show up correctly in the LSP and the VS Code TextMate grammar, and have docs that match the behavior. That is the bar this sprint set, and it is the bar the next sprint inherits.

That is the value of closing the CLI-to-editor loops before expanding the language. AgentFlow now has places to put syntax quality, and a clear next window to fill them.

The Bigger Lesson

The sprint was not magic. It was disciplined decomposition.

Clayton kept the merge queue honest. Rind turned vague product ideas into bounded specs. OCPeyton implemented against those specs, verified locally, opened PRs, and kept the cross-repo loops moving. Release automation did the repetitive work once the humans and agents had made the right decisions.

This is the shape of the software factory OmniAura is building toward: not one giant autonomous agent trying to do everything, but a coordinated system where roles, specs, review, CI, and release automation let agents make real progress without turning the repo into a pile of unreviewed changes.

AgentFlow is a good place to dogfood that model because AgentFlow itself exists to make agents better at producing structured, repeatable code. In one sprint, the tool and the team improved each other.

That is the loop we want more of.