Agentic Engineering: ADD — AI Driven Development

There’s a moment when you realize you’re no longer a programmer in the traditional sense. You’re not writing every line of code anymore — you’re orchestrating agents that do it for you. Meshes Studio is built almost entirely this way. Here’s how it works, what I’ve learned, and where I still need to show up.

What ADD means

ADD — AI Driven Development — doesn’t mean asking a chatbot to write code and copying it into your editor. It means a structured workflow where the AI agent goes through the entire development cycle: reads the spec, makes a plan, implements, runs checks, does the review, and opens the PR. You review the plan, review the code diff, and move on.

The difference from “vibe coding” is discipline. Without structure, agents produce code that looks fine but breaks at the first corner case the spec didn’t cover. With structure, they produce code that passes CI, passes review, and ships to production.

The foundations

Specs — not the code — are the source of truth

The first thing I changed: the spec is the source of truth, not the code.

AI-generated code changes fast and can get “polluted” — different agents leave different traces, inconsistent conventions, duplicated logic. If the spec is vague or missing, the agent fills in the gaps with assumptions. Some correct, some not. And you won’t know which is which until something breaks.

A clearly and completely written spec means any agent, at any point, can read what the system should do and verify that the implementation respects that. It’s also the only place where human input truly matters — any ambiguity in the spec becomes a hallucination in the code.

I write specs using AI too — I start with what I want to build in natural language, ask the agent to structure it as a spec, then read every line and correct it. I read every line. If you skip this, you pay later.

Spec-based tests

Tests are not optional in ADD — they are the mechanism by which the agent self-corrects.

When an agent writes a wrong test or forgets to write one, its internal loop has no feedback. It implements, nothing fails, it assumes everything is fine. Tests written from the spec create a closed loop: if the implementation deviates from the spec, the test fails, the agent sees the error, and it corrects itself without you having to step in.

Strict checks — more constraints mean better results

Counter-intuitive: the more automated constraints you add, the better code agents produce.

We have checks for: TypeScript types, code formatting, file size limits, banned fields in serializers, missing migrations, updated API schema. All of them also run as git hooks. If the agent forgets to run make generate schema after an API change, the hook stops it and tells it exactly what to do.

Agents respond well to immediate, explicit feedback. A clear error message — “schema is stale, run make generate schema” — is far more effective than a vague description of the problem in the prompt.

Clear and reproducible tooling

An agent that doesn’t know how to run tests or how to deploy wastes time in useless loops. All tooling in Meshes Studio goes through make — a single entry point with clear commands: make check be, make check fe, make check build, make dev deploy.

The agent never guesses how something is run. And if a command fails, it fails with a clear message, not ambiguous output the agent has to interpret.

Parallelization — each agent with its own worktree

I normally work with 8–10 agents in parallel. Each agent gets its own git worktree — a separate copy of the repository, with its own isolated development environment. Ports are dynamically generated per worktree, containers are separate, the database is separate.

This means ten agents can run make dev deploy simultaneously without interfering with each other. The tooling has to support this from the start — if a make check writes output to the same file regardless of worktree, you have a concurrency problem before you’ve even begun.

The workflow in practice

1. The spec — the one place I can’t delegate

I create or edit the spec starting from what I want to build. I ask the agent to structure it, identify edge cases, and propose acceptance criteria. Then I read everything. I skip nothing.

Anything unclear in the spec leads directly to hallucination and sloppiness in the implementation. Writing a good spec is harder than writing good code — you have to know what you want before you’ve seen it.

2. The plan — reviewing every step

Starting from the spec, I ask the agent to enter plan mode: scan the existing code and build a step-by-step plan. I read the plan as carefully as the spec.

Unwanted artifacts in the plan — an unsolicited refactor, a premature abstraction, an approach that doesn’t fit the existing architecture — become unwanted code. It’s easier to cut a line from the plan than to go back on already-written and tested code.

Once the plan is ready, I run it through a second agent specialized in code review (Codex) that analyzes it critically and can identify architectural issues before a single line of code has been written.

3. Implementation with checks

The agent implements the plan and runs checks after each logical unit of work. Not at the end — along the way. If a test fails at step 3, the agent stops and resolves it before continuing.

I never accept “I’ve implemented it, checks seem to be passing” as an answer. I need to see the actual command output with all tests green.

4. Completeness verification

When the agent reports it’s done, I explicitly ask it to verify it has implemented everything in the plan. Agents tend to forget the last steps or mark things as “done” when they’ve only partially completed them. A simple prompt — “verify each step in the plan and confirm it’s implemented” — usually catches 1–2 missed items.

5. AI review

I send the diff to an agent specialized in review. It returns categorized issues: BUG, SECURITY, PLAN (something that doesn’t respect the architecture), COMMENT (non-blocking observations).

I decide what gets implemented and what doesn’t. Not everything the review finds needs to be fixed — sometimes it’s over-engineering, sometimes it’s out of scope. But bugs and security findings always get implemented.

6. Green checks → merge

If the checks are green and the review is approved, the agent opens the PR. I merge manually. I don’t delegate this — the last eye on the diff before merge is mine.

The biggest challenge

It’s not the tooling. It’s not AI itself.

It’s having specs ready before agents start. And keeping up with reviewing plans.

I can run 10 agents in parallel. Each one finishes a prompt in 5–10 minutes. That means every few minutes I have 2–3 plans or diffs waiting for my review. If I fall behind on reviews, I freeze the entire pipeline.

The bottleneck is not agent speed. It’s the speed at which I can produce clear specs and do quality human review. That’s the real work in ADD.

If you want to dig into any of this — tooling, workflow, how I structure specs — find me on LinkedIn or Instagram.