From prompt to pull request — the full lifecycle of an agentic task, in practice

Step by step, decision by decision, from a Slack message to a finished deliverable in GitHub.

I am going to walk you through a real task on the agentic orchestration system I built. Not a cherry-picked demo. A representative task that shows what the system actually does when it works, where the interesting decisions happen, and what the output looks like. The task I will use is this: researching the patentability of a novel technical approach to fast-charging electric vehicle batteries.

This is a good representative task because it is genuinely complex — it requires prior art research, technical analysis, legal reasoning, and synthesised writing. It has multiple modules with real dependencies. It costs real money (in this case, around eighty cents). And it produces output that is actually useful, not just structurally correct.

Step one: the Slack command

The interaction begins with a Slack slash command. I type:

/build research patentability of fast-charging EV battery approach using dynamic impedance matching

The Master Agent receives this via Socket Mode — a persistent WebSocket connection that means there is no polling, no webhook setup, and sub-second latency from message to receipt. The first thing it does is not generate questions. It researches.

The Master Agent makes several LLM calls at reasoning tier to build a background understanding of the domain — what fast-charging means technically, what the patent landscape looks like generally, what dynamic impedance matching involves. This takes about 45 seconds. The purpose is to make the subsequent clarifying questions genuinely informed rather than generic.

Step two: clarification

The Master Agent posts back to the thread with two mandatory questions and three optional ones with suggested defaults. The mandatory questions are specific to the domain — what is the novel aspect of this approach (the dynamic impedance matching itself, or the control algorithm that implements it, or both?), and is there an existing prototype or is this a conceptual invention? The optional questions have sensible defaults: geographic coverage (assuming US and PCT), claim scope (assuming both system and method claims), and prior art depth (assuming a standard search of the last fifteen years).

I reply with answers to the mandatory questions and “use defaults” for the rest. The Master Agent processes my reply, updates its internal intake record, and confirms with a brief clarified brief back in the thread — summarising what it understood and what assumptions it is carrying forward. This is not a courtesy step. It is a forcing function that catches misunderstandings before they propagate into the plan.

Step three: persona and decomposition

The confirmed intake triggers the spec stage. The Master Agent classifies the task against seven persona templates and selects patent_research — correctly, based on a combination of keyword matching (patent, patentability, prior art) and the clarified summary. The persona template provides the starting module structure: prior art search, novelty analysis, claims drafting, and report synthesis.

The decomposer adapts this structure to the specifics of the task. It produces five modules:

a prior art search scoped to impedance matching in fast-charging contexts
a technical novelty analysis comparing the specific approach to find prior art
a claims drafting module for both system claims and method claims
an analysis of potential obviousness arguments that an examiner might raise
a synthesis module that stitches everything into a final report

The dependency graph is straightforward: prior art search has no dependencies and runs first. Novelty analysis and obviousness analysis both depend on prior art search and can run in parallel. Claims drafting depends on novelty analysis. Report synthesis depends on everything.

For the three complex modules — novelty analysis, obviousness, claims drafting — the system spawns dedicated deep-dive sub-agents that refine the module spec: clarifying inputs and outputs, estimating token counts, identifying risks (in this case, the primary risk is that the prior art search misses relevant non-English language patents, which is flagged explicitly in the module’s risk field).

The system posts a spec preview to the Slack thread: five modules, three waves of execution, persona patent_research, output format doc.

Step four: approval

The approval prompt contains the full cost estimate: $0.55 to $1.20, medium confidence, recommended budget $0.90. The risk scanner has flagged one medium-severity risk (the external call to patent databases in the prior art search module) and one low-severity risk (the novelty analysis module has a high token estimate). No high-severity risks — no destructive operations, no irreversible publishes.

I click Approve. The system locks the budget at $0.90 and moves the task to the execution queue. Elapsed time from my initial Slack command to clicking Approve: seven minutes, most of which was me reading the clarifying questions and the approval prompt.

Step five: repo provisioning and launch

Within ten seconds of approval, the system creates a private GitHub repository named agentic-task-a3f8c921 (the first eight characters of the task UUID). It bundles the relevant source code, the full build spec, and a generated Jupyter notebook tailored specifically to this task — with the task ID pre-filled, the list of required Colab secrets pre-populated, and the launch cell ready to run.

A Slack message arrives with two links: the GitHub repo and a Colab launch URL. I click the Colab link. The notebook opens. I click Runtime → Run all.

Step six: remote execution

The notebook clones the repo, installs dependencies, loads secrets from Colab’s Secrets manager, and starts the Build Master. From this point, I can close the Colab tab if I want — although in practice I keep it open for the first few minutes to confirm things are running.

The Build Master reads spec.json from the cloned repo, reconstructs the dependency graph, and begins execution. Wave one is the prior art search — a single sub-agent that calls the research_agent role. This agent does several things in sequence: queries patent databases conceptually, synthesises what it finds, and produces a structured artifact called prior_art_search.md containing twelve identified relevant patents and publications with summaries and relevance assessments.

Throughout this, Slack receives a stream of progress messages: “▶ research_agent started”, “⏳ Gathering prior art references (25%)”, “📎 Artifact produced: prior_art_search.md”. The message bus is carrying real-time events from the Colab process back to the local Master Agent, which forwards them to Slack. The latency between an event happening on Colab and the corresponding Slack message is under two seconds.

Wave two begins once the prior art search completes. Two sub-agents spin up simultaneously: the novelty analysis agent and the obviousness analysis agent. Both receive the prior_art_search.md artifact as an upstream input. They run in parallel. The novelty analysis agent signals confusion at one point — it is uncertain how to handle a Japanese-language patent that appears potentially relevant. The system automatically spawns a research agent, which returns notes on how to handle non-English prior art in a US patent context. The novelty analysis agent retries with those notes and succeeds.

Wave three is claims drafting, which receives both the prior art search and the novelty analysis as inputs. This is the most LLM-intensive module — claims language requires careful balancing of breadth (for protection) against prior art (for validity) — and it produces the largest artifact: a claims document with three independent claims and seven dependent claims, each with reasoning.

Wave four is report synthesis. The stitcher takes all four module artifacts, calls the LLM at the reasoning tier, and produces a coherent final report — not just concatenation, but a genuine editorial pass that resolves redundancies, adds transitions, and produces an executive summary at the top.

Step seven: output and content

The Build Master pushes all artifacts to the GitHub repo’s outputs/ directory and commits. The final deliverable — FINAL_DELIVERABLE.md — is a 2,200-word patent research report with prior art landscape, novelty assessment, draft claims, and examiner risk analysis.

Slack receives the completion message: “Build complete!” with a direct link to the file in GitHub. Total elapsed time from clicking Run all in Colab to this message: 18 minutes. Total cost: $0.74 — eighteen percent under the recommended budget.

Because this was a patent research task rather than a content creation task, the Content Agent does not trigger automatically. If I had run a blog post task, the system would now generate platform-specific variants and request approval before committing them.

Step eight: the observability view

While all of this was happening, the dashboard at localhost was showing the live state of the task. The agent hierarchy tree grew in real time — Build Master at the root, four sub-agents nested under it, one research agent nested under the novelty analysis sub-agent. The cost breakdown showed spend by provider and by agent role. The live event feed showed every bus message as it arrived. The logs panel showed structured logs from every agent, filterable by severity.

The dashboard did not produce the output. But it changed my relationship to the process. I was never wondering whether things were running. I had a window into the execution at whatever level of detail I wanted — from the high-level status badge to the raw logs of a specific sub-agent’s LLM calls.

What this actually demonstrates

The task I just described was not magic. It was a well-engineered pipeline that took a natural language input, disambiguated it through structured dialogue, planned it against domain-specific templates, executed it through parallel specialised agents with real-time monitoring and budget enforcement, and produced a structured, auditable output — all while giving me clear visibility and control at every significant decision point.

The value is not that any single step is impossible to do manually. A competent researcher could find prior art. A patent attorney could draft claims. A technical writer could synthesise a report. The value is in the coordination — the ability to take a complex, multi-disciplinary task, decompose it into concurrent workstreams, execute them with appropriate models at appropriate costs, and produce a coherent integrated output in under 30 minutes, with a complete audit trail.