Loop Engineering: The Final Evolution of AI Agent Design

From Prompt Engineering to Loop Engineering: The Evolutionary Chain

The first three years of the large language model era followed a methodical progression. Prompt engineering dominated the conversation first — engineers spent hours crafting instructions to extract better responses from GPT-3 and Claude. Then came context engineering, shifting the focus from the instruction to the input material: RAG, embeddings, expanded context windows. By late 2025, harness engineering emerged as the methodology for agent orchestration — Claude Code, OpenClaw, Aider, and other tools enabled semi-automated workflows where the human remained in command, albeit at a higher level of abstraction.

By mid-2026, the evolution took a qualitative leap. The concept now reshaping how technology teams think about AI-driven automation is loop engineering: the discipline of designing autonomous cycles where AI agents execute, evaluate, correct, and iterate until an objective is completed — without direct human supervision.

The Origins: From the Ralph Loop to Anthropic

The seed was planted in January 2026 by Geoffrey Huntley, an Australian engineer who documented the ralph loop: an agent that received an objective and operated for hours, finding and resolving tasks autonomously. The technical key lay in persistence — when the model’s context window ran out, a bash script launched a new instance that resumed work from Git history and notes saved to disk. No internal model memory was required; the infrastructure absorbed that role entirely.

Anthropic had published an equivalent description months earlier, drawing from academic research: agents working in shifts, each unaware of the previous one, sustained by on-disk logs. Peter Steinberger, creator of OpenClaw, gave the concept public visibility. Boris, from the Claude Code team, crystallised the shift: “My real job is engineering the loops.”

Prompt versus Loop: The Structural Difference

The distinction is sharp. A prompt generates a response and stops — it requires human input to advance. A loop is a recursive objective that, once defined, drives the system through the entire circuit independently. Prompt engineering places the human at the centre of every decision; loop engineering places the human at the objective definition and the guardrails, but outside the execution path.

In practical terms: prompt engineering is driving a car manually. Loop engineering is programming the autonomous driving system — with guardrails, sensors, and fail-safe mechanisms — and then stepping out of the driver’s seat.

Architecture: Six Fundamental Building Blocks

Addy Osmani, a Chrome engineer at Google, systematised the architecture into six components that together form a functional loop:

Automations

Scheduled jobs that identify and triage work before human intervention. They function as the starter motor of the cycle.

Worktrees

Isolated parallel checkouts that allow multiple agents to operate on the same repository without conflicts. Each agent gets its own dedicated branch — a concept that maps directly to Kubernetes namespaces, where workloads operate in isolation within a shared cluster.

Skills

Documented project knowledge — conventions, build steps, known pitfalls. Written once, consumed by all agents across all executions. This eliminates the re-derivation of context and operates much like infrastructure manifests in GitOps repositories: a single source of truth that any orchestrator can read and act upon.

Plugins and Connectors

Implementations over MCP (Model Context Protocol) that expand the loop’s reach beyond the file system — issue trackers, databases, Slack, staging environments. In cloud-native terms, these are the service meshes and API gateways of the agent world, controlling how cognitive workloads communicate with external systems.

Sub-agents

The separation between the writer and the reviewer. The model that generated the code is not the one that reviews it, eliminating the self-evaluation bias that affects all generative models. This mirrors the separation of concerns in microservice architectures: the service that processes a request should not be the same one that validates it.

State

Persistent on-disk memory — markdown files or Linear boards that record what was done, what failed, and what remains pending. This compensates for the model reset between executions. In cloud orchestration terms, this is etcd: the consistent, distributed key-value store that holds the cluster’s state across restarts and failures.

In Practice: From Feedback to Deploy

Hendrik Krack, from CodeRabbit, documented a real loop built in February 2026. The flow started with user feedback, passed through automated triage, planning, implementation by Claude, iterative review by CodeRabbit (the diff returned for correction until clean), test execution, CI validation, automatic merge, and post-deploy verification.

The infrastructure was minimalist: a scheduled job and a markdown state file. What conferred reliability was the quality gate — nothing was integrated without passing tests and a clean review. Krack limited the loop to small, verifiable changes, reserving higher-judgment decisions for himself. The result: the loop consistently delivered the kind of work that previously demanded his direct involvement at every stage.

The Cloud-Native Lens: Where Loop Engineering Meets Orchestration

For organisations operating in the cloud, loop engineering introduces a new automation layer that aligns naturally with established infrastructure principles. The analogies are not superficial — they reveal a deeper structural convergence.

Worktrees map to Kubernetes namespaces. Just as a K8s namespace provides isolation for a group of resources within a shared cluster, worktrees give each agent an isolated checkout of a shared codebase. Both prevent the noisy-neighbour problem: one agent’s experimental change cannot corrupt another’s work.

Persistent state mirrors etcd. The markdown files and issue boards that track loop progress serve the same function as etcd in a Kubernetes cluster — a durable, queryable record of system state that survives process restarts. When a new loop iteration starts, it reads state the way a Kubernetes controller reads from the API server: what exists, what changed, what needs reconciliation.

Quality gates function as admission controllers. In Kubernetes, an admission controller intercepts requests to the API server and can reject or mutate them based on policy. A loop’s quality gate — “tests must pass, review must be clean” — operates on the same principle: a policy enforcement layer between intent and execution. Nothing reaches production without satisfying the policy.

Sub-agent separation mirrors service mesh patterns. The architectural decision that the code-writing agent must differ from the code-reviewing agent parallels the service mesh principle that the service producing data should not be trusted to validate it independently. Istio, Linkerd, and their peers enforce mTLS and mutual verification between services; loop engineering enforces cognitive separation between agent roles.

Companies that have already invested in infrastructure automation hold a natural advantage in adopting loop engineering. The same orchestration, observability, and logging tools that govern deployment pipelines can be adapted to monitor and control AI agent loops. Prometheus can scrape loop metrics. Grafana can visualise iteration counts, failure rates, and cycle times. PagerDuty can alert when a loop stalls.

Multi-Loop Orchestration: The Next Frontier

The natural evolution of loop engineering points toward multi-loop orchestration — multiple loops operating in parallel, each with distinct domains, coordinated by a higher-order orchestrator. The open questions are systemic in nature: how to manage conflicts between loops? How to scale without degrading quality? Who audits a loop when no one is inside it?

For the cloud ecosystem, this represents a new layer of orchestration complexity — comparable to the transition from individual containers to Kubernetes, but in the domain of cognitive agent coordination. The orchestrator would need to handle resource allocation (which loops get GPU tokens), scheduling (which loops run at which times), conflict resolution (when two loops modify overlapping code), and observability (a unified dashboard showing the health of all running loops).

The comparison to Kubernetes is instructive. Before K8s, teams ran containers manually — SSHing into hosts, pulling images, writing custom systemd units. The orchestrator layer abstracted away that operational burden and introduced declarative infrastructure. Multi-loop orchestration aims to do the same for agent coordination: declare the objective, define the constraints, and let the orchestrator handle the rest.

The governance challenges, however, are qualitatively different. A misconfigured Kubernetes pod fails visibly — CPU spikes, OOM kills, crash loops. A misconfigured agent loop can produce subtly wrong code that passes tests but introduces architectural debt, security vulnerabilities, or business logic errors. The blast radius is harder to detect, and the feedback loop between deployment and discovery is longer.

When to Apply — And When to Hold Back

The practical rule: loops work well when the objective is stable. A refactoring has clear criteria — it compiles, tests pass, behaviour is maintained. The verifier is written once and reused. When criteria shift between executions, the effort of rewriting the verifier outweighs the loop’s savings. Stable target: build the loop. Moving target: keep it manual.

This is not unlike infrastructure-as-code governance. Terraform modules are powerful when the infrastructure they describe is well-understood and changes incrementally. When the architecture is in flux — new services being added, network topologies being redesigned — hand-crafted provisioning often outperforms automated module-driven deployment, because the cost of maintaining the modules exceeds the cost of manual execution.

Krack’s pragmatic approach — limiting loops to small, verifiable changes — reflects this reality. The loop handles the mechanical; the human handles the strategic. In cloud terms, this is the distinction between day-one operations (provisioning a known stack) and day-two operations (diagnosing a novel production incident). One lends itself to automation; the other demands human judgment.

The Convergence of Infrastructure and Intelligence

Loop engineering represents, for now, the most practical and immediately applicable method of removing humans from execution without sacrificing control. It is automation with verification built in — and in the software engineering landscape of 2026, that may be the most consequential evolution of the year.

What makes it particularly significant for the cloud-native world is the convergence it represents. The same engineering discipline that produced Kubernetes, Terraform, and GitOps — treating infrastructure as declarative, observable, and self-healing — is now being applied to the coordination of AI agents. The vocabulary changes, but the philosophy remains: define the desired state, codify the verification, and automate the path from current to desired.

The organisations that will lead in this space are not those with the most sophisticated models, but those with the most mature operational practices. A team that already thinks in terms of state reconciliation, policy-as-code, and observability-driven development is better positioned to adopt loop engineering than a team that treats AI as a magic box. The infrastructure mindset is the prerequisite; the models are merely the runtime.

For a deeper exploration of the security and DevSecOps dimensions of loop engineering, see the companion analysis published at CloudAIsec.com.