Loop engineering: from the inner loop to the outer loop

2026.06.14AI engineering · agentic AI · research

Abstract. The concept of loop engineering has emerged as a defining shift in how AI coding agents are deployed. Rather than responding to individual prompts, agents now operate inside automated outer loops that schedule, direct, and verify work without human intervention at every step. This piece explores the origins of loop engineering, its foundational building blocks, and its application in engineering design. Drawing on a Carnegie Mellon University study comparing three agentic loop architectures (Xu, Martelaro and McComb, 2026), it examines how metacognitive co-regulation between agents produces significantly better design outcomes, and closes with practical implications for AI engineering practitioners.

Introduction

For most of the brief history of AI coding agents, the human was the loop. A practitioner typed a prompt. The model responded. The practitioner read the output and typed the next prompt. This cycle repeated until the task was done.

That arrangement has changed. Boris Churnney, who leads Claude Code at Anthropic, described the shift plainly: he no longer prompts the model by hand. He builds loops that do the prompting for him. The term now attached to this practice is loop engineering.

Loop engineering is the discipline of designing automated systems that run an AI agent on a schedule, against a defined goal, and verify the result without requiring human input at each turn. It is not a replacement for prompt engineering or context engineering. It is the next layer built on top of them.

The evolutionary staircase

Loop engineering sits at the top of a progression. Each step expanded the practitioner's leverage without discarding the layer below it.

Stage 1 — Prompt engineering

Crafting the single instruction that produces a useful response. Still essential. A bad prompt inside a loop fails faster on repeat.

Stage 2 — Context engineering

Recognising that the prompt alone was not the lever. The surrounding material — documentation, history, tool definitions — determines quality as much as the instruction itself.

Stage 3 — Harness engineering

Building the full system that runs one agent well: authentication, tool availability, output handling, error recovery.

Stage 4 — Loop engineering

Designing the outer system that runs the harness on a schedule until a stated goal is met. Leverage moves further from the raw model. The practitioner defines the outcome; the loop does the iteration.

At every stage, lower layers remain necessary. The loop is built out of prompts and context. Prompt engineering is not dead — it is table stakes.

Inner loop and outer loop

Agents already loop internally. Each agent run follows a cycle of reasoning about what to do, taking an action such as running a test, observing the result, and reasoning again. This is the ReAct pattern (Yao et al., 2022): reason, act, observe.

Loop engineering operates one level above this inner loop. The outer loop wakes on a schedule, launches the agent, checks its work, saves the state, and starts the next cycle. The inner loop is where the agent thinks. The outer loop is where the engineer designs the machine that runs it on repeat.

The inner loop: the agent thinks. The outer loop: you design the machine that runs it.

The Ralph Wiggum loop: where it began

Before loop engineering had a name, there was what Geoffrey Huntley described in early 2026: run a coding agent inside a plain while loop, feed it the same prompt against a written specification, let it pick one task, complete it, test it, commit it, and then discard that agent entirely. Start a fresh agent with an identical prompt. Repeat until all tasks are complete (Huntley, 2026).

Huntley named this the Ralph Wiggum loop, after the Simpsons character, because it looks too simple to work. It works well for a structural reason. A long AI session degrades: the context window fills with stale reasoning and abandoned approaches, and the agent becomes less reliable. The Ralph Wiggum loop avoids this by starting fresh each pass. Every agent reads the current state from an external file rather than carrying forward a polluted context.

This pattern has since been formalised and studied experimentally. Xu, Martelaro and McComb (2026) use the Ralph Wiggum loop as the baseline architecture against which more sophisticated agentic systems are evaluated in engineering design tasks.

The building blocks of a real loop

A production loop contains five components, plus one that holds everything together.

Automations

The scheduled trigger that wakes the loop and finds available work without human initiation. A simple example is a weekday-morning trigger that scans an issue tracker for open tasks.

Work trees

When several agents run concurrently, work trees keep each in an isolated copy of the codebase. They cannot overwrite each other's changes.

Skills

Project knowledge written down once. Conventions, build steps, and domain rules that the agent would otherwise have to rediscover each session.

Connectors

Built on the Model Context Protocol (MCP), these plug the agent into external tools: issue trackers, databases, Slack, staging APIs.

Sub-agents

The writer–checker split. A separate agent verifies the work of the agent that produced it. The model that wrote the code is too generous grading its own output.

Plus one — external memory

A file on disk, or any persistent store outside the conversation. The model forgets everything between runs. The repository remembers. State lives on disk, not in the agent's head.

The stop condition

The most underestimated element of loop design is the termination condition. An instruction to "make the checkout flow better" gives the agent nothing to grade itself against. It stops whenever it decides to stop.

A well-designed stop condition specifies four things:

The end state — what done looks like.
The evidence — for example, all tests pass and coverage reaches 90 per cent.
The constraints — boundaries the agent must not violate.
A hard budget — a maximum number of steps or a compute ceiling to prevent runaway loops.

The strongest loops use the writer–checker split at this final stage. A separate model evaluates whether the goal has truly been met. The agent that did the work does not get to declare its own victory.

Research evidence: metacognitive co-regulation in agentic loops

Xu, Martelaro and McComb (2026) examined whether adding metacognitive support to agentic loops improves engineering design performance. They compared three architectures against a constrained battery-pack design problem: design a 400V pack with a minimum capacity of 25 Ah, capable of supplying at least 48A continuously while staying at or below 60 degrees Celsius, within a 750 x 750 x 250 mm envelope.

Ralph Wiggum loop (RWL)

The baseline. The design agent iterates until it produces a valid design, receiving evaluation feedback after each attempt. It self-evaluates but receives no structured metacognitive support.

Self-Regulation loop (SRL)

Built on the RWL. A Progress Analyser provides the agent with an explicit trajectory summary after each iteration — which metrics are improving, stalling, or regressing. The agent is instructed to set goals, monitor progress, and plan strategy.

Co-Regulation Design Agentic Loop (CRDAL)

Built on the SRL. A separate Metacognitive Co-Regulation Agent receives the progress trajectory and provides strategic feedback to the design agent. The second agent acts as supervisor or peer reviewer: it identifies bottlenecks and suggests strategy for the next iteration.

Results

Each system ran the design problem 30 times. All three successfully generated valid designs in nearly every run. The primary measure was battery-pack capacity, which the agents were explicitly tasked to maximise (Xu, Martelaro and McComb, 2026).

Statistical testing confirmed that CRDAL produced significantly higher-capacity designs than both RWL (t(49.6) = 5.320, p < 0.001, Cohen's d = 1.375) and SRL (t(56.5) = 3.676, p = 0.001, Cohen's d = 0.955). The difference between RWL and SRL was not statistically significant (p = 0.206). Crucially, CRDAL achieved this without taking significantly more design steps — it worked smarter rather than harder (Xu, Martelaro and McComb, 2026).

Why co-regulation outperformed self-regulation

The authors offer a plausible explanation grounded in context-window dynamics. Research has shown that LLMs perform best when relevant information appears at the very beginning or end of the context, regardless of context length (Liu et al., 2024). The Metacognitive Co-Regulation Agent may have served as a fresh, high-salience reminder at each design step, keeping the design agent's attention on the most important constraints and opportunities (Xu, Martelaro and McComb, 2026).

This resonates with Marvin Minsky's thesis that intelligence emerges from many simple minds interacting rather than from a single sophisticated one (Minsky, 1986). More recent multi-agent systems research in engineering design supports the same principle (Campbell, Cagan and Kotovsky, 1999; McComb, Cagan and Kotovsky, 2016).

Design-space exploration

The three systems also explored different regions of the design space. CRDAL consistently produced designs with higher cell counts: more than half of its final designs exceeded 3,024 cells, with the most common design using 3,888 cells. By contrast, 27 of 29 successful RWL designs and 23 of 29 SRL designs used fewer than 3,024 cells. CRDAL more effectively avoided local optima by exploring the design space in a fundamentally different direction (Xu, Martelaro and McComb, 2026).

Risks and responsibilities

Loop engineering amplifies output. It does not eliminate the need for engineering judgment. Three risks sharpen as the loop improves.

Verification drift

A loop running unattended is also a loop making mistakes unattended. "Done" is a claim, not a proof. The engineer remains responsible for confirming that outcomes meet actual requirements.

Comprehension debt

The faster the loop ships code the engineer did not write, the larger the gap between what is in the repository and what the engineer actually understands. This debt compounds silently until a production incident makes it visible.

Cognitive surrender

The temptation to stop having opinions and simply accept whatever the loop produces. Designing the loop requires judgment. Pressing go to avoid thinking does not.

The study also highlights the risk of design fixation in agentic systems. Just as human designers can fixate prematurely on a limited set of solutions (Jansson and Smith, 1991), so can AI agents. Structured metacognitive intervention, delivered by a separate agent, reduced this tendency in the CRDAL system.

Practical implications for AI engineering

The Ralph Wiggum loop is a starting point, not a ceiling. It is simple, effective, and worth deploying for repetitive task automation. The CMU research suggests that the writer–checker split embedded in the CRDAL architecture delivers substantially better results in complex, constrained design problems without proportional cost increases.

Separate the producer from the evaluator. The agent that produces output is biased toward its own work.
Write stop conditions as contracts, not wishes. Specify end state, evidence, constraints, and a hard budget.
Use external memory. State must persist on disk across runs; the agent's context does not.
Add a progress analyser when the task involves multiple constraints and a single optimisation objective. The trajectory summary keeps the agent oriented.
Consider a metacognitive co-regulation agent for complex, multi-disciplinary problems where local optima are likely.

Future research should examine these architectures across different engineering domains, with smaller models suited to local deployment, and with agents that have access to numerical optimisation tools and spatial design representations (Xu, Martelaro and McComb, 2026).

Conclusion

Loop engineering represents a genuine shift in how AI agents are used. The practitioner moves from typing prompts to designing the system that generates prompts autonomously, on a schedule, against a verifiable goal.

The evidence from Carnegie Mellon confirms that the architecture of the loop matters. A second agent supervising the first, providing structured metacognitive feedback, produces significantly better engineering design outcomes than either a plain iterating loop or a single agent instructed to self-regulate.

The craft is not in pressing go. It is in designing the loop well enough that it can be trusted to run without constant supervision, while remaining alert to the point at which human judgment must intervene.

Build the loop like someone who intends to stay the engineer, not just the person who presses go.

References

Campbell, M.I., Cagan, J. and Kotovsky, K. (1999) 'A-Design: an agent-based approach to conceptual design in a dynamic environment', Research in Engineering Design, 11(3), pp. 172–192.
Huntley, G. (2026) 'Everything is a Ralph loop'. Available at: ghuntley.com/loop (Accessed: June 2026).
Jansson, D.G. and Smith, S.M. (1991) 'Design fixation', Design Studies, 12(1), pp. 3–11.
Liu, N.F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F. and Liang, P. (2024) 'Lost in the middle: how language models use long contexts', Transactions of the Association for Computational Linguistics, 12, pp. 157–173.
McComb, C., Cagan, J. and Kotovsky, K. (2016) 'Drawing inspiration from human design teams for better search and optimisation', Journal of Mechanical Design, 138(044501).
Minsky, M. (1986) The Society of Mind. New York: Simon and Schuster.
Xu, Z., Martelaro, N. and McComb, C. (2026) 'Supervising Ralph Wiggum: exploring a metacognitive co-regulation agentic AI loop for engineering design', arXiv preprint arXiv:2603.24768.
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K. and Cao, Y. (2022) 'ReAct: synergizing reasoning and acting in language models', Proceedings of ICLR 2023.

Shawn Greyling is an AI Engineer based in Johannesburg, South Africa. He builds agentic content pipelines, automation workflows, and AI engineering frameworks. He holds a Higher Certificate in Software Development and is completing a Bachelor of Business Information Systems.