TL;DR
- ReAct (Yao et al., 2022, arXiv:2210.03629) interleaves reasoning and acting: the model emits a thought, then an action, observes the result, repeats. The original tool-use loop.
- Plan-and-Execute separates planning from execution: a planner LLM produces a step-by-step plan, an executor runs each step with tools, and a replanner adapts if execution diverges from the plan.
- Reflexion (Shinn et al., 2023, arXiv:2303.11366) adds verbal self-reflection: after a failed attempt, the agent writes a critique to memory and tries again, conditioning the next attempt on its own reflection.
- These patterns are not exclusive — production agents often combine them. ReAct inside steps, Plan-and-Execute across steps, Reflexion across retries.
ReAct#
ReAct (Reason + Act), introduced by Yao et al. at Princeton and Google in 2022, was the first widely-cited prompting pattern for tool-using agents. The pattern is a loop: the model emits a Thought (free-text reasoning), then an Action (a tool invocation), then an Observation (the tool result), and the loop continues until a Finish action.
ReAct works because reasoning before acting raises action quality, and observing after acting grounds the next reasoning step in real evidence. It is the conceptual ancestor of every modern tool-use agent, including the structured tool-call APIs that replaced its original text format. The original paper outperformed chain-of-thought-only and act-only baselines on HotpotQA, FEVER, and ALFWorld.
Thought: The user wants the weather in London. I should call the weather tool.
Action: get_weather(city="London")
Observation: 18°C, partly cloudy, wind 12 km/h NW.
Thought: I have the data. I can answer now.
Action: Finish("It is 18°C and partly cloudy in London.")Plan-and-Execute#
Plan-and-Execute (the structure used in early BabyAGI and refined in LangChain's plan-and-execute agents) separates planning from doing. A planner LLM is given the goal and produces an ordered list of steps. An executor — often a ReAct loop over a smaller model — runs each step with tools. A replanner inspects the state after each step and updates the remaining plan if needed.
The pattern's value is in long-horizon tasks where a single ReAct loop drifts. Producing a plan up front anchors the agent to the original goal and makes progress legible. The cost is extra LLM calls for planning and replanning; for short tasks the overhead is not worth it.
Reflexion#
Reflexion (Shinn et al., 2023) layers self-reflection on top of ReAct or Plan-and-Execute. After an attempt fails — either because the task is judged unsuccessful or a verifier flags a problem — the agent generates a verbal reflection: a critique of what went wrong and what to try differently. The reflection is appended to memory, and the next attempt is conditioned on it.
Reflexion dramatically improves performance on tasks with verifiable outcomes (HumanEval, WebShop, ALFWorld in the original paper) because the agent compounds learning across attempts within a single session. It does not require any model fine-tuning — all the adaptation happens in the prompt.
Choosing Between Them#
| Pattern | Best for | Cost |
|---|---|---|
| ReAct | Short, tool-heavy tasks | Low — one model call per step |
| Plan-and-Execute | Long-horizon, multi-step tasks | Medium — extra planning calls |
| Reflexion | Verifiable tasks where retries help | High — entire task is re-run |
These three patterns compose. A production agent can use Plan-and-Execute at the top level, ReAct inside each step, and Reflexion to retry failed steps. Start simple and add layers when measurement justifies it.
Modern Frontier Models and Built-in Reasoning#
Frontier models in 2026 — Claude with extended thinking, OpenAI's o-series, Gemini's deep think modes — bake structured reasoning into the model itself. Much of what ReAct, Plan-and-Execute, and Reflexion did with prompting is now done internally by these models when given enough thinking budget.
This shifts the agent designer's job. Less prompt engineering for reasoning patterns, more attention to tool design, memory architecture, and verification. The classical patterns remain useful for cheaper models, for tasks where the trace of reasoning needs to be auditable, and for compositions that the built-in reasoning does not cover.
References
- ReAct: Synergising Reasoning and Acting in Language Models · arXiv (Yao et al., 2022)
- Reflexion: Language Agents with Verbal Reinforcement Learning · arXiv (Shinn et al., 2023)
- Self-Refine: Iterative Refinement with Self-Feedback · arXiv (Madaan et al., 2023)