Chapter 07 · Internals

AgentExecutor — line by line.

Time to demystify the runtime. AgentExecutor is ~200 lines of orchestration code. Once you can describe what it does on each iteration, you can debug any agent in any codebase.

What it is

AgentExecutor is a Runnable that wraps the agent loop. Its .invoke({"{...}"}) method runs the loop until the agent returns a final answer or hits a stop condition. That's it.

{`class AgentExecutor(Chain): agent: Runnable tools: list[BaseTool] max_iterations: int = 15 max_execution_time: float | None = None # wall-clock seconds early_stopping_method: str = "force" # or "generate" handle_parsing_errors: bool | str = False return_intermediate_steps: bool = False verbose: bool = False`} The loop, in pseudocode {`def _call(self, inputs): intermediate_steps = [] # list of (AgentAction, observation) iterations = 0 start = time.time() while iterations < self.max_iterations: if self.max_execution_time and time.time() - start > self.max_execution_time: return self._return_stopped(...) # 1. Ask the agent for the next action(s) next_step = self.agent.plan( intermediate_steps, **inputs, ) # 2. If the agent decides to finish, return. if isinstance(next_step, AgentFinish): return self._return(next_step, intermediate_steps) # 3. Otherwise execute every requested tool call (possibly in parallel). for action in next_step: # list[AgentAction] tool = self.tool_map[action.tool] try: obs = tool.run(action.tool_input) except Exception as e: obs = self._handle_error(e) intermediate_steps.append((action, obs)) iterations += 1 return self._return_stopped(intermediate_steps)`} Watch the message list grow

The diagram below shows what's actually in the prompt at each tick of the loop. Notice nothing ever moves to a side channel — the entire state lives in the message list, which gets shipped back to the API every iteration.

Where things go wrong

Parsing errors (ReAct only)

{`executor = AgentExecutor( agent=agent, tools=tools, handle_parsing_errors=True, # just retry with a hint # or: handle_parsing_errors="Re-read the format. Output Action: ", )`}

Tool errors

By default, exceptions bubble out and kill the executor. You almost never want that in production. Either:

Wrap the tool body in try/except and return an error dict (recommended).
Pass handle_tool_error=True to the tool — LangChain catches and converts to a ToolMessage.

Infinite loops

If a tool returns the same error 5 times, the agent will probably keep retrying. max_iterations is your circuit breaker — set it to 8–15 and tune from traces. For real defense, build state into the prompt ("you've already tried get_weather('Tokyo') 3 times and it failed; try a different approach."). Inspecting what happened {`executor = AgentExecutor( agent=agent, tools=tools, return_intermediate_steps=True, ) result = executor.invoke({"input": "..."}) print(result["output"]) # final answer for action, obs in result["intermediate_steps"]: print(f" → {action.tool}({action.tool_input}) = {obs}")`} AgentExecutor is now considered legacy by the LangChain team. New code uses create_react_agent from langgraph.prebuilt, which is a drop-in replacement built on a graph runtime. Same mental model, better streaming, native human-in-the-loop, persistence. The intuition you built here transfers 1:1. AgentExecutor (legacy) vs LangGraph (modern)

AgentExecutor is officially legacy — still supported, but the LangChain team actively recommends migrating to LangGraph. The fastest way to internalize the difference is to build the same agent both ways.

The task

An agent that checks weather and searches the web, answering: "What's the weather in Tokyo, and should I carry an umbrella?"

Old way — AgentExecutor

{`from langchain.agents import AgentExecutor, create_tool_calling_agent from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder from langchain_core.tools import tool from langchain_openai import ChatOpenAI # 1. Tools @tool def get_weather(city: str) -> dict: """Get current weather for a city.""" return {"city": city, "temp_c": 18.2, "condition": "Rainy"} @tool def search_web(query: str) -> str: """Search the web for information.""" return f"Search results for: {query}" tools = [get_weather, search_web] # 2. Model model = ChatOpenAI(model="gpt-4o") # 3. Prompt — must have agent_scratchpad placeholder prompt = ChatPromptTemplate.from_messages([ ("system", "You are a helpful assistant."), MessagesPlaceholder("chat_history", optional=True), # memory ("human", "{input}"), MessagesPlaceholder("agent_scratchpad"), # tool call history ]) # 4. Agent (just the brain — no loop yet) agent = create_tool_calling_agent(model, tools, prompt) # 5. Executor (adds the while loop) executor = AgentExecutor( agent=agent, tools=tools, verbose=True, max_iterations=10, handle_parsing_errors=True, return_intermediate_steps=True, ) # 6. Run result = executor.invoke({ "input": "What's the weather in Tokyo? Should I carry an umbrella?" }) print(result["output"]) for action, obs in result["intermediate_steps"]: print(f"-> {action.tool}({action.tool_input}) = {obs}")`}

No streaming mid-execution
No human-in-the-loop (can't pause and ask the user)
No branching logic (if tool fails, do X, else do Y)
Hard to add memory / persistence
Black box — hard to extend without subclassing

New way — LangGraph

{`from langgraph.prebuilt import create_react_agent from langgraph.checkpoint.memory import MemorySaver from langchain_core.tools import tool from langchain_openai import ChatOpenAI # 1. Tools (identical — nothing changes here) @tool def get_weather(city: str) -> dict: """Get current weather for a city.""" return {"city": city, "temp_c": 18.2, "condition": "Rainy"} @tool def search_web(query: str) -> str: """Search the web for information.""" return f"Search results for: {query}" tools = [get_weather, search_web] # 2. Model (identical) model = ChatOpenAI(model="gpt-4o") # 3. Memory (built-in, was painful before) memory = MemorySaver() # 4. Create agent — no separate executor needed agent = create_react_agent( model=model, tools=tools, checkpointer=memory, # persistence built in state_modifier="You are a helpful assistant.", # replaces prompt ) # 5. Config — thread_id enables per-user memory config = {"configurable": {"thread_id": "user_123"}} # 6. Run — same interface result = agent.invoke( {"messages": [{"role": "user", "content": "What's the weather in Tokyo? Should I carry an umbrella?"}]}, config=config, ) print(result["messages"][-1].content) # 6b. Stream token by token (impossible cleanly in AgentExecutor) for chunk in agent.stream( {"messages": [{"role": "user", "content": "What's the weather in Tokyo?"}]}, config=config, ): print(chunk) # each node's output as it happens`}

Side-by-side diff

{`OLD (AgentExecutor) NEW (LangGraph) ───────────────────────────────────────────────────────── create_tool_calling_agent() → create_react_agent() AgentExecutor(agent, tools) → (built into create_react_agent) MessagesPlaceholder( → state_modifier= (just a string) "agent_scratchpad") No built-in memory → checkpointer=MemorySaver() executor.invoke() → agent.invoke() No real streaming → agent.stream() ✅ No pause/resume → interrupt_before=["tools"] ✅ max_iterations=N → recursion_limit=N (same idea) return_intermediate_steps → result["messages"] has everything`} Your tools don't change. Your model doesn't change. Your prompt collapses into a single string. The executor disappears entirely — the graph runtime absorbs it. Streaming, persistence, and human-in-the-loop go from "subclass and pray" to one keyword argument each. Invest the hour to migrate; you'll get it back the first time you need to pause an agent for human approval.