Chapter 09 · Observability

Streaming, callbacks, and LangSmith.

You can't tune what you can't see. This chapter covers the three things you need to debug a real agent: token-level streaming, runtime callbacks, and a hosted trace viewer.

Streaming events from an agent

Agents produce many kinds of events: model tokens, tool start/end, intermediate AIMessages, final answer. astream_events gives you a unified async stream of all of them:

{`async for event in executor.astream_events({"input": "..."}, version="v2"): kind = event["event"] if kind == "on_chat_model_stream": chunk = event["data"]["chunk"] print(chunk.content, end="", flush=True) elif kind == "on_tool_start": print(f"\\n[calling {event['name']} with {event['data']['input']}]") elif kind == "on_tool_end": print(f"[result: {event['data']['output']}]")`}

The event types you'll touch most:

on_chat_model_start / _stream / _end — every LLM call.
on_tool_start / _end / _error — every tool invocation.
on_chain_start / _end — every Runnable boundary (useful for nested chains).
on_retriever_start / _end — RAG retrievals.

The v2 event schema is the modern one with consistent field names. The implicit default is the legacy v1 stream — different fields, different ordering, will eventually be removed. Callbacks — hooks for everything

Callbacks are the lower-level mechanism behind streaming. You implement a handler class and pass it via config={"{'callbacks': [handler]}"}. Every Runnable in the chain fires events through it:

{`from langchain_core.callbacks import BaseCallbackHandler class CostTracker(BaseCallbackHandler): def __init__(self): self.tokens_in = 0 self.tokens_out = 0 def on_llm_end(self, response, **kw): usage = response.llm_output.get("token_usage", {}) self.tokens_in += usage.get("prompt_tokens", 0) self.tokens_out += usage.get("completion_tokens", 0) def on_tool_start(self, serialized, input_str, **kw): print(f"→ {serialized['name']}({input_str})") tracker = CostTracker() executor.invoke({"input": "..."}, config={"callbacks": [tracker]}) print(f"Cost: in={tracker.tokens_in} out={tracker.tokens_out}")`} LangSmith — hosted tracing

LangSmith is the hosted dashboard from the LangChain team. Set two env vars and every run shows up as an interactive trace tree — every LLM call, every tool, every chain, with token counts and latency:

{`LANGSMITH_TRACING=true LANGSMITH_API_KEY=lsv2_pt_... LANGSMITH_PROJECT=my-agent-prod`}

That's literally all the wiring. Any LangChain Runnable picks the env vars up and ships traces. You get:

Full prompt at every LLM call (not "the abstract prompt template" — the actual rendered messages).
Tool inputs and outputs, including raw errors with stack traces.
Per-step latency and token cost. Easy to spot a tool that runs 14 times with the same args.
Side-by-side trace comparison across prompt versions.

Agents have non-trivial control flow. The single highest-leverage thing you can do as an LLM developer is wire up a trace viewer on day one. LangSmith is the easy path; OpenTelemetry exporters exist if you'd rather self-host. Verbose mode (free, low-fi) {`import langchain langchain.debug = True # extreme: every event, every dict # or executor = AgentExecutor(agent=agent, tools=tools, verbose=True)`}

Verbose dumps colored ASCII traces to stdout. Good enough for quick iteration; pales next to LangSmith for anything beyond a few turns.