TL;DR
- Apache 2.0 (formerly MIT) open-source orchestration framework started by Harrison Chase in October 2022 at LangChain Inc, now governed in the open at github.com/langchain-ai with contributions from Anthropic, OpenAI, Cohere, Mistral, IBM, AWS, Microsoft and Yobitel. The de-facto starting point for LLM application work.
- Composed of four layered packages: langchain-core (Runnable, ChatModel, Tool primitives), langchain (community-curated chains and agents), langgraph (typed StateGraph runtime for cyclical and human-in-the-loop agents), and langchain-<provider> integrations. LangSmith is the paid observability and evaluation plane.
- Surface is consistent across Python and TypeScript. The LangChain Expression Language (LCEL) standardises composition through a `|` operator; every Runnable supports invoke, batch, stream, astream and async variants without extra wiring.
- Hundreds of integrations: 200+ chat / completion models, 100+ vector stores, every major retriever, every major tool surface. The OpenAI-compatible ChatModel adapter is the single feature that makes LangChain the natural client for any inference endpoint that speaks the OpenAI schema — including Yobibyte workspaces, Yobitel NeoCloud vLLM tenants and InferenceBench-graded providers.
- Default client framework recommended by Yobitel: Yobibyte exposes OpenAI-compatible chat, completion and embeddings endpoints, so any LangChain ChatModel built against `langchain-openai` switches to Yobibyte by changing only `base_url` and `api_key`. No Yobitel-specific SDK; the integration is the open standard.
Overview#
LangChain emerged in late 2022 to answer the question GPT-3 made urgent: how do you connect an LLM to your own data, your own tools and your own control flow without re-implementing prompt templating, retry policy, output parsing and tool dispatch in every project. Harrison Chase published the first cut in October 2022; the project crossed 50,000 GitHub stars within a year, spawned a venture-backed company and is now the most widely used orchestration framework for LLM applications by a wide margin.
By mid-2026 LangChain is no longer one package. The core has been refactored into langchain-core (the small set of primitives), langchain (community-curated higher-level abstractions), langgraph (the typed graph-based agent runtime), and per-provider integration packages (langchain-openai, langchain-anthropic, langchain-aws, langchain-google-vertexai, langchain-mistralai, langchain-cohere and dozens more). This separation is the response to legitimate criticism of the 2023 monolith — install only the integrations you need, and the surface area you actually depend on is a few hundred classes, not a few thousand.
The defining abstraction is the Runnable: a typed unit of computation that exposes invoke, batch, stream, astream, and ainvoke uniformly. Prompt templates, chat models, output parsers, retrievers, tools, retrievers and arbitrary user-defined functions all implement Runnable, which means a working pipeline is usually three or four objects piped together with the `|` operator. Everything else in LangChain is composition of Runnables.
This entry documents the LangChain surface a production team actually uses: the four-package layering, the LCEL composition model, the ChatModel + Tool + AgentExecutor + StateGraph runtime, the OpenAI-compatible adapter that points LangChain at any compliant endpoint, the deployment shapes, the sizing and quota considerations, the LangSmith-driven observability story, the cost model, the security posture, and how it slots in next to LlamaIndex, the raw provider SDK, and Yobitel's Yobibyte managed workspace. This entry helps you stand up a production LangChain agent against any OpenAI-compatible inference endpoint — including Yobibyte and Yobitel NeoCloud vLLM tenancies — with the right composition pattern, retry and tracing discipline, and an honest view of when LangChain pays rent vs when raw SDK calls are cleaner.
Quick start#
The example below installs the modern split packages, builds a ChatModel + Tool + AgentExecutor that does retrieval-augmented tool calling, and runs the same agent against a Yobibyte (or any OpenAI-compatible) endpoint by changing only the base URL. The second block migrates the same agent to LangGraph's StateGraph runtime for human-in-the-loop and checkpointing. The third block adds LangSmith tracing with a single environment variable.
# 1. Install the modern split packages
pip install "langchain-core>=0.3" "langchain>=0.3" \
"langgraph>=0.2" "langchain-openai>=0.2" \
"langchain-anthropic>=0.2" "langsmith>=0.1"
# 2. Point a ChatModel at any OpenAI-compatible endpoint
cat > agent.py <<'PY'
import os
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate
# Yobibyte exposes OpenAI-compatible /v1 — only base_url + api_key change.
llm = ChatOpenAI(
model="llama-3.1-70b-instruct",
base_url=os.environ["LLM_BASE_URL"], # e.g. https://api.yobibyte.example/v1
api_key=os.environ["LLM_API_KEY"],
temperature=0,
max_retries=3,
timeout=60,
)
@tool
def search_orders(order_id: str) -> str:
"""Look up an order by its ORD-NNNNNNNN id. Use only when given a real id."""
return f"Order {order_id}: shipped 2026-06-10, tracking 1Z999..."
prompt = ChatPromptTemplate.from_messages([
("system", "You are a careful support assistant. Cite the tool output."),
("placeholder", "{chat_history}"),
("human", "{input}"),
("placeholder", "{agent_scratchpad}"),
])
agent = create_tool_calling_agent(llm, [search_orders], prompt)
runner = AgentExecutor(agent=agent, tools=[search_orders], max_iterations=8)
print(runner.invoke({"input": "Where is ORD-12345678?"})["output"])
PY
LLM_BASE_URL=https://api.yobibyte.example/v1 \
LLM_API_KEY=sk-yb-... python agent.py
# 3. Migrate the same agent to LangGraph for checkpointing + HITL
cat > graph_agent.py <<'PY'
from langgraph.prebuilt import create_react_agent
from langgraph.checkpoint.postgres import PostgresSaver
# Same tools, same model — drop into a StateGraph with persistent memory.
agent = create_react_agent(
model=llm, tools=[search_orders],
checkpointer=PostgresSaver.from_conn_string(os.environ["PG_URL"]),
)
state = agent.invoke(
{"messages": [("user", "Where is ORD-12345678?")]},
config={"configurable": {"thread_id": "user-42"}},
)
print(state["messages"][-1].content)
PY
# 4. Enable LangSmith tracing — one env var, every Runnable is traced
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY=lsv2_...
export LANGSMITH_PROJECT=support-agent
python agent.pyThe OpenAI-compatible ChatModel adapter is the single most useful integration in LangChain for Yobitel customers. Anything that ships an `/v1/chat/completions` surface — Yobibyte, raw vLLM, TGI, TensorRT-LLM, SGLang, OpenAI, Mistral La Plateforme, Together, Anyscale — drops in with a base_url change. No code path for Yobitel-specific behaviour is needed.
How it works#
LangChain is structured around three concentric layers. langchain-core defines the primitive interfaces (Runnable, ChatModel, Tool, BaseRetriever, BaseLanguageModel) and the LCEL composition algebra. langchain depends on core and adds community-curated higher-level abstractions (AgentExecutor, history-aware chains, retrieval chains, evaluation helpers). langgraph depends on core and adds the StateGraph runtime that supersedes the original AgentExecutor for any agent with cycles, branching or human-in-the-loop.
The Runnable interface is the spine. Every Runnable exposes the same five methods — invoke, batch, stream, astream, ainvoke — so composing them through `|` automatically yields a new Runnable with the same surface. Streaming, async, structured outputs, retries, and fallbacks are all properties of the composed graph, not features each component re-implements.
The original AgentExecutor (from 2023) was a hidden while-loop calling tools until the model stopped. It hid too much: cycles were hard to inspect, state was implicit, human approval steps were awkward. LangGraph replaces the loop with an explicit directed graph of typed nodes and edges. Each node is a function that takes and returns a typed state object; edges declare the control flow including conditional branches and cycles. Checkpointers persist state to memory, SQLite, Postgres or Redis so a long-running agent can pause for review and resume hours later. New agent projects in 2026 should default to LangGraph; the legacy AgentExecutor remains supported but is not where the framework's investment is going.
Tool calling is the protocol below the framework. LangChain ChatModels normalise tool definitions into the provider's native shape — OpenAI function-calling JSON, Anthropic tool_use blocks, Google FunctionDeclaration, Cohere tools — so the application code is identical regardless of the model behind the endpoint. Yobibyte's OpenAI-compatible surface inherits this behaviour for free: a LangChain agent built against Anthropic translates to Yobibyte by swapping the ChatModel class and adjusting the model name.
- Runnable — typed unit of computation with invoke / batch / stream / ainvoke / astream. The primitive of LCEL.
- ChatModel — the abstraction for chat-style LLMs; subclasses cover OpenAI, Anthropic, Google, AWS Bedrock, Vertex, Cohere, Mistral, Together, Anyscale, Ollama, vLLM, TGI and every Yobibyte-shaped endpoint.
- Tool — Python callable decorated with `@tool` (or a `BaseTool` subclass) exposed to the model with a JSON-schema description.
- AgentExecutor — legacy single-loop runtime that dispatches tool calls until the model stops. Kept for simple agents; superseded for anything cyclical.
- StateGraph (LangGraph) — explicit typed graph of nodes and edges with checkpointing, branching, parallel fan-out, and human-in-the-loop interrupts.
- Retriever — a uniform `get_relevant_documents` surface implemented by every supported vector store, BM25, ensemble, parent-document and self-query retriever.
- Output parsers — Pydantic, JSON, structured, tool-call, regex parsers that coerce model output into typed objects.
- Memory — superseded by LangGraph checkpointers; the old `ConversationBufferMemory` family is deprecated in 0.3+.
LangChain's 0.3 (Sep 2024) and subsequent releases are the stable surface to build on. Anything written against 0.0.x or 0.1.x APIs from 2023 has likely been deprecated or moved. The framework is no longer the moving target it was — the 0.3+ surface has held shape through 2026.
Reference and specifications#
The table below is the canonical reference for the primitive interfaces and the runtime classes most production teams touch. Methods marked `→Runnable` return a composable graph; everything else returns a concrete value or stream. Provider integration packages add their own constructor arguments — the most frequently consulted are listed under the ChatModel row.
| Symbol | Package | Surface | Typical use |
|---|---|---|---|
| Runnable | langchain-core | invoke / batch / stream / ainvoke / astream / `|` | Base interface for every LCEL component. |
| ChatModel | langchain-openai / -anthropic / … | invoke(messages) → AIMessage; bind_tools(); with_structured_output() | LLM client. base_url + api_key + model + temperature + max_retries + timeout. |
| PromptTemplate / ChatPromptTemplate | langchain-core | from_template / from_messages → Runnable | f-string or Jinja templating, with role placeholders. |
| Tool (`@tool` decorator) | langchain-core | name, description, args_schema, _run, _arun | Wraps a Python function as a typed tool exposed to the model. |
| AgentExecutor | langchain | invoke({input}) → {output, intermediate_steps} | Legacy tool-calling loop. Use create_tool_calling_agent + AgentExecutor. |
| StateGraph | langgraph | add_node / add_edge / add_conditional_edges / compile() | Typed graph runtime. Default for new agent work. |
| create_react_agent | langgraph.prebuilt | (model, tools, checkpointer) → CompiledGraph | Pre-baked ReAct-style agent over StateGraph. |
| BaseRetriever | langchain-core | get_relevant_documents / aget_relevant_documents | Uniform retrieval interface. Implemented by every vector store binding. |
| RunnableWithMessageHistory | langchain-core | Wraps a Runnable with per-session message persistence | Replaces legacy ConversationChain for stateful chat. |
| LangSmith @traceable | langsmith | Decorator emits trace spans for arbitrary Python | Trace non-LangChain code into the same project as the agent. |
| Checkpointer (Memory / SQLite / Postgres / Redis) | langgraph.checkpoint.* | Persist StateGraph state by thread_id | Required for HITL, replay, multi-turn durable agents. |
Pin the langchain-core minor version. Cross-package compatibility is per-minor (0.3.x of langchain expects 0.3.x of langchain-core). A bare `pip install langchain` without an upper-bound constraint is the most common cause of import-time errors in production environments.
Workload patterns#
Three application shapes cover the bulk of production LangChain usage: (A) RAG agent that retrieves then answers with citations, (B) multi-step research / planner agent built on LangGraph with HITL approval, (C) customer-support routing agent with tool calling and a fallback hand-off. Each maps to a recognisable LangChain composition that the Yobibyte console can drive against a managed OpenAI-compatible endpoint without bespoke code paths.
Pattern A — Retrieval-augmented agent. Retriever Runnable composed with a prompt that injects context, a ChatModel with bound tools, and an output parser that yields a final answer plus per-claim citations. Pattern B — Plan / execute / replan StateGraph with a `human_review` node that interrupts before destructive actions. Pattern C — Tool-calling triage agent that hands off via a Swarm-style explicit tool call to a specialised downstream agent. All three patterns benefit from prompt caching when the upstream provider supports it (Anthropic and Yobibyte do; raw vLLM honours prefix caching transparently).
# Pattern A — RAG agent with citations against a Yobibyte endpoint
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain.tools.retriever import create_retriever_tool
from langgraph.prebuilt import create_react_agent
llm = ChatOpenAI(model="llama-3.1-70b-instruct",
base_url="https://api.yobibyte.example/v1",
api_key=os.environ["LLM_API_KEY"], temperature=0)
retriever_tool = create_retriever_tool(
retriever=vector_store.as_retriever(search_kwargs={"k": 6}),
name="kb_search",
description="Search the internal knowledge base. Always cite passages.",
)
agent = create_react_agent(llm, [retriever_tool])
state = agent.invoke({"messages": [("user", "Summarise our NCSC posture.")]})
print(state["messages"][-1].content)
# Pattern B — Plan / execute / replan with human approval
from langgraph.graph import StateGraph, END
from typing_extensions import TypedDict
class S(TypedDict):
goal: str; plan: list[str]; step_idx: int; results: list[str]; approval: str
def plan(s): ... # llm: produce s["plan"]
def execute(s): ... # llm + tools: run s["plan"][s["step_idx"]]
def replan(s): ... # llm: revise s["plan"] given results
def hitl(s): return s # interrupt point; resume after human sets s["approval"]
g = StateGraph(S)
for n in (plan, hitl, execute, replan):
g.add_node(n.__name__, n)
g.set_entry_point("plan")
g.add_edge("plan", "hitl")
g.add_conditional_edges("hitl",
lambda s: "execute" if s["approval"] == "ok" else END)
g.add_edge("execute", "replan")
g.add_conditional_edges("replan",
lambda s: "execute" if s["step_idx"] < len(s["plan"]) else END)
app = g.compile(checkpointer=PostgresSaver.from_conn_string(os.environ["PG_URL"]),
interrupt_before=["execute"])When a Yobitel customer asks where their LangChain agent should run, the answer is usually the same: keep the agent stateless behind a Yobibyte-issued OpenAI-compatible model and persist conversational state in a LangGraph Postgres checkpointer in the same regulatory region. No Yobitel-specific SDK; only `base_url`, `api_key` and a region pin change.
Sizing and capacity planning#
LangChain itself adds milliseconds, not seconds. The cost dominators are the LLM round-trips, the retriever calls and (where applicable) the checkpointer write per StateGraph step. The table below is realistic per-call overhead measured against a Yobibyte H100 vLLM tenant from a Python 3.11 client on the same continent; treat it as a planning anchor and re-measure for your own topology.
The single biggest concurrency lever is async. LangChain's invoke is synchronous; ainvoke + astream let one process drive hundreds of concurrent agent sessions over the same OpenAI-compatible endpoint. For high-throughput batch workloads — eval runs, dataset enrichment, large-corpus re-indexing — `Runnable.batch` schedules calls across the provider's allowed concurrency in one call.
| Dimension | Typical value | Notes |
|---|---|---|
| LangChain framework overhead per call | 2-10 ms | Excludes LLM round-trip. Same order of magnitude as direct SDK use. |
| LangGraph step overhead | 5-20 ms | Includes checkpointer write when configured. |
| Postgres checkpointer write | 3-15 ms | Single insert per step; size by step rate, not by GPU. |
| Retriever fan-out (k=6, FAISS local) | 10-40 ms | Network retriever (Pinecone, Weaviate) adds 30-150 ms. |
| LLM round-trip on Yobibyte (Llama-70B TTFT) | 300-600 ms | Dominates the budget; size for it. |
| Concurrent agent sessions per Python process | 200-2000 | Async-bound; CPU is rarely the limit. |
| Tool-calling round-trip overhead | 50-200 ms | Per tool invocation, plus the tool's own work. |
| Streaming first-token to client SSE | +20-60 ms | vs non-streaming, before the underlying TTFT. |
Limits and quotas#
LangChain places no hard limits of its own. Every limit you hit will be one imposed by the upstream provider, the retriever backend, the checkpointer database or the host process. The list below covers the limits that bite in practice when running LangChain against Yobibyte, Yobitel NeoCloud vLLM tenants, or third-party OpenAI-compatible endpoints.
- Tool count per agent — model accuracy degrades materially beyond 30-50 tools regardless of the model behind the endpoint. Split into supervisor + workers or use a router pattern.
- Context window — bound by the upstream model. A 70B Llama tenant on Yobibyte typically advertises 32K or 128K with RoPE scaling; LangChain will not protect you from over-stuffing the prompt — wrap retrieval in a `ContextualCompressionRetriever` or a token-aware filter.
- Concurrency — Yobibyte workspaces and Yobitel NeoCloud tenants advertise a workspace-level QPS quota. Configure ChatModel `max_retries` and `timeout` to respect 429s; otherwise the retry policy compounds.
- Streaming back-pressure — LangChain streams via async generators; an HTTP gateway that does not flush per-chunk will appear as a stalled stream. Verify the proxy in between honours `text/event-stream` flushes.
- StateGraph recursion depth — defaults to 25. Long-horizon agents that exceed it should either increase the limit explicitly or refactor to a sub-graph.
- Checkpointer row size — Postgres `bytea` columns store the serialised state per step; very large message histories trigger `parameter exceeds maximum` errors. Trim with `RemoveMessage` reducers or summary checkpoints.
- LangSmith ingestion rate — paid plan tiers cap traced runs per minute; high-RPS agents should sample (set `LANGSMITH_SAMPLING_RATE`) or batch-ship via OTLP.
- JSON-schema depth in `bind_tools` — most providers truncate schemas beyond 6-8 levels of nesting. Keep tool argument schemas flat.
Set explicit `max_retries`, `timeout`, and `request_timeout` on every ChatModel destined for production. The defaults are friendly but unsuited to a 429-shaped backpressure event from a busy Yobibyte tenant or NeoCloud peak hour — without an upper bound, retries can amplify load by 5-10x.
Observability#
LangSmith is LangChain's first-party observability layer — a hosted (or self-hosted enterprise) platform for tracing, evaluation and prompt management. Every Runnable execution emits a span tree showing inputs, outputs, latencies, token counts, tool calls and errors. Enable with `LANGSMITH_TRACING=true` and an API key; no code changes. The `@traceable` decorator lets you fold arbitrary non-LangChain Python into the same trace.
For teams that prefer vendor-neutral observability, the OpenTelemetry GenAI semantic conventions describe an attribute schema that LangSmith, Langfuse, Phoenix, Helicone and OpenLLMetry all support. Auto-instrument the underlying provider SDK (OpenLLMetry via Traceloop) and the per-call spans appear in your existing OTLP backend without touching LangChain code. Yobitel customers who route inference through Yobibyte get this for free at the Yobibyte gateway — span fan-out happens upstream of the application.
- Per-call attributes worth recording: model name, provider, base_url, prompt token count, completion token count, cached prompt tokens, tool name, tool arguments hash, tool result size, latency, error class.
- Per-session attributes: agent type (ReAct, supervisor, swarm), StateGraph thread_id, checkpoint version, user id, region.
- LangSmith Datasets — promote production runs into a versioned dataset, replay against new models or prompts, score with LLM-as-judge or programmatic graders.
- LangSmith Comparisons — pairwise A/B between two prompts or two providers (e.g. Yobibyte-served Llama 70B vs Anthropic Claude Sonnet) over the same dataset.
- Prompt Hub — versioned prompt templates with diff and rollback; reduces the failure mode of prompts living in code reviews only.
# Vendor-neutral tracing alongside LangSmith
import os, langsmith
from traceloop.sdk import Traceloop
from langchain_openai import ChatOpenAI
# (1) LangSmith — native LangChain runs
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = "lsv2_..."
os.environ["LANGSMITH_PROJECT"] = "support-agent-prod"
# (2) OpenLLMetry — emits OTel GenAI spans to your existing collector
Traceloop.init(
app_name="support-agent",
api_endpoint=os.environ["OTLP_ENDPOINT"],
disable_batch=False,
)
# Both run side-by-side; LangSmith sees the LCEL tree, your OTel
# backend sees per-call provider spans for fleet-wide latency views.
llm = ChatOpenAI(model="llama-3.1-70b-instruct",
base_url=os.environ["LLM_BASE_URL"])Cost and FinOps#
LangChain itself is free (Apache 2.0). The cost lives in three places: the upstream LLM tokens, LangSmith ingestion (paid above the free tier), and the operational cost of running the application that hosts the agent. The single largest controllable cost is repeated tool / system prompt context — every agent turn re-sends the tool catalogue and system prompt. Where the upstream supports caching, marking the catalogue as a cache breakpoint typically recovers 70-90 percent of catalogue token cost.
Yobibyte exposes prompt caching transparently on supported models; running LangChain against Yobibyte requires no client-side change to benefit. Direct Anthropic / OpenAI use requires the application to emit `cache_control` markers or use the prompt-caching endpoints; LangChain's ChatAnthropic and ChatOpenAI integrations have shipped this since 0.2.x.
| Cost component | Typical USD range | Driver |
|---|---|---|
| LangChain licence | $0 | Apache 2.0, no surcharge. |
| LLM tokens via Yobibyte (Llama-70B FP8) | $0.40-1.20 per 1M input + $0.80-2.40 per 1M output | Workspace pricing; volume tiers apply. |
| LLM tokens via OpenAI / Anthropic (frontier) | $3-15 per 1M input + $15-75 per 1M output | List prices; prompt caching recovers 60-90%. |
| LangSmith ingestion (Plus) | $0-39/user/month + per-run | Per-run pricing tier above free quota. |
| LangSmith Enterprise (self-hosted) | Custom (annual) | For air-gapped deployments. |
| Application compute (small) | $30-200/month | 2x small containers + Postgres for checkpointer. |
| Application compute (busy) | $500-5,000/month | Scales with concurrent agent sessions, not LangChain itself. |
If you are running LangChain against a Yobibyte endpoint, you are already paying for prompt caching automatically. If you are running it direct against OpenAI or Anthropic, mark the end of your system prompt and the end of your retrieved-context block as cache breakpoints — usually a one-line change that cuts input bill 70-90 percent.
Security and compliance#
LangChain inherits the security posture of the integrations it composes. A LangChain agent's attack surface is the upstream LLM endpoint, the retrievers, the tools and the LangSmith trace store. The checklist below is the working production-security stance that aligns with Yobibyte's NCSC Cloud Security Principle alignment, UK GDPR Article 32 evidence requirements, and the spirit of the OWASP LLM Top 10.
- Tool injection — any tool that touches the network, the filesystem or a database is a candidate for prompt injection. Treat tool arguments as untrusted input, validate against tight JSON schemas with regex patterns and enum constraints, and route destructive operations through a HITL interrupt in LangGraph.
- Credential scope — the `api_key` configured on a ChatModel inherits whatever scope the upstream token has. Mint per-workspace, per-environment tokens with Yobibyte workspace scoping rather than reusing one human-owned key.
- Trace PII — LangSmith stores prompts and completions verbatim unless you redact. Configure `LANGSMITH_HIDE_INPUTS=true` / `LANGSMITH_HIDE_OUTPUTS=true` for sensitive projects, or pre-redact with Presidio before emit. Under UK GDPR an unredacted trace store is a personal-data store.
- Checkpointer encryption — Postgres / Redis state stores hold conversation history. Enable encryption at rest at the database tier; for sovereign workloads on Yobitel NeoCloud terminate the checkpointer inside the regulatory boundary and align KMS key control with NCSC Principle 11.
- Output sanitisation — LangChain output parsers do not sanitise HTML or escape SQL; any downstream renderer or executor must do that itself.
- Dependency pinning — pin the minor versions of langchain-core, langchain, langgraph and every langchain-<provider>. Cross-minor drift is the most common cause of silent behaviour change.
- Supply-chain — vet third-party `langchain-community` integrations the same way you would any other Python package; not every community integration has had a security review.
- For UK Sovereign deployments — terminate the OpenAI-compatible base_url inside the regulated boundary (Yobitel NeoCloud UK), enforce mTLS at the egress, and route LangSmith traces to a self-hosted enterprise instance rather than the public SaaS.
Do not let a LangChain agent execute arbitrary Python or shell as a tool. Code-executing tools belong behind a sandbox (Docker, Firecracker microVM, a managed sandbox API) with no network egress and read-only data mounts. The compromise model is the same as for AutoGen and CrewAI — see the AutoGen entry's code-execution callout.
Migration and alternatives#
LangChain is the default but not the only choice. The table below captures the practical comparison with the three alternatives Yobitel customers most often weigh against it: LlamaIndex (RAG-centric), the raw provider SDK (Anthropic / OpenAI / Yobibyte directly), and AutoGen / CrewAI (multi-agent specialists). All four can target the same Yobibyte OpenAI-compatible endpoint without code-path divergence.
- From raw OpenAI / Anthropic SDK to LangChain — wrap the existing prompt and tool definitions in a ChatModel and `bind_tools`; the dispatch loop becomes a 5-line `create_react_agent` call. The provider JSON the model emits is unchanged.
- From LangChain 0.0.x or 0.1.x to 0.3+ — the migration is mostly mechanical via `langchain-cli migrate`; the substantive piece is replacing legacy `AgentExecutor` chains with LangGraph `create_react_agent` or a StateGraph.
- From LlamaIndex query engines to LangChain — wrap the LlamaIndex retriever with `LlamaIndexRetriever` and let LangChain handle the agent loop; or vice versa. The two frameworks compose more often than they substitute.
- From hand-rolled multi-agent dispatch to LangGraph — model your agents as nodes and your hand-offs as edges; HITL becomes `interrupt_before=[node]` rather than ad-hoc state.
- From LangChain self-hosted to Yobibyte managed — keep the LangChain code, change `base_url` and `api_key`; Yobibyte handles scaling, observability, billing and SLA. Recipe protection applies — Yobibyte's internal model routing is not part of the customer surface.
| Approach | Surface | Strengths | Weaknesses |
|---|---|---|---|
| LangChain (core + langchain + langgraph) | Runnable / ChatModel / Tool / StateGraph | Largest integration catalogue; cycles + HITL via LangGraph; LangSmith observability. | Surface area is wide; community packages vary in quality; minor-version pinning required. |
| LlamaIndex | DataConnectors / NodeParsers / Indexes / QueryEngines / Workflows | Deepest RAG ingestion + indexing primitives; LlamaParse for hard PDFs. | Lighter on multi-agent orchestration; smaller integration list for non-RAG tools. |
| Raw provider SDK (Anthropic / OpenAI) | Per-vendor JSON request / response | Total control; minimum dependencies; lowest latency. | Reinvent retry, parsing, tracing, agent loop per project; no multi-provider portability. |
| AutoGen | ConversableAgent / GroupChat | Multi-agent conversation patterns + code-executing agent. | Heavier abstraction; less natural for single-agent RAG. |
| CrewAI | Agent + Task + Crew with roles | Opinionated role-goal-backstory model; production-shaped. | Locked into the crew metaphor; smaller ecosystem than LangChain. |
| Yobibyte managed agent (alternative) | OpenAI-compatible endpoint + Yobibyte console | No framework to operate; SLA-backed; tracing and quotas included. | Tied to Yobitel; less flexibility than self-hosted LangChain when novel orchestration is required. |
Troubleshooting#
The failure modes below are the ones LangChain operators hit repeatedly when running production agents against OpenAI-compatible endpoints — including Yobibyte, Yobitel NeoCloud vLLM, and third-party providers.
| Symptom | Likely cause | Fix |
|---|---|---|
| ImportError on `from langchain import X` | Package split in 0.1+; X lives in langchain-core or langchain-community now. | Run `langchain-cli migrate` or follow the migration notes. |
| AgentExecutor stops after 1 step | `max_iterations` defaults to a low value (15); a misbehaving tool exhausts it. | Set `max_iterations=25-50` and inspect the trace for tool-call retries. |
| ChatModel times out against Yobibyte / OpenAI-compatible endpoint | Default request timeout too aggressive at first-token latency. | Set `timeout=60-120` and configure `max_retries=3-5` with backoff. |
| Tool gets called with wrong arguments | Vague tool description; loose JSON schema. | Tighten the schema with `pattern`, `enum`, and required fields; rewrite the description with explicit when-to-call examples. |
| StateGraph hits recursion limit | Cycle in the graph or a node that always re-enters. | Add a conditional edge to terminate; increase `recursion_limit` only if the cycle is intentional. |
| LangSmith trace missing for parts of the run | Non-LangChain Python in the call path is invisible. | Decorate the missing functions with `@traceable` from `langsmith`. |
| Postgres checkpointer slow under load | Default JSON serialisation re-encodes the whole state each step. | Switch to MsgPack serialiser; index the state column on `(thread_id, ts)`. |
| LangChain agent works in dev, fails in prod with 429 | No retry policy; Yobibyte / OpenAI rate limit hit and not absorbed. | Set `max_retries=5`, `timeout=120`; respect `Retry-After`. |
| Streaming stops mid-response | Reverse proxy buffering `text/event-stream`. | Disable proxy buffering (Nginx `proxy_buffering off;`, ALB `idle timeout >= keepalive`). |
| `bind_tools` returns no tool call even when obvious | Model lacks tool-calling support, or the catalogue exceeds the provider's schema-depth cap. | Verify the model on the Yobibyte / provider side supports tool calling; flatten the JSON schema. |
| `with_structured_output` returns None | Provider returned malformed JSON. | Use `outlines` or `xgrammar` constrained decoding on the upstream (vLLM / Yobibyte tenant); fall back to a tolerant parser with retry. |
The single most effective debugging tool for LangChain agents is a LangSmith trace tree. When you cannot reproduce a customer-reported failure locally, find the trace for the failing run, fork it into the prompt playground, and step through the LLM calls — the failure cause is usually visible in 30 seconds.
Where this fits in the Yobitel stack#
Yobitel treats LangChain as the recommended client framework for customers building agents against Yobibyte and Yobitel NeoCloud inference. Yobibyte exposes the OpenAI-compatible Chat Completions, Completions and Embeddings APIs that LangChain's ChatModel and OpenAIEmbeddings already speak — the integration is a configuration change, not a Yobitel-specific SDK. Customers consuming raw vLLM tenants on Yobitel NeoCloud get the same compatibility for free since vLLM ships the OpenAI surface upstream.
Internally, Yobitel's Customer Excellency and Professional Services teams use LangChain + LangGraph for the agent fabric of internal automation, and InferenceBench's pricing and latency feeds are consumed by LangChain agents that monitor model-provider drift. For UK Sovereign customers, the recommended pattern is to keep the LangChain code-base unchanged, terminate the ChatModel base_url inside the Yobibyte UK Sovereign boundary, persist LangGraph checkpoints in a regional Postgres, and route LangSmith traces to a self-hosted enterprise instance. None of this requires a Yobitel-specific framework — only standard LangChain composition pointed at Yobitel-operated endpoints.
References
- LangChain Documentation · LangChain
- LangGraph Documentation · LangChain
- LangSmith Documentation · LangChain
- LangChain on GitHub · GitHub
- LangGraph on GitHub · GitHub
- OpenTelemetry GenAI Semantic Conventions · OpenTelemetry