How to Build AI Agents with LangChain: Complete 2026 Tutorial
Step-by-step tutorial to build production-ready AI agents with LangChain. From setup to deployment with tools, memory, evaluation, and error handling.
What You'll Build
By the end of this tutorial, you'll have a working AI agent that can:
- Search the web for real-time information
- Query a database for structured data
- Maintain conversation memory across sessions
- Handle errors gracefully with retry logic
- Deploy to production with proper monitoring
This is not a "hello world" demo. This is a production-ready agent pattern you can adapt for real applications.
"The best way to learn AI agents is to build one that solves a real problem."
Prerequisites:
- Python 3.10+
- Basic understanding of LLMs and prompts
- OpenAI API key (or Anthropic/other providers)
- 30 minutes of focused time
Step 1: Project Setup
First, create your project structure and install dependencies.
# Create project directory
mkdir langchain-agent && cd langchain-agent
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install langchain langchain-openai langchain-community
pip install python-dotenv tavily-python chromadb
Create your .env file for API keys:
# .env
OPENAI_API_KEY=your-openai-key
TAVILY_API_KEY=your-tavily-key # For web search
Create the main project structure:
langchain-agent/
├── .env
├── agent.py # Main agent logic
├── tools.py # Custom tools
├── memory.py # Memory configuration
├── evaluation.py # Testing and evaluation
└── requirements.txt
Step 2: Your First Agent
Let's build a simple agent that can answer questions using web search.
# agent.py
import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_react_agent
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_core.prompts import PromptTemplate
load_dotenv()
# Initialize the LLM
llm = ChatOpenAI(
model="gpt-4o",
temperature=0,
max_tokens=1000
)
# Initialize tools
search_tool = TavilySearchResults(
max_results=3,
search_depth="advanced"
)
tools = [search_tool]
# Define the agent prompt
template = """You are a helpful AI assistant with access to web search.
Answer the user's question using the tools available. Always cite your sources.
You have access to the following tools:
{tools}
Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
Begin!
Question: {input}
{agent_scratchpad}
"""
prompt = PromptTemplate.from_template(template)
# Create the agent
agent = create_react_agent(llm, tools, prompt)
# Create the executor with error handling
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
verbose=True,
handle_parsing_errors=True,
max_iterations=5
)
# Test the agent
if __name__ == "__main__":
result = agent_executor.invoke({
"input": "What are the latest developments in AI agents as of January 2026?"
})
print(result["output"])
Run your agent:
python agent.py
You should see the agent reasoning through the problem, searching the web, and providing a sourced answer.
Step 3: Add Custom Tools
Real agents need custom tools for specific tasks. Let's add a database query tool.
# tools.py
from langchain.tools import tool
from typing import Optional
import json
# Simulated database (replace with real database connection)
MOCK_DATABASE = {
"users": [
{"id": 1, "name": "Alice", "role": "engineer", "projects": 5},
{"id": 2, "name": "Bob", "role": "designer", "projects": 3},
{"id": 3, "name": "Charlie", "role": "manager", "projects": 8},
],
"projects": [
{"id": 1, "name": "AI Dashboard", "status": "active", "team_size": 4},
{"id": 2, "name": "Data Pipeline", "status": "completed", "team_size": 3},
{"id": 3, "name": "Mobile App", "status": "planning", "team_size": 2},
]
}
@tool
def query_database(query: str) -> str:
"""
Query the internal database for user and project information.
Args:
query: Natural language query like "list all engineers" or "active projects"
Returns:
JSON string with query results
"""
query_lower = query.lower()
# Simple query routing (in production, use SQL or proper query parsing)
if "user" in query_lower or "engineer" in query_lower or "designer" in query_lower:
if "engineer" in query_lower:
results = [u for u in MOCK_DATABASE["users"] if u["role"] == "engineer"]
elif "designer" in query_lower:
results = [u for u in MOCK_DATABASE["users"] if u["role"] == "designer"]
else:
results = MOCK_DATABASE["users"]
return json.dumps({"type": "users", "count": len(results), "data": results})
if "project" in query_lower:
if "active" in query_lower:
results = [p for p in MOCK_DATABASE["projects"] if p["status"] == "active"]
elif "completed" in query_lower:
results = [p for p in MOCK_DATABASE["projects"] if p["status"] == "completed"]
else:
results = MOCK_DATABASE["projects"]
return json.dumps({"type": "projects", "count": len(results), "data": results})
return json.dumps({"error": "Query not understood. Try asking about users or projects."})
@tool
def calculate_metrics(data_type: str) -> str:
"""
Calculate metrics for users or projects.
Args:
data_type: Either "users" or "projects"
Returns:
JSON string with calculated metrics
"""
if data_type == "users":
users = MOCK_DATABASE["users"]
total_projects = sum(u["projects"] for u in users)
avg_projects = total_projects / len(users)
return json.dumps({
"total_users": len(users),
"total_projects_assigned": total_projects,
"avg_projects_per_user": round(avg_projects, 2),
"roles": list(set(u["role"] for u in users))
})
if data_type == "projects":
projects = MOCK_DATABASE["projects"]
total_team = sum(p["team_size"] for p in projects)
statuses = {}
for p in projects:
statuses[p["status"]] = statuses.get(p["status"], 0) + 1
return json.dumps({
"total_projects": len(projects),
"total_team_members": total_team,
"status_breakdown": statuses
})
return json.dumps({"error": "Unknown data type. Use 'users' or 'projects'."})
Update your agent to use these tools:
# Updated agent.py
from tools import query_database, calculate_metrics
# Add custom tools to the tools list
tools = [search_tool, query_database, calculate_metrics]
Step 4: Implement Memory
Agents without memory forget everything between conversations. Let's add persistent memory.
# memory.py
from langchain.memory import ConversationBufferWindowMemory
from langchain_community.chat_message_histories import SQLChatMessageHistory
def create_memory(session_id: str, window_size: int = 10):
"""
Create a memory instance with conversation history.
Args:
session_id: Unique identifier for the conversation
window_size: Number of recent messages to keep in context
Returns:
Configured memory instance
"""
# Use SQLite for persistent storage (can use PostgreSQL, Redis, etc.)
message_history = SQLChatMessageHistory(
session_id=session_id,
connection_string="sqlite:///chat_history.db"
)
memory = ConversationBufferWindowMemory(
chat_memory=message_history,
k=window_size,
return_messages=True,
memory_key="chat_history"
)
return memory
def clear_memory(session_id: str):
"""Clear conversation history for a session."""
message_history = SQLChatMessageHistory(
session_id=session_id,
connection_string="sqlite:///chat_history.db"
)
message_history.clear()
Update your agent to use memory:
# agent_with_memory.py
from memory import create_memory
# Create memory for this session
memory = create_memory(session_id="user_123")
# Update the prompt to include chat history
template_with_memory = """You are a helpful AI assistant with access to web search and internal databases.
Previous conversation:
{chat_history}
Answer the user's question using the tools available. Always cite your sources.
You have access to the following tools:
{tools}
Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
Begin!
Question: {input}
{agent_scratchpad}
"""
# Create executor with memory
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
memory=memory,
verbose=True,
handle_parsing_errors=True,
max_iterations=5
)
Step 5: Add RAG (Retrieval-Augmented Generation)
For agents that need to answer questions from documents, add RAG capabilities.
# rag.py
from langchain_community.document_loaders import TextLoader, PDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.tools import tool
# Initialize embeddings and vector store
embeddings = OpenAIEmbeddings()
vectorstore = None
def initialize_vectorstore(documents_path: str):
"""Load documents and create vector store."""
global vectorstore
# Load documents (adjust loader based on file type)
loader = TextLoader(documents_path)
documents = loader.load()
# Split into chunks
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200
)
splits = text_splitter.split_documents(documents)
# Create vector store
vectorstore = Chroma.from_documents(
documents=splits,
embedding=embeddings,
persist_directory="./chroma_db"
)
return vectorstore
@tool
def search_documents(query: str) -> str:
"""
Search internal documents for relevant information.
Args:
query: The search query
Returns:
Relevant document excerpts
"""
if vectorstore is None:
return "Document store not initialized."
results = vectorstore.similarity_search(query, k=3)
if not results:
return "No relevant documents found."
formatted_results = []
for i, doc in enumerate(results, 1):
formatted_results.append(f"[{i}] {doc.page_content[:500]}...")
return "\n\n".join(formatted_results)
Step 6: Error Handling and Retry Logic
Production agents need robust error handling.
# error_handling.py
import time
from functools import wraps
from typing import Callable, Any
def retry_with_backoff(
max_retries: int = 3,
base_delay: float = 1.0,
max_delay: float = 60.0,
exceptions: tuple = (Exception,)
):
"""
Decorator for retry logic with exponential backoff.
"""
def decorator(func: Callable) -> Callable:
@wraps(func)
def wrapper(*args, **kwargs) -> Any:
retries = 0
while retries < max_retries:
try:
return func(*args, **kwargs)
except exceptions as e:
retries += 1
if retries == max_retries:
raise e
delay = min(base_delay * (2 ** retries), max_delay)
print(f"Attempt {retries} failed: {e}. Retrying in {delay}s...")
time.sleep(delay)
return None
return wrapper
return decorator
class AgentError(Exception):
"""Base exception for agent errors."""
pass
class ToolError(AgentError):
"""Error during tool execution."""
pass
class MemoryError(AgentError):
"""Error with memory operations."""
pass
def safe_tool_execution(tool_func: Callable, *args, **kwargs) -> dict:
"""
Safely execute a tool with error capture.
"""
try:
result = tool_func(*args, **kwargs)
return {"success": True, "result": result}
except Exception as e:
return {
"success": False,
"error": str(e),
"error_type": type(e).__name__
}
Step 7: Evaluation and Testing
Never ship an agent without testing. Here's how to evaluate your agent.
# evaluation.py
import json
from typing import List, Dict
from dataclasses import dataclass
@dataclass
class TestCase:
input: str
expected_action: str # Expected tool to be called
expected_keywords: List[str] # Keywords expected in output
min_confidence: float = 0.7
TEST_CASES = [
TestCase(
input="How many engineers do we have?",
expected_action="query_database",
expected_keywords=["engineer", "user"]
),
TestCase(
input="What is the latest news about LangChain?",
expected_action="tavily_search",
expected_keywords=["LangChain"]
),
TestCase(
input="Calculate user metrics",
expected_action="calculate_metrics",
expected_keywords=["total", "average"]
),
]
def run_evaluation(agent_executor, test_cases: List[TestCase]) -> Dict:
"""
Run evaluation suite on the agent.
"""
results = {
"total": len(test_cases),
"passed": 0,
"failed": 0,
"details": []
}
for i, test in enumerate(test_cases):
print(f"\nRunning test {i+1}/{len(test_cases)}: {test.input[:50]}...")
try:
response = agent_executor.invoke({"input": test.input})
output = response.get("output", "").lower()
# Check if expected keywords are present
keywords_found = sum(
1 for kw in test.expected_keywords
if kw.lower() in output
)
keyword_score = keywords_found / len(test.expected_keywords)
passed = keyword_score >= test.min_confidence
results["details"].append({
"test": test.input,
"passed": passed,
"keyword_score": keyword_score,
"output_preview": output[:200]
})
if passed:
results["passed"] += 1
else:
results["failed"] += 1
except Exception as e:
results["failed"] += 1
results["details"].append({
"test": test.input,
"passed": False,
"error": str(e)
})
results["pass_rate"] = results["passed"] / results["total"]
return results
if __name__ == "__main__":
from agent import agent_executor
results = run_evaluation(agent_executor, TEST_CASES)
print(f"\n{'='*50}")
print(f"Evaluation Results:")
print(f"Pass Rate: {results['pass_rate']*100:.1f}%")
print(f"Passed: {results['passed']}/{results['total']}")
Step 8: Deploy to Production
Here's a simple FastAPI deployment.
# api.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import Optional
import uvicorn
from agent import agent_executor
from memory import create_memory, clear_memory
app = FastAPI(title="AI Agent API")
class QueryRequest(BaseModel):
query: str
session_id: Optional[str] = "default"
class QueryResponse(BaseModel):
response: str
session_id: str
@app.post("/query", response_model=QueryResponse)
async def query_agent(request: QueryRequest):
"""Send a query to the AI agent."""
try:
result = agent_executor.invoke({
"input": request.query
})
return QueryResponse(
response=result["output"],
session_id=request.session_id
)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
@app.post("/clear-memory/{session_id}")
async def clear_session_memory(session_id: str):
"""Clear conversation memory for a session."""
clear_memory(session_id)
return {"message": f"Memory cleared for session {session_id}"}
@app.get("/health")
async def health_check():
"""Health check endpoint."""
return {"status": "healthy"}
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)
Deploy with:
# Local testing
python api.py
# Production (use gunicorn)
pip install gunicorn
gunicorn api:app -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000
Common Errors and Fixes
Error: "Agent stopped due to max iterations"
Cause: The agent is stuck in a loop or taking too many steps.
Fix: Increase max_iterations or improve your prompt to be more specific:
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
max_iterations=10, # Increase from default 5
early_stopping_method="generate" # Let the agent decide when to stop
)
Error: "Could not parse LLM output"
Cause: The LLM response doesn't match the expected format.
Fix: Enable error handling and add better prompt instructions:
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
handle_parsing_errors=True, # Auto-retry on parse errors
verbose=True
)
Error: "Rate limit exceeded"
Cause: Too many API calls in a short time.
Fix: Add rate limiting and caching:
from langchain.cache import SQLiteCache
from langchain.globals import set_llm_cache
# Enable caching
set_llm_cache(SQLiteCache(database_path=".langchain.db"))
Error: "Context length exceeded"
Cause: Too much text in the prompt (memory + tools + query).
Fix: Use windowed memory and summarization:
from langchain.memory import ConversationSummaryBufferMemory
memory = ConversationSummaryBufferMemory(
llm=llm,
max_token_limit=2000, # Summarize when exceeding this
return_messages=True
)
FAQ
Q: Which LLM should I use?
For agents, use models with strong reasoning: GPT-4o, Claude 3.5 Sonnet, or Gemini 1.5 Pro. Avoid smaller models for complex agent tasks.
Q: How do I reduce costs?
- Cache responses with
set_llm_cache() - Use smaller models for simple tools
- Limit memory window size
- Set
max_iterationsappropriately
Q: Can I use local models?
Yes! Use Ollama or vLLM:
from langchain_community.llms import Ollama
llm = Ollama(model="llama3.1:70b")
Q: How do I add authentication?
Use FastAPI middleware:
from fastapi import Depends, HTTPException
from fastapi.security import HTTPBearer
security = HTTPBearer()
@app.post("/query")
async def query_agent(request: QueryRequest, token = Depends(security)):
# Validate token
pass
Complete GitHub Template
Get the full code with additional features:
git clone https://github.com/LMouhssine/langchain-agent-template
cd langchain-agent-template
pip install -r requirements.txt
cp .env.example .env
# Add your API keys to .env
python agent.py
What's Next?
You now have a production-ready AI agent. Here's where to go from here:
- Add more tools - Connect to your specific APIs and databases
- Implement guardrails - Add safety checks for sensitive operations
- Set up monitoring - Track usage, errors, and costs
- Build a UI - Connect to a frontend or Slack/Discord
Related guides
Continue your AI agent journey:
- Why AI Agents Fail (And How to Fix Them) — Debug production agent failures with this checklist.
- The AI Orchestrator Battle Guide 2026 — Level up from single agents to multi-agent systems.
- AI-Powered Developer Workflows — Compare tools for building with AI.
If this tutorial helped you build your first agent, share it with another developer. The best way to learn is to build.
Build with AI and ship with confidence
Need a developer who can turn ideas into production work?
I help teams ship React, Next.js, Node.js, AI, and automation work with clear scope, practical guardrails, and fast execution.
Related articles
How AI Agents Actually Work: Architecture, Memory, Tools, and the Agent Loop
A technical walkthrough of AI agent architecture: the agent loop, tool use, memory (RAG/vector DBs), evaluation, and common production failure modes.
Why AI Agents Fail (And How to Fix Them)
A practical guide to AI agent failures in production and how to fix them with better prompts, memory design, tool gating, evaluation, UX, and security.
Building SmartDAM: An AI-Powered Digital Asset Manager for Food Photography
How I built SmartDAM — a Flask app that auto-analyzes food images via HuggingFace, generates multilingual tags, supports Azure Blob Storage, and delivers real-time search.
