AI
LangChain
Agents
Tutorial
Python
LLM
2026

How to Build AI Agents with LangChain: Complete 2026 Tutorial

Mouhssine Lakhili profile
Mouhssine Lakhili
January 30, 202612 min read

Step-by-step tutorial to build production-ready AI agents with LangChain. From setup to deployment with tools, memory, evaluation, and error handling.

How to Build AI Agents with LangChain: Complete 2026 Tutorial
Article cover: Hands-on LangChain tutorial for building AI agents with retrieval, tools, and orchestration.

What You'll Build

By the end of this tutorial, you'll have a working AI agent that can:

  • Search the web for real-time information
  • Query a database for structured data
  • Maintain conversation memory across sessions
  • Handle errors gracefully with retry logic
  • Deploy to production with proper monitoring

This is not a "hello world" demo. This is a production-ready agent pattern you can adapt for real applications.

"The best way to learn AI agents is to build one that solves a real problem."

Prerequisites:

  • Python 3.10+
  • Basic understanding of LLMs and prompts
  • OpenAI API key (or Anthropic/other providers)
  • 30 minutes of focused time

Step 1: Project Setup

First, create your project structure and install dependencies.

# Create project directory
mkdir langchain-agent && cd langchain-agent

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install langchain langchain-openai langchain-community
pip install python-dotenv tavily-python chromadb

Create your .env file for API keys:

# .env
OPENAI_API_KEY=your-openai-key
TAVILY_API_KEY=your-tavily-key  # For web search

Create the main project structure:

langchain-agent/
├── .env
├── agent.py           # Main agent logic
├── tools.py           # Custom tools
├── memory.py          # Memory configuration
├── evaluation.py      # Testing and evaluation
└── requirements.txt

Step 2: Your First Agent

Let's build a simple agent that can answer questions using web search.

# agent.py
import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_react_agent
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_core.prompts import PromptTemplate

load_dotenv()

# Initialize the LLM
llm = ChatOpenAI(
    model="gpt-4o",
    temperature=0,
    max_tokens=1000
)

# Initialize tools
search_tool = TavilySearchResults(
    max_results=3,
    search_depth="advanced"
)
tools = [search_tool]

# Define the agent prompt
template = """You are a helpful AI assistant with access to web search.

Answer the user's question using the tools available. Always cite your sources.

You have access to the following tools:
{tools}

Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: {input}
{agent_scratchpad}
"""

prompt = PromptTemplate.from_template(template)

# Create the agent
agent = create_react_agent(llm, tools, prompt)

# Create the executor with error handling
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,
    handle_parsing_errors=True,
    max_iterations=5
)

# Test the agent
if __name__ == "__main__":
    result = agent_executor.invoke({
        "input": "What are the latest developments in AI agents as of January 2026?"
    })
    print(result["output"])

Run your agent:

python agent.py

You should see the agent reasoning through the problem, searching the web, and providing a sourced answer.

Step 3: Add Custom Tools

Real agents need custom tools for specific tasks. Let's add a database query tool.

# tools.py
from langchain.tools import tool
from typing import Optional
import json

# Simulated database (replace with real database connection)
MOCK_DATABASE = {
    "users": [
        {"id": 1, "name": "Alice", "role": "engineer", "projects": 5},
        {"id": 2, "name": "Bob", "role": "designer", "projects": 3},
        {"id": 3, "name": "Charlie", "role": "manager", "projects": 8},
    ],
    "projects": [
        {"id": 1, "name": "AI Dashboard", "status": "active", "team_size": 4},
        {"id": 2, "name": "Data Pipeline", "status": "completed", "team_size": 3},
        {"id": 3, "name": "Mobile App", "status": "planning", "team_size": 2},
    ]
}

@tool
def query_database(query: str) -> str:
    """
    Query the internal database for user and project information.

    Args:
        query: Natural language query like "list all engineers" or "active projects"

    Returns:
        JSON string with query results
    """
    query_lower = query.lower()

    # Simple query routing (in production, use SQL or proper query parsing)
    if "user" in query_lower or "engineer" in query_lower or "designer" in query_lower:
        if "engineer" in query_lower:
            results = [u for u in MOCK_DATABASE["users"] if u["role"] == "engineer"]
        elif "designer" in query_lower:
            results = [u for u in MOCK_DATABASE["users"] if u["role"] == "designer"]
        else:
            results = MOCK_DATABASE["users"]
        return json.dumps({"type": "users", "count": len(results), "data": results})

    if "project" in query_lower:
        if "active" in query_lower:
            results = [p for p in MOCK_DATABASE["projects"] if p["status"] == "active"]
        elif "completed" in query_lower:
            results = [p for p in MOCK_DATABASE["projects"] if p["status"] == "completed"]
        else:
            results = MOCK_DATABASE["projects"]
        return json.dumps({"type": "projects", "count": len(results), "data": results})

    return json.dumps({"error": "Query not understood. Try asking about users or projects."})


@tool
def calculate_metrics(data_type: str) -> str:
    """
    Calculate metrics for users or projects.

    Args:
        data_type: Either "users" or "projects"

    Returns:
        JSON string with calculated metrics
    """
    if data_type == "users":
        users = MOCK_DATABASE["users"]
        total_projects = sum(u["projects"] for u in users)
        avg_projects = total_projects / len(users)
        return json.dumps({
            "total_users": len(users),
            "total_projects_assigned": total_projects,
            "avg_projects_per_user": round(avg_projects, 2),
            "roles": list(set(u["role"] for u in users))
        })

    if data_type == "projects":
        projects = MOCK_DATABASE["projects"]
        total_team = sum(p["team_size"] for p in projects)
        statuses = {}
        for p in projects:
            statuses[p["status"]] = statuses.get(p["status"], 0) + 1
        return json.dumps({
            "total_projects": len(projects),
            "total_team_members": total_team,
            "status_breakdown": statuses
        })

    return json.dumps({"error": "Unknown data type. Use 'users' or 'projects'."})

Update your agent to use these tools:

# Updated agent.py
from tools import query_database, calculate_metrics

# Add custom tools to the tools list
tools = [search_tool, query_database, calculate_metrics]

Step 4: Implement Memory

Agents without memory forget everything between conversations. Let's add persistent memory.

# memory.py
from langchain.memory import ConversationBufferWindowMemory
from langchain_community.chat_message_histories import SQLChatMessageHistory

def create_memory(session_id: str, window_size: int = 10):
    """
    Create a memory instance with conversation history.

    Args:
        session_id: Unique identifier for the conversation
        window_size: Number of recent messages to keep in context

    Returns:
        Configured memory instance
    """
    # Use SQLite for persistent storage (can use PostgreSQL, Redis, etc.)
    message_history = SQLChatMessageHistory(
        session_id=session_id,
        connection_string="sqlite:///chat_history.db"
    )

    memory = ConversationBufferWindowMemory(
        chat_memory=message_history,
        k=window_size,
        return_messages=True,
        memory_key="chat_history"
    )

    return memory


def clear_memory(session_id: str):
    """Clear conversation history for a session."""
    message_history = SQLChatMessageHistory(
        session_id=session_id,
        connection_string="sqlite:///chat_history.db"
    )
    message_history.clear()

Update your agent to use memory:

# agent_with_memory.py
from memory import create_memory

# Create memory for this session
memory = create_memory(session_id="user_123")

# Update the prompt to include chat history
template_with_memory = """You are a helpful AI assistant with access to web search and internal databases.

Previous conversation:
{chat_history}

Answer the user's question using the tools available. Always cite your sources.

You have access to the following tools:
{tools}

Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: {input}
{agent_scratchpad}
"""

# Create executor with memory
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    memory=memory,
    verbose=True,
    handle_parsing_errors=True,
    max_iterations=5
)

Step 5: Add RAG (Retrieval-Augmented Generation)

For agents that need to answer questions from documents, add RAG capabilities.

# rag.py
from langchain_community.document_loaders import TextLoader, PDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.tools import tool

# Initialize embeddings and vector store
embeddings = OpenAIEmbeddings()
vectorstore = None

def initialize_vectorstore(documents_path: str):
    """Load documents and create vector store."""
    global vectorstore

    # Load documents (adjust loader based on file type)
    loader = TextLoader(documents_path)
    documents = loader.load()

    # Split into chunks
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200
    )
    splits = text_splitter.split_documents(documents)

    # Create vector store
    vectorstore = Chroma.from_documents(
        documents=splits,
        embedding=embeddings,
        persist_directory="./chroma_db"
    )
    return vectorstore


@tool
def search_documents(query: str) -> str:
    """
    Search internal documents for relevant information.

    Args:
        query: The search query

    Returns:
        Relevant document excerpts
    """
    if vectorstore is None:
        return "Document store not initialized."

    results = vectorstore.similarity_search(query, k=3)

    if not results:
        return "No relevant documents found."

    formatted_results = []
    for i, doc in enumerate(results, 1):
        formatted_results.append(f"[{i}] {doc.page_content[:500]}...")

    return "\n\n".join(formatted_results)

Step 6: Error Handling and Retry Logic

Production agents need robust error handling.

# error_handling.py
import time
from functools import wraps
from typing import Callable, Any

def retry_with_backoff(
    max_retries: int = 3,
    base_delay: float = 1.0,
    max_delay: float = 60.0,
    exceptions: tuple = (Exception,)
):
    """
    Decorator for retry logic with exponential backoff.
    """
    def decorator(func: Callable) -> Callable:
        @wraps(func)
        def wrapper(*args, **kwargs) -> Any:
            retries = 0
            while retries < max_retries:
                try:
                    return func(*args, **kwargs)
                except exceptions as e:
                    retries += 1
                    if retries == max_retries:
                        raise e

                    delay = min(base_delay * (2 ** retries), max_delay)
                    print(f"Attempt {retries} failed: {e}. Retrying in {delay}s...")
                    time.sleep(delay)

            return None
        return wrapper
    return decorator


class AgentError(Exception):
    """Base exception for agent errors."""
    pass


class ToolError(AgentError):
    """Error during tool execution."""
    pass


class MemoryError(AgentError):
    """Error with memory operations."""
    pass


def safe_tool_execution(tool_func: Callable, *args, **kwargs) -> dict:
    """
    Safely execute a tool with error capture.
    """
    try:
        result = tool_func(*args, **kwargs)
        return {"success": True, "result": result}
    except Exception as e:
        return {
            "success": False,
            "error": str(e),
            "error_type": type(e).__name__
        }

Step 7: Evaluation and Testing

Never ship an agent without testing. Here's how to evaluate your agent.

# evaluation.py
import json
from typing import List, Dict
from dataclasses import dataclass

@dataclass
class TestCase:
    input: str
    expected_action: str  # Expected tool to be called
    expected_keywords: List[str]  # Keywords expected in output
    min_confidence: float = 0.7

TEST_CASES = [
    TestCase(
        input="How many engineers do we have?",
        expected_action="query_database",
        expected_keywords=["engineer", "user"]
    ),
    TestCase(
        input="What is the latest news about LangChain?",
        expected_action="tavily_search",
        expected_keywords=["LangChain"]
    ),
    TestCase(
        input="Calculate user metrics",
        expected_action="calculate_metrics",
        expected_keywords=["total", "average"]
    ),
]


def run_evaluation(agent_executor, test_cases: List[TestCase]) -> Dict:
    """
    Run evaluation suite on the agent.
    """
    results = {
        "total": len(test_cases),
        "passed": 0,
        "failed": 0,
        "details": []
    }

    for i, test in enumerate(test_cases):
        print(f"\nRunning test {i+1}/{len(test_cases)}: {test.input[:50]}...")

        try:
            response = agent_executor.invoke({"input": test.input})
            output = response.get("output", "").lower()

            # Check if expected keywords are present
            keywords_found = sum(
                1 for kw in test.expected_keywords
                if kw.lower() in output
            )
            keyword_score = keywords_found / len(test.expected_keywords)

            passed = keyword_score >= test.min_confidence

            results["details"].append({
                "test": test.input,
                "passed": passed,
                "keyword_score": keyword_score,
                "output_preview": output[:200]
            })

            if passed:
                results["passed"] += 1
            else:
                results["failed"] += 1

        except Exception as e:
            results["failed"] += 1
            results["details"].append({
                "test": test.input,
                "passed": False,
                "error": str(e)
            })

    results["pass_rate"] = results["passed"] / results["total"]
    return results


if __name__ == "__main__":
    from agent import agent_executor

    results = run_evaluation(agent_executor, TEST_CASES)
    print(f"\n{'='*50}")
    print(f"Evaluation Results:")
    print(f"Pass Rate: {results['pass_rate']*100:.1f}%")
    print(f"Passed: {results['passed']}/{results['total']}")

Step 8: Deploy to Production

Here's a simple FastAPI deployment.

# api.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import Optional
import uvicorn

from agent import agent_executor
from memory import create_memory, clear_memory

app = FastAPI(title="AI Agent API")

class QueryRequest(BaseModel):
    query: str
    session_id: Optional[str] = "default"

class QueryResponse(BaseModel):
    response: str
    session_id: str

@app.post("/query", response_model=QueryResponse)
async def query_agent(request: QueryRequest):
    """Send a query to the AI agent."""
    try:
        result = agent_executor.invoke({
            "input": request.query
        })
        return QueryResponse(
            response=result["output"],
            session_id=request.session_id
        )
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.post("/clear-memory/{session_id}")
async def clear_session_memory(session_id: str):
    """Clear conversation memory for a session."""
    clear_memory(session_id)
    return {"message": f"Memory cleared for session {session_id}"}

@app.get("/health")
async def health_check():
    """Health check endpoint."""
    return {"status": "healthy"}

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)

Deploy with:

# Local testing
python api.py

# Production (use gunicorn)
pip install gunicorn
gunicorn api:app -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000

Common Errors and Fixes

Error: "Agent stopped due to max iterations"

Cause: The agent is stuck in a loop or taking too many steps.

Fix: Increase max_iterations or improve your prompt to be more specific:

agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    max_iterations=10,  # Increase from default 5
    early_stopping_method="generate"  # Let the agent decide when to stop
)

Error: "Could not parse LLM output"

Cause: The LLM response doesn't match the expected format.

Fix: Enable error handling and add better prompt instructions:

agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    handle_parsing_errors=True,  # Auto-retry on parse errors
    verbose=True
)

Error: "Rate limit exceeded"

Cause: Too many API calls in a short time.

Fix: Add rate limiting and caching:

from langchain.cache import SQLiteCache
from langchain.globals import set_llm_cache

# Enable caching
set_llm_cache(SQLiteCache(database_path=".langchain.db"))

Error: "Context length exceeded"

Cause: Too much text in the prompt (memory + tools + query).

Fix: Use windowed memory and summarization:

from langchain.memory import ConversationSummaryBufferMemory

memory = ConversationSummaryBufferMemory(
    llm=llm,
    max_token_limit=2000,  # Summarize when exceeding this
    return_messages=True
)

FAQ

Q: Which LLM should I use?

For agents, use models with strong reasoning: GPT-4o, Claude 3.5 Sonnet, or Gemini 1.5 Pro. Avoid smaller models for complex agent tasks.

Q: How do I reduce costs?

  1. Cache responses with set_llm_cache()
  2. Use smaller models for simple tools
  3. Limit memory window size
  4. Set max_iterations appropriately

Q: Can I use local models?

Yes! Use Ollama or vLLM:

from langchain_community.llms import Ollama
llm = Ollama(model="llama3.1:70b")

Q: How do I add authentication?

Use FastAPI middleware:

from fastapi import Depends, HTTPException
from fastapi.security import HTTPBearer

security = HTTPBearer()

@app.post("/query")
async def query_agent(request: QueryRequest, token = Depends(security)):
    # Validate token
    pass

Complete GitHub Template

Get the full code with additional features:

git clone https://github.com/LMouhssine/langchain-agent-template
cd langchain-agent-template
pip install -r requirements.txt
cp .env.example .env
# Add your API keys to .env
python agent.py

What's Next?

You now have a production-ready AI agent. Here's where to go from here:

  1. Add more tools - Connect to your specific APIs and databases
  2. Implement guardrails - Add safety checks for sensitive operations
  3. Set up monitoring - Track usage, errors, and costs
  4. Build a UI - Connect to a frontend or Slack/Discord

Continue your AI agent journey:

If this tutorial helped you build your first agent, share it with another developer. The best way to learn is to build.

Build with AI and ship with confidence

Need a developer who can turn ideas into production work?

I help teams ship React, Next.js, Node.js, AI, and automation work with clear scope, practical guardrails, and fast execution.

Share this article

Related articles