
Strengths
- Provides a complete tool chain for RAG (Retrieval Augmentation Generation)
- Support building complex AI agents and workflows
- Compatible with all mainstream models such as OpenAI, Anthropic, Hugging Face, etc.
- LangSmith provides complete debugging and monitoring capabilities
- Active open source community, a large number of ready-made integrated components
Best for
- Build an enterprise knowledge base question and answer system (RAG)
- Develop AI Agents that use tools
- Build a multi-step AI workflow (Chain)
- Document processing and information extraction
- Build conversational AI applications
RAG (Retrieval Augmentation Generation)
RAG is the most commonly used application scenario of LangChain, allowing AI to answer questions based on your private documents.
Build a local document question and answer system
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA
# 1. Load document
loader = PyPDFLoader("company_manual.pdf")
documents = loader.load()
# 2. Split text
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000, chunk_overlap=200
)
chunks = splitter.split_documents(documents)
# 3. Create vector database
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(chunks, embeddings)
# 4. Build a question and answer chain
qa_chain = RetrievalQA.from_chain_type(
llm=ChatOpenAI(model="gpt-4o"),
retriever=vectorstore.as_retriever(search_kwargs={"k": 3}),
return_source_documents=True
)
# 5. Ask a question
result = qa_chain.invoke({"query": "What is the company's annual leave policy?"})
print(result["result"])
print("source:", [doc.metadata for doc in result["source_documents"]])The system will:
Retrieve the 3 most relevant text fragments from PDF
Pass these fragments as context to GPT-4o
GPT-4o answers questions based on context
Return answers and source document information
The settings of chunk_size and chunk_overlap are important and are recommended to be adjusted according to the document type.
Multi-document knowledge base system
from langchain_community.document_loaders import DirectoryLoader
# Load documents from the entire directory
loader = DirectoryLoader(
"./docs",
glob="**/*.pdf",
loader_cls=PyPDFLoader
)
documents = loader.load()
print(f"{len(documents)} document fragments loaded")
# The subsequent steps are the same as for single document
# Vector database will automatically handle multiple documentsLoad all PDFs in the directory at once,
Build a unified knowledge base,
Supports cross-document question answering and information retrieval.
For large document libraries, it is recommended to use a persistent vector database (such as Chroma Persistence or Pinecone).
AI Agent development
LangChain’s Agent can independently decide which tools to use to complete the task.
Build an Agent that can search and compute
from langchain_openai import ChatOpenAI
from langchain.agents import create_openai_tools_agent, AgentExecutor
from langchain_community.tools import DuckDuckGoSearchRun
from langchain.tools import tool
from langchain import hub
#define tools
search = DuckDuckGoSearchRun()
@tool
def calculate(expression: str) -> str:
"""Compute mathematical expressions, such as '2 + 2 * 3'"""
return str(eval(expression))
tools = [search, calculate]
#Create Agent
llm = ChatOpenAI(model="gpt-4o")
prompt = hub.pull("hwchase17/openai-tools-agent")
agent = create_openai_tools_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
# Run Agent
result = agent_executor.invoke({
"input": "Search the price of Bitcoin today and calculate how much it would be worth if I had 0.5 Bitcoins"
})
print(result["output"])Agent will decide independently:
First call the search tool to query the Bitcoin price
Then call the calculation tool to calculate the value of 0.5 BTC
Integrate the results to give the final answer
verbose=True allows you to see the complete reasoning process.
verbose=True is very helpful for debugging Agent behavior and can be turned off in production environments.
Debugging with LangSmith
LangSmith is LangChain’s debugging and monitoring platform that can trace every LLM call.
Configuring LangSmith tracing
import os # Set LangSmith environment variables os.environ["LANGCHAIN_TRACING_V2"] = "true" os.environ["LANGCHAIN_API_KEY"] = "your-langsmith-api-key" os.environ["LANGCHAIN_PROJECT"] = "my-rag-project" # All subsequent LangChain calls will be automatically tracked # Can be viewed at smith.langchain.com: # - Input and output for each call # - Token usage and cost # - Delay and error messages # - Complete call link
You can see this in the LangSmith console:
Complete calling link (Chain → LLM → Tool)
Input and output of each step
Token consumption and cost statistics
Error location and debugging information
The free version of LangSmith is sufficient for personal projects and is an essential tool for debugging complex Agents.
Compared with similar tools
| Tool | Strength | Best for | Pricing |
|---|---|---|---|
| LangChain This tool | The most comprehensive functions, the most active community, and the most integrated components | Developers building complex RAG and Agent applications | Open Source Free / LangSmith Paid |
| LlamaIndex | Data connectivity and RAG capabilities are more focused | Focus on data retrieval and knowledge base construction | Open source and free |
| LangGraph | More flexible Agent workflow orchestration | Agents that require complex state management | Open source and free |
| AutoGen | Strong multi-agent collaboration capabilities | Multiple AI Agents are required to collaborate to complete tasks | Open source and free |
Sources & references:
- LangChain official documentation (2025-03)
- LangChain GitHub (2025-03)
- LangSmith Documentation (2025-03)