diff --git a/README.md b/README.md
index fdcbb18..8223c4f 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,6 @@
 # AI Agent - 智能助手系统
 
-一个基于 LangGraph + FastAPI 的智能对话助手，支持多模型切换、RAG 知识库检索、联网搜索、可视化图表、以及多个专业子图模块（通讯录、词典、资讯分析）等功能。
+一个基于 LangGraph + FastAPI 的智能对话助手，支持多模型切换、联网搜索、可视化图表、以及多个专业子图模块（通讯录、词典、资讯分析）等功能。
 
 ---
 
@@ -8,12 +8,11 @@
 
 - [核心功能](#-核心功能) - 面向用户的功能和技术特性
 - [技术架构](#️-技术架构) - 技术栈、系统架构图、工作流流程图
-- [核心算法与实现原理](#-核心算法与实现原理) - RAG 检索、LangGraph 工作流、多模型路由、SSE 流式响应
-- [RAG 系统完整架构](#-rag-系统完整架构) - 离线索引构建、在线检索生成、演进路线
+- [核心算法与实现原理](#-核心算法与实现原理) - LangGraph 工作流、多模型路由、SSE 流式响应
 - [快速开始](#-快速开始) - Docker 和本地部署指南
 - [使用指南](#-使用指南) - 基础对话、工具调用、多模型切换
 - [开发指南](#-开发指南) - 添加工具、添加模型、Docker 部署
-- [实现指南与最佳实践](#️-实现指南与最佳实践) - RAG 构建、性能优化、扩展开发、部署实践
+- [实现指南与最佳实践](#️-实现指南与最佳实践) - 性能优化、扩展开发、部署实践
 - [环境配置](#️-环境配置) - 配置文件、环境变量
 - [故障排查](#-故障排查) - 常见问题
 
@@ -24,7 +23,6 @@
 ### 面向用户的功能
 
 - 💬 **智能对话**：支持多轮对话，自动记忆上下文
-- 🔍 **知识库检索（RAG）**：基于向量数据库的智能问答
 - 🌐 **联网搜索**：免费使用 DuckDuckGo 搜索，无需 API Key，支持引用溯源
 - 📊 **可视化图表**：支持 Mermaid 图表和 matplotlib 图表生成
 - 🔄 **多模型切换**：前端可选择不同大语言模型
@@ -56,26 +54,26 @@
 
 ### 技术栈总览
 
-| 层级 | 组件 | 技术选型 | 版本 | 说明 |
-|------|------|---------|------|------|
-| **LLM 服务** | 云端模型 | 智谱 AI (glm-5.1) | v5.1 | 快速响应，适合日常对话 |
-| | | DeepSeek (deepseek-v4-pro) | v4 | 深度推理，适合复杂问题 |
-| | 本地模型 | Gemma-4-E4B-it | v4 | 本地部署，保护隐私 |
-| **模型服务层** | Chat 服务 | chat_services.py | - | 统一的生成式大模型接口 |
-| | Embedding 服务 | embedding_services.py | - | 统一的嵌入模型接口 |
-| | Rerank 服务 | rerank_services.py | - | 统一的重排序接口 |
-| | 服务基类 | base.py | - | BaseServiceProvider + FallbackServiceChain + SingletonServiceManager |
-| **Embedding** | 向量嵌入 | llama.cpp server | latest | 本地 embedding 服务，支持多种模型 |
-| **Agent 框架** | 工作流编排 | LangGraph + LangChain | latest | 状态机驱动的智能体工作流 |
-| **子图系统** | 模块化子图 | subgraphs/ | - | 通讯录、词典、资讯分析等子图 |
-| | 核心工具 | core/ | - | 状态基类、意图理解、格式化输出、人工审核、**联网搜索**、**可视化图表** |
-| **主图系统** | 主流程 | main_graph/ | - | 主图节点、工具、构建器 |
-| **向量数据库** | 向量检索 | Qdrant | v1.12+ | 高性能向量相似度检索（远程服务器） |
-| **后端框架** | API 服务 | FastAPI + Uvicorn | v0.115+ | RESTful API + WebSocket + SSE 流式输出 |
-| **前端框架** | Web 界面 | Streamlit | v1.40+ | 交互式对话界面，组件化设计 |
-| **关系数据库** | 持久化存储 | PostgreSQL | v16 | 对话记忆持久化（远程服务器） |
-| **容器化** | 服务编排 | Docker + Docker Compose | v24+ | 一键部署所有服务 |
-| **CI/CD** | 自动化部署 | Gitea Workflows | latest | 代码推送自动构建部署 |
+| 层级 | 组件 | 技术选型 | 说明 |
+|------|------|---------|------|
+| **Agent 框架** | 工作流编排 | LangGraph + LangChain | 状态机驱动的智能体工作流 |
+| **主图系统** | 主流程 | main_graph/ | 混合路由 + React 循环 + 工具执行 |
+| **子图系统** | 模块化子图 | subgraphs/ | 通讯录、词典、资讯分析等子图 |
+| | 核心工具 | core/ | 意图理解、格式化输出、人工审核、联网搜索、可视化图表 |
+| **向量数据库** | 向量检索 | Qdrant | 高性能向量相似度检索（远程服务器） |
+| **后端框架** | API 服务 | FastAPI + Uvicorn | RESTful API + SSE 流式输出 |
+| **前端框架** | Web 界面 | Streamlit | 交互式对话界面 |
+| **关系数据库** | 持久化存储 | PostgreSQL | 对话记忆持久化（远程服务器） |
+| **容器化** | 服务编排 | Docker + Docker Compose | 一键部署所有服务 |
+| **CI/CD** | 自动化部署 | Gitea Workflows | 代码推送自动构建部署 |
+| **LLM 服务** | 云端模型 | 智谱 AI (glm-4-plus) | 快速响应，适合日常对话 |
+| | | DeepSeek (deepseek-chat-v3) | 深度推理，适合复杂问题 |
+| | | OpenAI (gpt-4o-mini) | 通用对话 |
+| | 本地模型 | Qwen3.5-9B.Q4_K_M | 本地部署 GGUF 格式 |
+| **模型服务层** | Chat 服务 | chat_services.py | 统一的生成式大模型接口 |
+| | Embedding 服务 | embedding_services.py | 统一的嵌入模型接口 |
+| | Rerank 服务 | rerank_services.py | 统一的重排序接口 |
+| **Embedding** | 向量嵌入 | llama.cpp server | 本地 embedding 服务 (:18001) |
 
 ### 系统架构流程图
 
@@ -85,81 +83,109 @@ graph TB
     Frontend -->|REST API| Backend[FastAPI 后端 :8079]
 
     Backend --> AgentService[AIAgentService]
-    Backend --> SubgraphAPI[子图 API 端点<br>/subgraph/contact<br>/subgraph/dictionary<br>/subgraph/news]
-
-    AgentService -->|模型路由| ChatServices[模型服务层 chat_services]
-    ChatServices -->|自动降级| FallbackChain[FallbackServiceChain]
-    FallbackChain -->|创建| Zhipu[智谱 GLM-5.1]
-    FallbackChain -->|创建| DeepSeek[DeepSeek V4-Pro]
-    FallbackChain -->|创建| LocalGemma[本地 Gemma-4]
 
     AgentService -->|初始化| LangGraph[LangGraph 工作流引擎]
 
-    LangGraph -->|节点1| RetrieveMemory[记忆检索]
-    LangGraph -->|节点2| MemoryTrigger[记忆触发]
-    LangGraph -->|节点3| LLMCall[LLM调用 React模式]
-    LangGraph -->|节点4| ToolNode[工具执行]
-    LangGraph -->|节点5| Summarize[记忆摘要]
-    LangGraph -->|节点6| Finalize[最终处理]
+    LangGraph -->|阶段1| RetrieveMemory[记忆检索 retrieve_memory]
+    LangGraph -->|阶段2| MemoryTrigger[记忆触发 memory_trigger]
+    LangGraph -->|阶段3| InitState[初始化状态 init_state]
+    LangGraph -->|阶段4| HybridRouter[⭐ 混合路由 hybrid_router]
+    LangGraph -->|阶段5| ReactLoop[React 循环 react_loop]
+    LangGraph -->|阶段6| FastPath[快速路径 fast_*]
+    LangGraph -->|阶段7| LLMCall[LLM 调用 llm_call]
+    LangGraph -->|阶段8| Summarize[记忆摘要 summarize]
+    LangGraph -->|阶段9| Finalize[最终处理 finalize]
 
-    ToolNode -->|调用| Tools[工具集合]
-    Tools -->|RAG| RAGTool[知识库检索]
-    Tools -->|子图| SubgraphTools[子图工具: 通讯录/词典/资讯]
+    HybridRouter -->|闲聊| FastChitchat[fast_chitchat]
+    HybridRouter -->|知识查询| FastRAG[fast_rag]
+    HybridRouter -->|工具调用| FastTool[fast_tool]
+    HybridRouter -->|复杂任务| ReactLoop
 
-    RAGTool -->|检索| Qdrant[Qdrant向量库]
-    RAGTool -->|重排序| RerankService[rerank_services]
-    RAGTool -->|嵌入| EmbeddingService[embedding_services]
+    ReactLoop -->|推理| ReactReason[react_reason 推理节点]
+    ReactLoop -->|RAG检索| RAGRetrieve[rag_retrieve]
+    ReactLoop -->|联网搜索| WebSearch[web_search]
+    ReactLoop -->|通讯录| ContactSubgraph[contact_subgraph]
+    ReactLoop -->|词典| DictionarySubgraph[dictionary_subgraph]
+    ReactLoop -->|资讯分析| NewsSubgraph[news_analysis_subgraph]
+    ReactLoop -->|错误处理| HandleError[handle_error]
 
-    RetrieveMemory -->|存储| PostgreSQL[PostgreSQL]
+    RAGRetrieve -->|向量检索| Qdrant[Qdrant向量库]
+    RAGRetrieve -->|重排序| RerankService[Rerank服务]
+    RAGRetrieve -->|嵌入| EmbeddingService[Embedding服务]
+
+    AgentService -->|模型路由| ChatServices[模型服务层 chat_services]
+    ChatServices -->|自动降级| FallbackChain[FallbackServiceChain]
+    FallbackChain -->|创建| Zhipu[智谱 GLM-4]
+    FallbackChain -->|创建| DeepSeek[DeepSeek V3]
+    FallbackChain -->|创建| OpenAI[OpenAI GPT-4o]
+    FallbackChain -->|创建| LocalQwen[本地 Qwen3.5-9B]
+
+    RetrieveMemory -->|存储/读取| PostgreSQL[PostgreSQL]
     Summarize -->|存储| PostgreSQL
 
-    SubgraphAPI -->|路由| Subgraphs[子图系统 subgraphs]
-    Subgraphs --> Contact[通讯录子图]
-    Subgraphs --> Dictionary[词典子图]
-    Subgraphs --> NewsAnalysis[资讯分析子图]
-    Subgraphs --> Core[核心工具 core/]
-    Core --> Intent[意图理解 React]
-    Core --> HumanReview[人工审核]
-    Core --> Formatter[格式化输出]
-    Core --> StateBase[状态基类]
-    Core --> WebSearch[⭐ 联网搜索 DuckDuckGo]
-    Core --> Visualization[⭐ 可视化图表 Mermaid]
-
-    Contact -->|数据库| ContactDB[PostgreSQL联系人]
-    Dictionary -->|数据库| DictionaryDB[PostgreSQL生词本]
-    NewsAnalysis -->|检索| NewsQdrant[Qdrant向量检索]
-
     style User fill:#e1f5ff
     style Frontend fill:#fff4e1
     style Backend fill:#e8f5e9
+    style HybridRouter fill:#fff3e0,stroke:#ff9800,stroke-width:3px
+    style ReactLoop fill:#f3e5f5
+    style FastPath fill:#e3f2fd
+    style LangGraph fill:#c8e6c9
     style ChatServices fill:#c8e6c9
-    style LangGraph fill:#f3e5f5
-    style Subgraphs fill:#fff3e0
     style PostgreSQL fill:#ffebee
     style Qdrant fill:#ffebee
 ```
 
 ---
 
-### 主图与子图的关联架构
+### 主图与子图架构
 
 ```mermaid
 graph TB
     subgraph "主图 MainGraph"
         StartMain[START]
-        IntentMain[意图分类节点<br>判断用户意图]
-        ChatNode[普通对话节点<br>调用主 LLM]
-        SubgraphCaller[子图调用器<br>调用对应子图]
+        RetrieveMemory[记忆检索]
+        MemoryTrigger[记忆触发]
+        InitState[初始化状态]
+        HybridRouter[⭐ 混合路由<br>规则分流 + LLM意图分类]
+        FastChitchat[fast_chitchat<br>闲聊快速路径]
+        FastRAG[fast_rag<br>RAG快速路径]
+        FastTool[fast_tool<br>工具快速路径]
+        ReactReason[react_reason<br>React推理节点]
+        LLMCall[llm_call<br>LLM调用节点]
         FinalMain[最终响应]
         EndMain[END]
 
-        StartMain -->|用户输入| IntentMain
-        IntentMain -->|chat| ChatNode
-        IntentMain -->|contact| SubgraphCaller
-        IntentMain -->|dictionary| SubgraphCaller
-        IntentMain -->|news| SubgraphCaller
-        ChatNode --> FinalMain
-        SubgraphCaller --> FinalMain
+        StartMain -->|用户输入| RetrieveMemory
+        RetrieveMemory --> MemoryTrigger
+        MemoryTrigger --> InitState
+        InitState --> HybridRouter
+
+        HybridRouter -->|闲聊| FastChitchat
+        HybridRouter -->|知识查询| FastRAG
+        HybridRouter -->|工具调用| FastTool
+        HybridRouter -->|复杂任务| ReactReason
+
+        FastChitchat -->|成功| LLMCall
+        FastChitchat -->|失败| ReactReason
+        FastRAG -->|成功| LLMCall
+        FastRAG -->|失败| ReactReason
+        FastTool -->|成功| LLMCall
+        FastTool -->|失败| ReactReason
+
+        ReactReason -->|rag_retrieve| RAGRetrieve[RAG检索]
+        ReactReason -->|web_search| WebSearchNode[联网搜索]
+        ReactReason -->|contact_subgraph| ContactNode[通讯录子图]
+        ReactReason -->|dictionary_subgraph| DictNode[词典子图]
+        ReactReason -->|news_analysis_subgraph| NewsNode[资讯子图]
+        ReactReason -->|llm_call| LLMCall
+
+        RAGRetrieve --> ReactReason
+        WebSearchNode --> ReactReason
+        ContactNode --> ReactReason
+        DictNode --> ReactReason
+        NewsNode --> ReactReason
+
+        LLMCall --> FinalMain
         FinalMain --> EndMain
     end
 
@@ -243,76 +269,161 @@ graph TB
         FormatNews --> EndNews
     end
 
-    SubgraphCaller -.->|调用<br>状态传递| StartContact
-    SubgraphCaller -.->|调用<br>状态传递| StartDict
-    SubgraphCaller -.->|调用<br>状态传递| StartNews
+    ReactReason -.->|调用<br>状态传递| StartContact
+    ReactReason -.->|调用<br>状态传递| StartDict
+    ReactReason -.->|调用<br>状态传递| StartNews
 
-    style IntentMain fill:#ffe0b2
-    style ChatNode fill:#e0e0e0
-    style SubgraphCaller fill:#bbdefb
-    style FinalMain fill:#fff59d
-    style IntentContact fill:#c8e6c9
-    style IntentDict fill:#e1bee7
-    style IntentNews fill:#ffcdd2
-```
+    style HybridRouter fill:#fff3e0,stroke:#ff9800,stroke-width:3px
+    style ReactReason fill:#e8eaf6
+---
 
-### LangGraph 工作流详细流程
+### 索引工作流（离线构建）
 
 ```mermaid
-stateDiagram-v2
-    [*] --> RetrieveMemory: 用户输入消息
+flowchart TB
+    subgraph 文档输入
+        A1[文档源]
+        A2[PDF/DOCX/TXT/Markdown]
+    end
 
-    RetrieveMemory --> MemoryTrigger: 检索历史记忆
-    MemoryTrigger --> LLMCall: 检查记忆触发条件
+    subgraph 文档加载
+        B1[rag_indexer/loaders.py]
+        B2[UnstructuredLoader]
+        B3[PyMuPDFLoader]
+        B4[TextLoader]
+    end
 
-    %% ⭐ React (Reasoning → Acting → Observing) 循环开始
-    LLMCall --> CheckTools: LLM 输出
+    subgraph 文本切分
+        C1[rag_indexer/splitters.py]
+        C2[RecursiveCharacterTextSplitter<br/>按分隔符递归切分]
+        C3[SemanticChunker<br/>基于语义相似度]
+        C4[ParentChildSplitter<br/>父子块切分]
+    end
 
-    CheckTools --> ToolNode: 需要调用工具
-    CheckTools --> CheckSummary: 直接回复
+    subgraph 嵌入生成
+        D1[Embedding 生成]
+        D2[稠密向量<br/>Qwen3-Embedding-0.6B<br/>llama.cpp server:18001]
+        D3[稀疏向量 BM25<br/>FastEmbed]
+    end
 
-    ToolNode --> ExecuteTool: 执行工具
-    ExecuteTool --> LLMCall: ⭐ 工具结果返回（Observing → Reasoning 循环）
-    %% ⭐ React 循环结束
+    subgraph 向量存储
+        E1[Qdrant Vector Store]
+        E2[稠密向量索引<br/>HNSW 算法]
+        E3[稀疏向量索引<br/>BM25]
+        E4[rag_core/vector_store.py]
+    end
 
-    CheckSummary --> Summarize: 达到摘要阈值
-    CheckSummary --> Finalize: 未达阈值，直接结束
+    A1 --> A2
+    A2 --> B1
+    B1 --> B2
+    B2 --> C1
+    C1 --> C2
+    C1 --> C3
+    C1 --> C4
+    C2 --> D1
+    C3 --> D1
+    C4 --> D1
+    D1 --> D2
+    D1 --> D3
+    D2 --> E1
+    D3 --> E1
+    E1 --> E2
+    E1 --> E3
 
-    Summarize --> PostgreSQL: 保存摘要
-    PostgreSQL --> Finalize: 继续对话
-
-    Finalize --> FormatResponse: 格式化响应
-    FormatResponse --> [*]: SSE 流式输出
-
-    note right of LLMCall
-        ⭐ Reasoning
-        LLM 思考：
-        - 需要调用工具吗？
-        - 调用什么工具？
-    end note
-
-    note right of ToolNode
-        ⭐ Acting
-        执行工具：
-        - 天气查询
-        - 文件读取
-        - RAG 检索
-        - 等等
-    end note
-
-    note right of ExecuteTool
-        ⭐ Observing
-        观察工具结果，
-        返回给 LLM 再次思考
-    end note
-
-    note right of LLMCall
-        ⭐ React 循环
-        Reasoning → Acting → Observing → Reasoning...
-        可以多次循环
-    end note
+    style A1 fill:#e3f2fd
+    style B1 fill:#fff3e0
+    style C1 fill:#f3e5f5
+    style D1 fill:#e8f5e9
+    style E1 fill:#ffebee
 ```
 
+**技术组件说明：**
+
+| 组件 | 技术选型 | 说明 |
+|------|---------|------|
+| 文档加载 | Unstructured / PyMuPDF / TextLoader | 支持多种文档格式 |
+| 文本切分 | RecursiveCharacterTextSplitter | 按分隔符递归切分，默认 500 字符 |
+| 语义切分 | SemanticChunker | 基于 Embedding 相似度自动切分 |
+| 父子块切分 | ParentChildSplitter | 大块存储上下文，小块用于检索 |
+| 稠密嵌入 | Qwen3-Embedding-0.6B-Q8_0 | llama.cpp server (:18001) |
+| 稀疏嵌入 | FastEmbed BM25 | 本地计算，无需额外服务 |
+| 向量存储 | Qdrant | HNSW 索引，高性能 ANN 检索 |
+
+### 检索工作流（在线查询）
+
+```mermaid
+flowchart TB
+    subgraph 查询输入
+        Q1[用户查询]
+        Q2[Query: "公司报销流程是什么？"]
+    end
+
+    subgraph 查询处理
+        R1[查询改写 MultiQuery]
+        R2[rag_indexer/splitters.py]
+        R3[使用 chat_services 生成多角度查询]
+    end
+
+    subgraph 混合检索
+        S1[并行检索]
+        S2[稠密向量检索<br/>Embedding → 向量相似度]
+        S3[稀疏 BM25 检索<br/>词频统计]
+        S4[rag_core/sparse_embedder.py]
+    end
+
+    subgraph 结果融合
+        F1[RRF 融合]
+        F2[rag_indexer/fusion.py]
+        F3[RRF(d) = Σ 1/(k + rank)]
+        F4[Qdrant Fusion API<br/>服务端融合]
+    end
+
+    subgraph 重排序
+        P1[Cross-Encoder 重排]
+        P2[bge-reranker-v2-m3]
+        P3[rerank_services.py<br/>llama.cpp server:18002]
+        P4[Query-Document 交互编码]
+    end
+
+    subgraph LLM 生成
+        G1[LLM 生成回答]
+        G2[chat_services.py]
+        G3[Context + 生成回答]
+    end
+
+    Q1 --> Q2
+    Q2 --> R1
+    R1 --> R2
+    R2 --> S1
+    S1 --> S2
+    S1 --> S3
+    S2 --> F1
+    S3 --> F1
+    F1 --> P1
+    P1 --> P2
+    P2 --> G1
+    G1 --> G2
+    G2 --> G3
+
+    style Q1 fill:#e3f2fd
+    style R1 fill:#fff3e0
+    style S1 fill:#f3e5f5
+    style F1 fill:#e8f5e9
+    style P1 fill:#ffebee
+    style G1 fill:#e1f5ff
+```
+
+**技术组件说明：**
+
+| 阶段 | 技术选型 | 说明 |
+|------|---------|------|
+| 查询改写 | MultiQuery | 使用 LLM 生成 3-5 个多角度查询 |
+| 稠密检索 | Qwen3-Embedding | 向量相似度计算，余弦相似度 |
+| 稀疏检索 | FastEmbed BM25 | 词频 TF-IDF 统计 |
+| 结果融合 | Qdrant Fusion API | 服务端 RRF 融合，无需传输数据 |
+| 重排序 | bge-reranker-v2-m3 | Cross-Encoder 交互编码，精度更高 |
+| LLM 生成 | chat_services | 统一的大模型服务接口 |
+
 ### 数据流向图
 
 ```
@@ -333,15 +444,26 @@ stateDiagram-v2
           │      │      ├─→ user_id: 用户标识
           │      │      └─→ metadata: 元数据
           │      │
-          │      ├─→ 执行工作流
-          │      │      ├─→ retrieve_memory: 从 PostgreSQL 检索历史
-          │      │      ├─→ memory_trigger: 判断是否触发记忆
-          │      │      ├─→ llm_call: 调用 LLM 生成响应
-          │      │      ├─→ tool_node: 执行工具（如有需要）
-          │      │      ├─→ summarize: 生成对话摘要
-          │      │      └─→ finalize: 格式化最终响应
-          │      │
-          │      └─→ 返回 SSE 流
+          │      └─→ 执行工作流（混合路由 + React 循环）
+          │             ├─→ retrieve_memory: 从 PostgreSQL 检索历史
+          │             ├─→ memory_trigger: 判断是否触发记忆
+          │             ├─→ init_state: 初始化状态
+          │             ├─→ hybrid_router: 混合路由决策
+          │             │      ├─→ fast_chitchat: 闲聊快速路径
+          │             │      ├─→ fast_rag: RAG 快速路径
+          │             │      ├─→ fast_tool: 工具快速路径
+          │             │      └─→ react_loop: React 循环（兜底）
+          │             │
+          │             ├─→ React 循环（react_reason 节点）
+          │             │      ├─→ rag_retrieve: RAG 检索
+          │             │      ├─→ web_search: 联网搜索
+          │             │      ├─→ contact_subgraph: 通讯录子图
+          │             │      ├─→ dictionary_subgraph: 词典子图
+          │             │      ├─→ news_analysis_subgraph: 资讯子图
+          │             │      └─→ llm_call: LLM 调用
+          │             │
+          │             ├─→ summarize: 生成对话摘要（如需要）
+          │             └─→ finalize: 格式化最终响应
           │
           └─→ 持久化存储
                  ├─→ PostgreSQL: 对话历史和摘要
@@ -372,35 +494,41 @@ Agent1/
 │   │   │
 │   │   ├── agent/                 # ⭐ Agent 服务层
 │   │   │   ├── __init__.py
-│   │   │   ├── service.py         # Agent 服务核心（使用 chat_services）
+│   │   │   ├── agent_service.py  # Agent 服务核心（使用 chat_services）
 │   │   │   ├── history.py         # 历史查询服务
 │   │   │   └── prompts.py         # 提示词模板
 │   │   │
-│   │   ├── main_graph/            # ⭐ 主图 - LangGraph 主流程
+│   │   ├── main_graph/            # ⭐ 主图 - LangGraph 主流程（混合路由 + React 循环）
 │   │   │   ├── __init__.py
-│   │   │   ├── state.py           # 主图状态定义
-│   │   │   ├── graph_builder.py   # LangGraph 图构建器
+│   │   │   ├── state.py           # 主图状态定义 MainGraphState
+│   │   │   ├── graph.py           # LangGraph 组件导出
+│   │   │   ├── config.py          # 主图配置
 │   │   │   ├── nodes/             # 主图节点
 │   │   │   │   ├── __init__.py
-│   │   │   │   ├── router.py      # 路由决策
-│   │   │   │   ├── llm_call.py    # LLM 调用节点（React 模式）
-│   │   │   │   ├── tool_call.py   # 工具执行节点
+│   │   │   │   ├── _utils.py     # 节点公共工具
+│   │   │   │   ├── reasoning.py   # React 推理节点
+│   │   │   │   ├── hybrid_router.py # 混合路由节点
+│   │   │   │   ├── fast_paths.py # 快速路径节点
+│   │   │   │   ├── llm_call.py   # LLM 调用节点
+│   │   │   │   ├── routing.py    # 路由决策（init_state, route_by_reasoning）
+│   │   │   │   ├── rag_nodes.py  # RAG 检索节点
+│   │   │   │   ├── web_search.py # 联网搜索节点
 │   │   │   │   ├── retrieve_memory.py # 记忆检索节点
-│   │   │   │   ├── summarize.py   # 记忆存储节点
-│   │   │   │   ├── finalize.py    # 最终处理节点
 │   │   │   │   ├── memory_trigger.py # 记忆触发节点
-│   │   │   │   ├── rag_nodes.py   # RAG 集成节点
-│   │   │   │   └── react_nodes.py # React 模式节点
+│   │   │   │   ├── summarize.py   # 记忆摘要节点
+│   │   │   │   ├── finalize.py   # 最终处理节点
+│   │   │   │   ├── tool_call.py  # 工具执行节点
+│   │   │   │   └── error_handling.py # 错误处理节点
 │   │   │   ├── tools/             # 主图工具
 │   │   │   │   ├── __init__.py
-│   │   │   │   ├── graph_tools.py # 工具定义
-│   │   │   │   └── subgraph_tools.py # 子图调用工具
+│   │   │   │   ├── common_tools.py # 通用工具
+│   │   │   │   ├── subgraph_tools.py # 子图调用工具
+│   │   │   │   └── graph_tools.py # 图工具
 │   │   │   └── utils/             # 主图工具函数
 │   │   │       ├── __init__.py
-│   │   │       ├── retry_utils.py # 重试工具
-│   │   │       ├── subgraph_builder.py # 子图构建器
-│   │   │       ├── rag_initializer.py # RAG 初始化工具
-│   │   │       └── visualize_graph.py # 图可视化工具
+│   │   │       ├── main_graph_builder.py # 主图构建器
+│   │   │       ├── rag_initializer.py # RAG 初始化
+│   │   │       └── retry_utils.py # 重试工具
 │   │   │
 │   │   ├── subgraphs/             # ⭐ 子图模块
 │   │   │   ├── __init__.py
@@ -430,6 +558,18 @@ Agent1/
 │   │   │   ├── embedding_services.py # 嵌入模型服务
 │   │   │   └── rerank_services.py # 重排序服务
 │   │   │
+│   │   ├── mcp/                   # MCP 模块
+│   │   │   ├── __init__.py
+│   │   │   ├── mcp_manager.py    # MCP 管理器
+│   │   │   ├── mcp_client.py     # MCP 客户端
+│   │   │   ├── adapters/         # MCP 适配器
+│   │   │   │   ├── __init__.py
+│   │   │   │   ├── base_adapter.py
+│   │   │   │   ├── contact_adapter.py
+│   │   │   │   ├── dictionary_adapter.py
+│   │   │   │   └── news_adapter.py
+│   │   │   └── mcp_example.py
+│   │   │
 │   │   ├── memory/                # 记忆模块
 │   │   │   ├── __init__.py
 │   │   │   └── mem0_client.py     # Mem0 客户端封装
@@ -451,17 +591,16 @@ Agent1/
 │   │   │   └── init_db.py         # 数据库初始化
 │   │   │
 │   │   └── utils/                 # 工具模块
-│   │       └── __init__.py
-│   └── rag_core/
+│   │       ├── __init__.py
+│   │       └── logging.py         # 日志工具
+│   └── rag_core/                # ⭐ RAG 核心库（统一组件）
 │       ├── __init__.py
-│       ├── client.py              # RAG 核心客户端
-│       ├── embedders.py           # 嵌入模型
-│       ├── vector_store.py        # 向量存储
-│       ├── retriever_factory.py   # 检索器工厂
-│       └── store/
-│           ├── __init__.py
-│           ├── factory.py         # 存储工厂
-│           └── postgres.py        # PostgreSQL 存储
+│       ├── config.py            # RAG 配置
+│       ├── client.py            # RAG 核心客户端
+│       ├── embedders.py        # 嵌入模型
+│       ├── sparse_embedder.py  # BM25 稀疏嵌入
+│       ├── vector_store.py     # 向量存储（Dense + Sparse）
+│       └── doc_store.py        # 文档存储
 ├── frontend/
 │   ├── run.py                   # 前端启动脚本
 │   ├── requirements.txt
@@ -516,288 +655,197 @@ Agent1/
 
 ### 1. RAG 检索算法
 
-#### 1.1 文本切分策略
+项目采用稠密 + 稀疏混合检索架构，结合 RRF 融合和 Cross-Encoder 重排序，实现高精度知识库问答。
 
-项目支持三种文本切分策略，适应不同场景需求：
+**核心特性：**
+- 三种文本切分策略：递归字符切分、语义切分、父子块切分
+- 稠密向量检索（Embedding）+ 稀疏 BM25 检索
+- RRF 融合算法实现多检索源结果合并
+- Cross-Encoder 重排序提升相关性
 
-**递归字符切分（Recursive Character Splitting）**
-```
-算法思路：
-  按优先级分隔符逐级切分：["\n\n", "\n", "。", "！", "？", " ", ""]
-  ↓
-  确保每个块不超过 chunk_size（默认 500 字符）
-  ↓
-  保留 chunk_overlap（默认 50 字符）避免上下文丢失
-  
-优势：简单高效，适合结构化文档
-实现：langchain_text_splitters.RecursiveCharacterTextSplitter
-```
-
-**语义切分（Semantic Chunking）**
-```
-算法思路：
-  1. 将文档按句子边界切分
-  2. 使用 Embedding 模型计算相邻句子的向量相似度
-  3. 计算相似度变化率（breakpoint threshold）
-  4. 在相似度骤降处切分（语义主题变化点）
-  
-阈值策略：
-  - percentile（百分位数）：默认，取相似度分布的 95 百分位
-  - standard_deviation（标准差）：低于均值 1.5 标准差处切分
-  - interquartile（四分位距）：使用 IQR 方法检测异常点
-  
-优势：保持语义完整性，切分更自然
-实现：langchain_experimental.text_splitter.SemanticChunker
-```
-
-**父子块切分（Parent-Child Chunking）**
-```
-算法思路：
-  父块（大块）：1000 字符，保留完整上下文
-    ↓
-  子块（小块）：语义切分，用于向量检索
-    ↓
-  建立映射关系：child_id → parent_id
-    ↓
-  检索时：用子块检索，返回时扩展为父块上下文
-  
-检索流程：
-  用户查询 → 向量检索匹配子块 → 通过映射获取父块 → 返回完整上下文
-  
-优势：检索精度高，上下文完整
-实现：自定义 ParentChildSplitter 类
-```
-
-#### 1.2 混合检索（Dense + Sparse）
-
-```
-检索架构：
-┌─────────────────────────────────────────────┐
-│              用户查询                         │
-└──────────────────┬──────────────────────────┘
-                   │
-        ┌──────────┴──────────┐
-        ↓                     ↓
-┌───────────────┐    ┌───────────────┐
-│  稠密向量检索   │    │  稀疏 BM25 检索 │
-│  (语义相似)    │    │  (关键词匹配)   │
-│               │    │               │
-│  Embedding    │    │  Token 化     │
-│  → 向量相似度  │    │  → 词频统计    │
-└───────┬───────┘    └───────┬───────┘
-        │                    │
-        └──────────┬─────────┘
-                   ↓
-          ┌────────────────┐
-          │   结果融合      │
-          │  (RRF 算法)     │
-          └────────┬───────┘
-                   ↓
-          ┌────────────────┐
-          │  Cross-Encoder  │
-          │   重排序        │
-          └────────┬───────┘
-                   ↓
-          ┌────────────────┐
-          │  Top-K 结果返回  │
-          └────────────────┘
-```
-
-**稠密向量检索（Dense Retrieval）**
-- 使用 Embedding 模型将查询和文档映射到同一向量空间
-- 计算余弦相似度，返回语义最相似的文档
-- 适合：语义理解、同义词匹配、概念检索
-
-**稀疏向量检索（Sparse Retrieval - BM25）**
-- 基于词频（TF）和逆文档频率（IDF）计算相关性
-- 适合：专有名词、精确匹配、术语检索
-
-#### 1.3 RRF 融合算法（Reciprocal Rank Fusion）
-
-```python
-# 核心算法实现
-def reciprocal_rank_fusion(doc_lists: List[List[Document]], k: int = 60) -> List[Document]:
-    """
-    RRF 公式：RRF(d) = Σ (1 / (k + rank(d)))
-    
-    参数说明：
-    - k: 平滑常数，通常设为 60
-      - k 值越大，排名影响越小，结果越平滑
-      - k 值越小，高排名文档优势越大
-    
-    算法步骤：
-    1. 遍历每个检索结果列表
-    2. 对每个文档，根据其排名计算 RRF 得分
-    3. 累加同一文档在不同列表中的得分
-    4. 按总得分降序排序
-    
-    示例：
-    文档 A 在列表 1 中排名第 1，在列表 2 中排名第 3
-    RRF(A) = 1/(60+1) + 1/(60+3) = 0.0164 + 0.0159 = 0.0323
-    
-    文档 B 在列表 1 中排名第 2，在列表 2 中排名第 2
-    RRF(B) = 1/(60+2) + 1/(60+2) = 0.0161 + 0.0161 = 0.0322
-    
-    结果：A 排名高于 B
-    """
-    doc_to_score: Dict[str, float] = {}
-    doc_map: Dict[str, Document] = {}
-    
-    for docs in doc_lists:
-        for rank, doc in enumerate(docs, start=1):
-            doc_id = f"{doc.page_content[:200]}_{doc.metadata.get('source', '')}"
-            if doc_id not in doc_map:
-                doc_map[doc_id] = doc
-            score = doc_to_score.get(doc_id, 0.0) + 1.0 / (k + rank)
-            doc_to_score[doc_id] = score
-    
-    sorted_ids = sorted(doc_to_score.keys(), key=lambda x: doc_to_score[x], reverse=True)
-    return [doc_map[doc_id] for doc_id in sorted_ids]
-```
-
-**RRF 算法优势：**
-- 无需归一化：不同检索器的分数范围可能不同，RRF 直接使用排名
-- 参数简单：仅需调整 k 值，默认 60 在大多数场景表现良好
-- 鲁棒性强：对异常值和噪声不敏感
-
-#### 1.4 Cross-Encoder 重排序
-
-```
-重排序流程：
-┌──────────────────────────────────────────────┐
-│  输入：RRF 融合后的 Top-20 文档                │
-└──────────────────┬───────────────────────────┘
-                   │
-                   ↓
-┌──────────────────────────────────────────────┐
-│  Cross-Encoder 模型（bge-reranker-v2-m3）      │
-│                                              │
-│  工作原理：                                    │
-│  - 将 Query 和 Document 拼接输入模型           │
-│  - "[CLS] Query [SEP] Document [SEP]"         │
-│  - 通过 Transformer 计算交互注意力              │
-│  - 输出相关性得分（0-1）                       │
-│                                              │
-│  与 Bi-Encoder 的区别：                        │
-│  - Bi-Encoder：分别编码，计算余弦相似度（快）    │
-│  - Cross-Encoder：联合编码，计算交互得分（准）   │
-└──────────────────┬───────────────────────────┘
-                   │
-                   ↓
-┌──────────────────────────────────────────────┐
-│  按相关性得分排序，返回 Top-5                  │
-└──────────────────────────────────────────────┘
-```
-
-**实现细节：**
-- 使用远程 llama.cpp 服务部署重排序模型
-- 兼容 OpenAI Rerank API 格式
-- 超时保护：60 秒超时，失败时降级为原始排序
+**详细文档：**
+- 算法原理详见 [backend/docs/RAG_ALGORITHM.md](backend/docs/RAG_ALGORITHM.md)
+- 系统架构详见 [backend/docs/RAG_ARCHITECTURE.md](backend/docs/RAG_ARCHITECTURE.md)
 
 ---
 
-### 1.5 RAG 评估方法 ⭐
+### LangGraph 工作流详细流程
 
-如何评估 RAG 系统的召回率和相关性？
+#### 1. 混合路由 + React 循环架构 ⭐⭐
 
-**核心指标：**
-- **Recall@k**：前 k 个结果中包含多少比例的相关文档
-- **Precision@k**：前 k 个结果中有多少比例是相关文档
-- **F1@k**：召回率和精确率的调和平均数
-- **MRR**：平均倒数排名
-- **相关性评分**：0-5 分的相关性评估
+**设计理念**：混合路由（Hybrid Router）作为前置决策，快速路径处理简单任务，React 循环作为复杂任务的兜底方案。
 
-**详细指南：**
-参见 [backend/docs/RAG_EVALUATION_GUIDE.md](backend/docs/RAG_EVALUATION_GUIDE.md)
-
-**快速使用：**
-```bash
-# 运行评估脚本
-cd backend
-python scripts/evaluate_rag.py
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                        主图执行流程                                          │
+├─────────────────────────────────────────────────────────────────────────────┤
+│                                                                             │
+│  ┌─────────────────────────────────────────────────────────────────────┐    │
+│  │  阶段1: 记忆检索 (retrieve_memory)                                      │    │
+│  │     - 从 PostgreSQL 检索用户历史对话                                    │    │
+│  │     - 生成 memory_context 供后续使用                                   │    │
+│  └─────────────────────────────────────────────────────────────────────┘    │
+│                                  ↓                                         │
+│  ┌─────────────────────────────────────────────────────────────────────┐    │
+│  │  阶段2: 记忆触发 (memory_trigger)                                     │    │
+│  │     - 判断是否需要激活记忆上下文                                       │    │
+│  └─────────────────────────────────────────────────────────────────────┘    │
+│                                  ↓                                         │
+│  ┌─────────────────────────────────────────────────────────────────────┐    │
+│  │  阶段3: 初始化状态 (init_state)                                        │    │
+│  │     - 初始化 MainGraphState                                           │    │
+│  │     - 设置 user_query、messages 等                                    │    │
+│  └─────────────────────────────────────────────────────────────────────┘    │
+│                                  ↓                                         │
+│  ┌─────────────────────────────────────────────────────────────────────┐    │
+│  │  阶段4: 混合路由 (hybrid_router) ⭐                                     │    │
+│  │     - 规则分流：闲聊关键词、子图关键词（<5ms）                           │    │
+│  │     - LLM 分类：使用轻量级模型进行意图分类（chitchat/knowledge/tool）    │    │
+│  │     - 输出决策：fast_chitchat / fast_rag / fast_tool / react_loop      │    │
+│  └─────────────────────────────────────────────────────────────────────┘    │
+│                                  ↓                                         │
+│        ┌──────────────────────┼──────────────────────┐                    │
+│        ↓                      ↓                      ↓                      │
+│  ┌──────────┐          ┌──────────┐          ┌──────────┐                    │
+│  │快速路径  │          │快速路径  │          │快速路径  │                    │
+│  │fast_*    │          │fast_rag  │          │fast_tool│                    │
+│  └────┬─────┘          └────┬─────┘          └────┬─────┘                    │
+│       │                     │                     │                          │
+│       │    ┌────────────────┘     ┌──────────────┘                          │
+│       │    │                      │                                         │
+│       ↓    ↓                      ↓                                         │
+│  ┌─────────────────────────────────────────────────────────────┐             │
+│  │  阶段5: React 循环 (react_reason)                            │             │
+│  │     - 调用 app/core/intent.py 的 react_reason_async()        │             │
+│  │     - 使用 app/model_services/ 获取 chat 服务                │             │
+│  │     - 推理下一步动作（rag_retrieve/web_search/子图/llm_call） │             │
+│  └─────────────────────────────────────────────────────────────┘             │
+│                                  ↓                                         │
+│  ┌─────────────────────────────────────────────────────────────────────┐    │
+│  │  React 循环内节点                                                     │    │
+│  │     rag_retrieve → react_reason（回到推理）                            │    │
+│  │     web_search → react_reason（回到推理）                              │    │
+│  │     contact_subgraph → react_reason（回到推理）                        │    │
+│  │     dictionary_subgraph → react_reason（回到推理）                      │    │
+│  │     news_analysis_subgraph → react_reason（回到推理）                   │    │
+│  │     handle_error → react_reason（错误处理后继续推理）                    │    │
+│  │     llm_call → 退出循环，进入完成阶段                                  │    │
+│  └─────────────────────────────────────────────────────────────────────┘    │
+│                                  ↓                                         │
+│  ┌─────────────────────────────────────────────────────────────────────┐    │
+│  │  阶段6: LLM 调用 (llm_call)                                           │    │
+│  │     - 调用主 LLM 生成最终回答                                         │    │
+│  │     - 使用 llm.bind_tools(tools) 绑定工具                             │    │
+│  │     - 支持流式输出到前端                                              │    │
+│  └─────────────────────────────────────────────────────────────────────┘    │
+│                                  ↓                                         │
+│  ┌─────────────────────────────────────────────────────────────────────┐    │
+│  │  阶段7: 记忆摘要 (summarize) / 最终处理 (finalize)                      │    │
+│  │     - 对话轮数 >= 5 时触发摘要                                        │    │
+│  │     - 保存对话到 PostgreSQL                                           │    │
+│  └─────────────────────────────────────────────────────────────────────┘    │
+│                                                                             │
+└─────────────────────────────────────────────────────────────────────────────┘
 ```
 
----
+#### 2. React 推理循环详解
 
-### 2. LangGraph 工作流算法
-
-#### 2.1 React (Reasoning → Acting → Observing) 模式 ⭐
-
-**设计理念**：让 LLM 先思考（Reasoning），再行动（Acting），然后观察结果（Observing），可以多次循环。
+React 推理循环使用 `app/core/intent.py` 中的 `react_reason_async()` 函数：
 
 ```
 ┌─────────────────────────────────────────────────────────────────┐
-│                    React 模式循环                                 │
+│                    React 推理循环                                 │
 ├─────────────────────────────────────────────────────────────────┤
 │                                                                 │
 │  ┌───────────────────────────────────────────────────────────┐  │
-│  │  1. Reasoning (思考)                                        │  │
-│  │     LLMCall 节点                                             │  │
-│  │     - 分析用户问题                                          │  │
-│  │     - 决定是否需要调用工具                                  │  │
-│  │     - 决定调用哪个工具                                      │  │
+│  │  1. Reasoning (推理) - react_reason 节点                    │  │
+│  │     - 调用 react_reason_async()                            │  │
+│  │     - 传入上下文：retrieved_docs、reasoning_history、        │  │
+│  │       previous_actions、messages、errors                     │  │
+│  │     - LLM 决定下一步 action                                │  │
+│  │     - 记录到 reasoning_history                             │  │
 │  └───────────────────────────────────────────────────────────┘  │
 │                            ↓                                    │
 │  ┌───────────────────────────────────────────────────────────┐  │
-│  │  2. Acting (行动)                                           │  │
-│  │     ToolNode 节点                                           │  │
-│  │     - 执行工具调用                                          │  │
-│  │     - 天气查询 / 文件读取 / RAG 检索等                      │  │
+│  │  2. Acting (行动)                                          │  │
+│  │     - rag_retrieve: RAG 检索                               │  │
+│  │     - web_search: 联网搜索                                  │  │
+│  │     - contact_subgraph: 通讯录子图                         │  │
+│  │     - dictionary_subgraph: 词典子图                          │  │
+│  │     - news_analysis_subgraph: 资讯分析子图                   │  │
+│  │     - handle_error: 错误处理                               │  │
 │  └───────────────────────────────────────────────────────────┘  │
 │                            ↓                                    │
 │  ┌───────────────────────────────────────────────────────────┐  │
-│  │  3. Observing (观察)                                        │  │
-│  │     ExecuteTool → LLMCall                                   │  │
-│  │     - 观察工具结果                                          │  │
-│  │     - 返回给 LLM 再次思考                                    │  │
+│  │  3. Observing (观察) / 循环                                 │  │
+│  │     - 工具结果返回给 react_reason                           │  │
+│  │     - 再次推理下一步                                         │  │
+│  │     - 最多 10 次循环 (max_steps=10)                         │  │
+│  │     - 或直到推理决定 llm_call                              │  │
 │  └───────────────────────────────────────────────────────────┘  │
 │                            ↓                                    │
 │  ┌───────────────────────────────────────────────────────────┐  │
-│  │  4. 循环或结束                                               │  │
-│  │     should_continue 路由                                    │  │
-│  │     - 还需要调用工具吗？ → 继续循环                          │  │
-│  │     - 不需要了 → 结束流程                                   │  │
+│  │  4. 退出条件                                               │  │
+│  │     - action == llm_call: 退出循环，进入 llm_call 节点       │  │
+│  │     - max_steps 达到: 强制退出到 llm_call                   │  │
+│  │     - 错误累积过多: 进入 handle_error                       │  │
 │  └───────────────────────────────────────────────────────────┘  │
 │                                                                 │
 └─────────────────────────────────────────────────────────────────┘
 ```
 
 **关键实现点**：
-1. **`llm.bind_tools(tools)`** - 在 `create_llm_call_node` 中，让 LLM 知道可以调用哪些工具
-2. **`should_continue` 路由函数** - 检查 LLM 输出是否包含 `tool_calls`
-3. **`tool_node → llm_call` 循环边** - 工具结果返回给 LLM 再次思考
-4. **可以多次循环** - LLM 可以调用多个工具，或者同一个工具多次
-
-**实际代码位置**：
-- `backend/app/main_graph/graph_builder.py` 第 79 行：`builder.add_edge("tool_node", "llm_call")`
-- `backend/app/main_graph/nodes/router.py`：`should_continue` 函数检查 `last_message.tool_calls`
+1. **`react_reason_async()`** - 在 `app/core/intent.py` 中，使用 chat_services 获取 LLM 进行推理
+2. **`route_by_reasoning`** - 路由函数，根据推理结果决定下一步节点
+3. **循环边** - 工具节点执行后回到 react_reason 继续推理
+4. **自动升级** - 快速路径失败时，回到 react_reason 继续推理
 
 #### 2.2 状态机设计
 
 ```python
 # 核心状态定义
-class AgentState(TypedDict):
-    messages: Annotated[list, add_messages]  # 对话历史（自动合并）
-    user_id: str                              # 用户标识
-    memory_context: str                       # 检索到的记忆上下文
-    should_summarize: bool                    # 是否需要生成摘要
-    tool_calls: list                          # 工具调用列表
-    final_response: str                       # 最终响应
+class MainGraphState(TypedDict):
+    messages: Annotated[list, add_messages]        # 对话历史（自动合并）
+    user_id: str                                    # 用户标识
+    user_query: str                                 # 用户查询
+    memory_context: str                             # 检索到的记忆上下文
+    memory_triggered: bool                          # 记忆是否触发
+    should_summarize: bool                          # 是否需要生成摘要
+    retrieved_docs: list                            # RAG 检索到的文档
+    reasoning_history: list                         # React 推理历史
+    previous_actions: list                          # 之前的动作
+    errors: list                                    # 错误列表
+    current_step: int                               # 当前步骤
+    max_steps: int                                  # 最大步骤数
+    llm_override: str                               # LLM 覆盖
+    final_response: str                             # 最终响应
 ```
 
 **状态流转规则**：
 ```
-初始状态 → retrieve_memory → memory_trigger → [条件分支]
-                              ↓
-              ┌───────────────────┼───────────────┐
-              ↓                   ↓               ↓
-        should_summarize     直接回复         需要工具
-              ↓                   ↓               ↓
-        summarize → save      finalize          tool_node
-              ↓                   ↓               ↓
-        llm_call ←───────────────┘         llm_call ←┘
-              ↓
-          finalize
+初始状态 → retrieve_memory → memory_trigger → init_state → hybrid_router
+                                                          ↓
+                                ┌───────────────────────────┼───────────────────────────┐
+                                ↓                           ↓                           ↓
+                           fast_chitchat               fast_rag                   fast_tool
+                                ↓                           ↓                           ↓
+                   [成功→llm_call / 失败→react_reason]  [成功→llm_call / 失败→react_reason] [成功→llm_call / 失败→react_reason]
+                                                                        ↓
+                        ┌───────────────────────────────────────────────┘
+                        ↓
+                  react_reason
+                        ↓
+         ┌──────────────┼──────────────┬──────────────┬──────────────┬──────────────┬──────────────┐
+         ↓              ↓              ↓              ↓              ↓              ↓              ↓
+   rag_retrieve  web_search  contact_subgraph  dictionary_subgraph  news_analysis_subgraph  handle_error  llm_call
+         ↓              ↓              ↓              ↓              ↓              ↓              ↓
+   react_reason ←────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┘
+                                                                        ↓
+                                                                  llm_call
+                                                                        ↓
+                                                                  summarize
+                                                                        ↓
+                                                                  finalize
 ```
 
 #### 2.3 记忆管理算法
@@ -834,17 +882,17 @@ class AgentState(TypedDict):
 │  用户选择模型                                 │
 └──────────────────┬──────────────────────────┘
                    │
-        ┌──────────┼──────────┐
-        ↓          ↓          ↓
-    ┌──────┐  ┌──────┐  ┌──────┐
-    │ zhipu│  │deep  │  │local │
-    └──┬───┘  └──┬───┘  └──┬───┘
-       │         │         │
-       ↓         ↓         ↓
-   ChatZhipu  ChatOpenAI ChatOpenAI
-   (官方SDK)  (DeepSeek)  (vLLM/Gemma)
-       │         │         │
-       └─────────┼─────────┘
+        ┌──────────┼──────────┬──────────┐
+        ↓          ↓          ↓          ↓
+    ┌──────┐  ┌──────┐  ┌──────┐  ┌──────┐
+    │ zhipu│  │deep  │  │openai│  │local │
+    └──┬───┘  └──┬───┘  └──┬───┘  └──┬───┘
+       │         │         │         │
+       ↓         ↓         ↓         ↓
+   ChatZhipu  ChatOpenAI ChatOpenAI ChatOpenAI
+   (官方SDK)  (DeepSeek)  (OpenAI) (llama.cpp/Qwen)
+       │         │         │         │
+       └─────────┴─────────┴─────────┘
                  ↓
         ┌────────────────┐
         │  统一接口输出   │
@@ -1052,14 +1100,11 @@ RAG 系统分为两个独立但协同的阶段：
 
 实现代码：
   from app.rag.tools import search_knowledge_base
-  from app.graph.graph_builder import GraphBuilder
-  
-  # 将 RAG 工具添加到工具列表
-  tools = AVAILABLE_TOOLS + [search_knowledge_base]
-  
+  from app.main_graph.utils.main_graph_builder import MainGraphBuilder
+
   # 构建图
-  builder = GraphBuilder(llm, tools, tools_by_name)
-  graph = builder.build().compile(checkpointer=checkpointer)
+  builder = MainGraphBuilder()
+  graph = builder.build_graph().compile(checkpointer=checkpointer)
 ```
 
 #### Level 5: GraphRAG 集成 (基于图和关系的 RAG)
@@ -1546,27 +1591,30 @@ streamlit run frontend/src/frontend_main.py
 
 ```
 你好，请介绍一下自己
-今天北京天气怎么样？
-帮我总结一下 a.txt 的内容
+帮我写一个 Python 脚本
 ```
 
-### 工具调用示例
+### 主要功能
 
-| 功能 | 示例提问 |
-|------|---------|
-| 🌤️ 天气查询 | "上海今天天气如何？" |
-| 📄 读取文本 | "读取 a.txt 的内容" |
-| 📑 解析 PDF | "总结 b.pdf 的主要内容" |
-| 📊 Excel 数据 | "显示 c.xlsx 的数据" |
-| 🌐 网页抓取 | "抓取 https://example.com 的内容" |
-| 🔍 长期记忆 | "记住我喜欢吃川菜" → "我有什么饮食偏好？" |
+| 功能 | 说明 | 示例提问 |
+|------|------|---------|
+| 🧠 混合路由智能分流 | 自动判断任务类型，选择最佳路径 | 自然对话即可 |
+| ⚡ 快速路径 | 闲聊、RAG查询、工具调用可走快速路径 | "你好"、"什么是 RAG" |
+| 🔄 React 推理循环 | 复杂任务走完整的思考-行动-观察循环 | "帮我分析一下这个文档" |
+| 🌐 联网搜索 | 免费 DuckDuckGo 搜索 | "今天北京天气怎么样？" |
+| 📚 RAG 知识库检索 | 检索本地知识库 | "如何配置系统？" |
+| 📇 通讯录管理 | 联系人 CRUD、邮件处理 | "帮我查看一下张三的联系方式" |
+| 📖 智能词典 | 翻译、生词本、专业术语提取 | "帮我翻译这句话" |
+| 📰 资讯分析 | 资讯获取、内容分析 | "帮我分析一下这篇新闻" |
+| 📊 可视化图表 | 支持 Mermaid 图表生成 | "帮我画一个流程图" |
 
 ### 多模型切换
 
 1. 在左侧边栏选择模型：
-   - **智谱 GLM-5.1**：在线服务，速度快
-   - **DeepSeek V4-Pro**：深度推理模型
-   - **本地 Gemma-4**：本地部署，隐私性好
+   - **智谱 GLM-4**：在线服务，速度快
+   - **DeepSeek V3**：深度推理模型
+   - **OpenAI GPT-4o-mini**：通用对话模型
+   - **本地 Qwen3.5-9B**：本地部署，隐私性好
 
 2. 可随时切换，甚至在同一会话中
 
@@ -1578,17 +1626,20 @@ streamlit run frontend/src/frontend_main.py
 
 ### 添加新工具
 
-在 [backend/app/main_graph/tools/graph_tools.py](file:///home/huang/Study/AIProject/Agent1/backend/app/main_graph/tools/graph_tools.py) 中添加新的 `@tool` 装饰函数：
+在 [backend/app/main_graph/tools/common_tools.py](file:///home/huang/Study/AIProject/Agent1/backend/app/main_graph/tools/common_tools.py) 中添加新的 `@tool` 装饰函数：
 
 ```python
+from langchain_core.tools import tool
+from typing import Optional
+
 @tool
 def my_new_tool(param: str) -> str:
     """
     工具描述（会显示给 LLM）
-    
+
     Args:
         param: 参数说明
-        
+
     Returns:
         返回值说明
     """
@@ -1596,11 +1647,11 @@ def my_new_tool(param: str) -> str:
     return result
 ```
 
-工具会自动注册到 `AVAILABLE_TOOLS` 列表中。
+然后在 [backend/app/main_graph/tools/graph_tools.py](file:///home/huang/Study/AIProject/Agent1/backend/app/main_graph/tools/graph_tools.py) 的 `AVAILABLE_TOOLS` 列表中注册。
 
 ### 添加新模型
 
-在 [backend/app/model_services/chat_services.py](file:///root/projects/ailine/backend/app/model_services/chat_services.py) 中添加新的服务提供者：
+在 [backend/app/model_services/chat_services.py](file:///home/huang/Study/AIProject/Agent1/backend/app/model_services/chat_services.py) 中添加新的服务提供者：
 
 ```python
 class NewModelChatProvider(BaseServiceProvider[BaseChatModel]):
@@ -1647,58 +1698,127 @@ class NewModelChatProvider(BaseServiceProvider[BaseChatModel]):
 
 ```python
 CHAT_PROVIDERS: Dict[str, Callable[[], BaseServiceProvider[BaseChatModel]]] = {
-    "local": lambda: LocalVLLMChatProvider(),
+    "local": lambda: LocalChatProvider(),
     "zhipu": lambda: ZhipuChatProvider(),
     "deepseek": lambda: DeepSeekChatProvider(),
+    "openai": lambda: OpenAIChatProvider(),
     "new_model": lambda: NewModelChatProvider(),  # 新增
 }
 ```
 
-### 使用模型服务
+### 添加新的子图
 
-#### 生成式大模型
-```python
-from app.model_services import get_chat_service, get_all_chat_services
+#### 1. 创建子图目录结构
 
-# 自动选择可用服务（优先本地，降级智谱，再降级 DeepSeek）
-llm = get_chat_service()
+在 `backend/app/subgraphs/` 下创建新的子图目录：
 
-# 获取所有可用模型（用于多模型切换）
-all_llms = get_all_chat_services()  # Dict[str, BaseChatModel]
+```
+backend/app/subgraphs/
+└── my_subgraph/
+    ├── __init__.py
+    ├── state.py         # 子图状态定义
+    ├── nodes.py         # 子图节点实现
+    ├── graph.py         # 子图构建
+    └── api_client.py    # 外部 API 客户端（可选）
 ```
 
-#### 嵌入服务
+#### 2. 定义子图状态
+
+在 `state.py` 中定义：
+
 ```python
-from app.model_services import get_embedding_service
+from typing import TypedDict, Annotated, Literal
+from langgraph.graph.message import add_messages
 
-# 自动选择可用服务（优先本地，降级智谱）
-embeddings = get_embedding_service()
-
-# 使用
-vector = embeddings.embed_query("hello")
+class MySubgraphState(TypedDict):
+    """子图状态"""
+    messages: Annotated[list, add_messages]
+    user_id: str
+    query: str
+    result: str
+    step: Literal["init", "process", "format", "end"]
 ```
 
-#### 重排服务
+#### 3. 实现子图节点
+
+在 `nodes.py` 中实现节点函数：
+
 ```python
-from app.model_services import get_rerank_service
-from app.rag.rerank import create_document_reranker
+from .state import MySubgraphState
 
-# 获取原始重排服务（仅计算分数）
-rerank_service = get_rerank_service()
-scores = rerank_service.compute_scores("query", ["doc1", "doc2"])
+def process_query(state: MySubgraphState) -> MySubgraphState:
+    """处理查询"""
+    query = state["query"]
+    # 处理逻辑
+    return {
+        "step": "format",
+        "result": "处理结果"
+    }
 
-# 使用业务逻辑层（完整的文档重排）
-reranker = create_document_reranker()
-sorted_docs = reranker.compress_documents(docs, "query", top_n=5)
+def format_output(state: MySubgraphState) -> MySubgraphState:
+    """格式化输出"""
+    result = state["result"]
+    return {
+        "step": "end",
+        "result": f"格式化后的结果: {result}"
+    }
 ```
 
+#### 4. 构建子图
+
+在 `graph.py` 中构建：
+
+```python
+from langgraph.graph import StateGraph, END
+from .state import MySubgraphState
+from .nodes import process_query, format_output
+
+def create_my_subgraph() -> StateGraph:
+    """创建子图"""
+    graph = StateGraph(MySubgraphState)
+
+    graph.add_node("process_query", process_query)
+    graph.add_node("format_output", format_output)
+
+    graph.set_entry_point("process_query")
+
+    graph.add_edge("process_query", "format_output")
+    graph.add_edge("format_output", END)
+
+    return graph
+```
+
+#### 5. 在主图中注册子图工具
+
+在 [backend/app/main_graph/tools/subgraph_tools.py](file:///home/huang/Study/AIProject/Agent1/backend/app/main_graph/tools/subgraph_tools.py) 中添加子图调用工具：
+
+```python
+@tool
+async def my_subgraph_tool(query: str) -> str:
+    """
+    我的子图工具描述
+
+    Args:
+        query: 用户查询
+
+    Returns:
+        子图执行结果
+    """
+    # 调用子图逻辑
+    return result
+```
+
+#### 6. 在 React Reason 中添加路由
+
+在 [backend/app/core/intent.py](file:///home/huang/Study/AIProject/Agent1/backend/app/core/intent.py) 的 `react_reason_async` 函数中添加对子图工具的支持。
+
 ### Docker 部署
 
 项目包含完整的 Docker 配置：
 
 - **docker-compose.yml**：服务编排（Backend + Frontend，连接远程数据库）
-- **Dockerfile.backend**：后端镜像构建
-- **Dockerfile.frontend**：前端镜像构建
+- **docker/Dockerfile.backend**：后端镜像构建
+- **docker/Dockerfile.frontend**：前端镜像构建
 - **.gitea/workflows/deploy.yml**：CI/CD 自动化部署
 
 详见 [QUICKSTART.md](QUICKSTART.md) 的 Docker 部署章节。
@@ -1721,27 +1841,38 @@ sorted_docs = reranker.compress_documents(docs, "query", top_n=5)
 - **本地开发**：`cp .env.docker .env`，修改为本地服务地址
 - **Docker 部署**：`cp .env.docker .env`，使用远程服务器地址
 
-### 必需的环境变量
+### 重要配置（必需）
 
-| 变量名 | 说明 | 本地开发示例 | Docker 部署示例 |
-|--------|------|------------|----------------|
-| `ZHIPUAI_API_KEY` | 智谱AI API密钥 | `your-api-key` | `your-api-key` |
-| `DEEPSEEK_API_KEY` | DeepSeek API密钥 | `your-api-key` | `your-api-key` |
-| `LLAMACPP_API_KEY` | llama.cpp API 密钥 | `token-abc123` | `token-abc123` |
-| `LLM_API_KEY` | 主 LLM 服务 API 密钥 | `token-abc123` | `token-abc123` |
-| `VLLM_BASE_URL` | LLM 服务地址 | `http://127.0.0.1:8081/v1` | `http://your-server:8081/v1` |
-| `LLAMACPP_EMBEDDING_URL` | Embedding 服务地址 | `http://127.0.0.1:8082/v1` | `http://your-server:8082/v1` |
-| `LLAMACPP_RERANKER_URL` | Rerank 服务地址 | `http://127.0.0.1:8083/v1` | `http://your-server:8083/v1` |
-| `ZHIPU_EMBEDDING_MODEL` | 智谱嵌入模型（可选） | `embedding-3` | `embedding-3` |
-| `ZHIPU_RERANK_MODEL` | 智谱重排模型（可选） | `rerank-2` | `rerank-2` |
-| `ZHIPU_API_BASE` | 智谱 API 基础地址（可选） | `https://open.bigmodel.cn/api/paas/v4` | 同左 |
-| `DB_URI` | PostgreSQL 连接字符串 | `postgresql://...@115.190.121.151:5432/langgraph_db` | 同左 |
-| `QDRANT_URL` | Qdrant 向量数据库地址 | `http://115.190.121.151:6333` | 同左 |
-| `QDRANT_API_KEY` | Qdrant API 密钥 | `your-api-key` | `your-api-key` |
-| `QDRANT_COLLECTION_NAME` | Qdrant 集合名称 | `rag_documents` | `rag_documents` |
-| `LOG_LEVEL` | 日志级别 | `INFO` | `WARNING` |
-| `ENABLE_GRAPH_TRACE` | 是否启用图流转追踪 | `true` | `false` |
-| `MEMORY_SUMMARIZE_INTERVAL` | 对话摘要生成间隔 | `10` | `10` |
+| 变量名 | 说明 | 示例值 |
+|--------|------|-------|
+| `ZHIPUAI_API_KEY` | 智谱AI API密钥 | `your-api-key` |
+| `DEEPSEEK_API_KEY` | DeepSeek API密钥 | `your-api-key` |
+| `LLAMACPP_API_KEY` | llama.cpp API密钥 | `your-api-key` |
+| `VLLM_BASE_URL` | 主 LLM 服务地址 | `http://127.0.0.1:18000/v1` |
+| `LLAMACPP_EMBEDDING_URL` | Embedding 服务地址 | `http://127.0.0.1:18001/v1` |
+| `LLAMACPP_RERANKER_URL` | Rerank 服务地址 | `http://127.0.0.1:18002/v1` |
+| `DB_HOST` | PostgreSQL 主机 | `115.190.121.151` |
+| `DB_PORT` | PostgreSQL 端口 | `5432` |
+| `DB_USER` | PostgreSQL 用户名 | `postgres` |
+| `DB_PASSWORD` | PostgreSQL 密码 | `your-password` |
+| `DB_NAME` | PostgreSQL 数据库名 | `langgraph_db` |
+| `QDRANT_URL` | Qdrant 向量数据库地址 | `http://115.190.121.151:6333` |
+| `QDRANT_API_KEY` | Qdrant API 密钥 | `your-api-key` |
+
+### 其他配置（有默认值）
+
+| 变量名 | 说明 | 默认值 |
+|--------|------|-------|
+| `BACKEND_PORT` | 后端服务端口 | `8079` |
+| `API_URL` | 前端调用后端地址 | `http://backend:8079/chat` |
+| `MEMORY_SUMMARIZE_INTERVAL` | 对话摘要生成间隔 | `10` |
+| `ENABLE_GRAPH_TRACE` | 是否启用图追踪 | `true` |
+| `FASTEMBED_CACHE_PATH` | FastEmbed 缓存路径 | `/app/fastembed_cache` |
+| `RAG_COLLECTION_NAME` | RAG 集合名称 | `rag_documents` |
+| `RAG_STRATEGY` | RAG 切分策略 | `parent-child` |
+| `RAG_STORAGE_TYPE` | RAG 存储类型 | `postgres` |
+| `LOG_LEVEL` | 日志级别 | `DEBUG` |
+| `DEBUG` | 调试模式 | `true` |
 
 ### 注意事项
 
@@ -1751,460 +1882,7 @@ sorted_docs = reranker.compress_documents(docs, "query", top_n=5)
 
 ---
 
-## �️ 实现指南与最佳实践
-
-### 1. RAG 知识库构建指南
-
-#### 1.1 离线索引构建流程
-
-```bash
-# 步骤 1：准备文档
-# 将文档放入指定目录（如 ./documents/）
-# 支持格式：TXT, PDF, Markdown, DOCX
-
-# 步骤 2：选择切分策略
-# 根据文档类型选择：
-# - 结构化文档（如手册、API 文档）→ Recursive Splitting
-# - 非结构化文档（如文章、报告）→ Semantic Chunking
-# - 长文档（如书籍、论文）→ Parent-Child Chunking
-
-# 步骤 3：构建索引
-cd rag_indexer
-python cli.py build \
-    --input-dir ../documents \
-    --collection-name my_knowledge \
-    --splitter-type semantic \
-    --chunk-size 500
-
-# 步骤 4：验证索引
-python cli.py query \
-    --collection-name my_knowledge \
-    --query "如何配置系统？" \
-    --top-k 5
-```
-
-#### 1.2 切分策略选择建议
-
-| 文档类型 | 推荐策略 | chunk_size | 说明 |
-|---------|---------|------------|------|
-| API 文档 | Recursive | 300-500 | 结构清晰，按章节切分 |
-| 技术文章 | Semantic | 自适应 | 保持语义完整性 |
-| 法律合同 | Parent-Child | 父:1000, 子:200 | 需要完整上下文 |
-| 问答对 | Recursive | 200-300 | 短文本，精确匹配 |
-| 产品手册 | Parent-Child | 父:1500, 子:300 | 长文档，跨章节检索 |
-
-#### 1.3 Embedding 模型选择
-
-| 模型 | 维度 | 语言 | 速度 | 精度 | 适用场景 |
-|------|------|------|------|------|---------|
-| bge-large-zh-v1.5 | 1024 | 中文 | 中 | 高 | 中文文档检索 |
-| bge-m3 | 1024 | 多语言 | 中 | 高 | 多语言混合文档 |
-| text-embedding-3-small | 1536 | 英文 | 快 | 中 | 英文文档检索 |
-| gte-large | 1024 | 中文 | 慢 | 很高 | 高精度要求场景 |
-
-### 2. 性能优化指南
-
-#### 2.1 检索性能优化
-
-```python
-# 优化 1：调整检索参数
-search_kwargs = {
-    "k": 20,              # 召回数量（增加召回，提高精度）
-    "score_threshold": 0.3,  # 相似度阈值（过滤低质量结果）
-    "fetch_k": 50,        # 初始召回数量（用于 MMR 去重）
-    "lambda_mult": 0.7,   # MMR 多样性参数（0=去重，1=不去重）
-}
-
-# 优化 2：使用缓存
-from functools import lru_cache
-
-@lru_cache(maxsize=1000)
-def cached_retrieve(query: str) -> List[Document]:
-    """缓存常见查询结果"""
-    return retriever.invoke(query)
-
-# 优化 3：批量 Embedding
-# 构建索引时，使用批量处理提高效率
-batch_size = 32
-for i in range(0, len(documents), batch_size):
-    batch = documents[i:i+batch_size]
-    embeddings = embedder.embed_documents([doc.page_content for doc in batch])
-```
-
-#### 2.2 LLM 调用优化
-
-```python
-# 优化 1：使用流式响应
-# 减少首字延迟，提升用户体验
-response = await llm.astream(messages)
-async for chunk in response:
-    yield chunk.content
-
-# 优化 2：控制上下文长度
-MAX_CONTEXT_LENGTH = 4000
-
-def truncate_context(context: str, max_length: int = MAX_CONTEXT_LENGTH) -> str:
-    """截断上下文，保留开头和结尾"""
-    if len(context) <= max_length:
-        return context
-    half = max_length // 2
-    return context[:half] + "\n...\n" + context[-half:]
-
-# 优化 3：Prompt 优化
-# 使用结构化 Prompt，提高 LLM 理解能力
-prompt_template = """
-你是一个智能助手。请根据以下上下文回答用户问题。
-
-上下文：
-{context}
-
-问题：{question}
-
-回答要求：
-1. 仅基于上下文回答，不要编造信息
-2. 如果上下文中没有答案，明确告知用户
-3. 引用上下文中的具体信息时，注明来源
-"""
-```
-
-#### 2.3 数据库性能优化
-
-```sql
--- PostgreSQL 优化
--- 1. 为对话历史表添加索引
-CREATE INDEX idx_conversations_user_id ON conversations(user_id);
-CREATE INDEX idx_conversations_created_at ON conversations(created_at);
-
--- 2. 为记忆摘要表添加索引
-CREATE INDEX idx_summaries_user_id ON summaries(user_id);
-
--- 3. 定期清理旧数据（保留最近 90 天）
-DELETE FROM conversations 
-WHERE created_at < NOW() - INTERVAL '90 days';
-
--- Qdrant 优化
--- 1. 使用 HNSW 索引（默认已启用）
--- 2. 调整 ef_construct 和 m 参数平衡速度和精度
--- 3. 定期优化集合
-curl -X POST http://localhost:6333/collections/my_docs/actions \
-  -H "Content-Type: application/json" \
-  -d '{"actions": [{"optimize": {}}]}'
-```
-
-### 3. 扩展开发指南
-
-#### 3.1 添加自定义工具
-
-```python
-# 在 backend/app/main_graph/tools/graph_tools.py 中添加
-
-from langchain_core.tools import tool
-from typing import Optional
-
-@tool
-def calculate_expression(
-    expression: str,
-    precision: int = 2
-) -> str:
-    """
-    计算数学表达式的值。
-    
-    支持基本运算：加减乘除、幂运算、括号。
-    
-    Args:
-        expression: 数学表达式，如 "2 + 3 * 4"
-        precision: 结果精度（小数位数），默认 2
-        
-    Returns:
-        计算结果
-    """
-    try:
-        # 安全计算（避免 eval 的安全风险）
-        import ast
-        import operator
-        
-        # 定义允许的操作符
-        ops = {
-            ast.Add: operator.add,
-            ast.Sub: operator.sub,
-            ast.Mult: operator.mul,
-            ast.Div: operator.truediv,
-            ast.Pow: operator.pow,
-        }
-        
-        def eval_node(node):
-            if isinstance(node, ast.Num):
-                return node.n
-            elif isinstance(node, ast.BinOp):
-                left = eval_node(node.left)
-                right = eval_node(node.right)
-                return ops[type(node.op)](left, right)
-            else:
-                raise ValueError(f"不支持的操作: {type(node)}")
-        
-        tree = ast.parse(expression, mode='eval')
-        result = eval_node(tree.body)
-        return f"{result:.{precision}f}"
-    except Exception as e:
-        return f"计算错误: {str(e)}"
-
-# 工具会自动注册，无需手动添加到 AVAILABLE_TOOLS
-```
-
-#### 3.2 添加自定义 LLM 提供商
-
-```python
-# 在 backend/app/agent/llm_factory.py 中添加
-
-from langchain_openai import ChatOpenAI
-from langchain_core.language_models import BaseChatModel
-
-class LLMFactory:
-    @staticmethod
-    def create_anthropic_model() -> BaseChatModel:
-        """创建 Anthropic Claude 模型"""
-        api_key = os.getenv("ANTHROPIC_API_KEY")
-        if not api_key:
-            raise ValueError("ANTHROPIC_API_KEY 环境变量未设置")
-        
-        return ChatAnthropic(
-            model="claude-3-sonnet-20240229",
-            temperature=0.1,
-            max_tokens=4096,
-            api_key=api_key,
-        )
-    
-    # 注册到 CREATORS
-    CREATORS = {
-        "local": create_local,
-        "deepseek": create_deepseek,
-        "zhipu": create_zhipu,
-        "anthropic": create_anthropic_model,  # 新增
-    }
-```
-
-#### 3.3 添加自定义 RAG 检索器
-
-```python
-# 在 backend/app/rag/retriever.py 中添加
-
-from langchain_core.retrievers import BaseRetriever
-from langchain_core.documents import Document
-
-class HybridRetriever(BaseRetriever):
-    """混合检索器：结合多种检索策略"""
-    
-    dense_retriever: BaseRetriever
-    sparse_retriever: BaseRetriever
-    reranker: Optional[BaseReranker] = None
-    top_k: int = 5
-    
-    def _get_relevant_documents(self, query: str) -> List[Document]:
-        # 并行检索
-        dense_docs = self.dense_retriever.invoke(query)
-        sparse_docs = self.sparse_retriever.invoke(query)
-        
-        # RRF 融合
-        fused_docs = reciprocal_rank_fusion([dense_docs, sparse_docs])
-        
-        # 重排序
-        if self.reranker:
-            fused_docs = self.reranker.compress_documents(fused_docs[:20], query)
-        
-        return fused_docs[:self.top_k]
-```
-
-### 4. 部署最佳实践
-
-#### 4.1 生产环境配置
-
-```yaml
-# docker-compose.prod.yml
-version: '3.8'
-
-services:
-  backend:
-    build:
-      context: .
-      dockerfile: docker/Dockerfile.backend
-    environment:
-      - LOG_LEVEL=WARNING  # 生产环境降低日志级别
-      - ENABLE_GRAPH_TRACE=false  # 关闭图追踪（提升性能）
-      - MEMORY_SUMMARIZE_INTERVAL=5  # 更频繁的摘要生成
-    deploy:
-      resources:
-        limits:
-          cpus: '2'
-          memory: 4G
-        reservations:
-          cpus: '1'
-          memory: 2G
-    restart: unless-stopped
-    healthcheck:
-      test: ["CMD", "curl", "-f", "http://localhost:8083/health"]
-      interval: 30s
-      timeout: 10s
-      retries: 3
-
-  frontend:
-    build:
-      context: .
-      dockerfile: docker/Dockerfile.frontend
-    environment:
-      - STREAMLIT_SERVER_PORT=8501
-      - STREAMLIT_SERVER_HEADLESS=true
-    deploy:
-      resources:
-        limits:
-          cpus: '1'
-          memory: 2G
-    restart: unless-stopped
-```
-
-#### 4.2 监控与告警
-
-```python
-# 添加健康检查端点
-from fastapi import FastAPI
-from datetime import datetime
-
-app = FastAPI()
-
-@app.get("/health")
-async def health_check():
-    """健康检查端点"""
-    return {
-        "status": "healthy",
-        "timestamp": datetime.now().isoformat(),
-        "services": {
-            "postgresql": await check_postgresql(),
-            "qdrant": await check_qdrant(),
-            "llm": await check_llm_service(),
-        }
-    }
-
-async def check_postgresql() -> bool:
-    try:
-        # 尝试连接数据库
-        async with asyncpg.connect(DB_URI) as conn:
-            await conn.fetchval("SELECT 1")
-        return True
-    except Exception:
-        return False
-
-async def check_qdrant() -> bool:
-    try:
-        client = QdrantClient(url=QDRANT_URL)
-        client.get_collections()
-        return True
-    except Exception:
-        return False
-```
-
-#### 4.3 安全最佳实践
-
-```python
-# 1. 使用环境变量管理密钥
-# 永远不要硬编码 API Key
-import os
-from typing import Optional
-
-def get_api_key(env_var: str, default: Optional[str] = None) -> str:
-    api_key = os.getenv(env_var, default)
-    if not api_key:
-        raise ValueError(f"{env_var} 环境变量未设置")
-    return api_key
-
-# 2. 输入验证
-from pydantic import BaseModel, validator
-
-class ChatRequest(BaseModel):
-    message: str
-    user_id: str
-    model: str = "zhipu"
-    
-    @validator('message')
-    def validate_message(cls, v):
-        if len(v) > 4000:
-            raise ValueError("消息长度不能超过 4000 字符")
-        return v.strip()
-    
-    @validator('model')
-    def validate_model(cls, v):
-        allowed_models = {"zhipu", "deepseek", "local"}
-        if v not in allowed_models:
-            raise ValueError(f"不支持的模型: {v}")
-        return v
-
-# 3. 速率限制
-from slowapi import Limiter
-from slowapi.util import get_remote_address
-
-limiter = Limiter(key_func=get_remote_address)
-
-@app.post("/api/chat")
-@limiter.limit("10/minute")  # 每分钟最多 10 次请求
-async def chat(request: Request, chat_request: ChatRequest):
-    ...
-```
-
-### 5. 调试与测试
-
-#### 5.1 启用调试模式
-
-```bash
-# 启用详细日志
-export LOG_LEVEL=DEBUG
-
-# 启用 LangGraph 图追踪
-export ENABLE_GRAPH_TRACE=true
-
-# 查看图执行流程
-# 启动后访问：http://localhost:8083/graph/trace
-```
-
-#### 5.2 单元测试示例
-
-```python
-# tests/test_rag.py
-import pytest
-from app.rag.fusion import reciprocal_rank_fusion
-from langchain_core.documents import Document
-
-def test_rrf_fusion():
-    # 准备测试数据
-    docs1 = [
-        Document(page_content="文档 A"),
-        Document(page_content="文档 B"),
-    ]
-    docs2 = [
-        Document(page_content="文档 B"),
-        Document(page_content="文档 C"),
-    ]
-    
-    # 执行 RRF 融合
-    result = reciprocal_rank_fusion([docs1, docs2])
-    
-    # 验证结果
-    assert len(result) == 3
-    assert result[0].page_content == "文档 B"  # B 在两个列表中都出现，应该排第一
-
-@pytest.mark.asyncio
-async def test_retriever():
-    from app.rag.retriever import create_base_retriever
-    
-    retriever = create_base_retriever(
-        collection_name="test_collection",
-        embeddings=mock_embeddings,
-    )
-    
-    docs = await retriever.ainvoke("测试查询")
-    assert len(docs) > 0
-```
-
----
-
-## �� 故障排查
+## 🔍 故障排查
 
 ### 常见问题
 
@@ -2218,15 +1896,20 @@ curl http://115.190.121.151:6333/collections
 ```
 
 **Q: 后端启动失败？**
-- 确认端口 8083 未被占用
+- 确认端口 8079 未被占用
 - 检查 `.env` 中的 API Key 是否正确
 - 查看启动日志确认模型初始化成功
 
 **Q: 模型切换后无响应？**
 - 检查所选模型的配置是否正确
-- 确认 vLLM 容器是否运行（如使用本地模型）
+- 确认 llama.cpp 服务是否运行（如使用本地模型）
 - 尝试切换到另一个模型
 
+**Q: 混合路由异常？**
+- 检查 `ENABLE_GRAPH_TRACE=true` 查看详细执行流程
+- 确认快速路径工具是否正确注册
+- 查看 React Reason 节点的输出
+
 更多问题排查请查看 [QUICKSTART.md](QUICKSTART.md)
 
 ---
diff --git a/backend/app/agent/agent_service.py b/backend/app/agent/agent_service.py
index d5d29b6..aa8a8c8 100644
--- a/backend/app/agent/agent_service.py
+++ b/backend/app/agent/agent_service.py
@@ -10,7 +10,6 @@ import asyncio
 from ..main_graph.utils.main_graph_builder import build_react_main_graph
 from ..main_graph.tools.graph_tools import AVAILABLE_TOOLS, TOOLS_BY_NAME
 from ..main_graph.config import set_stream_writer
-from ..model_services.chat_services import get_all_chat_services, LocalVLLMChatProvider
 from ..main_graph.utils.rag_initializer import init_rag_tool
 from ..core.intent_classifier import get_intent_classifier
 from ..logger import info, warning, error
@@ -33,18 +32,10 @@ class AIAgentService:
     async def initialize(self):
         # 0. 初始化 Mem0 客户端
         from ..memory.mem0_client import Mem0Client
-        # 创建一个临时的 LLM 用于 Mem0（用第一个可用的）
-        chat_services = get_all_chat_services()
-        temp_llm = None
-        if chat_services:
-            temp_llm = list(chat_services.values())[0]
-        self.mem0_client = Mem0Client(temp_llm)
+        self.mem0_client = Mem0Client()
         
         # 1. 初始化 RAG 工具（如果需要）
-        def create_local_llm():
-            provider = LocalVLLMChatProvider()
-            return provider.get_service()
-        rag_tool = await init_rag_tool(create_local_llm)
+        rag_tool = await init_rag_tool()
         if rag_tool:
             self.tools.append(rag_tool)
             self.tools_by_name[rag_tool.name] = rag_tool
diff --git a/backend/app/main_graph/utils/rag_initializer.py b/backend/app/main_graph/utils/rag_initializer.py
index 7351635..becd5e7 100644
--- a/backend/app/main_graph/utils/rag_initializer.py
+++ b/backend/app/main_graph/utils/rag_initializer.py
@@ -20,12 +20,11 @@ def is_initialized() -> bool:
     return _initialized
 
 
-async def init_rag_tool(local_llm_creator, force: bool = False):
+async def init_rag_tool(force: bool = False):
     """
-    初始化 RAG 工具（注册到模块级变量）
+    初始化 RAG 工具（注册到模块级变量，内部获取所需服务）
 
     Args:
-        local_llm_creator: 返回 LLM 实例的函数
         force: 是否强制重新初始化
 
     Returns:
@@ -39,20 +38,22 @@ async def init_rag_tool(local_llm_creator, force: bool = False):
         return _rag_tool
 
     try:
+        from app.model_services.chat_services import get_chat_service
+
         info("🔄 正在初始化 RAG 检索系统...")
         embeddings = get_embedding_service()
         retriever = create_parent_hybrid_retriever(
             collection_name="rag_documents",
             search_k=5,
-            embeddings=embeddings
+            embeddings=embeddings,
         )
-        rewrite_llm = local_llm_creator()
+        rewrite_llm = get_chat_service()
 
         rag_tool = create_rag_tool(
             retriever=retriever,
             llm=rewrite_llm,
             num_queries=3,
-            rerank_top_n=5
+            rerank_top_n=5,
         )
 
         _rag_tool = rag_tool
diff --git a/backend/app/memory/mem0_client.py b/backend/app/memory/mem0_client.py
index f004fe6..1c53bf7 100644
--- a/backend/app/memory/mem0_client.py
+++ b/backend/app/memory/mem0_client.py
@@ -1,33 +1,36 @@
-from app.config import (
-    LLM_API_KEY, ZHIPUAI_API_KEY,
-    VLLM_BASE_URL, QDRANT_URL, QDRANT_COLLECTION_NAME, QDRANT_API_KEY,
-    LLAMACPP_EMBEDDING_URL, LLAMACPP_API_KEY,
-    ZHIPU_EMBEDDING_MODEL, ZHIPU_API_BASE
-)
-from ..model_services import get_embedding_service
-from app.logger import info, warning, error
-import time
 """
 Mem0 记忆层客户端封装模块
 负责 Mem0 的初始化、检索和存储
 """
 
 import asyncio
-from typing import Optional, List, Dict
+import time
+from typing import Optional, List
+
 from mem0 import AsyncMemory
 
+from app.config import (
+    LLM_API_KEY,
+    ZHIPUAI_API_KEY,
+    VLLM_BASE_URL,
+    QDRANT_URL,
+    QDRANT_COLLECTION_NAME,
+    QDRANT_API_KEY,
+    LLAMACPP_EMBEDDING_URL,
+    LLAMACPP_API_KEY,
+    ZHIPU_EMBEDDING_MODEL,
+    ZHIPU_API_BASE,
+)
+from app.logger import info, warning, error
+from app.model_services import get_embedding_service
+from app.model_services.chat_services import get_chat_service
+
 
 class Mem0Client:
     """Mem0 异步客户端封装类"""
 
-    def __init__(self, llm_instance):
-        """
-        初始化 Mem0 客户端
-
-        Args:
-            llm_instance: LangChain LLM 实例（用于事实提取）
-        """
-        self.llm = llm_instance
+    def __init__(self):
+        """初始化 Mem0 客户端（内部获取所需服务）"""
         self.mem0: Optional[AsyncMemory] = None
         self._initialized = False
 
@@ -35,7 +38,7 @@ class Mem0Client:
         """异步初始化 Mem0 客户端，并进行实际连接测试"""
         if self._initialized:
             return
-        
+
         try:
             # 获取可用的 embedding 服务并确定维度
             info("🔄 正在获取嵌入服务...")
@@ -43,14 +46,16 @@ class Mem0Client:
             test_embedding = embeddings.embed_query("test")
             embedding_dim = len(test_embedding)
             info(f"✅ 嵌入服务可用，向量维度: {embedding_dim}")
-            
-            # 构建 embedder 配置 - 改进的方法
-            # 检查本地 provider
-            from ..model_services.embedding_services import LocalLlamaCppEmbeddingProvider, ZhipuEmbeddingProvider
-            
+
+            # 构建 embedder 配置
+            from app.model_services.embedding_services import (
+                LocalLlamaCppEmbeddingProvider,
+                ZhipuEmbeddingProvider,
+            )
+
             embedder_config = None
             local_provider = LocalLlamaCppEmbeddingProvider()
-            
+
             if local_provider.is_available():
                 info("✅ 使用本地 llama.cpp 作为 mem0 embedder")
                 embedder_config = {
@@ -59,22 +64,20 @@ class Mem0Client:
                         "model": "Qwen3-Embedding-0.6B-Q8_0",
                         "api_key": LLAMACPP_API_KEY or "dummy-key",
                         "openai_base_url": LLAMACPP_EMBEDDING_URL,
-                    }
+                    },
                 }
             else:
                 # 检查智谱
                 zhipu_provider = ZhipuEmbeddingProvider()
                 if zhipu_provider.is_available():
                     info("✅ 使用智谱 API 作为 mem0 embedder")
-                    # 使用自定义 embedder 或者 openai 兼容方式
-                    # 注意：这里我们使用一个特殊的配置方法
                     embedder_config = {
                         "provider": "openai",
                         "config": {
                             "model": ZHIPU_EMBEDDING_MODEL,
                             "api_key": ZHIPUAI_API_KEY,
                             "openai_base_url": ZHIPU_API_BASE,
-                        }
+                        },
                     }
                 else:
                     # 都不可用，使用 dummy 配置并警告
@@ -83,12 +86,17 @@ class Mem0Client:
                         "provider": "openai",
                         "config": {
                             "model": "text-embedding-ada-002",
-                            "api_key": "dummy-key",
+                            "api_key": "***",
                             "openai_base_url": "http://localhost:8080/v1",
-                        }
+                        },
                     }
-            
-            # Mem0 配置 - 简化配置，先确保能启动
+
+            # 获取 LLM 服务（内部获取）
+            info("🔄 正在获取 LLM 服务...")
+            chat_llm = get_chat_service()
+            info("✅ LLM 服务获取成功")
+
+            # Mem0 配置
             info("🔄 正在构建 Mem0 配置...")
             config = {
                 "vector_store": {
@@ -98,7 +106,7 @@ class Mem0Client:
                         "api_key": QDRANT_API_KEY,
                         "collection_name": QDRANT_COLLECTION_NAME,
                         "embedding_model_dims": embedding_dim,
-                    }
+                    },
                 },
                 "llm": {
                     "provider": "openai",
@@ -108,31 +116,30 @@ class Mem0Client:
                         "openai_base_url": VLLM_BASE_URL or ZHIPU_API_BASE,
                         "temperature": 0.1,
                         "max_tokens": 2000,
-                    }
+                    },
                 },
                 "embedder": embedder_config,
-                "version": "v1.1"
+                "version": "v1.1",
             }
-            
+
             info("🔄 正在初始化 Mem0 实例...")
             self.mem0 = AsyncMemory.from_config(config)
             info("✅ Mem0 配置加载成功")
-            
+
             # 尝试进行连接测试，但失败不会阻止初始化
             try:
                 info("🔄 正在测试 Mem0 连接...")
-                # 使用短超时的测试
                 await asyncio.wait_for(
                     self.mem0.search("ping", user_id="test", limit=1),
-                    timeout=10.0
+                    timeout=10.0,
                 )
                 info("✅ Mem0 连接测试成功")
             except Exception as e:
                 warning(f"⚠️ Mem0 连接测试遇到问题（但继续使用）: {e}")
-            
+
             self._initialized = True
             info("🎉 Mem0 初始化完成")
-            
+
         except asyncio.TimeoutError:
             error("❌ Mem0 初始化超时")
             self.mem0 = None
@@ -140,11 +147,14 @@ class Mem0Client:
         except Exception as e:
             error(f"❌ Mem0 初始化失败: {e}")
             import traceback
+
             error(f"详细错误信息:\n{traceback.format_exc()}")
             self.mem0 = None
             self._initialized = False
 
-    async def search_memories(self, query: str, user_id: str, limit: int = 5) -> List[str]:
+    async def search_memories(
+        self, query: str, user_id: str, limit: int = 5
+    ) -> List[str]:
         """
         检索相关记忆
 
@@ -163,7 +173,7 @@ class Mem0Client:
         try:
             memories = await asyncio.wait_for(
                 self.mem0.search(query, user_id=user_id, limit=limit),
-                timeout=30.0
+                timeout=30.0,
             )
 
             if memories and "results" in memories:
@@ -183,17 +193,25 @@ class Mem0Client:
             return []
 
     async def add_memories(self, messages, user_id):
-      if not self.mem0:
-        return False
-      try:
-        start = time.time()
-        info(f"📝 开始 Mem0 add，消息数: {len(messages)}")
-        await asyncio.wait_for(
-            self.mem0.add(messages, user_id=user_id, metadata={"type": "conversation"}),
-            timeout=60.0
-        )
-        info(f"✅ Mem0 add 完成，耗时: {time.time() - start:.2f}s")
-        return True
-      except asyncio.TimeoutError:
-        error(f"❌ Mem0 记忆添加超时 (60s)，已等待 {time.time() - start:.2f}s")
-        return False
\ No newline at end of file
+        """添加记忆"""
+        if not self.mem0:
+            return False
+        try:
+            start = time.time()
+            info(f"📝 开始 Mem0 add，消息数: {len(messages)}")
+            await asyncio.wait_for(
+                self.mem0.add(
+                    messages, user_id=user_id, metadata={"type": "conversation"}
+                ),
+                timeout=60.0,
+            )
+            info(f"✅ Mem0 add 完成，耗时: {time.time() - start:.2f}s")
+            return True
+        except asyncio.TimeoutError:
+            error(
+                f"❌ Mem0 记忆添加超时 (60s)，已等待 {time.time() - start:.2f}s"
+            )
+            return False
+        except Exception as e:
+            error(f"❌ Mem0 add 失败: {e}")
+            return False
diff --git a/tools/test/test_graph_branches.py b/tools/test/test_graph_branches.py
index 32ca0f3..a25aa80 100644
--- a/tools/test/test_graph_branches.py
+++ b/tools/test/test_graph_branches.py
@@ -31,25 +31,25 @@ TEST_CASES = [
         "query": "吕布的事迹？",
         "description": "测试快速 RAG 分支"
     },
-    # # 测试3: 需要推理的复杂问题 - 应该直接到 React 循环
-    # {
-    #     "name": "复杂推理测试",
-    #     "query": "请帮我分析：如果我有10万元，想要在一年内获得15%的收益，有哪些低风险的投资方案？",
-    #     "description": "测试 React 循环推理分支"
-    # },
+    # 测试3: 需要推理的复杂问题 - 应该直接到 React 循环
+    {
+        "name": "复杂推理测试",
+        "query": "请帮我分析：如果我有10万元，想要在一年内获得15%的收益，有哪些低风险的投资方案？",
+        "description": "测试 React 循环推理分支"
+    },
     # # 测试4: 需要工具调用的问题
     # {
     #     "name": "工具调用测试",
     #     "query": "搜索一下今天的天气怎么样",
     #     "description": "测试工具调用分支"
     # },
-    # # 测试5: 带记忆的对话
-    # {
-    #     "name": "记忆测试",
-    #     "query": "你刚才回答了我什么问题？",
-    #     "description": "测试记忆检索分支",
-    #     "thread_id": "test_memory_thread"
-    # }
+    # 测试5: 带记忆的对话
+    {
+        "name": "记忆测试",
+        "query": "你刚才回答了我什么问题？",
+        "description": "测试记忆检索分支",
+        "thread_id": "test_memory_thread"
+    }
 ]