diff --git a/.gitignore b/.gitignore
index e9074b3..5c86892 100644
--- a/.gitignore
+++ b/.gitignore
@@ -26,6 +26,7 @@
 !.gitignore
 !README.md
 !QUICKSTART.md
+!REACT_MODE_SUMMARY.md
 !LICENSE
 !requirement.txt
 !.env.docker
diff --git a/REACT_MODE_SUMMARY.md b/REACT_MODE_SUMMARY.md
new file mode 100644
index 0000000..f912de7
--- /dev/null
+++ b/REACT_MODE_SUMMARY.md
@@ -0,0 +1,149 @@
+# React 模式架构总结
+
+---
+
+## ✅ 当前架构：混合路由 + React 循环
+
+本项目采用 **两层混合架构**：
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│ 第一层：前置混合路由（低延迟）                                 │
+│   ├─ 规则快速分流（无 LLM）                                  │
+│   ├─ 轻量级意图分类（smallLLM）                              │
+│   └─ 快速路径（fast_chitchat, fast_rag, fast_tool）         │
+└───────────────────────┬─────────────────────────────────────┘
+                        ↓（自动升级：失败时）
+┌─────────────────────────────────────────────────────────────┐
+│ 第二层：完整 React 循环（兜底，复杂任务处理）                  │
+│   └─ 推理 → 行动 → 观察（最多 40 步）                        │
+└─────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## 🎯 第一层：前置混合路由（新）
+
+### 核心功能
+
+| 功能 | 说明 |
+|------|------|
+| 规则快速分流 | 无 LLM，毫秒级响应，用于问候、感谢、子图关键词等 |
+| 轻量级意图分类 | 使用 smallLLM，压缩到 4 类：chitchat, knowledge, tool, complex |
+| 快速路径 | 三个快速处理节点：fast_chitchat, fast_rag, fast_tool |
+| 自动升级 | 快速路径失败时，自动回到完整 React 循环 |
+| SSE 事件增强 | intent_classified, path_decision, fast_path_*, escalation |
+
+### 快速流程图
+
+```
+START
+  ↓
+init_state
+  ↓
+hybrid_router (前置路由) ←────────────┐
+  ↓                                    │
+  ├─ 规则分流 → fast_chitchat →────────┤
+  │                     ↓              │
+  ├─ 模型分类 → fast_rag →────────────┤
+  │                     ↓              │
+  ├─               fast_tool →────────┤
+  │                     ↓              │
+  └─               react_loop →────────┤
+                        ↓              │
+              检查成功/升级? ──────────┘
+                 ↓         ↓
+              finalize  react_reason
+```
+
+### 关键文件
+
+| 文件 | 说明 |
+|------|------|
+| `backend/app/main_graph/nodes/hybrid_router.py` | 混合路由完整实现 |
+| `backend/app/model_services/chat_services.py` | get_chat_service() + get_small_llm_service() |
+| `backend/app/main_graph/utils/main_graph_builder.py` | 集成混合路由到主图 |
+
+### 配置项
+
+```python
+# 构建图时可选择
+graph = build_react_main_graph(use_hybrid_router=True)  # 启用混合路由（默认）
+graph = build_react_main_graph(use_hybrid_router=False) # 禁用，纯 React 循环
+```
+
+---
+
+## 🎯 第二层：完整 React 循环（保留）
+
+### 核心特性
+
+| 特性 | 说明 |
+|------|------|
+| 循环推理 | 每轮推理判断下一步，最多 40 步 |
+| 结构化错误 | ErrorRecord + ErrorSeverity |
+| 超时重试 | RAG 最多 2 次，子图最多 1 次 |
+| 子图集成 | contact, dictionary, news_analysis |
+| RAG 检索 | 支持重检索（re_retrieve） |
+
+### 流程图
+
+```
+react_reason (推理) ←──────────────────┐
+  ↓                                    │
+条件路由                              │
+  ├─→ rag_retrieve (带重试) →──────────┤
+  ├─→ contact_subgraph →───────────────┤
+  ├─→ dictionary_subgraph →────────────┤
+  ├─→ news_analysis_subgraph →─────────┤
+  ├─→ handle_error → (重试或降级) →────┤
+  └─→ finalize
+  ↓
+END
+```
+
+---
+
+## 📁 关键文件清单
+
+| 文件 | 说明 |
+|------|------|
+| `backend/app/main_graph/utils/main_graph_builder.py` | 主图构建（支持混合路由开关） |
+| `backend/app/main_graph/nodes/react_nodes.py` | React 循环节点 |
+| `backend/app/main_graph/nodes/hybrid_router.py` | 混合路由节点（新） |
+| `backend/app/main_graph/nodes/rag_nodes.py` | RAG 检索节点 |
+| `backend/app/main_graph/utils/retry_utils.py` | 超时重试工具 |
+| `backend/app/main_graph/state.py` | 主状态 |
+| `backend/app/core/intent.py` | React 模式意图推理器 |
+| `backend/app/model_services/chat_services.py` | 双模型服务（llm + smallLLM） |
+
+---
+
+## 🚀 快速使用
+
+```python
+from backend.app.main_graph.utils.main_graph_builder import build_react_main_graph
+
+# 构建图（默认启用混合路由）
+graph = build_react_main_graph(use_hybrid_router=True)
+compiled_graph = graph.compile()
+
+# 调用
+result = compiled_graph.invoke({"user_query": "你好", "user_id": "test"})
+print(result.final_result)
+```
+
+---
+
+## 🎉 完整特性总结
+
+✅ 双模型服务 (llm + smallLLM)  
+✅ 前置混合路由（规则快速分流 + 轻量级意图分类）  
+✅ 三个快速路径（fast_chitchat, fast_rag, fast_tool）  
+✅ 自动升级机制（快速路径失败 → 完整 React 循环）  
+✅ SSE 事件增强（intent_classified, path_decision, fast_path_*, escalation）  
+✅ 完整 React 循环（最多 40 步）  
+✅ 结构化错误处理  
+✅ 超时和重试策略  
+✅ 子图集成（contact, dictionary, news_analysis）  
+✅ 向后兼容（use_hybrid_router=True/False）
diff --git a/README.md b/README.md
index 1588ae3..fdcbb18 100644
--- a/README.md
+++ b/README.md
@@ -45,6 +45,10 @@
 - ✅ **子图系统**：模块化的子图架构，共享公共工具（意图理解、人工审核、格式化输出）
 - ✅ **公共工具库**：联网搜索、可视化图表等通用工具，所有子图和主图均可使用
 - ✅ **React 模式** ⭐：Reasoning → Acting → Observing 循环，LLM 先思考再行动，支持多次工具调用
+- ✅ **混合路由架构** ⭐⭐：前置快速路由（规则分流 + 轻量级意图分类）+ 完整 React 循环（兜底）
+- ✅ **双模型服务** ⭐：get_chat_service()（大模型）+ get_small_llm_service()（轻量级模型）
+- ✅ **自动升级机制**：快速路径失败时，自动回到完整 React 循环
+- ✅ **向后兼容**：可通过 use_hybrid_router=True/False 切换混合路由/纯 React 模式
 
 ---
 
diff --git a/backend/app/README.md b/backend/app/README.md
index 2ebdea6..eedbbf9 100644
--- a/backend/app/README.md
+++ b/backend/app/README.md
@@ -36,6 +36,7 @@ app/
 │   ├── nodes/               # 主图节点
 │   │   ├── __init__.py
 │   │   ├── react_nodes.py   # React 模式节点（推理、路由）
+│   │   ├── hybrid_router.py # ⭐ 混合路由节点（前置快速路由 + 自动升级）
 │   │   ├── llm_call.py      # LLM 调用节点
 │   │   ├── retrieve_memory.py # 记忆检索节点
 │   │   ├── memory_trigger.py # 记忆触发节点
@@ -52,7 +53,7 @@ app/
 │   │
 │   └── utils/               # 主图工具函数
 │       ├── __init__.py
-│       ├── main_graph_builder.py # 主图构建器
+│       ├── main_graph_builder.py # 主图构建器（支持混合路由开关）
 │       ├── retry_utils.py   # 重试工具
 │       ├── rag_initializer.py # RAG 初始化工具
 │       └── visualize_graph.py # 图可视化工具
diff --git a/backend/app/model_services/README.md b/backend/app/model_services/README.md
index 1107214..221516b 100644
--- a/backend/app/model_services/README.md
+++ b/backend/app/model_services/README.md
@@ -1,31 +1,85 @@
 """
 模型服务模块（model_services）
 
-提供统一的嵌入和重排模型服务获取接口，支持自动降级：
-1. 优先使用本地 llama.cpp 服务
-2. 本地服务不可用时，自动降级到智谱 API 服务
+提供统一的嵌入、重排和生成式大模型服务获取接口，支持自动降级。
 
-使用方法：
+---
 
-from app.model_services import get_embedding_service, get_rerank_service, BaseReranker
+## 📚 生成式大模型服务（Chat）
+
+### 双模型服务
+| 函数 | 说明 |
+|------|------|
+| `get_chat_service()` | 获取大模型服务（用于复杂推理、生成） |
+| `get_small_llm_service()` | 获取轻量级模型服务（用于简单意图分类、快速问答） |
+| `get_all_chat_services()` | 获取所有可用的生成式大模型服务（用于多模型切换） |
+
+### 使用方法
+
+```python
+from app.model_services import get_chat_service, get_small_llm_service
+
+# 获取大模型服务（复杂任务）
+llm = get_chat_service()
+response = llm.invoke("什么是 LangGraph?")
+
+# 获取轻量级模型服务（简单任务）
+small_llm = get_small_llm_service()
+response = small_llm.invoke("分类用户意图：'你好'")
+```
+
+---
+
+## 📚 嵌入模型服务（Embedding）
+
+| 函数 | 说明 |
+|------|------|
+| `get_embedding_service()` | 获取嵌入模型服务（自动降级） |
+
+### 使用方法
+
+```python
+from app.model_services import get_embedding_service
 
 # 获取嵌入服务（LangChain 兼容的 Embeddings）
 embeddings = get_embedding_service()
+```
+
+---
+
+## 📚 重排模型服务（Rerank）
+
+| 函数 | 说明 |
+|------|------|
+| `get_rerank_service()` | 获取重排模型服务（自动降级） |
+
+### 使用方法
+
+```python
+from app.model_services import get_rerank_service
 
 # 获取重排服务
 reranker = get_rerank_service()
 sorted_docs = reranker.compress_documents(documents, query, top_n=5)
+```
 
-环境变量配置：
+---
 
+## 🔧 环境变量配置
+
+```env
 # 智谱 API 配置
-ZHIPUAI_API_KEY=your_api_key
+ZHIPUAI_API_KEY=***
 ZHIPU_EMBEDDING_MODEL=embedding-3  # 可选：embedding-2、embedding-3
 ZHIPU_RERANK_MODEL=rerank-2        # 可选：rerank-1、rerank-2
 ZHIPU_API_BASE=https://open.bigmodel.cn/api/paas/v4
 
+# DeepSeek API 配置（用于大模型）
+DEEPSEEK_API_KEY=***
+
 # 本地 llama.cpp 服务配置（原有配置保持不变）
 LLAMACPP_EMBEDDING_URL=http://localhost:port/v1
 LLAMACPP_RERANKER_URL=http://localhost:port/v1
-LLAMACPP_API_KEY=your_api_key
+LLAMACPP_API_KEY=***
+```
 """
diff --git a/backend/docs/HYBRID_ROUTER.md b/backend/docs/HYBRID_ROUTER.md
deleted file mode 100644
index ff5d426..0000000
--- a/backend/docs/HYBRID_ROUTER.md
+++ /dev/null
@@ -1,269 +0,0 @@
-# 混合 Agent 路由架构文档
-
-## 架构概述
-
-```
-                    +-----------------+
-                    |   用户输入      |
-                    +--------+--------+
-                             |
-                             v
-                    +-----------------+
-                    |  意图分类器     |
-                    +--------+--------+
-                             |
-            +----------------+-----------------+
-            |                |                 |
-            v                v                 v
-       +---------+     +---------+      +----------------+
-       | 知识查询 |     | 工具操作 |      | 复杂任务       |
-       +----+----+     +----+----+      +-------+--------+
-            |                |                  |
-            v                v                  v
-      +-----------+   +------------+    +---------------+
-      |  快速 RAG |   |  快速工具 |    |  React 循环  |
-      +-----+-----+   +-----+------+    +-------+-------+
-            |                |                 |
-            +----------------+-----------------+
-                             |
-                             v
-                      +-------------+
-                      |  最终答案  |
-                      +-------------+
-```
-
-## 意图类型
-
-| 类型 | 说明 | 示例 | 路径 |
-|------|------|------|------|
-| `knowledge` | 知识查询 | "公司报销政策是什么？" | 快速 RAG |
-| `realtime` | 实时数据查询 | "查一下订单 123 的状态" | 快速工具 |
-| `action` | 执行操作 | "帮我申请退款" | 快速工具 |
-| `chitchat` | 闲聊 | "你好" | 直接回答 |
-| `clarify` | 需要澄清 | "我想查点东西..." | 澄清反问 |
-| `mixed` | 复杂任务 | "查订单+退款政策+写邮件" | React 循环 |
-
-## 路由规则
-
-```
-置信度 < 0.6 → React 循环（安全模式）
-
-置信度 >= 0.6
-    ├─ knowledge → 快速 RAG
-    ├─ realtime → 快速工具
-    ├─ action → 快速工具
-    ├─ chitchat → 直接回答
-    ├─ clarify → 澄清反问
-    └─ mixed → React 循环
-```
-
-## 文件结构
-
-```
-backend/app/agent/
-├── intent_classifier.py    # 意图分类器
-├── hybrid_router.py        # 混合路由实现
-└── service.py              # Agent 服务（已更新）
-```
-
-## SSE 事件
-
-### 新增事件
-
-| 事件 | 说明 | 数据结构 |
-|------|------|---------|
-| `intent_classified` | 意图分类完成 | `{type: "intent_classified", intent: string, confidence: float, reasoning: string}` |
-| `path_decision` | 路径决策完成 | `{type: "path_decision", path: "fast|react_loop", intent: string}` |
-
-### 完整事件流
-
-```
-用户消息
-  ↓
-intent_classified  (新!)
-  ↓
-path_decision      (新!)
-  ↓
-[node_start] llm_call
-  ↓
-[reasoning] 思考过程
-  ↓
-[tool_call_start] 工具调用开始
-  ↓
-[tool_call_end] 工具调用结束
-  ↓
-[llm_token] 最终回答
-  ↓
-[human_review_request] 人工审核（如有）
-  ↓
-[done]
-```
-
-## 使用示例
-
-### 快速路径示例
-
-```python
-# 输入
-用户: "你好"
-
-# 响应
-intent_classified: {
-  intent: "chitchat",
-  confidence: 0.95,
-  reasoning: "简单寒暄"
-}
-path_decision: {
-  path: "fast",
-  intent: "chitchat"
-}
-llm_token: "你"...
-llm_token: "好"...
-```
-
-### React 循环示例
-
-```python
-# 输入
-用户: "帮我查订单，然后生成邮件"
-
-# 响应
-intent_classified: {
-  intent: "mixed",
-  confidence: 0.92,
-  reasoning: "需要查询订单、生成邮件，多步骤任务"
-}
-path_decision: {
-  path: "react_loop",
-  intent: "mixed"
-}
-node_start: llm_call
-reasoning: "我需要先查询订单..."
-tool_call_start: get_order
-tool_call_end: 结果
-...
-```
-
-## 快速开始
-
-### 1. 初始化意图分类器
-
-```python
-from app.agent.intent_classifier import get_intent_classifier
-
-classifier = get_intent_classifier()
-
-# 分类意图
-result = await classifier.classify("公司报销政策是什么？")
-print(f"意图: {result.intent_type}")
-print(f"置信度: {result.confidence}")
-print(f"推理: {result.reasoning}")
-```
-
-### 2. 使用混合路由
-
-```python
-from app.agent.hybrid_router import HybridRouter
-from app.agent.intent_classifier import get_intent_classifier
-
-classifier = get_intent_classifier()
-router = HybridRouter(
-    intent_classifier=classifier,
-    rag_pipeline=None,  # 传入 RAG
-    tool_registry={},   # 传入工具
-    react_graph=None    # 传入 Graph
-)
-
-# 路由决策
-decision = await router.route("你好")
-print(f"决策: {decision.action}")
-
-# 执行
-result = await router.execute(decision, "你好", "thread_123")
-```
-
-## 配置选项
-
-### 置信度阈值
-
-```python
-# 修改 backend/app/agent/hybrid_router.py 中的 _make_decision 方法
-if confidence < 0.6:  # 修改这个值
-    # 走 React 循环
-```
-
-### 添加新的意图类型
-
-1. 在 `IntentType` 枚举中添加新类型
-2. 在 `routing_map` 中添加路由规则
-3. 在 `_build_examples` 中添加示例
-
-## 核心优势
-
-1. **性能优化** - 简单问题走快速路径
-2. **用户体验** - 响应速度快
-3. **灵活扩展** - 易于添加新意图
-4. **安全可靠** - 低置信度走完整循环
-5. **可观测性** - 前端显示路径决策
-
-## 测试建议
-
-### 测试用例
-
-```python
-test_cases = [
-    # 知识查询
-    ("公司报销政策是什么？", "knowledge"),
-
-    # 实时查询
-    ("查一下订单 123 的状态", "realtime"),
-
-    # 执行操作
-    ("帮我申请退款", "action"),
-
-    # 闲聊
-    ("你好", "chitchat"),
-
-    # 澄清
-    ("我想查点东西...", "clarify"),
-
-    # 复杂任务
-    ("查订单+退款政策+写邮件", "mixed"),
-]
-
-for query, expected_intent in test_cases:
-    result = await classifier.classify(query)
-    print(f"{query} → {result.intent_type}")
-```
-
-## 扩展指南
-
-### 添加新的快速路径
-
-```python
-# 在 HybridRouter 中添加
-async def _execute_custom_path(self, user_input: str) -> str:
-    # 自定义路径逻辑
-    pass
-```
-
-### 添加缓存层
-
-```python
-# 在 IntentClassifier 中添加缓存
-from functools import lru_cache
-
-class IntentClassifier:
-    @lru_cache(maxsize=1000)
-    async def classify_cached(self, user_input: str):
-        # 缓存分类结果
-        pass
-```
-
-## 注意事项
-
-1. 确保降级策略合理
-2. 监控意图分类准确率
-3. 根据实际情况调整置信度阈值
-4. 前端需要处理新的 SSE 事件
-5. 保持向后兼容
\ No newline at end of file