refactor!: 完全异步化 RAG 系统,移除 LangChain ParentDocumentRetriever 依赖
Some checks failed
构建并部署 AI Agent 服务 / deploy (push) Failing after 6m34s

- 重写 rag_core/vector_store.py:完全异步实现 aadd_documents、asimilarity_search
- 重写 app/rag/retriever.py:异步混合检索,移除同步兼容代码
- 修改 rag_indexer/index_builder.py:全链路异步调用
- 删除 rag_core/retriever_factory.py:不再使用 LangChain ParentDocumentRetriever
- 清理冗余导入和代码:移除 model_services 兼容、不需要的异常导入
- 更新 rag_indexer/README.md:反映新架构

核心改进:
- 完全异步化:索引构建和检索全链路 async/await
- 自定义实现:不再依赖 LangChain 的 ParentDocumentRetriever
- 双向量支持:子文档同时存储 dense + sparse 向量到 Qdrant
- 架构清晰:rag_core 公共组件、rag_indexer 索引、app/rag 检索
This commit is contained in:
2026-05-04 14:33:12 +08:00
parent 4209386c77
commit a07e398739
14 changed files with 651 additions and 592 deletions

View File

@@ -1,14 +1,13 @@
"""
RAG Core - 公共 RAG 组件包
提供嵌入模型、向量存储和文档存储的公共功能,被 rag_indexer 和 app/rag 共用。
"""
from .embedders import LlamaCppEmbedder
from .vector_store import QdrantVectorStore
from .embedders import get_embeddings, get_embedding_dimension
from .vector_store import QdrantHybridStore
from .sparse_embedder import BM25SparseEmbedder, get_sparse_embedder
from .store import PostgresDocStore, create_docstore
from .retriever_factory import create_parent_retriever
from .client import create_qdrant_client, create_async_qdrant_client
from .config import (
QDRANT_URL,
QDRANT_API_KEY,
@@ -20,8 +19,9 @@ from .config import (
__all__ = [
"LlamaCppEmbedder",
"QdrantVectorStore",
"get_embeddings",
"get_embedding_dimension",
"QdrantHybridStore",
"BM25SparseEmbedder",
"get_sparse_embedder",
"QDRANT_URL",
@@ -32,5 +32,6 @@ __all__ = [
"DOCSTORE_URI",
"PostgresDocStore",
"create_docstore",
"create_parent_retriever",
"create_qdrant_client",
"create_async_qdrant_client",
]