Files
ailine/rag_indexer/__init__.py
root a07e398739
Some checks failed
构建并部署 AI Agent 服务 / deploy (push) Failing after 6m34s
refactor!: 完全异步化 RAG 系统,移除 LangChain ParentDocumentRetriever 依赖
- 重写 rag_core/vector_store.py:完全异步实现 aadd_documents、asimilarity_search
- 重写 app/rag/retriever.py:异步混合检索,移除同步兼容代码
- 修改 rag_indexer/index_builder.py:全链路异步调用
- 删除 rag_core/retriever_factory.py:不再使用 LangChain ParentDocumentRetriever
- 清理冗余导入和代码:移除 model_services 兼容、不需要的异常导入
- 更新 rag_indexer/README.md:反映新架构

核心改进:
- 完全异步化:索引构建和检索全链路 async/await
- 自定义实现:不再依赖 LangChain 的 ParentDocumentRetriever
- 双向量支持:子文档同时存储 dense + sparse 向量到 Qdrant
- 架构清晰:rag_core 公共组件、rag_indexer 索引、app/rag 检索
2026-05-04 14:33:12 +08:00

82 lines
1.8 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

"""
Offline RAG Indexer module.
提供完整的离线索引构建功能,包括:
- 文档加载PDF、Word、TXT 等)
- 文本切分(递归、语义、父子块)
- 向量嵌入(支持 llama.cpp
- 向量存储Qdrant
- 父文档存储PostgreSQL
示例用法:
>>> from rag_indexer import IndexBuilder, IndexBuilderConfig, SplitterType
>>>
>>> config = IndexBuilderConfig(
... collection_name="my_docs",
... splitter_type=SplitterType.PARENT_CHILD,
... )
>>> builder = IndexBuilder(config)
>>>
>>> # 或直接传参(向后兼容)
>>> builder = IndexBuilder(collection_name="my_docs")
>>>
>>> await builder.build_from_file("document.pdf")
"""
from .index_builder import IndexBuilder, IndexBuilderConfig, DocstoreConfig
from .loaders import DocumentLoader
from .splitters import SplitterType, get_splitter
from .config import (
QDRANT_URL,
QDRANT_API_KEY,
LLAMACPP_EMBEDDING_URL,
LLAMACPP_API_KEY,
DB_URI,
DOCSTORE_URI,
RAG_OCR_LANGUAGES,
RAG_DOC_LANGUAGES,
)
# 从 rag_core 重新导出常用组件
from backend.rag_core import (
get_embeddings,
get_embedding_dimension,
QdrantHybridStore,
PostgresDocStore,
create_docstore,
)
__version__ = "2.0.0"
__all__ = [
# 核心构建器与配置
"IndexBuilder",
"IndexBuilderConfig",
"DocstoreConfig",
# 加载器
"DocumentLoader",
# 切分相关
"SplitterType",
"get_splitter",
# 配置
"QDRANT_URL",
"QDRANT_API_KEY",
"LLAMACPP_EMBEDDING_URL",
"LLAMACPP_API_KEY",
"DB_URI",
"DOCSTORE_URI",
"RAG_OCR_LANGUAGES",
"RAG_DOC_LANGUAGES",
# 嵌入与向量存储
"get_embeddings",
"get_embedding_dimension",
"QdrantHybridStore",
# 文档存储
"PostgresDocStore",
"create_docstore",
]