一、RAG的局限与高级RAG基础RAG检索增强生成存在明显短板检索精度低、缺乏多跳推理、无法处理复杂查询。高级RAG通过查询改写、重排序、知识图谱增强等技术将RAG从简单检索提升到深度问答。LlamaIndex是构建高级RAG系统的首选框架提供丰富的索引结构和检索策略。回到顶部二、LlamaIndex核心架构核心组件 - Document/Node文档与分片 - Index索引向量/关键词/知识图谱 - Retriever检索器 - ResponseSynthesizer响应合成器 - QueryEngine查询引擎 - Tool/Agent工具与智能体回到顶部三、环境搭建pip install llama-index llama-index-llms-openai pip install llama-index-embeddings-openai pip install llama-index-graph-stores-nebula import os os.environ[OPENAI_API_KEY] your-api-key回到顶部四、基础RAG vs 高级RAG对比from llama_index.core import VectorStoreIndex, SimpleDirectoryReader # 基础RAG documents SimpleDirectoryReader(data).load_data() index VectorStoreIndex.from_documents(documents) query_engine index.as_query_engine() response query_engine.query(什么是微服务架构) print(response) # 高级RAG带检索后处理 from llama_index.core.postprocessor import SentenceTransformerRerank rerank SentenceTransformerRerank(top_n3, modelcross-encoder/ms-marco-MiniLM-L-2-v2) query_engine index.as_query_engine( similarity_top_k10, # 先检索10个 node_postprocessors[rerank] # 再重排取前3 ) response query_engine.query(微服务和单体架构的核心区别是什么) print(response)回到顶部五、查询改写HyDE技术from llama_index.core.indices.query.query_transform import HyDEQueryTransform from llama_index.core.query_engine import TransformQueryEngine # HyDE先让LLM生成假设性文档再用假设文档做检索 hyde HyDEQueryTransform(include_originalTrue) query_engine index.as_query_engine() hyde_query_engine TransformQueryEngine(query_engine, hyde) # 对比效果 question 如何设计高并发系统 normal query_engine.query(question) hyde_result hyde_query_engine.query(question) print(普通RAG:, normal) print(HyDE RAG:, hyde_result)回到顶部六、多跳推理子问题分解from llama_index.core.query_engine import SubQuestionQueryEngine from llama_index.core.tools import QueryEngineTool, ToolMetadata # 为不同文档集创建独立索引 sql_index VectorStoreIndex.from_documents(sql_docs) java_index VectorStoreIndex.from_documents(java_docs) sql_tool QueryEngineTool( query_enginesql_index.as_query_engine(), metadataToolMetadata(namesql_docs, descriptionSQL优化相关文档) ) java_tool QueryEngineTool( query_enginejava_index.as_query_engine(), metadataToolMetadata(namejava_docs, descriptionJava性能优化文档) ) # 子问题分解引擎 sub_engine SubQuestionQueryEngine.from_defaults(query_engine_tools[sql_tool, java_tool]) # 复杂查询会自动分解为子查询 response sub_engine.query( 如何优化Java应用中的数据库查询性能需要同时考虑Java层面和SQL层面 ) print(response)回到顶部七、知识图谱RAGfrom llama_index.core import KnowledgeGraphIndex from llama_index.core.graph_stores import SimpleGraphStore # 构建知识图谱索引 graph_store SimpleGraphStore() kg_index KnowledgeGraphIndex.from_documents( documents, max_triplets_per_chunk5, graph_storegraph_store, include_embeddingsTrue ) # 知识图谱查询支持多跳关系推理 kg_query_engine kg_index.as_query_engine( include_textTrue, response_modetree_summarize, embedding_modehybrid, similarity_top_k5 ) response kg_query_engine.query( Spring Boot自动配置的完整流程是什么涉及哪些核心注解 ) print(response)回到顶部八、混合检索向量关键词知识图谱from llama_index.core.retrievers import QueryFusionRetriever # 向量检索器 vector_retriever index.as_retriever(similarity_top_k5) # 关键词检索器 keyword_retriever index.as_retriever( similarity_top_k5, retriever_modekeyword ) # 融合检索器Reciprocal Rank Fusion fusion_retriever QueryFusionRetriever( retrievers[vector_retriever, keyword_retriever], num_queries3, # 生成3个改写查询 similarity_top_k10, modereciprocal_rerank ) nodes fusion_retriever.retrieve(分布式事务如何保证一致性) for node in nodes: print(fScore: {node.score:.4f} | {node.text[:80]})回到顶部九、与Spring Boot集成Service public class AdvancedRAGService { private final RestTemplate restTemplate new RestTemplate(); Value(${llama-index.service.url}) private String llamaServiceUrl; public String query(String question, String mode) { MapString, Object body Map.of( question, question, mode, mode, // basic / hyde / sub_question / kg top_k, 5 ); HttpHeaders headers new HttpHeaders(); headers.setContentType(MediaType.APPLICATION_JSON); HttpEntityMapString, Object entity new HttpEntity(body, headers); ResponseEntity resp restTemplate.exchange( llamaServiceUrl /query, HttpMethod.POST, entity, Map.class); return (String) resp.getBody().get(answer); } }回到顶部十、评估与优化from llama_index.core.evaluation import FaithfulnessEvaluator, RelevancyEvaluator # 评估忠实度回答是否基于检索内容 faith_eval FaithfulnessEvaluator(llmllm) # 评估相关性回答是否切题 rel_eval RelevancyEvaluator(llmllm) # 批量评估 questions [什么是RAG?, 向量数据库如何选择?, 知识图谱如何构建?] for q in questions: response query_engine.query(q) faith_result faith_eval.evaluate_response(queryq, responseresponse) rel_result rel_eval.evaluate_response(queryq, responseresponse) print(fQ: {q}) print(f 忠实度: {faith_result.passing}) print(f 相关性: {rel_result.passing})回到顶部十一、最佳实践分块策略根据文档类型选择合适的chunk_size256-1024混合检索向量关键词融合效果优于单一检索重排序检索top_k大rerank后取小精度更高