- 本地文件检索与 rag_retrieve / table_rag_retrieve 不是同一数据源 - 不能因为 RAG 查到了,就假设本地文件也被覆盖 - 也不能因为本地文件没查到,就推断 RAG 知识库里也没有 - 进入 fallback 时,仍需按顺序继续尝试本地文件检索
58 lines
5.2 KiB
Markdown
58 lines
5.2 KiB
Markdown
# Retrieval Policy
|
|
|
|
### 1. Retrieval Order and Tool Selection
|
|
- Follow this section for source choice, tool choice, query rewrite, `top_k`, fallback, result handling, and citations.
|
|
- Use this default retrieval order and execute it sequentially: skill-enabled knowledge retrieval tools > `rag_retrieve` / `table_rag_retrieve` > local filesystem retrieval.
|
|
- Important: local filesystem retrieval and `rag_retrieve` / `table_rag_retrieve` do NOT share the same underlying data source. They are not equivalent, and one source cannot be used to assume coverage in the other.
|
|
- Do NOT answer from model knowledge first.
|
|
- Do NOT skip directly to local filesystem retrieval when an earlier retrieval source may answer the question.
|
|
- When a suitable skill-enabled knowledge retrieval tool is available, use it first.
|
|
- If no suitable skill-enabled retrieval tool is available, or if its result is insufficient, continue with `rag_retrieve` or `table_rag_retrieve`.
|
|
- Use `table_rag_retrieve` first for values, prices, quantities, inventory, specifications, rankings, comparisons, summaries, extraction, lists, tables, name lookup, historical coverage, mixed questions, and unclear cases.
|
|
- Use `rag_retrieve` first only for clearly pure concept, definition, workflow, policy, or explanation questions without structured data needs.
|
|
- After each retrieval step, evaluate sufficiency before moving to the next source. Do NOT run these retrieval sources in parallel.
|
|
|
|
### 2. Query Preparation
|
|
- Do NOT pass the raw user question unless it already works well for retrieval.
|
|
- Rewrite for recall: extract entity, time scope, attributes, and intent.
|
|
- Add useful variants: synonyms, aliases, abbreviations, related titles, historical names, and category terms.
|
|
- Expand list-style, extraction, overview, historical, roster, timeline, and archive queries more aggressively.
|
|
- Preserve meaning. Do NOT introduce unrelated topics.
|
|
|
|
### 3. Retrieval Breadth (`top_k`)
|
|
- Apply `top_k` only to `rag_retrieve`. Use the smallest sufficient value, then expand only if coverage is insufficient.
|
|
- Use `30` for simple fact lookup.
|
|
- Use `50` for moderate synthesis, comparison, summarization, or disambiguation.
|
|
- Use `100` for broad recall, such as comprehensive analysis, scattered knowledge, multiple entities or periods, or list / catalog / timeline / roster / overview requests.
|
|
- Raise `top_k` when keyword branches are many or results are too few, repetitive, incomplete, sparse, or too narrow.
|
|
- Use this expansion order: `30 -> 50 -> 100`. If unsure, use `100`.
|
|
|
|
### 4. Result Evaluation
|
|
- Treat results as insufficient if they are empty, start with `Error:`, say `no excel files found`, are off-topic, miss the core entity or scope, or provide no usable evidence.
|
|
- Also treat results as insufficient when they cover only part of the request, or when full-list, historical, comparison, or mixed data + explanation requests return only partial or truncated coverage.
|
|
|
|
### 5. Fallback and Sequential Retry
|
|
- If the first retrieval result is insufficient, call the next retrieval source in the default order before replying.
|
|
- Important: local filesystem retrieval and `rag_retrieve` / `table_rag_retrieve` do NOT share the same underlying data source. They are not equivalent, and results from one cannot be used to assume coverage in the other.
|
|
- Even if `rag_retrieve` or `table_rag_retrieve` returns relevant results, local filesystem retrieval may still contain different or additional information; likewise, local files may contain information absent from the RAG knowledge base.
|
|
- Therefore, when fallback is required, do NOT skip local filesystem retrieval on the assumption that RAG already covers the same documents, and do NOT treat a local-file miss as evidence that the RAG knowledge base also lacks the information.
|
|
- If the first RAG tool is insufficient, call the other RAG tool next before moving to local filesystem retrieval.
|
|
- If `table_rag_retrieve` is insufficient or empty, continue with `rag_retrieve`.
|
|
- If `rag_retrieve` is insufficient or empty, continue with `table_rag_retrieve`.
|
|
- If both `rag_retrieve` and `table_rag_retrieve` are insufficient, continue with local filesystem retrieval.
|
|
- Say no relevant information was found only after all applicable skill-enabled retrieval tools, both `rag_retrieve` and `table_rag_retrieve`, and local filesystem retrieval have been tried and still do not provide enough evidence.
|
|
- Do NOT reply that no relevant information was found before the final local filesystem fallback has also been tried.
|
|
|
|
### 6. Table RAG Result Handling
|
|
- Follow all `[INSTRUCTION]` and `[EXTRA_INSTRUCTION]` content in `table_rag_retrieve` results.
|
|
- If results are truncated, explicitly tell the user total matches (`N+M`), displayed count (`N`), and omitted count (`M`).
|
|
- Cite data sources using filenames from `file_ref_table`.
|
|
|
|
### 7. Citation Requirements for Retrieved Knowledge
|
|
- When using knowledge from `rag_retrieve` or `table_rag_retrieve`, you MUST generate `<CITATION ... />` tags.
|
|
- Follow the citation format returned by each tool.
|
|
- Place citations immediately after the paragraph or bullet list that uses the knowledge.
|
|
- Do NOT collect citations at the end.
|
|
- Use 1-2 citations per paragraph or bullet list when possible.
|
|
- If learned knowledge is used, include at least 1 `<CITATION ... />`.
|