qwen_agent/Retrieval_Policy.md at 2fa779e61e2eeb8e45c79f39c2d3877a7b9955b4

2026-04-16 20:09:02 +08:00

Retrieval Policy (Priority & Fallback)

If earlier context does not explicitly specify a knowledge retrieval priority, the default order is: skill-enabled knowledge retrieval tools > rag_retrieve / table_rag_retrieve > local filesystem retrieval (including datasets/ and any file browsing/search tools).
Follow this Retrieval Policy (Priority & Fallback) section for retrieval source selection, tool selection order, query rewrite, top_k, result handling, fallback, and citation requirements.
The local filesystem is the lowest-priority source. Do NOT start knowledge retrieval by browsing or searching files (for example with ls, glob, directory listing, or other filesystem tools) when the information may come from knowledge retrieval tools. Only use filesystem retrieval after higher-priority retrieval tools have been tried and are unavailable, insufficient, or clearly inapplicable.

When knowledge retrieval is needed and no higher-priority skill-enabled retrieval tool is specified or available, you MUST start with rag_retrieve or table_rag_retrieve based on the question type. Do NOT answer from model knowledge before trying the appropriate retrieval tool.
Use table_rag_retrieve first for values, prices, quantities, inventory, specifications, rankings, comparisons, summaries, extraction, lists, tables, person / project / product name lookup, historical coverage, mixed questions, or any unclear case.
Use rag_retrieve first only for clearly pure concept / definition / workflow / policy / explanation questions that do not need structured data.

Do NOT pass the user's raw question directly unless it already fits retrieval needs well.
Rewrite the query to improve recall: extract the core entity, time scope, attributes, and intent.
Add meaningful variants such as synonyms, aliases, abbreviations, related titles, historical names, and category terms.
Expand enumeration-style, historical, roster, timeline, overview, archive, extraction, and list-style queries more aggressively.
Preserve the original meaning and do not introduce unrelated topics. Use both the original query and rewritten variants whenever possible.

top_k applies to rag_retrieve. Use the smallest sufficient top_k and expand only when coverage is insufficient.
Use 30 for simple fact lookup about one specific thing.
Use 50 for moderate synthesis, comparison, summarization, or disambiguation.
Use 100 for broad-recall queries needing high coverage, such as comprehensive analysis, scattered knowledge, multiple entities or periods, list / catalog / timeline / roster / overview requests, or all items / historical succession / many records.
Raise top_k when query rewrite produces many useful keyword branches or when results are too few, repetitive, incomplete, sparse, or too narrow in coverage. Do not raise top_k just because the query is longer.
Expansion sequence: 30 -> 50 -> 100. If uncertain, prefer 100.

Treat the result as insufficient when it is empty, starts with Error:, says no excel files found, is off-topic, does not match the user's core entity / scope, or clearly contains no usable evidence.
Treat the result as insufficient when it only covers part of the user's request, or when the user asked for a complete list, historical coverage, comparison, or mixed data + explanation but the result is only partial or truncated.

If the first retrieval tool returns empty results, errors, clearly irrelevant content, or only partial coverage of the user's request, you MUST try the other retrieval tool before replying to the user.
If the table result is empty, continue with rag_retrieve before concluding that no relevant data exists.
You may say that no relevant information was found only after both rag_retrieve and table_rag_retrieve have been tried and still do not provide enough evidence to answer.

When processing table_rag_retrieve results, follow all instructions in [INSTRUCTION] and [EXTRA_INSTRUCTION] sections of the response.
If Query result hint indicates truncation (for example, Only the first N rows are included; the remaining M rows were omitted), you MUST explicitly tell the user the total matches (N+M), displayed count (N), and omitted count (M).
Cite data sources using file names from file_ref_table in the response.

When your answer uses learned knowledge from rag_retrieve or table_rag_retrieve, you MUST generate <CITATION ... /> tags.
Follow the specific citation format instructions returned by each tool.
Citations MUST appear IMMEDIATELY AFTER the paragraph or bullet list that uses the knowledge.
NEVER collect all citations and place them at the end of your response.
Limit to 1-2 citations per paragraph or bullet list, combining related facts under one citation when possible.
If your answer uses learned knowledge, you MUST generate at least 1 <CITATION ... /> in the response.