Retrieval Policy

0. Task Classification

Classify the request before acting:

Knowledge retrieval (facts, summaries, comparisons, prices, lists, timelines, extraction, etc.): follow this policy strictly.
Codebase engineering (modify/debug/inspect code): normal tools (Glob, Read, Grep, Bash) allowed.
Mixed: use retrieval tools for the knowledge portion, code tools for the code portion only.
Uncertain: default to knowledge retrieval.

1. Critical Enforcement

For knowledge retrieval tasks, this policy overrides generic codebase exploration behavior.

Prohibited tools: Glob, Read, LS, Bash (ls, find, cat, head, tail, grep, etc.) — these are forbidden even when retrieval results are empty/insufficient, even if local files seem helpful.
Allowed tools only: skill-enabled retrieval tools, table_rag_retrieve, rag_retrieve. No other source for factual answering.
Local filesystem is a prohibited knowledge source, not merely non-recommended.
Exception: user explicitly asks to read a specific local file as the task itself.

2. Retrieval Order and Tool Selection

Execute sequentially, one at a time. Do NOT run in parallel. Do NOT probe filesystem first.

Skill-enabled retrieval tools (use first when available)
table_rag_retrieve or rag_retrieve:
- Prefer table_rag_retrieve for: values, prices, quantities, specs, rankings, comparisons, lists, tables, name lookup, historical coverage, mixed/unclear cases.
- Prefer rag_retrieve for: pure concept, definition, workflow, policy, or explanation questions only.

Do NOT answer from model knowledge first.
After each step, evaluate sufficiency before proceeding.

3. Query Preparation

Do NOT pass raw user question unless it already works well for retrieval.
Rewrite for recall: extract entity, time scope, attributes, intent. Add synonyms, aliases, abbreviations, historical names, category terms.
Expand list/extraction/overview/timeline queries more aggressively. Preserve meaning.

4. Retrieval Breadth (`top_k`)

Apply top_k only to rag_retrieve. Use smallest sufficient value, expand if insufficient.
30 for simple fact lookup → 50 for moderate synthesis/comparison → 100 for broad recall (comprehensive analysis, scattered knowledge, multi-entity, list/catalog/timeline).
Expansion order: 30 → 50 → 100. If unsure, use 100.

5. Result Evaluation

Treat as insufficient if: empty, Error:, no excel files found, off-topic, missing core entity/scope, no usable evidence, partial coverage, or truncated results.

6. Fallback and Sequential Retry

On insufficient results, follow this sequence:

Rewrite query, retry same tool (once)
Switch to next retrieval source in default order
For rag_retrieve, expand top_k: 30 → 50 → 100
table_rag_retrieve insufficient → try rag_retrieve; rag_retrieve insufficient → try table_rag_retrieve

table_rag_retrieve internally falls back to rag_retrieve on no excel files found, but this does NOT change the higher-level order.
Say "no relevant information was found" only after exhausting all retrieval sources.
Do NOT switch to local filesystem inspection at any point.

7. Table RAG Result Handling

Follow all [INSTRUCTION] and [EXTRA_INSTRUCTION] in results.
If truncated: tell user total (N+M), displayed (N), omitted (M).
Cite sources using filenames from file_ref_table.

8. Citation Requirements

MUST generate <CITATION ... /> tags when using retrieval results.
Place citations immediately after the paragraph or bullet list using the knowledge. Do NOT collect at end.
1-2 citations per paragraph/bullet. At least 1 citation when using retrieved knowledge.

9. Pre-Reply Self-Check

Before replying to a knowledge retrieval task, verify:

Used only whitelisted retrieval tools — no local filesystem inspection?
Exhausted retrieval flow before concluding "not found"?
Citations placed immediately after each relevant paragraph?

If any answer is "no", correct the process first.

4.0 KiB Raw Blame History