From 5138cd0abf3ea244d990afb918bf2935a5eddce3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E6=9C=B1=E6=BD=AE?= Date: Mon, 9 Feb 2026 12:21:28 +0800 Subject: [PATCH 1/3] add page number --- prompt/system_prompt.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/prompt/system_prompt.md b/prompt/system_prompt.md index 336a265..a8059cd 100644 --- a/prompt/system_prompt.md +++ b/prompt/system_prompt.md @@ -12,6 +12,18 @@ **Language Requirement**: All user interactions and result outputs must be in [{language}]. **Image Handling**: The content returned by the `rag_retrieve` tool may include images. Each image is exclusively associated with its nearest text or sentence. If multiple consecutive images appear near a text area, all of them are related to the nearest text content. Do not ignore these images, and always maintain their correspondence with the nearest text. Each sentence or key point in the response should be accompanied by relevant images (when they meet the established association criteria). Avoid placing all images at the end of the response. +**Citation Requirement (RAG Only)**: When answering questions based on `rag_retrieve` tool results, you MUST add XML citation tags for factual claims derived from the knowledge base. + +**MANDATORY FORMAT**: `The cited factual claim ` + +**Citation Rules**: +- The citation tag MUST be placed immediately after the factual claim or paragraph +- The `file` attribute MUST use the exact `File ID` from `rag_retrieve` document +- The `page` attribute MUST use the exact `Page Number` from `rag_retrieve` document +- If multiple sources support the same claim, include separate citation tags for each source +- Example: `According to the policy, returns are accepted within 30 days .` +- This requirement ONLY applies when using `rag_retrieve` results to answer questions + ### Current Working Directory The filesystem backend is currently operating in: `{agent_dir_path}` From becd36da9d4089299d08e36a42fba225f195fdf5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E6=9C=B1=E6=BD=AE?= Date: Fri, 27 Mar 2026 12:12:50 +0800 Subject: [PATCH 2/3] =?UTF-8?q?=E5=A2=9E=E5=8A=A0=E9=AB=98=E4=BA=AE?= =?UTF-8?q?=E5=8A=9F=E8=83=BD?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- prompt/system_prompt.md | 87 ++++++++++++++++++++++++++++++++++++----- 1 file changed, 78 insertions(+), 9 deletions(-) diff --git a/prompt/system_prompt.md b/prompt/system_prompt.md index 4b5884d..8162c7f 100644 --- a/prompt/system_prompt.md +++ b/prompt/system_prompt.md @@ -1,16 +1,85 @@ {extra_prompt} -**Citation Requirement (RAG Only)**: When answering questions based on `rag_retrieve` tool results, you MUST add XML citation tags for factual claims derived from the knowledge base. +## CITATION REQUIREMENTS -**MANDATORY FORMAT**: `The cited factual claim ` +### A. Regular Document Knowledge +When answering questions based on `rag_retrieve` tool results, you MUST add XML citation tags for factual claims derived from the knowledge base. + +**Format:** `` +- Use `file` attribute with the UUID from document markers +- Use `filename` attribute with the actual filename from document markers +- Use `page` attribute (singular) with the page number +- `page` MUST be 0-based and must match the `pages:` values shown in the learned knowledge context + +### B. Table Knowledge (TABLE_KNOWLEDGE BEGIN/END) +When answering questions based on `table_rag_retrieve` tool results, you MUST add XML citation tags for factual claims derived from the knowledge base. + +**!!! CRITICAL RULE: NEVER put on same line as bullet/row !!!** +**Citations MUST be on separate lines AFTER the complete list/table.** +**NEVER include the `__src` column in your response - it is internal metadata only.** + +Format: `` +- Parse `__src`: `F1S2R5` = file_ref F1, sheet 2, row 5 +- Look up file_id in `file_ref_table` +- Combine same-sheet rows into one citation: `rows=[2, 4, 6]` +- **MANDATORY: Create SEPARATE citation for EACH (file, sheet) combination** + +✅ CORRECT (data from sheet 1 AND sheet 2 = 2 citations): +1. Liam - male +2. Noah - male +3. Ethan - male +4. Mason - male +5. William - male + + + +❌ WRONG (citation on same line): +1. Liam - male +❌ WRONG (missing sheet 2 citation): +...only 1 citation when data comes from 2 sheets... + + +### C. Web Page Knowledge + +**Format:** `` +- Use `url` attribute with the web page URL from the source metadata +- Do not use `file`, `filename`, or `page` attributes for web sources +- Web citations should appear immediately after the content they reference + +**!!! CRITICAL PLACEMENT RULES !!!** +1. **Citations MUST appear IMMEDIATELY AFTER the paragraph or bullet list** that uses the knowledge +2. **NEVER collect all citations and place them at the end of your response** +3. **Limit to 1-2 citations per paragraph/bullet list** - combine related facts under one citation +4. **If your answer uses learned knowledge, you MUST generate at least 1 `` in the response** +5. **If any paragraph or bullet list is grounded in a web source, prefer a web citation with `url` over a file citation** + +✅ CORRECT (citation immediately after paragraph): +氣候變遷的影響包括世界平均氣溫持續上升,2024年為有紀錄以來最熱的一年。 + +具體影響包括: +- 極端高溫事件頻率增加 +- 海洋熱浪 +- 暴雨強度和頻率增強 + +✅ CORRECT (web citation): +MIMURE位于东京港区高轮,是一家综合性商业设施。 + +❌ WRONG (all citations at the end): +氣候變遷的影響包括...(long response)... + + + + +(13 citations dumped at the end) + +❌ WRONG (web citation with file attributes): +MIMURE位于东京港区高轮,是一家综合性商业设施。 + +❌ WRONG (too many citations for short content): +2024年全球氣溫上升。 +世界各地發生災害。 +沙烏地阿拉伯熱浪。 -**Citation Rules**: -- The citation tag MUST be placed immediately after the factual claim or paragraph -- The `file` attribute MUST use the exact `File ID` from `rag_retrieve` document -- The `page` attribute MUST use the exact `Page Number` from `rag_retrieve` document -- If multiple sources support the same claim, include separate citation tags for each source -- Example: `According to the policy, returns are accepted within 30 days .` -- This requirement ONLY applies when using `rag_retrieve` results to answer questions ### Current Working Directory From 6300eea61da452267f206d6aa79fd5cdd19442cf Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E6=9C=B1=E6=BD=AE?= Date: Fri, 27 Mar 2026 12:30:20 +0800 Subject: [PATCH 3/3] =?UTF-8?q?refactor:=20=E5=B0=86=20citation=20?= =?UTF-8?q?=E8=AF=A6=E7=BB=86=E6=8F=90=E7=A4=BA=E8=AF=8D=E4=BB=8E=20system?= =?UTF-8?q?=20prompt=20=E7=A7=BB=E8=87=B3=20RAG=20tool=20result=20?= =?UTF-8?q?=E6=8C=89=E9=9C=80=E6=B3=A8=E5=85=A5?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit system prompt 中的 citation 规则(document/table/web 三类约80行)占用大量 token, 现将详细格式要求移到 rag_retrieve_server.py 中作为工具返回前缀按需注入, system prompt 仅保留精简版通用 placement rules。 Co-Authored-By: Claude Opus 4.6 (1M context) --- mcp/rag_retrieve_server.py | 49 +++++++++++++++++++++-- prompt/system_prompt.md | 82 +++----------------------------------- 2 files changed, 52 insertions(+), 79 deletions(-) diff --git a/mcp/rag_retrieve_server.py b/mcp/rag_retrieve_server.py index e44d50d..80a659f 100644 --- a/mcp/rag_retrieve_server.py +++ b/mcp/rag_retrieve_server.py @@ -29,6 +29,49 @@ from mcp_common import ( BACKEND_HOST = os.getenv("BACKEND_HOST", "https://api-dev.gptbase.ai") MASTERKEY = os.getenv("MASTERKEY", "master") +# Citation instruction prefixes injected into tool results +DOCUMENT_CITATION_INSTRUCTIONS = """ +When using the retrieved knowledge below, you MUST add XML citation tags for factual claims. + +## Document Knowledge +Format: `` +- Use `file` attribute with the UUID from document markers +- Use `filename` attribute with the actual filename from document markers +- Use `page` attribute (singular) with the page number +- `page` MUST be 0-based and must match the `pages:` values shown in the learned knowledge context + +## Web Page Knowledge +Format: `` +- Use `url` attribute with the web page URL from the source metadata +- Do not use `file`, `filename`, or `page` attributes for web sources +- If content is grounded in a web source, prefer a web citation with `url` over a file citation + +## Placement Rules +- Citations MUST appear IMMEDIATELY AFTER the paragraph or bullet list that uses the knowledge +- NEVER collect all citations and place them at the end of your response +- Limit to 1-2 citations per paragraph/bullet list +- If your answer uses learned knowledge, you MUST generate at least 1 `` in the response + + +""" + +TABLE_CITATION_INSTRUCTIONS = """ +When using the retrieved table knowledge below, you MUST add XML citation tags for factual claims. + +Format: `` +- Parse `__src`: `F1S2R5` = file_ref F1, sheet 2, row 5 +- Look up file_id in `file_ref_table` +- Combine same-sheet rows into one citation: `rows=[2, 4, 6]` +- MANDATORY: Create SEPARATE citation for EACH (file, sheet) combination +- NEVER put on the same line as a bullet point or table row +- Citations MUST be on separate lines AFTER the complete list/table +- NEVER include the `__src` column in your response - it is internal metadata only +- Citations MUST appear IMMEDIATELY AFTER the paragraph or bullet list that uses the knowledge +- NEVER collect all citations and place them at the end of your response + + +""" + def rag_retrieve(query: str, top_k: int = 100) -> Dict[str, Any]: """调用RAG检索API""" try: @@ -94,7 +137,7 @@ def rag_retrieve(query: str, top_k: int = 100) -> Dict[str, Any]: "content": [ { "type": "text", - "text": markdown_content + "text": DOCUMENT_CITATION_INSTRUCTIONS + markdown_content } ] } @@ -107,7 +150,7 @@ def rag_retrieve(query: str, top_k: int = 100) -> Dict[str, Any]: } ] } - + except requests.exceptions.RequestException as e: return { "content": [ @@ -179,7 +222,7 @@ def table_rag_retrieve(query: str) -> Dict[str, Any]: "content": [ { "type": "text", - "text": markdown_content + "text": TABLE_CITATION_INSTRUCTIONS + markdown_content } ] } diff --git a/prompt/system_prompt.md b/prompt/system_prompt.md index 8162c7f..c978779 100644 --- a/prompt/system_prompt.md +++ b/prompt/system_prompt.md @@ -2,83 +2,13 @@ ## CITATION REQUIREMENTS -### A. Regular Document Knowledge -When answering questions based on `rag_retrieve` tool results, you MUST add XML citation tags for factual claims derived from the knowledge base. +When your answer uses learned knowledge, you MUST generate `` tags. Follow the specific citation format instructions returned by each tool (`rag_retrieve`, `table_rag_retrieve`). -**Format:** `` -- Use `file` attribute with the UUID from document markers -- Use `filename` attribute with the actual filename from document markers -- Use `page` attribute (singular) with the page number -- `page` MUST be 0-based and must match the `pages:` values shown in the learned knowledge context - -### B. Table Knowledge (TABLE_KNOWLEDGE BEGIN/END) -When answering questions based on `table_rag_retrieve` tool results, you MUST add XML citation tags for factual claims derived from the knowledge base. - -**!!! CRITICAL RULE: NEVER put on same line as bullet/row !!!** -**Citations MUST be on separate lines AFTER the complete list/table.** -**NEVER include the `__src` column in your response - it is internal metadata only.** - -Format: `` -- Parse `__src`: `F1S2R5` = file_ref F1, sheet 2, row 5 -- Look up file_id in `file_ref_table` -- Combine same-sheet rows into one citation: `rows=[2, 4, 6]` -- **MANDATORY: Create SEPARATE citation for EACH (file, sheet) combination** - -✅ CORRECT (data from sheet 1 AND sheet 2 = 2 citations): -1. Liam - male -2. Noah - male -3. Ethan - male -4. Mason - male -5. William - male - - - -❌ WRONG (citation on same line): -1. Liam - male -❌ WRONG (missing sheet 2 citation): -...only 1 citation when data comes from 2 sheets... - - -### C. Web Page Knowledge - -**Format:** `` -- Use `url` attribute with the web page URL from the source metadata -- Do not use `file`, `filename`, or `page` attributes for web sources -- Web citations should appear immediately after the content they reference - -**!!! CRITICAL PLACEMENT RULES !!!** -1. **Citations MUST appear IMMEDIATELY AFTER the paragraph or bullet list** that uses the knowledge -2. **NEVER collect all citations and place them at the end of your response** -3. **Limit to 1-2 citations per paragraph/bullet list** - combine related facts under one citation -4. **If your answer uses learned knowledge, you MUST generate at least 1 `` in the response** -5. **If any paragraph or bullet list is grounded in a web source, prefer a web citation with `url` over a file citation** - -✅ CORRECT (citation immediately after paragraph): -氣候變遷的影響包括世界平均氣溫持續上升,2024年為有紀錄以來最熱的一年。 - -具體影響包括: -- 極端高溫事件頻率增加 -- 海洋熱浪 -- 暴雨強度和頻率增強 - -✅ CORRECT (web citation): -MIMURE位于东京港区高轮,是一家综合性商业设施。 - -❌ WRONG (all citations at the end): -氣候變遷的影響包括...(long response)... - - - - -(13 citations dumped at the end) - -❌ WRONG (web citation with file attributes): -MIMURE位于东京港区高轮,是一家综合性商业设施。 - -❌ WRONG (too many citations for short content): -2024年全球氣溫上升。 -世界各地發生災害。 -沙烏地阿拉伯熱浪。 +### General Placement Rules +1. Citations MUST appear IMMEDIATELY AFTER the paragraph or bullet list that uses the knowledge +2. NEVER collect all citations and place them at the end of your response +3. Limit to 1-2 citations per paragraph/bullet list - combine related facts under one citation +4. If your answer uses learned knowledge, you MUST generate at least 1 `` in the response ### Current Working Directory