Compare commits

...

70 Commits

Author SHA1 Message Date
朱潮
2911c67771 修改skill目录 2026-04-18 23:39:47 +08:00
朱潮
1535014c3e 修改skill目录 2026-04-18 23:37:26 +08:00
朱潮
0adfbad600 修改skill目录 2026-04-18 23:36:32 +08:00
朱潮
2a661a3a19 Merge branch 'developing' into dev 2026-04-18 23:20:20 +08:00
朱潮
db783a7c3d 修改skill目录 2026-04-18 23:19:46 +08:00
朱潮
2fa779e61e disable for autoload 2026-04-18 23:08:28 +08:00
朱潮
29fe5ac582 Merge branch 'feature/rag_retrive_top_k' into dev 2026-04-17 19:54:26 +08:00
朱潮
60f2b32593 优化retrieval-policy.md 2026-04-17 19:53:51 +08:00
朱潮
4470599cb7 Merge branch 'feature/rag_retrive_top_k' into dev 2026-04-17 19:52:51 +08:00
朱潮
e905738bfb 优化retrieval-policy.md 2026-04-17 19:52:30 +08:00
朱潮
6be64307a5 修改 cicd的webhook 2026-04-17 19:14:01 +08:00
朱潮
18d49933e2 Merge branch 'feature/rag_retrive_top_k' into dev 2026-04-17 18:01:49 +08:00
朱潮
d593eb9f46 修改 cicd的webhook 2026-04-17 17:34:30 +08:00
朱潮
a1a9d9fc47 Merge branch 'feature/rag_retrive_top_k' into developing 2026-04-17 16:29:00 +08:00
朱潮
63a2f094bf Merge branch 'feature/rag_retrive_top_k' into dev 2026-04-17 15:55:03 +08:00
朱潮
ebec9e2cba 禁止本地文件检索 2026-04-17 15:54:51 +08:00
朱潮
631f944340 Merge branch 'feature/rag_retrive_top_k' into dev 2026-04-17 11:44:45 +08:00
朱潮
90229ffeaf 优化skill覆盖逻辑 2026-04-17 11:43:20 +08:00
朱潮
80353a2824 Merge branch 'feature/rag_retrive_top_k' into dev 2026-04-17 11:05:37 +08:00
朱潮
9f12a633bc 优化mcp配置 2026-04-17 11:05:16 +08:00
朱潮
ed7da59cce Merge branch 'feature/rag_retrive_top_k' into dev 2026-04-17 09:28:45 +08:00
朱潮
4939a70209 新增了这层约束:
- 本地文件检索与 rag_retrieve / table_rag_retrieve 不是同一数据源
  - 不能因为 RAG 查到了,就假设本地文件也被覆盖
  - 也不能因为本地文件没查到,就推断 RAG 知识库里也没有
  - 进入 fallback 时,仍需按顺序继续尝试本地文件检索
2026-04-16 21:37:43 +08:00
朱潮
cb2fc7001a Merge branch 'feature/rag_retrive_top_k' into dev 2026-04-16 20:09:18 +08:00
朱潮
9d47324a76 add rag_retrieve-only 2026-04-16 20:09:02 +08:00
朱潮
9eeb958abc Merge branch 'feature/rag_retrive_top_k' into dev 2026-04-16 19:39:30 +08:00
朱潮
e1bf685314 add rag_retrieve autoload 2026-04-16 19:38:13 +08:00
朱潮
53fb98e44e Retrieval Policy 2026-04-16 17:55:34 +08:00
朱潮
2558949abc Merge branch 'onprem-release' into dev 2026-04-16 17:50:57 +08:00
朱潮
dfcd784cee Merge branch 'feature/rag_retrive_top_k' into onprem-release 2026-04-16 17:50:46 +08:00
朱潮
753f38a072 优化知识库检索顺序 2026-04-16 17:50:35 +08:00
朱潮
a54e2f758f Merge branch 'feature/rag_retrive_top_k' into dev 2026-04-16 17:23:07 +08:00
朱潮
d284e589d3 Merge branch 'feature/rag_retrive_top_k' into onprem-release 2026-04-16 17:22:45 +08:00
朱潮
393842b3f2 优化提示词 2026-04-16 17:22:27 +08:00
朱潮
b4a96acf4a Merge branch 'feature/rag_retrive_top_k' into onprem-release 2026-04-16 15:14:13 +08:00
朱潮
07a782933d Merge branch 'developing' into dev 2026-04-16 14:56:25 +08:00
朱潮
2fde1a5465 Merge branch 'feature/rag_retrive_top_k' into developing 2026-04-16 14:56:12 +08:00
朱潮
86f9c227de 优化知识库优先级 2026-04-16 14:55:56 +08:00
朱潮
f1487f6756 Merge branch 'feature/rag_retrive_top_k' into onprem-release 2026-04-16 13:01:12 +08:00
朱潮
7add92b304 Merge branch 'feature/rag_retrive_top_k' into dev 2026-04-16 13:01:01 +08:00
朱潮
b9f042a11c Merge branch 'feature/rag_retrive_top_k' into developing 2026-04-16 13:00:58 +08:00
朱潮
7b5435fd0f 优化知识库优先级 2026-04-16 12:55:31 +08:00
朱潮
f3162cb332 Merge branch 'developing' into dev 2026-04-16 12:22:33 +08:00
朱潮
ca3872c99e Merge branch 'feature/rag_retrive_top_k' into onprem-release 2026-04-16 12:19:44 +08:00
朱潮
7cb172a21f 优化dataset优先级 2026-04-16 12:19:23 +08:00
朱潮
404f7d39c9 Merge branch 'developing' into feature/rag_retrive_top_k 2026-04-16 12:14:49 +08:00
朱潮
8aa23dea3f 优化top_k决策 2026-04-16 12:11:46 +08:00
朱潮
d73f3b2749 Merge branch 'feature/rag_retrive_top_k' into onprem-release 2026-04-16 11:08:24 +08:00
朱潮
aa3d9f3687 Merge branch 'feature/rag_retrive_top_k' into developing 2026-04-16 11:07:25 +08:00
朱潮
a6bda16442 Merge branch 'feature/rag_retrive_top_k' into dev 2026-04-16 11:07:11 +08:00
朱潮
7c48035f0e 优化top_k决策 2026-04-16 11:06:27 +08:00
朱潮
d9d78075aa add skill 2026-04-16 10:23:54 +08:00
朱潮
5bb09b22a5 Merge branch 'developing' into dev 2026-04-15 11:10:46 +08:00
朱潮
1d0dcb90b5 Merge branch 'developing' into onprem-release 2026-04-15 11:10:12 +08:00
朱潮
8c49997ed6 dataset to datsets 2026-04-15 11:09:55 +08:00
朱潮
7e27fd4266 Merge branch 'developing' into dev 2026-04-14 20:50:32 +08:00
朱潮
6af391f936 add 12+skills 2026-04-14 20:49:56 +08:00
朱潮
7bcb9da5e2 Merge branch 'developing' into dev 2026-04-13 18:54:24 +08:00
朱潮
d4bdff64f7 Merge branch 'feature/moshui20260410-file-path-fix' into developing 2026-04-13 18:54:09 +08:00
朱潮
6f48799ed9 Merge branch 'feature/moshui20260410-file-path-fix' into onprem-dev 2026-04-13 17:26:51 +08:00
朱潮
a94963461e Merge branch 'developing' into dev 2026-04-11 21:52:19 +08:00
朱潮
73d613b0d1 Merge branch 'feature/moshui20260411-deepagents-0_5_2' into developing 2026-04-11 21:52:06 +08:00
朱潮
cdf2338c66 Merge branch 'feature/moshui20260411-deepagents-0_5_2' into onprem-release 2026-04-11 21:51:20 +08:00
朱潮
21b8226f58 Merge branch 'feature/moshui20260411-deepagents-0_5_2' into dev 2026-04-11 21:01:02 +08:00
朱潮
032f9ba8a6 Merge branch 'feature/moshui20260411-deepagents-0_5_2' into onprem-dev 2026-04-11 20:48:00 +08:00
朱潮
3a0a150813 Merge branch 'developing' into dev 2026-04-11 20:00:27 +08:00
朱潮
9b6eacd554 Merge branch 'feature/moshui20260411-deepagents-0_5_2' into onprem-dev 2026-04-11 19:16:04 +08:00
朱潮
d7130337b0 Merge branch 'feature/moshui20260411-deepagents-0_5_2' into dev 2026-04-11 17:58:33 +08:00
朱潮
b47119b262 Merge branch 'feature/moshui20260411-deepagents-0_5_2' into onprem-dev 2026-04-11 17:57:32 +08:00
朱潮
f196b7cab1 Merge branch 'prod' into dev 2026-04-11 16:08:02 +08:00
朱潮
f750d153cb Merge branch 'prod' into onprem-release 2026-04-11 16:07:05 +08:00
257 changed files with 11964 additions and 324 deletions

View File

@ -49,14 +49,11 @@ jobs:
echo $CMD
ssh ${USER_NAME}@${HOST_NAME} ${CMD}
# 判断正则表达式是否匹配匹配规则从下面的示例中提取示例2023.wk42,2023.wk43
if [[ "${CIRCLE_BRANCH}" =~ ^([0-9]{4}.wk[0-9]{2},?)+$ ]]; then
curl -X POST -H "Content-Type: application/json" -d '{"msg_type":"text","content":{"text":"'${CIRCLE_USERNAME}' 触发了正式环境分支 '${CIRCLE_BRANCH}' 更新与部署"}}' https://open.larksuite.com/open-apis/bot/v2/hook/68004e4a-1381-4886-a982-cd77d5f2e6a1
fi
# 判断正则表达式是否匹配匹配规则从下面的示例中提取示例canary.2023.wk42,canary.2023.wk43
if [[ "${CIRCLE_BRANCH}" =~ ^canary.([0-9]{4}.wk[0-9]{2},?)+$ ]]; then
curl -X POST -H "Content-Type: application/json" -d '{"msg_type":"text","content":{"text":"'${CIRCLE_USERNAME}' 触发了灰度环境分支 '${CIRCLE_BRANCH}' 更新与部署"}}' https://open.larksuite.com/open-apis/bot/v2/hook/68004e4a-1381-4886-a982-cd77d5f2e6a1
fi
case "${CIRCLE_BRANCH}" in
dev|staging|prod|onprem-dev)
curl -X POST -H "Content-Type: application/json" -d '{"msg_type":"text","content":{"text":"'${CIRCLE_USERNAME}' 触发了 '${CIRCLE_BRANCH}' 分支部署成功job='${CIRCLE_JOB}',详情:'${CIRCLE_BUILD_URL}'"}}' https://open.larksuite.com/open-apis/bot/v2/hook/3acf274a-1828-494b-a4a2-a3185f5e466d || echo "WARN: Lark notify failed"
;;
esac
docker-hub-build-push:
machine:
image: ubuntu-2404:current
@ -93,6 +90,10 @@ jobs:
docker push <<parameters.repo>>:<<parameters.docker-tag>>
docker push <<parameters.repo>>:$IMAGE_TAG
if [[ "${CIRCLE_BRANCH}" == "onprem-release" && "<<parameters.docker-tag>>" == "latest" ]]; then
curl -X POST -H "Content-Type: application/json" -d '{"msg_type":"text","content":{"text":"'${CIRCLE_USERNAME}' 触发了 onprem-release Docker Hub 推送成功,镜像:<<parameters.repo>>:'${IMAGE_TAG}'job='${CIRCLE_JOB}',详情:'${CIRCLE_BUILD_URL}'"}}' https://open.larksuite.com/open-apis/bot/v2/hook/3acf274a-1828-494b-a4a2-a3185f5e466d || echo "WARN: Lark notify failed"
fi
workflows:
version: 2
backend_build_and_push:

View File

@ -1,12 +1,15 @@
# Skill 功能
> 负责范围:技能包管理服务 - 核心实现
> 最后更新2025-02-11
> 最后更新2026-04-18
## 当前状态
Skill 系统支持两种来源:官方 skills (`./skills/`) 和用户 skills (`projects/uploads/{bot_id}/skills/`)。支持 Hook 系统和 MCP 服务器配置,通过 SKILL.md 或 plugin.json 定义元数据。
目前已新增一批**纯 `SKILL.md` 型业务 skill MVP**,用于研究、摘要、报告和情报编排,底层文件处理与外部检索能力继续复用既有 skill。
## 核心文件
- `routes/skill_manager.py` - Skill 上传/删除/列表 API
@ -18,10 +21,25 @@ Skill 系统支持两种来源:官方 skills (`./skills/`) 和用户 skills (`
## 最近重要事项
- 2026-04-16: 为 `auto-daily-summary``competitor-news-intel` 新增 Python CLI 脚本 MVP统一采用 `argparse + JSON stdout` 模式
- 2026-04-16: 新增 6 个纯 `SKILL.md` 型业务 skill`market-academic-insight`、`financial-report-generator`、`contract-document-generator`、`sales-decision-report`、`auto-daily-summary`、`competitor-news-intel`
- 2026-04-18: `create_robot_project` 改为自动加载 `skills/autoload/{SKILLS_SUBDIR}` 下所有 skill并跳过已显式传入的同名 skill
- 2026-04-18: `/api/v1/skill/list` 的官方库改为同时读取 `skills/common``skills/{SKILLS_SUBDIR}`,并按目录顺序去重
- 2026-04-18: `_extract_skills_to_robot` 改为通过环境变量 `SKILLS_SUBDIR` 选择官方 skills 子目录,默认使用 `skills/common`
- 2025-02-11: 初始化 skill 功能 memory
## Gotchas开发必读
- ⚠️ 纯 `SKILL.md` 型业务 skill 适合先承载 workflow、输入模板、输出模板需要稳定文件产出或自动化时再补 `scripts/`
- ⚠️ 新业务 skill 应复用既有基础能力 skill`baidu-search`、`xlsx`、`docx`、`pdf`、`schedule-job`、`imap-smtp-email`),避免重复定义底层工具能力
- ⚠️ 新增脚本优先采用 `Python + argparse + JSON stdout`,比 `argv[1] JSON` 更适合自动化链路
- ⚠️ `auto-daily-summary` 需要特别注意中文分句、action 边界截断、risk 窗口裁剪,否则容易把整句/整段吞进去
- ⚠️ `competitor-news-intel` 的 payload 校验应按命令拆分collect/analyze/run不要共用一套最小校验
- ⚠️ `competitor-news-intel``collect/run` 依赖 `BAIDU_API_KEY`;无该环境变量时应返回稳定错误 JSON不要静默降级
- ⚠️ `_extract_skills_to_robot` 只会从 `skills/{SKILLS_SUBDIR}` 读取官方 skills默认是 `common`
- ⚠️ 执行脚本必须使用绝对路径
- ⚠️ MCP 配置优先级Skill MCP > 默认 MCP > 用户参数
- ⚠️ 上传大小限制50MBZIP解压后最大 500MB
@ -88,9 +106,8 @@ skill-name/
## Skill 加载优先级
1. Skill MCP 配置(最高)
2. 默认 MCP 配置 (`mcp/mcp_settings.json`)
3. 用户传入参数(覆盖所有)
1. Skill MCP 配置
2. 用户传入参数(覆盖已有同名配置)
## 安全措施

View File

@ -396,7 +396,6 @@ dataset_name/
│ ├── document.txt # 原始文本内容
│ ├── serialization.txt # 结构化数据
│ └── schema.json # 字段定义和元数据
├── mcp_settings.json # MCP 工具配置
└── system_prompt.md # 系统提示词(可选)
```
@ -405,7 +404,6 @@ dataset_name/
- **document.txt**: 原始 Markdown 文本,提供完整上下文
- **serialization.txt**: 格式化结构数据,每行 `字段1:值1;字段2:值2`
- **schema.json**: 字段定义、枚举值映射和文件关联关系
- **mcp_settings.json**: MCP 工具配置,定义可用的数据处理工具
---
@ -565,8 +563,7 @@ qwen-agent/
│ ├── multi_keyword_search_server.py # 多关键词搜索服务
│ ├── excel_csv_operator_server.py # Excel/CSV 操作服务
│ ├── json_reader_server.py # JSON 读取服务
│ ├── mcp_settings.json # MCP 配置文件
│ └── tools/ # 工具定义文件
│ └── tools/ # 工具定义文件
├── models/ # 模型文件
├── projects/ # 项目目录
│ └── queue_data/ # 队列数据

View File

@ -117,13 +117,6 @@ def read_system_prompt():
return f.read().strip()
def read_mcp_settings():
"""读取MCP工具配置"""
with open("./mcp/mcp_settings.json", "r") as f:
mcp_settings_json = json.load(f)
return mcp_settings_json
async def get_tools_from_mcp(mcp):
"""从MCP配置中提取工具带缓存"""
start_time = time.time()
@ -195,8 +188,7 @@ async def init_agent(config: AgentConfig):
final_system_prompt = await load_system_prompt_async(config)
final_mcp_settings = await load_mcp_settings_async(config)
# 如果没有提供mcp使用config中的mcp_settings
mcp_settings = final_mcp_settings if final_mcp_settings else read_mcp_settings()
mcp_settings = final_mcp_settings if final_mcp_settings else []
system_prompt = final_system_prompt if final_system_prompt else read_system_prompt()
config.system_prompt = mcp_settings

View File

@ -38,8 +38,8 @@ class GuidelineMiddleware(AgentMiddleware):
if not self.guidelines:
self.guidelines = """
1. General Inquiries
Condition: User inquiries about products, policies, troubleshooting, factual questions, etc.
Action: Priority given to invoking the Knowledge Base Retrieval tool to query the knowledge base.
Condition: User inquiries about products, policies, troubleshooting, factual questions, definitions, workflows, data lookups, or other knowledge-seeking requests.
Action: First choose the most suitable Knowledge Base Retrieval tool by scenario. Use table_rag_retrieve first for structured data, lists, statistics, comparisons, extraction, mixed requests, or unclear cases. Use rag_retrieve first only for clearly pure concept / definition / workflow / policy explanation questions. If the first retrieval result is empty, errored, irrelevant, or only partially answers the request, call the other retrieval tool before replying. Only reply that no relevant information was found after both retrieval tools have been tried and still provide no sufficient evidence.
2.Social Dialogue
Condition: User intent involves small talk, greetings, expressions of thanks, compliments, or other non-substantive conversations.
@ -47,7 +47,7 @@ Action: Provide concise, friendly, and personified natural responses.
"""
if not self.tool_description:
self.tool_description = """
- **Knowledge Base Retrieval**: For knowledge queries/other inquiries, prioritize searching the knowledge base rag_retrieve-rag_retrieve
- **Knowledge Base Retrieval**: Choose retrieval order by scenario. Default to `table_rag_retrieve -> rag_retrieve` for structured, list, mixed, or unclear requests. Use `rag_retrieve -> table_rag_retrieve` only for clearly pure concept or workflow questions. Do not answer with "no result" until both tools have been tried when retrieval is needed.
"""
def get_guideline_prompt(self, config: AgentConfig) -> str:

View File

@ -5,6 +5,7 @@ Claude Plugins 模式的 Hook 加载器
"""
import os
import json
import copy
import logging
import asyncio
import subprocess
@ -116,7 +117,8 @@ async def merge_skill_mcp_configs(bot_id: str) -> List[Dict]:
plugin_config = json.load(f)
servers = plugin_config.get('mcpServers', {})
if servers:
merged_servers.update(servers)
normalized_servers = _normalize_skill_mcp_servers(servers, skill_path)
merged_servers.update(normalized_servers)
logger.info(f"Loaded MCP config from skill: {skill_name}")
except Exception as e:
logger.error(f"Failed to load mcpServers from {skill_name}: {e}")
@ -127,6 +129,47 @@ async def merge_skill_mcp_configs(bot_id: str) -> List[Dict]:
return []
def _normalize_skill_mcp_servers(servers: Dict[str, Any], skill_path: str) -> Dict[str, Any]:
"""将 skill plugin 中 stdio MCP server 的相对路径归一化为基于 skill 目录的绝对路径。"""
normalized_servers = copy.deepcopy(servers)
for server_name, server_config in normalized_servers.items():
if not isinstance(server_config, dict):
continue
transport = server_config.get('transport')
if not transport:
transport = 'http' if 'url' in server_config else 'stdio'
if transport != 'stdio':
continue
command = server_config.get('command')
if isinstance(command, str):
server_config['command'] = _resolve_skill_relative_path(command, skill_path)
args = server_config.get('args')
if isinstance(args, list):
server_config['args'] = [
_resolve_skill_relative_path(arg, skill_path) if isinstance(arg, str) else arg
for arg in args
]
return normalized_servers
def _resolve_skill_relative_path(value: str, skill_path: str) -> str:
"""将 ./ 或 ../ 开头且不含占位符的路径转为基于 skill 目录的绝对路径。"""
if '{' in value or '}' in value:
return value
if not value.startswith(('./', '../')):
return value
normalized_path = os.path.abspath(os.path.join(skill_path, value))
logger.debug(f"Resolved skill MCP path: {value} -> {normalized_path}")
return normalized_path
def _load_plugin_config(plugin_json_path: str) -> Dict:
"""加载 plugin.json 配置"""
try:

View File

@ -203,10 +203,9 @@ async def load_mcp_settings_async(config) -> List[Dict]:
List[Dict]: 合并后的MCP设置列表
Note:
支持在 mcp_settings.json args 中使用 {dataset_dir} 占位符
支持在传入或合并后的 mcp_settings args 中使用 {dataset_dir} 占位符
会在 init_modified_agent_service_with_files 中被替换为实际的路径
"""
from agent.config_cache import config_cache
# 从config中获取参数
project_dir = getattr(config, 'project_dir', None)
@ -222,33 +221,6 @@ async def load_mcp_settings_async(config) -> List[Dict]:
skill_mcp_servers = skill_mcp_settings[0].get('mcpServers', {})
logger.info(f"Loaded {len(skill_mcp_servers)} MCP servers from skills")
# ===========================================================================================
# 2. 读取默认MCP设置使用缓存
default_mcp_settings = []
try:
default_mcp_file = os.path.join("mcp", f"mcp_settings.json")
default_mcp_settings = await config_cache.get_json_file(default_mcp_file) or []
if default_mcp_settings:
logger.info(f"Using cached default mcp_settings from mcp folder")
except Exception as e:
logger.error(f"Failed to load default mcp_settings: {str(e)}")
default_mcp_settings = []
# 3. 合并默认设置到merged_settings默认设置被skill覆盖
if default_mcp_settings and len(default_mcp_settings) > 0:
default_mcp_servers = default_mcp_settings[0].get('mcpServers', {})
if merged_settings and len(merged_settings) > 0:
# skill配置已存在将默认配置合并进去skill优先
skill_mcp_servers = merged_settings[0].get('mcpServers', {})
# 默认配置中不存在的才添加
for server_name, server_config in default_mcp_servers.items():
if server_name not in skill_mcp_servers:
skill_mcp_servers[server_name] = server_config
else:
# 没有skill配置直接使用默认配置
merged_settings = default_mcp_settings.copy()
# 遍历mcpServers工具给每个工具增加env参数
if merged_settings and len(merged_settings) > 0:
mcp_servers = merged_settings[0].get('mcpServers', {})
for server_name, server_config in mcp_servers.items():
@ -291,7 +263,7 @@ async def load_mcp_settings_async(config) -> List[Dict]:
# 计算 dataset_dir 用于替换 MCP 配置中的占位符
# 只有当 project_dir 不为 None 时才计算 dataset_dir
dataset_dir = os.path.join(project_dir, "dataset") if project_dir is not None else None
dataset_dir = os.path.join(project_dir, "datasets") if project_dir is not None else None
# 替换 MCP 配置中的 {dataset_dir} 占位符
if dataset_dir is None:
dataset_dir = ""

View File

@ -1,5 +0,0 @@
[
{
"mcpServers": {}
}
]

View File

@ -1,35 +0,0 @@
[
{
"name": "rag_retrieve",
"description": "Retrieve relevant documents from the knowledge base. Returns markdown format results containing relevant content.\n\n[CALLING STRATEGY] This tool is the SECONDARY choice. Only call this tool FIRST when the question is clearly a pure knowledge/concept query (e.g. \"What does XX mean?\", \"How to use XX?\", \"What is the workflow for XX?\") that has NO relation to data, lists, summaries, or tabular output. In ALL other cases, call table_rag_retrieve FIRST, then use this tool to supplement if table_rag results are insufficient or need additional context.\n\n[WHEN TO USE AS SUPPLEMENT] After calling table_rag_retrieve, call this tool if:\n- table_rag_retrieve returned insufficient results and you need document context\n- The answer requires background explanation beyond the structured data\n- The user's question involves both data retrieval and conceptual understanding",
"inputSchema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Retrieval query content. Optimize the query by keeping core keywords and removing vague modifiers. For complex questions, split into multiple sub-queries."
},
"top_k": {
"type": "integer",
"description": "Number of top results to retrieve (default: 100)",
"default": 100
}
},
"required": ["query"]
}
},
{
"name": "table_rag_retrieve",
"description": "Retrieve relevant data from Excel/spreadsheet files in the knowledge base. Returns markdown format results containing table data analysis.\n\n[CALLING STRATEGY] This tool is the DEFAULT first choice. Call this tool FIRST in any of the following situations:\n- Questions involving specific values, prices, quantities, inventory, specifications, rankings, comparisons, statistics\n- Requests for tabular output (e.g. \"make a table\", \"list in a table\", \"一覧表にして\", \"整理成表格\")\n- Information extraction/organization requests (e.g. \"extract\", \"list\", \"summarize and list\", \"抽出\", \"提取\", \"列举\", \"汇总\")\n- Queries about specific person names, project names, or product names (e.g. \"XX議員の答弁を一覧にして\")\n- ANY question where you are unsure whether table data is needed — default to calling this tool first\n\n[RESPONSE HANDLING] When processing the returned results:\n1. Follow all instructions in [INSTRUCTION] and [EXTRA_INSTRUCTION] sections of the response (e.g. output format, source citation requirements)\n2. If Query result hint indicates truncation (e.g. \"Only the first N rows are included; the remaining M rows were omitted\"), you MUST explicitly tell the user: total matches (N+M), displayed count (N), and omitted count (M)\n3. If query result is empty, respond truthfully that no relevant data was found — do NOT fabricate data\n4. Cite data sources using file names from file_ref_table in the response",
"inputSchema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Retrieval query content for table data. Optimize the query by keeping core keywords and removing vague modifiers. For complex questions, split into multiple sub-queries."
}
},
"required": ["query"]
}
}
]

View File

@ -1,7 +1,6 @@
{extra_prompt}
### Current Working Directory
# Current Working Directory
PROJECT_ROOT: `{agent_dir_path}`
The filesystem backend is currently operating in: `{agent_dir_path}`
@ -44,7 +43,7 @@ When executing scripts from SKILL.md files, you MUST convert relative paths to a
**4. Workspace Directory Structure**
- **`{agent_dir_path}/skills/`** - Skill packages with embedded scripts
- **`{agent_dir_path}/dataset/`** - Store file datasets and document data
- **`{agent_dir_path}/datasets/`** - Store file datasets and document data
- **`{agent_dir_path}/executable_code/`** - Place generated executable scripts here (not skill scripts)
- **`{agent_dir_path}/download/`** - Store downloaded files and content
@ -65,12 +64,12 @@ When creating scripts in `executable_code/`, follow these organization rules:
**Path Examples:**
- Skill script: `{agent_dir_path}/skills/rag-retrieve/scripts/rag_retrieve.py`
- Dataset file: `{agent_dir_path}/dataset/document.txt`
- Dataset file: `{agent_dir_path}/datasets/document.txt`
- Task-specific script: `{agent_dir_path}/executable_code/invoice_parser/parse.py`
- Temporary script (when needed): `{agent_dir_path}/executable_code/tmp/test.py`
- Downloaded file: `{agent_dir_path}/download/report.pdf`
## System Information
# System Information
<env>
Working directory: {agent_dir_path}
Current User: {user_identifier}
@ -80,6 +79,7 @@ Trace Id: {trace_id}
# Execution Guidelines
- **Tool-Driven**: All operations are implemented through tool interfaces.
- **No Premature File Exploration**: Do not inspect local files merely to "see what exists" before attempting earlier knowledge retrieval sources. Local filesystem retrieval is the final fallback, not the default path, but do not skip it when earlier retrieval sources are insufficient.
- **Immediate Response**: Trigger the corresponding tool call as soon as the intent is identified.
- **Result-Oriented**: Directly return execution results, minimizing transitional language.
- **Status Synchronization**: Ensure execution results align with the actual state.

View File

@ -213,7 +213,7 @@ async def reset_files_processing(dataset_id: str):
elif 'filename' in file_info:
# Fallback to old filename-based structure
filename_without_ext = os.path.splitext(file_info['filename'])[0]
dataset_dir = os.path.join("projects", "data", dataset_id, "dataset", filename_without_ext)
dataset_dir = os.path.join("projects", "data", dataset_id, "datasets", filename_without_ext)
if remove_file_or_directory(dataset_dir):
removed_files.append(dataset_dir)
@ -232,7 +232,7 @@ async def reset_files_processing(dataset_id: str):
removed_files.append(files_dir)
# Also remove the entire dataset directory (clean up any remaining files)
dataset_dir = os.path.join(project_dir, "dataset")
dataset_dir = os.path.join(project_dir, "datasets")
if remove_file_or_directory(dataset_dir):
removed_files.append(dataset_dir)
@ -465,4 +465,4 @@ async def cleanup_tasks(older_than_days: int = 7):
except Exception as e:
logger.error(f"Error cleaning up tasks: {str(e)}")
raise HTTPException(status_code=500, detail=f"清理任务记录失败: {str(e)}")
raise HTTPException(status_code=500, detail=f"清理任务记录失败: {str(e)}")

View File

@ -33,8 +33,8 @@ async def list_all_projects():
# 统计文件数量
file_count = 0
if os.path.exists(os.path.join(item_path, "dataset")):
for root, dirs, files in os.walk(os.path.join(item_path, "dataset")):
if os.path.exists(os.path.join(item_path, "datasets")):
for root, dirs, files in os.walk(os.path.join(item_path, "datasets")):
file_count += len(files)
robot_projects.append({
@ -173,4 +173,4 @@ async def get_project_tasks(dataset_id: str):
except Exception as e:
logger.error(f"Error getting project tasks: {str(e)}")
raise HTTPException(status_code=500, detail=f"获取项目任务失败: {str(e)}")
raise HTTPException(status_code=500, detail=f"获取项目任务失败: {str(e)}")

View File

@ -10,7 +10,7 @@ from typing import List, Optional
from dataclasses import dataclass
from fastapi import APIRouter, HTTPException, Query, UploadFile, File, Form
from pydantic import BaseModel
from utils.settings import SKILLS_DIR
from utils.settings import SKILLS_DIR, SKILLS_SUBDIR
import aiofiles
logger = logging.getLogger('app')
@ -427,27 +427,39 @@ def get_official_skills(base_dir: str) -> List[SkillItem]:
List of SkillItem objects
"""
skills = []
skill_names = set()
# Use SKILLS_DIR from settings, relative to base_dir
if os.path.isabs(SKILLS_DIR):
official_skills_dir = SKILLS_DIR
skills_root_dir = SKILLS_DIR
else:
official_skills_dir = os.path.join(base_dir, SKILLS_DIR)
skills_root_dir = os.path.join(base_dir, SKILLS_DIR)
if not os.path.exists(official_skills_dir):
logger.warning(f"Official skills directory not found: {official_skills_dir}")
return skills
official_skills_dirs = [
os.path.join(skills_root_dir, "common"),
os.path.join(skills_root_dir, SKILLS_SUBDIR),
]
for skill_name in os.listdir(official_skills_dir):
skill_path = os.path.join(official_skills_dir, skill_name)
if os.path.isdir(skill_path):
metadata = get_skill_metadata_legacy(skill_path)
if metadata:
skills.append(SkillItem(
name=metadata['name'],
description=metadata['description'],
user_skill=False
))
logger.debug(f"Found official skill: {metadata['name']}")
for official_skills_dir in official_skills_dirs:
if not os.path.exists(official_skills_dir):
logger.warning(f"Official skills directory not found: {official_skills_dir}")
continue
for skill_name in os.listdir(official_skills_dir):
if skill_name in skill_names:
continue
skill_path = os.path.join(official_skills_dir, skill_name)
if os.path.isdir(skill_path):
metadata = get_skill_metadata_legacy(skill_path)
if metadata:
skills.append(SkillItem(
name=metadata['name'],
description=metadata['description'],
user_skill=False
))
skill_names.add(skill_name)
logger.debug(f"Found official skill: {metadata['name']} from {official_skills_dir}")
return skills
@ -498,7 +510,7 @@ async def list_skills(
SkillListResponse containing all skills
Notes:
- Official skills are read from the /skills directory
- Official skills are read from /skills/common and /skills/{SKILLS_SUBDIR}
- User skills are read from /projects/uploads/{bot_id}/skills directory
- User skills are marked with user_skill: true
"""

View File

@ -0,0 +1,218 @@
---
name: auto-daily-summary
description: Generate recurring summaries, daily reports, content digests, and concise action-oriented briefs from multiple inputs. Use when the user asks for daily summaries, periodic briefings, meeting digests, content condensation, or automated recurring report generation. 中文触发词包括:日报、周报、摘要、会议纪要、内容浓缩、自动汇总、每天发我一份总结。
---
# Auto Daily Summary
## Overview
This skill converts scattered information into concise, structured summaries for recurring or one-off use.
Typical scenarios:
- daily or weekly report generation
- long content condensation
- meeting or conversation summary
- multi-source digest
- action-item extraction
This skill focuses on **organization and summarization**, not source retrieval itself.
## Quick Start
When the user asks for a summary or report:
1. Identify the sources to summarize
2. Clarify the audience and desired level of detail
3. Determine whether the output is one-time or recurring
4. Summarize by theme, not by raw chronological dump unless requested
5. Extract action items and watch items when useful
### 中文任务映射
- “帮我整理成日报” → `daily_report`
- “做个周报/周总结” → `digest``daily_report`
- “把这段会议内容整理一下” → `meeting_digest`
- “浓缩成 3-5 条重点” → `digest` + `short`
- “每天早上发我一份总结” → `plan-recurring` + `schedule-job`
## Input Requirements
| Field | Required | Description |
|-------|----------|-------------|
| source content | yes | Text, notes, messages, links, reports, logs, or mixed content |
| summary objective | yes | Inform, decide, archive, handoff, or monitor |
| audience | no | Self, team, manager, executive, customer |
| time scope | no | Today, this week, meeting duration, selected period |
| desired length | no | TL;DR, short, standard, detailed |
| output style | no | Daily report, digest, executive summary, bullet list |
| action extraction | no | Whether to extract todos, risks, blockers |
## Workflow Decision Tree
### Content Summary
Use when the user wants a concise summary of a long input.
### Daily / Weekly Report
Use when the user wants a periodic report with sections and status updates.
### Meeting Digest
Use when the user wants decisions, action items, and blockers from a discussion.
### Recurring Summary Workflow
Use when the user wants this to happen on a schedule. In that case, pair with `schedule-job`.
## Instructions
### Step 1: Identify source boundaries
Clarify what should and should not be included in the summary.
### Step 2: Determine the correct abstraction level
Choose the right level for the audience:
- executive audience -> implications and decisions
- working team -> concrete tasks and blockers
- archive -> structured factual recap
### Step 3: Group by theme
Prefer grouping by:
- progress
- decisions
- blockers
- risks
- next steps
Avoid copying source order unless chronology itself matters.
### Step 4: Extract action items
When appropriate, identify:
- owner
- task
- due timing
- dependency or blocker
If ownership is unclear, say so.
### Step 5: Prepare for automation if needed
If the user wants recurring output:
- use `schedule-job` for cadence
- use `imap-smtp-email` or other enabled notification skills for delivery
## Scripts
### CLI Usage
Use the following commands when you need stable structured outputs:
```bash
poetry run python skills/auto-daily-summary/scripts/summary_cli.py validate --input-json '<JSON>'
poetry run python skills/auto-daily-summary/scripts/summary_cli.py run --input-json '<JSON>' --output json
poetry run python skills/auto-daily-summary/scripts/summary_cli.py plan-recurring --input-json '<JSON>'
```
### Recommended Uses
- `validate` - check whether the summary request payload is complete
- `run` - generate summary JSON and markdown
- `plan-recurring` - generate a schedule-ready message payload for `schedule-job`
### Daily Report
```markdown
# Daily Report
## Summary
[Short summary]
## Key Updates
- [Update]
## Decisions
- [Decision]
## Risks / Blockers
- [Risk or blocker]
## Next Actions
- [Action]
```
### Content Digest
```markdown
# Content Digest
## TL;DR
[Very short summary]
## Main Themes
### 1. [Theme]
- [Key point]
## Notable Details
- [Detail]
## Follow-up
- [Suggested follow-up]
```
## Quality Checklist
Before finalizing, verify:
- the summary matches the audience level
- repetition and noise are removed
- key decisions are not buried
- action items are explicit when relevant
- uncertainty is preserved rather than flattened away
- the result is shorter and clearer than the source material
## Fallback Strategy
If the input is too fragmented:
- produce a partial summary by theme
- list gaps or unclear areas
- ask for additional source material only if needed for the users stated goal
## Related Skills
- `skills/schedule-job/SKILL.md` - automate recurring execution
- `skills/imap-smtp-email/SKILL.md` - send summaries via email
- `skills/market-academic-insight/SKILL.md` - use when the task is deeper research synthesis rather than pure summarization
- `skills/competitor-news-intel/SKILL.md` - use when competitor monitoring and intelligence is the real task
## Examples
**User**: "帮我把今天的工作内容整理成日报"
Expected output:
- summary
- key updates
- blockers
- next actions
**User**: "把这篇长文浓缩成 5 条重点"
Expected output:
- TL;DR
- 5 concise points
- optional follow-up note
**User**: "每天早上自动给我发新闻摘要"
Expected output:
- summary format definition
- recommendation to combine with `schedule-job`
- delivery method confirmation
**User**: "把这段会议记录整理成会议纪要"
Expected output:
- summary
- decisions
- action items
- blockers if any
**User**: "给我做个今天的三段式总结"
Expected output:
- summary
- key updates
- next actions

View File

@ -0,0 +1,164 @@
#!/usr/bin/env python3
import argparse
import json
import sys
from datetime import datetime, UTC
from summary_core import build_summary, validate_payload
ERROR_TEMPLATE = {
"success": False,
"code": "invalid_input",
"message": "",
"data": {},
"meta": {},
"errors": [],
}
def _now_iso() -> str:
return datetime.now(UTC).isoformat()
def _emit_json(data: dict, pretty: bool, stream=None):
print(json.dumps(data, ensure_ascii=False, indent=2 if pretty else None), file=stream or sys.stdout)
def _error_response(code: str, message: str, errors: list[str] | None = None) -> dict:
return {
**ERROR_TEMPLATE,
"code": code,
"message": message,
"meta": {"generated_at": _now_iso()},
"errors": errors or [],
}
def _parse_bool(value: str | None) -> bool | None:
if value is None:
return None
lowered = value.lower()
if lowered in {"1", "true", "yes", "y"}:
return True
if lowered in {"0", "false", "no", "n"}:
return False
raise ValueError(f"invalid boolean value: {value}")
def _load_payload(raw: str) -> dict:
return json.loads(raw)
def _apply_overrides(payload: dict, args: argparse.Namespace) -> dict:
payload.setdefault("data", {})
if args.lang:
payload["language"] = args.lang
if args.style:
payload["data"]["style"] = args.style
if args.length:
payload["data"]["length"] = args.length
if hasattr(args, "extract_actions"):
extract_actions = _parse_bool(getattr(args, "extract_actions", None))
if extract_actions is not None:
payload["data"]["extract_actions"] = extract_actions
if hasattr(args, "extract_risks"):
extract_risks = _parse_bool(getattr(args, "extract_risks", None))
if extract_risks is not None:
payload["data"]["extract_risks"] = extract_risks
return payload
def cmd_validate(args: argparse.Namespace):
payload = _apply_overrides(_load_payload(args.input_json), args)
errors = validate_payload(payload)
result = {
"success": not errors,
"code": "ok" if not errors else "invalid_input",
"message": "payload valid" if not errors else "payload invalid",
"data": {"valid": not errors},
"meta": {"generated_at": _now_iso()},
"errors": errors,
}
target_stream = sys.stdout if not errors else sys.stderr
_emit_json(result, args.pretty, target_stream)
if errors:
raise SystemExit(1)
def cmd_run(args: argparse.Namespace):
payload = _apply_overrides(_load_payload(args.input_json), args)
errors = validate_payload(payload)
if errors:
_emit_json(_error_response("invalid_input", "payload invalid", errors), args.pretty, sys.stderr)
raise SystemExit(1)
data = build_summary(payload)
if args.output == "markdown":
print(data["markdown"])
return
result = {
"success": True,
"code": "ok",
"message": "summary generated",
"data": data,
"meta": {
"generated_at": _now_iso(),
"source_count": len(payload.get("data", {}).get("sources", [])),
},
"errors": [],
}
_emit_json(result, args.pretty)
def cmd_plan_recurring(args: argparse.Namespace):
payload = _apply_overrides(_load_payload(args.input_json), args)
errors = validate_payload(payload)
if errors:
_emit_json(_error_response("invalid_input", "payload invalid", errors), args.pretty, sys.stderr)
raise SystemExit(1)
data = build_summary(payload)
result = {
"success": True,
"code": "ok",
"message": "recurring plan generated",
"data": {"schedule_payload": data["schedule_payload"]},
"meta": {"generated_at": _now_iso()},
"errors": [],
}
_emit_json(result, args.pretty)
def build_parser() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(description="Generate structured summaries")
subparsers = parser.add_subparsers(dest="command", required=True)
for name in ["validate", "run", "plan-recurring"]:
sub = subparsers.add_parser(name)
sub.add_argument("--input-json", required=True)
sub.add_argument("--lang")
sub.add_argument("--style")
sub.add_argument("--length")
sub.add_argument("--extract-actions")
sub.add_argument("--extract-risks")
sub.add_argument("--pretty", action="store_true")
if name == "run":
sub.add_argument("--output", choices=["json", "markdown"], default="json")
subparsers.choices["validate"].set_defaults(func=cmd_validate)
subparsers.choices["run"].set_defaults(func=cmd_run)
subparsers.choices["plan-recurring"].set_defaults(func=cmd_plan_recurring)
return parser
if __name__ == "__main__":
parser = build_parser()
args = parser.parse_args()
try:
args.func(args)
except json.JSONDecodeError as exc:
_emit_json(_error_response("invalid_input", f"invalid json: {exc}", [str(exc)]), getattr(args, "pretty", False), sys.stderr)
raise SystemExit(1)
except ValueError as exc:
_emit_json(_error_response("invalid_input", str(exc), [str(exc)]), getattr(args, "pretty", False), sys.stderr)
raise SystemExit(1)
except Exception as exc:
_emit_json(_error_response("internal_error", "unexpected error", [str(exc)]), getattr(args, "pretty", False), sys.stderr)
raise SystemExit(1)

View File

@ -0,0 +1,228 @@
import re
from collections import Counter
from typing import Any
SUMMARY_LENGTH_LIMITS = {
"tldr": 2,
"short": 4,
"standard": 6,
"detailed": 10,
}
ACTION_PREFIX_PATTERNS = [
r"^(?:TODO|待办|action)[:]?\s*(.+)$",
r"^(?:需要|跟进)\s*(.+)$",
]
RISK_KEYWORDS = {
"high": ["阻塞", "blocker", "故障", "失败", "严重", "不可用"],
"medium": ["风险", "延迟", "异常", "超时", "报错"],
"low": ["提醒", "注意", "观察", "待确认"],
}
BOOL_FIELDS = ["extract_actions", "extract_risks"]
VALID_STYLES = {"daily_report", "digest", "meeting_digest", "executive"}
VALID_LENGTHS = {"tldr", "short", "standard", "detailed"}
def _clean_text(text: str) -> str:
return re.sub(r"\s+", " ", (text or "").strip())
def _normalize_key(text: str) -> str:
text = _clean_text(text).lower()
return re.sub(r"[^\w\u4e00-\u9fff]+", "", text)
def _split_sentences(text: str) -> list[str]:
raw_parts = re.split(r"[。!?;.!?;]+|\n+", text)
return [_clean_text(part) for part in raw_parts if _clean_text(part)]
def _sentence_tokens(sentence: str) -> list[str]:
return re.findall(r"[A-Za-z0-9_-]+|[\u4e00-\u9fff]{2,}", sentence.lower())
def _top_sentences(texts: list[str], limit: int) -> list[str]:
sentences: list[str] = []
for text in texts:
sentences.extend(_split_sentences(text))
if not sentences:
return []
token_counter = Counter()
for sentence in sentences:
token_counter.update(_sentence_tokens(sentence))
scored: list[tuple[int, int, str]] = []
for index, sentence in enumerate(sentences):
score = sum(token_counter[token] for token in _sentence_tokens(sentence))
scored.append((score, -index, sentence))
ranked = [sentence for _, _, sentence in sorted(scored, reverse=True)]
unique_ranked = []
seen = set()
for sentence in ranked:
key = _normalize_key(sentence)
if key and key not in seen:
seen.add(key)
unique_ranked.append(sentence)
if len(unique_ranked) >= limit:
break
return unique_ranked
def _trim_fragment(text: str, max_length: int = 80) -> str:
fragment = re.split(r"[,。;;!?]", text, maxsplit=1)[0]
fragment = _clean_text(fragment)
return fragment[:max_length].strip()
def _extract_actions(texts: list[str]) -> list[dict[str, Any]]:
items: list[dict[str, Any]] = []
for text in texts:
for sentence in _split_sentences(text):
for pattern in ACTION_PREFIX_PATTERNS:
match = re.match(pattern, sentence, re.IGNORECASE)
if not match:
continue
task = _trim_fragment(match.group(1))
if len(task) < 2:
continue
items.append({"task": task, "owner": None, "due_at": None, "blocker": None})
break
dedup = []
seen = set()
for item in items:
key = _normalize_key(item["task"])
if key and key not in seen:
seen.add(key)
dedup.append(item)
return dedup[:10]
def _extract_risks(texts: list[str]) -> list[dict[str, Any]]:
risks: list[dict[str, Any]] = []
for text in texts:
for sentence in _split_sentences(text):
lowered = sentence.lower()
for impact, keywords in RISK_KEYWORDS.items():
matched = next((keyword for keyword in keywords if keyword.lower() in lowered), None)
if not matched:
continue
start = max(0, lowered.find(matched.lower()) - 18)
end = min(len(sentence), lowered.find(matched.lower()) + len(matched) + 30)
fragment = _clean_text(sentence[start:end])
fragment = fragment[:120]
if len(fragment) < 2:
continue
risks.append({"risk": fragment, "impact": impact, "mitigation": None})
break
dedup = []
seen = set()
for item in risks:
key = _normalize_key(item["risk"])
if key and key not in seen:
seen.add(key)
dedup.append(item)
return dedup[:10]
def _build_summary_line(sentences: list[str]) -> str:
if not sentences:
return "暂无可提炼的关键信息。"
selected = sentences[:2]
return "".join(selected)
def build_summary(payload: dict[str, Any]) -> dict[str, Any]:
data = payload.get("data", {})
sources = data.get("sources", [])
texts = [_clean_text(source.get("content", "")) for source in sources if _clean_text(source.get("content", ""))]
length = data.get("length", "standard")
style = data.get("style", "daily_report")
limit = SUMMARY_LENGTH_LIMITS.get(length, SUMMARY_LENGTH_LIMITS["standard"])
top_sentences = _top_sentences(texts, limit)
summary_line = _build_summary_line(top_sentences)
summary_keys = {_normalize_key(sentence) for sentence in top_sentences[:2]}
detail_sentences = [sentence for sentence in top_sentences if _normalize_key(sentence) not in summary_keys]
sections = []
if detail_sentences:
if len(detail_sentences) == 1:
sections = [{"title": "Key Updates", "bullets": detail_sentences}]
else:
midpoint = max(1, len(detail_sentences) // 2)
sections = [
{"title": "Key Updates", "bullets": detail_sentences[:midpoint]},
{"title": "Notable Details", "bullets": detail_sentences[midpoint:]},
]
action_items = _extract_actions(texts) if data.get("extract_actions") else []
risk_items = _extract_risks(texts) if data.get("extract_risks") else []
markdown_lines = ["# Summary", "", "## Summary", f"- {summary_line}"]
for section in sections:
if not section["bullets"]:
continue
markdown_lines.extend(["", f"## {section['title']}"])
markdown_lines.extend(f"- {bullet}" for bullet in section["bullets"])
if action_items:
markdown_lines.extend(["", "## Action Items"])
markdown_lines.extend(f"- {item['task']}" for item in action_items)
if risk_items:
markdown_lines.extend(["", "## Risks"])
markdown_lines.extend(f"- [{item['impact']}] {item['risk']}" for item in risk_items)
schedule_payload = {
"suggested_name": "Daily Summary",
"message": "[Scheduled Task Triggered] 请立即汇总最新内容并输出结构化摘要,如有行动项和风险请一并列出,然后选择合适的通知方式发送给用户。",
}
return {
"summary": summary_line,
"sections": sections,
"action_items": action_items,
"risk_items": risk_items,
"markdown": "\n".join(markdown_lines),
"schedule_payload": schedule_payload,
"style": style,
}
def _validate_source(source: Any, index: int) -> list[str]:
errors = []
if not isinstance(source, dict):
return [f"data.sources[{index}] must be an object"]
if not _clean_text(str(source.get("content", ""))):
errors.append(f"data.sources[{index}].content is required")
return errors
def validate_payload(payload: dict[str, Any]) -> list[str]:
errors = []
data = payload.get("data")
if not isinstance(data, dict):
return ["data must be an object"]
sources = data.get("sources")
if not isinstance(sources, list) or not sources:
errors.append("data.sources must be a non-empty array")
else:
for index, source in enumerate(sources):
errors.extend(_validate_source(source, index))
objective = data.get("objective")
if not isinstance(objective, str) or not objective.strip():
errors.append("data.objective is required")
style = data.get("style")
if style is not None and style not in VALID_STYLES:
errors.append(f"data.style must be one of {sorted(VALID_STYLES)}")
length = data.get("length")
if length is not None and length not in VALID_LENGTHS:
errors.append(f"data.length must be one of {sorted(VALID_LENGTHS)}")
for field in BOOL_FIELDS:
value = data.get(field)
if value is not None and not isinstance(value, bool):
errors.append(f"data.{field} must be a boolean")
return errors

View File

@ -0,0 +1,22 @@
{
"name": "rag-retrieve",
"description": "Provides RAG and table RAG retrieval tools through a PrePrompt hook and MCP server.",
"hooks": {
"PrePrompt": [
{
"type": "command",
"command": "python hooks/pre_prompt.py"
}
]
},
"mcpServers": {
"rag_retrieve": {
"transport": "stdio",
"command": "python",
"args": [
"./rag_retrieve_server.py",
"{bot_id}"
]
}
}
}

View File

@ -0,0 +1,99 @@
# RAG Retrieve
An example autoload skill that demonstrates how to integrate `rag-retrieve` and `table-rag-retrieve` through Claude Plugins hooks and an MCP server.
## Overview
This skill uses a `PrePrompt` hook to inject retrieval guidance into the prompt, and starts an MCP server that exposes retrieval capabilities for the current bot.
### PrePrompt Hook
Runs when the system prompt is loaded and injects retrieval policy content.
- File: `hooks/pre_prompt.py`
- Purpose: load retrieval instructions and add them to the prompt context
### MCP Server
Provides retrieval tools over stdio for the current `bot_id`.
- File: `rag_retrieve_server.py`
- Purpose: expose `rag-retrieve` and related retrieval tools to the agent
## Directory Structure
```text
rag-retrieve/
├── README.md # Skill documentation
├── .claude-plugin/
│ └── plugin.json # Hook and MCP server configuration
├── hooks/
│ ├── pre_prompt.py # PrePrompt hook script
│ └── retrieval-policy.md # Retrieval policy injected into the prompt
├── mcp_common.py # Shared MCP utilities
├── rag_retrieve_server.py # MCP server entrypoint
└── rag_retrieve_tools.json # Tool definitions
```
## `plugin.json` Format
```json
{
"name": "rag-retrieve",
"description": "rag-retrieve and table-rag-retrieve",
"hooks": {
"PrePrompt": [
{
"type": "command",
"command": "python hooks/pre_prompt.py"
}
]
},
"mcpServers": {
"rag_retrieve": {
"transport": "stdio",
"command": "python",
"args": [
"./skills_autoload/rag-retrieve/rag_retrieve_server.py",
"{bot_id}"
]
}
}
}
```
## Hook Script Behavior
The hook script runs as a subprocess, receives input through environment variables, and writes the injected content to stdout.
### Available Environment Variables
| Environment Variable | Description | Applies To |
|----------------------|-------------|------------|
| `ASSISTANT_ID` | Bot ID | All hooks |
| `USER_IDENTIFIER` | User identifier | All hooks |
| `SESSION_ID` | Session ID | All hooks |
| `LANGUAGE` | Language code | All hooks |
| `HOOK_TYPE` | Hook type | All hooks |
### PrePrompt Example
```python
#!/usr/bin/env python3
import os
import sys
def main():
user_identifier = os.environ.get('USER_IDENTIFIER', '')
bot_id = os.environ.get('ASSISTANT_ID', '')
print(f"## Retrieval Context\n\nUser: {user_identifier}\nBot: {bot_id}")
return 0
if __name__ == '__main__':
sys.exit(main())
```
## Example Use Cases
1. **Prompt-time retrieval guidance**: inject retrieval rules before the model starts reasoning
2. **Bot-specific retrieval setup**: start the MCP server with the current `bot_id`
3. **Unified retrieval access**: expose RAG and table RAG tools through a single skill

View File

@ -0,0 +1,55 @@
# Retrieval Policy
### 1. Retrieval Order and Tool Selection
- Follow this section for source choice, tool choice, query rewrite, `top_k`, fallback, result handling, and citations.
- Use this default retrieval order and execute it sequentially: skill-enabled knowledge retrieval tools > `rag_retrieve` / `table_rag_retrieve`.
- Do NOT answer from model knowledge first.
- Do NOT bypass the retrieval flow and inspect local filesystem documents on your own.
- Do NOT use local filesystem retrieval as a fallback knowledge source.
- Local filesystem documents are not a recommended retrieval source here because file formats are inconsistent and have not been normalized or parsed for reliable knowledge lookup.
- Knowledge must be retrieved through the supported knowledge tools only: skill-enabled retrieval scripts, `table_rag_retrieve`, and `rag_retrieve`.
- When a suitable skill-enabled knowledge retrieval tool is available, use it first.
- If no suitable skill-enabled retrieval tool is available, or if its result is insufficient, continue with `rag_retrieve` or `table_rag_retrieve`.
- Use `table_rag_retrieve` first for values, prices, quantities, inventory, specifications, rankings, comparisons, summaries, extraction, lists, tables, name lookup, historical coverage, mixed questions, and unclear cases.
- Use `rag_retrieve` first only for clearly pure concept, definition, workflow, policy, or explanation questions without structured data needs.
- After each retrieval step, evaluate sufficiency before moving to the next source. Do NOT run these retrieval sources in parallel.
### 2. Query Preparation
- Do NOT pass the raw user question unless it already works well for retrieval.
- Rewrite for recall: extract entity, time scope, attributes, and intent.
- Add useful variants: synonyms, aliases, abbreviations, related titles, historical names, and category terms.
- Expand list-style, extraction, overview, historical, roster, timeline, and archive queries more aggressively.
- Preserve meaning. Do NOT introduce unrelated topics.
### 3. Retrieval Breadth (`top_k`)
- Apply `top_k` only to `rag_retrieve`. Use the smallest sufficient value, then expand only if coverage is insufficient.
- Use `30` for simple fact lookup.
- Use `50` for moderate synthesis, comparison, summarization, or disambiguation.
- Use `100` for broad recall, such as comprehensive analysis, scattered knowledge, multiple entities or periods, or list / catalog / timeline / roster / overview requests.
- Raise `top_k` when keyword branches are many or results are too few, repetitive, incomplete, sparse, or too narrow.
- Use this expansion order: `30 -> 50 -> 100`. If unsure, use `100`.
### 4. Result Evaluation
- Treat results as insufficient if they are empty, start with `Error:`, say `no excel files found`, are off-topic, miss the core entity or scope, or provide no usable evidence.
- Also treat results as insufficient when they cover only part of the request, or when full-list, historical, comparison, or mixed data + explanation requests return only partial or truncated coverage.
### 5. Fallback and Sequential Retry
- If the first retrieval result is insufficient, call the next supported retrieval source in the default order before replying.
- `table_rag_retrieve` now performs an internal fallback to `rag_retrieve` when it returns `no excel files found`, but this does NOT change the higher-level retrieval order.
- If `table_rag_retrieve` is insufficient or empty, continue with `rag_retrieve`.
- If `rag_retrieve` is insufficient or empty, continue with `table_rag_retrieve`.
- Say no relevant information was found only after all applicable skill-enabled retrieval tools, `rag_retrieve`, and `table_rag_retrieve` have been tried and still do not provide enough evidence.
- Do NOT reply that no relevant information was found before the supported knowledge retrieval flow has been exhausted.
### 6. Table RAG Result Handling
- Follow all `[INSTRUCTION]` and `[EXTRA_INSTRUCTION]` content in `table_rag_retrieve` results.
- If results are truncated, explicitly tell the user total matches (`N+M`), displayed count (`N`), and omitted count (`M`).
- Cite data sources using filenames from `file_ref_table`.
### 7. Citation Requirements for Retrieved Knowledge
- When using knowledge from `rag_retrieve` or `table_rag_retrieve`, you MUST generate `<CITATION ... />` tags.
- Follow the citation format returned by each tool.
- Place citations immediately after the paragraph or bullet list that uses the knowledge.
- Do NOT collect citations at the end.
- Use 1-2 citations per paragraph or bullet list when possible.
- If learned knowledge is used, include at least 1 `<CITATION ... />`.

View File

@ -0,0 +1,20 @@
#!/usr/bin/env python3
"""
PreMemoryPrompt Hook - 用户上下文加载器示例
在记忆提取提示词FACT_RETRIEVAL_PROMPT加载时执行
读取同目录下的 memory_prompt.md 作为自定义记忆提取提示词模板
"""
import sys
from pathlib import Path
def main():
prompt_file = Path(__file__).parent / "retrieval-policy.md"
if prompt_file.exists():
print(prompt_file.read_text(encoding="utf-8"))
return 0
if __name__ == '__main__':
sys.exit(main())

View File

@ -0,0 +1,80 @@
# Retrieval Policy
## 0. Task Classification
Classify the request before acting:
- **Knowledge retrieval** (facts, summaries, comparisons, prices, lists, timelines, extraction, etc.): follow this policy strictly.
- **Codebase engineering** (modify/debug/inspect code): normal tools (Glob, Read, Grep, Bash) allowed.
- **Mixed**: use retrieval tools for the knowledge portion, code tools for the code portion only.
- **Uncertain**: default to knowledge retrieval.
## 1. Critical Enforcement
For knowledge retrieval tasks, **this policy overrides generic codebase exploration behavior**.
- **Prohibited tools**: `Glob`, `Read`, `LS`, Bash (`ls`, `find`, `cat`, `head`, `tail`, `grep`, etc.) — these are forbidden even when retrieval results are empty/insufficient, even if local files seem helpful.
- **Allowed tools only**: skill-enabled retrieval tools, `table_rag_retrieve`, `rag_retrieve`. No other source for factual answering.
- Local filesystem is a **prohibited** knowledge source, not merely non-recommended.
- Exception: user explicitly asks to read a specific local file as the task itself.
## 2. Retrieval Order and Tool Selection
Execute **sequentially, one at a time**. Do NOT run in parallel. Do NOT probe filesystem first.
1. **Skill-enabled retrieval tools** (use first when available)
2. **`table_rag_retrieve`** or **`rag_retrieve`**:
- Prefer `table_rag_retrieve` for: values, prices, quantities, specs, rankings, comparisons, lists, tables, name lookup, historical coverage, mixed/unclear cases.
- Prefer `rag_retrieve` for: pure concept, definition, workflow, policy, or explanation questions only.
- Do NOT answer from model knowledge first.
- After each step, evaluate sufficiency before proceeding.
## 3. Query Preparation
- Do NOT pass raw user question unless it already works well for retrieval.
- Rewrite for recall: extract entity, time scope, attributes, intent. Add synonyms, aliases, abbreviations, historical names, category terms.
- Expand list/extraction/overview/timeline queries more aggressively. Preserve meaning.
## 4. Retrieval Breadth (`top_k`)
- Apply `top_k` only to `rag_retrieve`. Use smallest sufficient value, expand if insufficient.
- `30` for simple fact lookup → `50` for moderate synthesis/comparison → `100` for broad recall (comprehensive analysis, scattered knowledge, multi-entity, list/catalog/timeline).
- Expansion order: `30 → 50 → 100`. If unsure, use `100`.
## 5. Result Evaluation
Treat as insufficient if: empty, `Error:`, `no excel files found`, off-topic, missing core entity/scope, no usable evidence, partial coverage, or truncated results.
## 6. Fallback and Sequential Retry
On insufficient results, follow this sequence:
1. Rewrite query, retry same tool (once)
2. Switch to next retrieval source in default order
3. For `rag_retrieve`, expand `top_k`: `30 → 50 → 100`
4. `table_rag_retrieve` insufficient → try `rag_retrieve`; `rag_retrieve` insufficient → try `table_rag_retrieve`
- `table_rag_retrieve` internally falls back to `rag_retrieve` on `no excel files found`, but this does NOT change the higher-level order.
- Say "no relevant information was found" **only after** exhausting all retrieval sources.
- Do NOT switch to local filesystem inspection at any point.
## 7. Table RAG Result Handling
- Follow all `[INSTRUCTION]` and `[EXTRA_INSTRUCTION]` in results.
- If truncated: tell user total (`N+M`), displayed (`N`), omitted (`M`).
- Cite sources using filenames from `file_ref_table`.
## 8. Citation Requirements
- MUST generate `<CITATION ... />` tags when using retrieval results.
- Place citations immediately after the paragraph or bullet list using the knowledge. Do NOT collect at end.
- 1-2 citations per paragraph/bullet. At least 1 citation when using retrieved knowledge.
## 9. Pre-Reply Self-Check
Before replying to a knowledge retrieval task, verify:
- Used only whitelisted retrieval tools — no local filesystem inspection?
- Exhausted retrieval flow before concluding "not found"?
- Citations placed immediately after each relevant paragraph?
If any answer is "no", correct the process first.

View File

@ -0,0 +1,251 @@
#!/usr/bin/env python3
"""
MCP服务器通用工具函数
提供路径处理文件验证请求处理等公共功能
"""
import json
import os
import sys
import asyncio
from typing import Any, Dict, List, Optional, Union
import re
def get_allowed_directory():
"""获取允许访问的目录"""
# 优先使用命令行参数传入的dataset_dir
if len(sys.argv) > 1:
dataset_dir = sys.argv[1]
return os.path.abspath(dataset_dir)
# 从环境变量读取项目数据目录
project_dir = os.getenv("PROJECT_DATA_DIR", "./projects/data")
return os.path.abspath(project_dir)
def resolve_file_path(file_path: str, default_subfolder: str = "default") -> str:
"""
解析文件路径支持 folder/document.txt document.txt 两种格式
Args:
file_path: 输入的文件路径
default_subfolder: 当只传入文件名时使用的默认子文件夹名称
Returns:
解析后的完整文件路径
"""
# 如果路径包含文件夹分隔符,直接使用
if '/' in file_path or '\\' in file_path:
clean_path = file_path.replace('\\', '/')
# 移除 projects/ 前缀(如果存在)
if clean_path.startswith('projects/'):
clean_path = clean_path[9:] # 移除 'projects/' 前缀
elif clean_path.startswith('./projects/'):
clean_path = clean_path[11:] # 移除 './projects/' 前缀
else:
# 如果只有文件名,添加默认子文件夹
clean_path = f"{default_subfolder}/{file_path}"
# 获取允许的目录
project_data_dir = get_allowed_directory()
# 尝试在项目目录中查找文件
full_path = os.path.join(project_data_dir, clean_path.lstrip('./'))
if os.path.exists(full_path):
return full_path
# 如果直接路径不存在,尝试递归查找
found = find_file_in_project(clean_path, project_data_dir)
if found:
return found
# 如果是纯文件名且在default子文件夹中不存在尝试在根目录查找
if '/' not in file_path and '\\' not in file_path:
root_path = os.path.join(project_data_dir, file_path)
if os.path.exists(root_path):
return root_path
raise FileNotFoundError(f"File not found: {file_path} (searched in {project_data_dir})")
def find_file_in_project(filename: str, project_dir: str) -> Optional[str]:
"""在项目目录中递归查找文件"""
# 如果filename包含路径只搜索指定的路径
if '/' in filename:
parts = filename.split('/')
target_file = parts[-1]
search_dir = os.path.join(project_dir, *parts[:-1])
if os.path.exists(search_dir):
target_path = os.path.join(search_dir, target_file)
if os.path.exists(target_path):
return target_path
else:
# 纯文件名,递归搜索整个项目目录
for root, dirs, files in os.walk(project_dir):
if filename in files:
return os.path.join(root, filename)
return None
def load_tools_from_json(tools_file_name: str) -> List[Dict[str, Any]]:
"""从 JSON 文件加载工具定义"""
try:
tools_file = os.path.join(os.path.dirname(__file__), tools_file_name)
if os.path.exists(tools_file):
with open(tools_file, 'r', encoding='utf-8') as f:
return json.load(f)
else:
# 如果 JSON 文件不存在,使用默认定义
return []
except Exception as e:
print(f"Warning: Unable to load tool definition JSON file: {str(e)}")
return []
def create_error_response(request_id: Any, code: int, message: str) -> Dict[str, Any]:
"""创建标准化的错误响应"""
return {
"jsonrpc": "2.0",
"id": request_id,
"error": {
"code": code,
"message": message
}
}
def create_success_response(request_id: Any, result: Any) -> Dict[str, Any]:
"""创建标准化的成功响应"""
return {
"jsonrpc": "2.0",
"id": request_id,
"result": result
}
def create_initialize_response(request_id: Any, server_name: str, server_version: str = "1.0.0") -> Dict[str, Any]:
"""创建标准化的初始化响应"""
return {
"jsonrpc": "2.0",
"id": request_id,
"result": {
"protocolVersion": "2024-11-05",
"capabilities": {
"tools": {}
},
"serverInfo": {
"name": server_name,
"version": server_version
}
}
}
def create_ping_response(request_id: Any) -> Dict[str, Any]:
"""创建标准化的ping响应"""
return {
"jsonrpc": "2.0",
"id": request_id,
"result": {
"pong": True
}
}
def create_tools_list_response(request_id: Any, tools: List[Dict[str, Any]]) -> Dict[str, Any]:
"""创建标准化的工具列表响应"""
return {
"jsonrpc": "2.0",
"id": request_id,
"result": {
"tools": tools
}
}
def is_regex_pattern(pattern: str) -> bool:
"""检测字符串是否为正则表达式模式"""
# 检查 /pattern/ 格式
if pattern.startswith('/') and pattern.endswith('/') and len(pattern) > 2:
return True
# 检查 r"pattern" 或 r'pattern' 格式
if pattern.startswith(('r"', "r'")) and pattern.endswith(('"', "'")) and len(pattern) > 3:
return True
# 检查是否包含正则特殊字符
regex_chars = {'*', '+', '?', '|', '(', ')', '[', ']', '{', '}', '^', '$', '\\', '.'}
return any(char in pattern for char in regex_chars)
def compile_pattern(pattern: str) -> Union[re.Pattern, str, None]:
"""编译正则表达式模式,如果不是正则则返回原字符串"""
if not is_regex_pattern(pattern):
return pattern
try:
# 处理 /pattern/ 格式
if pattern.startswith('/') and pattern.endswith('/'):
regex_body = pattern[1:-1]
return re.compile(regex_body)
# 处理 r"pattern" 或 r'pattern' 格式
if pattern.startswith(('r"', "r'")) and pattern.endswith(('"', "'")):
regex_body = pattern[2:-1]
return re.compile(regex_body)
# 直接编译包含正则字符的字符串
return re.compile(pattern)
except re.error as e:
# 如果编译失败返回None表示无效的正则
print(f"Warning: Regular expression '{pattern}' compilation failed: {e}")
return None
async def handle_mcp_streaming(request_handler):
"""处理MCP请求的标准主循环"""
try:
while True:
# Read from stdin
line = await asyncio.get_event_loop().run_in_executor(None, sys.stdin.readline)
if not line:
break
line = line.strip()
if not line:
continue
try:
request = json.loads(line)
response = await request_handler(request)
# Write to stdout
sys.stdout.write(json.dumps(response, ensure_ascii=False) + "\n")
sys.stdout.flush()
except json.JSONDecodeError:
error_response = {
"jsonrpc": "2.0",
"error": {
"code": -32700,
"message": "Parse error"
}
}
sys.stdout.write(json.dumps(error_response, ensure_ascii=False) + "\n")
sys.stdout.flush()
except Exception as e:
error_response = {
"jsonrpc": "2.0",
"error": {
"code": -32603,
"message": f"Internal error: {str(e)}"
}
}
sys.stdout.write(json.dumps(error_response, ensure_ascii=False) + "\n")
sys.stdout.flush()
except KeyboardInterrupt:
pass

View File

@ -7,6 +7,7 @@ RAG检索MCP服务器
import asyncio
import hashlib
import json
import re
import sys
import os
from typing import Any, Dict, List
@ -218,6 +219,12 @@ def table_rag_retrieve(query: str) -> Dict[str, Any]:
if "markdown" in response_data:
markdown_content = response_data["markdown"]
if re.search(r"^no excel files found", markdown_content, re.IGNORECASE):
rag_result = rag_retrieve(query)
content = rag_result.get("content", [])
if content and content[0].get("type") == "text":
content[0]["text"] = "No table_rag_retrieve results were found. The content below is the fallback result from rag_retrieve\n\n" + content[0]["text"]
return rag_result
return {
"content": [
{

View File

@ -0,0 +1,35 @@
[
{
"name": "rag_retrieve",
"description": "Retrieve relevant documents from the knowledge base. Returns markdown results. Use this tool first only for clearly pure concept, definition, workflow, policy, or explanation questions without structured data needs. If the result is insufficient, try table_rag_retrieve before replying with no result.",
"inputSchema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Retrieval query content. Rewrite the query when needed to improve recall."
},
"top_k": {
"type": "integer",
"description": "Number of top results to retrieve. Choose dynamically based on retrieval breadth and coverage needs.",
"default": 100
}
},
"required": ["query"]
}
},
{
"name": "table_rag_retrieve",
"description": "Retrieve relevant table data from Excel or spreadsheet files in the knowledge base. Returns markdown results. Use this tool first for structured data, lists, statistics, extraction, mixed questions, and unclear cases. If the result is insufficient, try rag_retrieve before replying with no result.",
"inputSchema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Retrieval query content for table data. Rewrite the query when needed to improve recall."
}
},
"required": ["query"]
}
}
]

View File

@ -0,0 +1,22 @@
{
"name": "rag-retrieve",
"description": "Provides RAG and table RAG retrieval tools through a PrePrompt hook and MCP server.",
"hooks": {
"PrePrompt": [
{
"type": "command",
"command": "python hooks/pre_prompt.py"
}
]
},
"mcpServers": {
"rag_retrieve": {
"transport": "stdio",
"command": "python",
"args": [
"./rag_retrieve_server.py",
"{bot_id}"
]
}
}
}

View File

@ -0,0 +1,99 @@
# RAG Retrieve
An example autoload skill that demonstrates how to integrate `rag-retrieve` and `table-rag-retrieve` through Claude Plugins hooks and an MCP server.
## Overview
This skill uses a `PrePrompt` hook to inject retrieval guidance into the prompt, and starts an MCP server that exposes retrieval capabilities for the current bot.
### PrePrompt Hook
Runs when the system prompt is loaded and injects retrieval policy content.
- File: `hooks/pre_prompt.py`
- Purpose: load retrieval instructions and add them to the prompt context
### MCP Server
Provides retrieval tools over stdio for the current `bot_id`.
- File: `rag_retrieve_server.py`
- Purpose: expose `rag-retrieve` and related retrieval tools to the agent
## Directory Structure
```text
rag-retrieve/
├── README.md # Skill documentation
├── .claude-plugin/
│ └── plugin.json # Hook and MCP server configuration
├── hooks/
│ ├── pre_prompt.py # PrePrompt hook script
│ └── retrieval-policy.md # Retrieval policy injected into the prompt
├── mcp_common.py # Shared MCP utilities
├── rag_retrieve_server.py # MCP server entrypoint
└── rag_retrieve_tools.json # Tool definitions
```
## `plugin.json` Format
```json
{
"name": "rag-retrieve",
"description": "rag-retrieve and table-rag-retrieve",
"hooks": {
"PrePrompt": [
{
"type": "command",
"command": "python hooks/pre_prompt.py"
}
]
},
"mcpServers": {
"rag_retrieve": {
"transport": "stdio",
"command": "python",
"args": [
"./skills_autoload/rag-retrieve/rag_retrieve_server.py",
"{bot_id}"
]
}
}
}
```
## Hook Script Behavior
The hook script runs as a subprocess, receives input through environment variables, and writes the injected content to stdout.
### Available Environment Variables
| Environment Variable | Description | Applies To |
|----------------------|-------------|------------|
| `ASSISTANT_ID` | Bot ID | All hooks |
| `USER_IDENTIFIER` | User identifier | All hooks |
| `SESSION_ID` | Session ID | All hooks |
| `LANGUAGE` | Language code | All hooks |
| `HOOK_TYPE` | Hook type | All hooks |
### PrePrompt Example
```python
#!/usr/bin/env python3
import os
import sys
def main():
user_identifier = os.environ.get('USER_IDENTIFIER', '')
bot_id = os.environ.get('ASSISTANT_ID', '')
print(f"## Retrieval Context\n\nUser: {user_identifier}\nBot: {bot_id}")
return 0
if __name__ == '__main__':
sys.exit(main())
```
## Example Use Cases
1. **Prompt-time retrieval guidance**: inject retrieval rules before the model starts reasoning
2. **Bot-specific retrieval setup**: start the MCP server with the current `bot_id`
3. **Unified retrieval access**: expose RAG and table RAG tools through a single skill

View File

@ -0,0 +1,55 @@
# Retrieval Policy
### 1. Retrieval Order and Tool Selection
- Follow this section for source choice, tool choice, query rewrite, `top_k`, fallback, result handling, and citations.
- Use this default retrieval order and execute it sequentially: skill-enabled knowledge retrieval tools > `rag_retrieve` / `table_rag_retrieve`.
- Do NOT answer from model knowledge first.
- Do NOT bypass the retrieval flow and inspect local filesystem documents on your own.
- Do NOT use local filesystem retrieval as a fallback knowledge source.
- Local filesystem documents are not a recommended retrieval source here because file formats are inconsistent and have not been normalized or parsed for reliable knowledge lookup.
- Knowledge must be retrieved through the supported knowledge tools only: skill-enabled retrieval scripts, `table_rag_retrieve`, and `rag_retrieve`.
- When a suitable skill-enabled knowledge retrieval tool is available, use it first.
- If no suitable skill-enabled retrieval tool is available, or if its result is insufficient, continue with `rag_retrieve` or `table_rag_retrieve`.
- Use `table_rag_retrieve` first for values, prices, quantities, inventory, specifications, rankings, comparisons, summaries, extraction, lists, tables, name lookup, historical coverage, mixed questions, and unclear cases.
- Use `rag_retrieve` first only for clearly pure concept, definition, workflow, policy, or explanation questions without structured data needs.
- After each retrieval step, evaluate sufficiency before moving to the next source. Do NOT run these retrieval sources in parallel.
### 2. Query Preparation
- Do NOT pass the raw user question unless it already works well for retrieval.
- Rewrite for recall: extract entity, time scope, attributes, and intent.
- Add useful variants: synonyms, aliases, abbreviations, related titles, historical names, and category terms.
- Expand list-style, extraction, overview, historical, roster, timeline, and archive queries more aggressively.
- Preserve meaning. Do NOT introduce unrelated topics.
### 3. Retrieval Breadth (`top_k`)
- Apply `top_k` only to `rag_retrieve`. Use the smallest sufficient value, then expand only if coverage is insufficient.
- Use `30` for simple fact lookup.
- Use `50` for moderate synthesis, comparison, summarization, or disambiguation.
- Use `100` for broad recall, such as comprehensive analysis, scattered knowledge, multiple entities or periods, or list / catalog / timeline / roster / overview requests.
- Raise `top_k` when keyword branches are many or results are too few, repetitive, incomplete, sparse, or too narrow.
- Use this expansion order: `30 -> 50 -> 100`. If unsure, use `100`.
### 4. Result Evaluation
- Treat results as insufficient if they are empty, start with `Error:`, say `no excel files found`, are off-topic, miss the core entity or scope, or provide no usable evidence.
- Also treat results as insufficient when they cover only part of the request, or when full-list, historical, comparison, or mixed data + explanation requests return only partial or truncated coverage.
### 5. Fallback and Sequential Retry
- If the first retrieval result is insufficient, call the next supported retrieval source in the default order before replying.
- `table_rag_retrieve` now performs an internal fallback to `rag_retrieve` when it returns `no excel files found`, but this does NOT change the higher-level retrieval order.
- If `table_rag_retrieve` is insufficient or empty, continue with `rag_retrieve`.
- If `rag_retrieve` is insufficient or empty, continue with `table_rag_retrieve`.
- Say no relevant information was found only after all applicable skill-enabled retrieval tools, `rag_retrieve`, and `table_rag_retrieve` have been tried and still do not provide enough evidence.
- Do NOT reply that no relevant information was found before the supported knowledge retrieval flow has been exhausted.
### 6. Table RAG Result Handling
- Follow all `[INSTRUCTION]` and `[EXTRA_INSTRUCTION]` content in `table_rag_retrieve` results.
- If results are truncated, explicitly tell the user total matches (`N+M`), displayed count (`N`), and omitted count (`M`).
- Cite data sources using filenames from `file_ref_table`.
### 7. Citation Requirements for Retrieved Knowledge
- When using knowledge from `rag_retrieve` or `table_rag_retrieve`, you MUST generate `<CITATION ... />` tags.
- Follow the citation format returned by each tool.
- Place citations immediately after the paragraph or bullet list that uses the knowledge.
- Do NOT collect citations at the end.
- Use 1-2 citations per paragraph or bullet list when possible.
- If learned knowledge is used, include at least 1 `<CITATION ... />`.

View File

@ -0,0 +1,20 @@
#!/usr/bin/env python3
"""
PreMemoryPrompt Hook - 用户上下文加载器示例
在记忆提取提示词FACT_RETRIEVAL_PROMPT加载时执行
读取同目录下的 memory_prompt.md 作为自定义记忆提取提示词模板
"""
import sys
from pathlib import Path
def main():
prompt_file = Path(__file__).parent / "retrieval-policy.md"
if prompt_file.exists():
print(prompt_file.read_text(encoding="utf-8"))
return 0
if __name__ == '__main__':
sys.exit(main())

View File

@ -0,0 +1,80 @@
# Retrieval Policy
## 0. Task Classification
Classify the request before acting:
- **Knowledge retrieval** (facts, summaries, comparisons, prices, lists, timelines, extraction, etc.): follow this policy strictly.
- **Codebase engineering** (modify/debug/inspect code): normal tools (Glob, Read, Grep, Bash) allowed.
- **Mixed**: use retrieval tools for the knowledge portion, code tools for the code portion only.
- **Uncertain**: default to knowledge retrieval.
## 1. Critical Enforcement
For knowledge retrieval tasks, **this policy overrides generic codebase exploration behavior**.
- **Prohibited tools**: `Glob`, `Read`, `LS`, Bash (`ls`, `find`, `cat`, `head`, `tail`, `grep`, etc.) — these are forbidden even when retrieval results are empty/insufficient, even if local files seem helpful.
- **Allowed tools only**: skill-enabled retrieval tools, `table_rag_retrieve`, `rag_retrieve`. No other source for factual answering.
- Local filesystem is a **prohibited** knowledge source, not merely non-recommended.
- Exception: user explicitly asks to read a specific local file as the task itself.
## 2. Retrieval Order and Tool Selection
Execute **sequentially, one at a time**. Do NOT run in parallel. Do NOT probe filesystem first.
1. **Skill-enabled retrieval tools** (use first when available)
2. **`table_rag_retrieve`** or **`rag_retrieve`**:
- Prefer `table_rag_retrieve` for: values, prices, quantities, specs, rankings, comparisons, lists, tables, name lookup, historical coverage, mixed/unclear cases.
- Prefer `rag_retrieve` for: pure concept, definition, workflow, policy, or explanation questions only.
- Do NOT answer from model knowledge first.
- After each step, evaluate sufficiency before proceeding.
## 3. Query Preparation
- Do NOT pass raw user question unless it already works well for retrieval.
- Rewrite for recall: extract entity, time scope, attributes, intent. Add synonyms, aliases, abbreviations, historical names, category terms.
- Expand list/extraction/overview/timeline queries more aggressively. Preserve meaning.
## 4. Retrieval Breadth (`top_k`)
- Apply `top_k` only to `rag_retrieve`. Use smallest sufficient value, expand if insufficient.
- `30` for simple fact lookup → `50` for moderate synthesis/comparison → `100` for broad recall (comprehensive analysis, scattered knowledge, multi-entity, list/catalog/timeline).
- Expansion order: `30 → 50 → 100`. If unsure, use `100`.
## 5. Result Evaluation
Treat as insufficient if: empty, `Error:`, `no excel files found`, off-topic, missing core entity/scope, no usable evidence, partial coverage, or truncated results.
## 6. Fallback and Sequential Retry
On insufficient results, follow this sequence:
1. Rewrite query, retry same tool (once)
2. Switch to next retrieval source in default order
3. For `rag_retrieve`, expand `top_k`: `30 → 50 → 100`
4. `table_rag_retrieve` insufficient → try `rag_retrieve`; `rag_retrieve` insufficient → try `table_rag_retrieve`
- `table_rag_retrieve` internally falls back to `rag_retrieve` on `no excel files found`, but this does NOT change the higher-level order.
- Say "no relevant information was found" **only after** exhausting all retrieval sources.
- Do NOT switch to local filesystem inspection at any point.
## 7. Table RAG Result Handling
- Follow all `[INSTRUCTION]` and `[EXTRA_INSTRUCTION]` in results.
- If truncated: tell user total (`N+M`), displayed (`N`), omitted (`M`).
- Cite sources using filenames from `file_ref_table`.
## 8. Citation Requirements
- MUST generate `<CITATION ... />` tags when using retrieval results.
- Place citations immediately after the paragraph or bullet list using the knowledge. Do NOT collect at end.
- 1-2 citations per paragraph/bullet. At least 1 citation when using retrieved knowledge.
## 9. Pre-Reply Self-Check
Before replying to a knowledge retrieval task, verify:
- Used only whitelisted retrieval tools — no local filesystem inspection?
- Exhausted retrieval flow before concluding "not found"?
- Citations placed immediately after each relevant paragraph?
If any answer is "no", correct the process first.

View File

@ -0,0 +1,251 @@
#!/usr/bin/env python3
"""
MCP服务器通用工具函数
提供路径处理文件验证请求处理等公共功能
"""
import json
import os
import sys
import asyncio
from typing import Any, Dict, List, Optional, Union
import re
def get_allowed_directory():
"""获取允许访问的目录"""
# 优先使用命令行参数传入的dataset_dir
if len(sys.argv) > 1:
dataset_dir = sys.argv[1]
return os.path.abspath(dataset_dir)
# 从环境变量读取项目数据目录
project_dir = os.getenv("PROJECT_DATA_DIR", "./projects/data")
return os.path.abspath(project_dir)
def resolve_file_path(file_path: str, default_subfolder: str = "default") -> str:
"""
解析文件路径支持 folder/document.txt document.txt 两种格式
Args:
file_path: 输入的文件路径
default_subfolder: 当只传入文件名时使用的默认子文件夹名称
Returns:
解析后的完整文件路径
"""
# 如果路径包含文件夹分隔符,直接使用
if '/' in file_path or '\\' in file_path:
clean_path = file_path.replace('\\', '/')
# 移除 projects/ 前缀(如果存在)
if clean_path.startswith('projects/'):
clean_path = clean_path[9:] # 移除 'projects/' 前缀
elif clean_path.startswith('./projects/'):
clean_path = clean_path[11:] # 移除 './projects/' 前缀
else:
# 如果只有文件名,添加默认子文件夹
clean_path = f"{default_subfolder}/{file_path}"
# 获取允许的目录
project_data_dir = get_allowed_directory()
# 尝试在项目目录中查找文件
full_path = os.path.join(project_data_dir, clean_path.lstrip('./'))
if os.path.exists(full_path):
return full_path
# 如果直接路径不存在,尝试递归查找
found = find_file_in_project(clean_path, project_data_dir)
if found:
return found
# 如果是纯文件名且在default子文件夹中不存在尝试在根目录查找
if '/' not in file_path and '\\' not in file_path:
root_path = os.path.join(project_data_dir, file_path)
if os.path.exists(root_path):
return root_path
raise FileNotFoundError(f"File not found: {file_path} (searched in {project_data_dir})")
def find_file_in_project(filename: str, project_dir: str) -> Optional[str]:
"""在项目目录中递归查找文件"""
# 如果filename包含路径只搜索指定的路径
if '/' in filename:
parts = filename.split('/')
target_file = parts[-1]
search_dir = os.path.join(project_dir, *parts[:-1])
if os.path.exists(search_dir):
target_path = os.path.join(search_dir, target_file)
if os.path.exists(target_path):
return target_path
else:
# 纯文件名,递归搜索整个项目目录
for root, dirs, files in os.walk(project_dir):
if filename in files:
return os.path.join(root, filename)
return None
def load_tools_from_json(tools_file_name: str) -> List[Dict[str, Any]]:
"""从 JSON 文件加载工具定义"""
try:
tools_file = os.path.join(os.path.dirname(__file__), tools_file_name)
if os.path.exists(tools_file):
with open(tools_file, 'r', encoding='utf-8') as f:
return json.load(f)
else:
# 如果 JSON 文件不存在,使用默认定义
return []
except Exception as e:
print(f"Warning: Unable to load tool definition JSON file: {str(e)}")
return []
def create_error_response(request_id: Any, code: int, message: str) -> Dict[str, Any]:
"""创建标准化的错误响应"""
return {
"jsonrpc": "2.0",
"id": request_id,
"error": {
"code": code,
"message": message
}
}
def create_success_response(request_id: Any, result: Any) -> Dict[str, Any]:
"""创建标准化的成功响应"""
return {
"jsonrpc": "2.0",
"id": request_id,
"result": result
}
def create_initialize_response(request_id: Any, server_name: str, server_version: str = "1.0.0") -> Dict[str, Any]:
"""创建标准化的初始化响应"""
return {
"jsonrpc": "2.0",
"id": request_id,
"result": {
"protocolVersion": "2024-11-05",
"capabilities": {
"tools": {}
},
"serverInfo": {
"name": server_name,
"version": server_version
}
}
}
def create_ping_response(request_id: Any) -> Dict[str, Any]:
"""创建标准化的ping响应"""
return {
"jsonrpc": "2.0",
"id": request_id,
"result": {
"pong": True
}
}
def create_tools_list_response(request_id: Any, tools: List[Dict[str, Any]]) -> Dict[str, Any]:
"""创建标准化的工具列表响应"""
return {
"jsonrpc": "2.0",
"id": request_id,
"result": {
"tools": tools
}
}
def is_regex_pattern(pattern: str) -> bool:
"""检测字符串是否为正则表达式模式"""
# 检查 /pattern/ 格式
if pattern.startswith('/') and pattern.endswith('/') and len(pattern) > 2:
return True
# 检查 r"pattern" 或 r'pattern' 格式
if pattern.startswith(('r"', "r'")) and pattern.endswith(('"', "'")) and len(pattern) > 3:
return True
# 检查是否包含正则特殊字符
regex_chars = {'*', '+', '?', '|', '(', ')', '[', ']', '{', '}', '^', '$', '\\', '.'}
return any(char in pattern for char in regex_chars)
def compile_pattern(pattern: str) -> Union[re.Pattern, str, None]:
"""编译正则表达式模式,如果不是正则则返回原字符串"""
if not is_regex_pattern(pattern):
return pattern
try:
# 处理 /pattern/ 格式
if pattern.startswith('/') and pattern.endswith('/'):
regex_body = pattern[1:-1]
return re.compile(regex_body)
# 处理 r"pattern" 或 r'pattern' 格式
if pattern.startswith(('r"', "r'")) and pattern.endswith(('"', "'")):
regex_body = pattern[2:-1]
return re.compile(regex_body)
# 直接编译包含正则字符的字符串
return re.compile(pattern)
except re.error as e:
# 如果编译失败返回None表示无效的正则
print(f"Warning: Regular expression '{pattern}' compilation failed: {e}")
return None
async def handle_mcp_streaming(request_handler):
"""处理MCP请求的标准主循环"""
try:
while True:
# Read from stdin
line = await asyncio.get_event_loop().run_in_executor(None, sys.stdin.readline)
if not line:
break
line = line.strip()
if not line:
continue
try:
request = json.loads(line)
response = await request_handler(request)
# Write to stdout
sys.stdout.write(json.dumps(response, ensure_ascii=False) + "\n")
sys.stdout.flush()
except json.JSONDecodeError:
error_response = {
"jsonrpc": "2.0",
"error": {
"code": -32700,
"message": "Parse error"
}
}
sys.stdout.write(json.dumps(error_response, ensure_ascii=False) + "\n")
sys.stdout.flush()
except Exception as e:
error_response = {
"jsonrpc": "2.0",
"error": {
"code": -32603,
"message": f"Internal error: {str(e)}"
}
}
sys.stdout.write(json.dumps(error_response, ensure_ascii=False) + "\n")
sys.stdout.flush()
except KeyboardInterrupt:
pass

View File

@ -0,0 +1,351 @@
#!/usr/bin/env python3
"""
RAG检索MCP服务器
调用本地RAG API进行文档检索
"""
import asyncio
import hashlib
import json
import re
import sys
import os
from typing import Any, Dict, List
try:
import requests
except ImportError:
print("Error: requests module is required. Please install it with: pip install requests")
sys.exit(1)
from mcp_common import (
create_error_response,
create_success_response,
create_initialize_response,
create_ping_response,
create_tools_list_response,
load_tools_from_json,
handle_mcp_streaming
)
BACKEND_HOST = os.getenv("BACKEND_HOST", "https://api-dev.gptbase.ai")
MASTERKEY = os.getenv("MASTERKEY", "master")
# Citation instruction prefixes injected into tool results
DOCUMENT_CITATION_INSTRUCTIONS = """<CITATION_INSTRUCTIONS>
When using the retrieved knowledge below, you MUST add XML citation tags for factual claims.
## Document Knowledge
Format: `<CITATION file="file_uuid" filename="name.pdf" page=3 />`
- Use `file` attribute with the UUID from document markers
- Use `filename` attribute with the actual filename from document markers
- Use `page` attribute (singular) with the page number
- `page` MUST be 0-based and must match the `pages:` values shown in the learned knowledge context
## Web Page Knowledge
Format: `<CITATION url="https://example.com/page" />`
- Use `url` attribute with the web page URL from the source metadata
- Do not use `file`, `filename`, or `page` attributes for web sources
- If content is grounded in a web source, prefer a web citation with `url` over a file citation
## Placement Rules
- Citations MUST appear IMMEDIATELY AFTER the paragraph or bullet list that uses the knowledge
- NEVER collect all citations and place them at the end of your response
- Limit to 1-2 citations per paragraph/bullet list
- If your answer uses learned knowledge, you MUST generate at least 1 `<CITATION ... />` in the response
</CITATION_INSTRUCTIONS>
"""
TABLE_CITATION_INSTRUCTIONS = """<CITATION_INSTRUCTIONS>
When using the retrieved table knowledge below, you MUST add XML citation tags for factual claims.
Format: `<CITATION file="file_id" filename="name.xlsx" sheet=1 rows=[2, 4] />`
- Parse `__src`: `F1S2R5` = file_ref F1, sheet 2, row 5
- Look up file_id in `file_ref_table`
- Combine same-sheet rows into one citation: `rows=[2, 4, 6]`
- MANDATORY: Create SEPARATE citation for EACH (file, sheet) combination
- NEVER put <CITATION> on the same line as a bullet point or table row
- Citations MUST be on separate lines AFTER the complete list/table
- NEVER include the `__src` column in your response - it is internal metadata only
- Citations MUST appear IMMEDIATELY AFTER the paragraph or bullet list that uses the knowledge
- NEVER collect all citations and place them at the end of your response
</CITATION_INSTRUCTIONS>
"""
def rag_retrieve(query: str, top_k: int = 100) -> Dict[str, Any]:
"""调用RAG检索API"""
try:
bot_id = ""
if len(sys.argv) > 1:
bot_id = sys.argv[1]
url = f"{BACKEND_HOST}/v1/rag_retrieve/{bot_id}"
if not url:
return {
"content": [
{
"type": "text",
"text": "Error: RAG API URL not provided. Please provide URL as command line argument."
}
]
}
# 获取masterkey并生成认证token
masterkey = MASTERKEY
token_input = f"{masterkey}:{bot_id}"
auth_token = hashlib.md5(token_input.encode()).hexdigest()
headers = {
"content-type": "application/json",
"authorization": f"Bearer {auth_token}"
}
data = {
"query": query,
"top_k": top_k
}
# 发送POST请求
response = requests.post(url, json=data, headers=headers, timeout=30)
if response.status_code != 200:
return {
"content": [
{
"type": "text",
"text": f"Error: RAG API returned status code {response.status_code}. Response: {response.text}"
}
]
}
# 解析响应
try:
response_data = response.json()
except json.JSONDecodeError as e:
return {
"content": [
{
"type": "text",
"text": f"Error: Failed to parse API response as JSON. Error: {str(e)}, Raw response: {response.text}"
}
]
}
# 提取markdown字段
if "markdown" in response_data:
markdown_content = response_data["markdown"]
return {
"content": [
{
"type": "text",
"text": DOCUMENT_CITATION_INSTRUCTIONS + markdown_content
}
]
}
else:
return {
"content": [
{
"type": "text",
"text": f"Error: 'markdown' field not found in API response. Response: {json.dumps(response_data, indent=2, ensure_ascii=False)}"
}
]
}
except requests.exceptions.RequestException as e:
return {
"content": [
{
"type": "text",
"text": f"Error: Failed to connect to RAG API. {str(e)}"
}
]
}
except Exception as e:
return {
"content": [
{
"type": "text",
"text": f"Error: {str(e)}"
}
]
}
def table_rag_retrieve(query: str) -> Dict[str, Any]:
"""调用Table RAG检索API"""
try:
bot_id = ""
if len(sys.argv) > 1:
bot_id = sys.argv[1]
url = f"{BACKEND_HOST}/v1/table_rag_retrieve/{bot_id}"
masterkey = MASTERKEY
token_input = f"{masterkey}:{bot_id}"
auth_token = hashlib.md5(token_input.encode()).hexdigest()
headers = {
"content-type": "application/json",
"authorization": f"Bearer {auth_token}"
}
data = {
"query": query,
}
response = requests.post(url, json=data, headers=headers, timeout=300)
if response.status_code != 200:
return {
"content": [
{
"type": "text",
"text": f"Error: Table RAG API returned status code {response.status_code}. Response: {response.text}"
}
]
}
try:
response_data = response.json()
except json.JSONDecodeError as e:
return {
"content": [
{
"type": "text",
"text": f"Error: Failed to parse API response as JSON. Error: {str(e)}, Raw response: {response.text}"
}
]
}
if "markdown" in response_data:
markdown_content = response_data["markdown"]
if re.search(r"^no excel files found", markdown_content, re.IGNORECASE):
rag_result = rag_retrieve(query)
content = rag_result.get("content", [])
if content and content[0].get("type") == "text":
content[0]["text"] = "No table_rag_retrieve results were found. The content below is the fallback result from rag_retrieve\n\n" + content[0]["text"]
return rag_result
return {
"content": [
{
"type": "text",
"text": TABLE_CITATION_INSTRUCTIONS + markdown_content
}
]
}
else:
return {
"content": [
{
"type": "text",
"text": f"Error: 'markdown' field not found in API response. Response: {json.dumps(response_data, indent=2, ensure_ascii=False)}"
}
]
}
except requests.exceptions.RequestException as e:
return {
"content": [
{
"type": "text",
"text": f"Error: Failed to connect to Table RAG API. {str(e)}"
}
]
}
except Exception as e:
return {
"content": [
{
"type": "text",
"text": f"Error: {str(e)}"
}
]
}
async def handle_request(request: Dict[str, Any]) -> Dict[str, Any]:
"""Handle MCP request"""
try:
method = request.get("method")
params = request.get("params", {})
request_id = request.get("id")
if method == "initialize":
return create_initialize_response(request_id, "rag-retrieve")
elif method == "ping":
return create_ping_response(request_id)
elif method == "tools/list":
# 从 JSON 文件加载工具定义
tools = load_tools_from_json("rag_retrieve_tools.json")
if not tools:
# 如果 JSON 文件不存在,使用默认定义
tools = [
{
"name": "rag_retrieve",
"description": "调用RAG检索API根据查询内容检索相关文档。返回包含相关内容的markdown格式结果。",
"inputSchema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "检索查询内容"
}
},
"required": ["query"]
}
}
]
return create_tools_list_response(request_id, tools)
elif method == "tools/call":
tool_name = params.get("name")
arguments = params.get("arguments", {})
if tool_name == "rag_retrieve":
query = arguments.get("query", "")
top_k = arguments.get("top_k", 100)
if not query:
return create_error_response(request_id, -32602, "Missing required parameter: query")
result = rag_retrieve(query, top_k)
return {
"jsonrpc": "2.0",
"id": request_id,
"result": result
}
elif tool_name == "table_rag_retrieve":
query = arguments.get("query", "")
if not query:
return create_error_response(request_id, -32602, "Missing required parameter: query")
result = table_rag_retrieve(query)
return {
"jsonrpc": "2.0",
"id": request_id,
"result": result
}
else:
return create_error_response(request_id, -32601, f"Unknown tool: {tool_name}")
else:
return create_error_response(request_id, -32601, f"Unknown method: {method}")
except Exception as e:
return create_error_response(request.get("id"), -32603, f"Internal error: {str(e)}")
async def main():
"""Main entry point."""
await handle_mcp_streaming(handle_request)
if __name__ == "__main__":
asyncio.run(main())

View File

@ -0,0 +1,35 @@
[
{
"name": "rag_retrieve",
"description": "Retrieve relevant documents from the knowledge base. Returns markdown results. Use this tool first only for clearly pure concept, definition, workflow, policy, or explanation questions without structured data needs. If the result is insufficient, try table_rag_retrieve before replying with no result.",
"inputSchema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Retrieval query content. Rewrite the query when needed to improve recall."
},
"top_k": {
"type": "integer",
"description": "Number of top results to retrieve. Choose dynamically based on retrieval breadth and coverage needs.",
"default": 100
}
},
"required": ["query"]
}
},
{
"name": "table_rag_retrieve",
"description": "Retrieve relevant table data from Excel or spreadsheet files in the knowledge base. Returns markdown results. Use this tool first for structured data, lists, statistics, extraction, mixed questions, and unclear cases. If the result is insufficient, try rag_retrieve before replying with no result.",
"inputSchema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Retrieval query content for table data. Rewrite the query when needed to improve recall."
}
},
"required": ["query"]
}
}
]

Some files were not shown because too many files have changed in this diff Show More