Compare commits

...

15 Commits

Author SHA1 Message Date
朱潮
f45f55b50a add /api/v3/llm/chat/completions 2026-06-07 10:57:34 +08:00
朱潮
f18d966123 add /api/v3/llm/chat/completions 2026-06-07 10:55:25 +08:00
朱潮
8466b0e710 add expense-approval-reviewer 2026-06-07 10:50:17 +08:00
朱潮
d009411360 Merge branch 'developing' into bot_manager 2026-06-07 10:26:43 +08:00
朱潮
bb74aee41b add table-query 2026-06-07 08:58:22 +08:00
朱潮
ecf332add5 modify ag_retrieve 2026-06-05 14:49:54 +08:00
朱潮
b618cb12d2 add mineru 2026-06-05 14:35:17 +08:00
朱潮
22b9ad4877 Merge branch 'feature/mcp-ui' into developing 2026-06-03 19:43:02 +08:00
朱潮
1925de0355 add feature memory 2026-06-01 11:51:21 +08:00
朱潮
96585886f8 skill categroy 2026-05-26 20:32:41 +08:00
朱潮
5594eab520 add category 2026-05-26 17:48:25 +08:00
朱潮
022781b145 merge 2026-05-26 17:43:43 +08:00
朱潮
3ada55af40 merge 2026-05-26 17:43:12 +08:00
朱潮
96ded5e598 Merge branch 'feature/mcp-ui' into bot_manager 2026-05-26 16:13:37 +08:00
朱潮
203dcf4a4e skill category 2026-05-26 16:13:26 +08:00
108 changed files with 5803 additions and 38 deletions

View File

@ -0,0 +1,55 @@
---
feature: "memory"
scope: "Agent 长期记忆能力(基于 Mem0 + pgvector跨会话回忆与事实提取存储"
updated_at: "2026-06-01"
status: active
---
# Memory记忆功能
## 当前状态
Agent 的长期记忆能力,底层使用 **Mem0** 库 + **pgvector**PostgreSQL 向量存储)。
在 agent 执行前 `recall` 相关记忆并注入 system prompt在执行后于后台线程异步提取并存储新事实。
`(user_id, agent_id)` 多租户隔离,每个 `agent_id` 一张 `mem0_{agent_id}` 集合表。
> 注意API/配置字段历史上叫 `memori`,为兼容性保留命名,内部实际用的是 **Mem0**
## 配置开关
| 层级 | 字段 | 默认 | 位置 |
|------|------|------|------|
| 全局总开关 | `MEM0_ENABLED` (env) | `true` | `utils/settings.py:80` |
| Agent 配置 | `enable_memori: bool` | `False` | `agent/agent_config.py:47` |
| API 请求 | `enable_memory: bool` | `False` | `utils/api_models.py:56` |
| 召回数量 | `memori_semantic_search_top_k: int` | `20` | `agent/agent_config.py:48` |
| 召回数量(env) | `MEM0_SEMANTIC_SEARCH_TOP_K` | `20` | `utils/settings.py:84` |
| 连接池大小 | `MEM0_POOL_SIZE` (env) | `50` | `utils/settings.py:61` |
开启路径V1 走请求体 `enable_memory`V2 走 bot 配置 `enable_memory`;两者都受全局 `MEM0_ENABLED` 限制。
中间件注册在 `agent/deep_assistant.py:270``if config.enable_memori:`)。
## 核心文件
- `agent/mem0_manager.py` — Mem0 客户端管理器:实例创建/LRU 缓存(最多 50、连接池管理、`recall_memories` / `add_memory` / `delete_all`、多租户隔离、`CustomMem0Embedding`、`json_repair` 补丁
- `agent/mem0_middleware.py` — 中间件:`before_agent` 召回并写入 `config._mem0_context`(行 114/155`after_agent` 后台异步提取存储
- `agent/mem0_config.py` — Mem0 配置类user/agent/session id、记忆提示模板、自定义提取 prompt 加载(`PreMemoryPrompt` hook
- `routes/memory.py` — 内存管理 APIGET/POST/DELETE供前端管理用户记忆
- `drop_mem0_tables.py` — 清理脚本,删除所有 `mem0_*` 表(重置/清脏数据)
## 数据流
**写入**User+Assistant 消息 → `after_agent`(后台线程)→ `add_memory``Mem0.add()`LLM 提取事实)→ pgvector 向量化存入 `mem0_{agent_id}`
**读取**User query → `before_agent``recall_memories``Mem0.search()`(向量相似 top_k→ 格式化后写入 `config._mem0_context` → 注入 system prompt也供思考功能 [[../thinking/MEMORY|thinking]] 使用)。
## 关键设计决策
- 复用项目已加载的 embedding 模型(`CustomMem0Embedding`),避免 Mem0 重复加载 SentenceTransformer → `decisions/2026-06-custom-embedding.md`
- 连接池主动释放 + LRU 缓存实例,防连接池耗尽 → `decisions/2026-06-connection-pool.md`
## Gotchas开发必读
- **命名陷阱**:配置叫 `enable_memori`(无 yAPI 叫 `enable_memory`,内部实现是 Mem0三个名字别混。
- **连接池耗尽**Mem0 PGVector `__init__` 取连接、`__del__` 释放;必须在每次操作后主动 `_release_connection()`,否则高并发会打满 `MEM0_POOL_SIZE`
- **JSON 脆弱**LLM 提取事实返回的 JSON 常有尾逗号/单引号,已 monkey patch 成 `json_repair.loads`,不要改回原生解析。
- **表膨胀**:每个 `agent_id` 一张表,多 bot 长期运行会产生大量表,定期用 `drop_mem0_tables.py` 清理。
- **Embedding 维度**`paraphrase-multilingual-MiniLM-L12-v2`384 维;换模型需同步 pgvector 列维度,否则写入报错。
## 索引
- 设计决策:`decisions/`
- 变更历史:`changelog/`
- 相关文档:`docs/`

View File

@ -0,0 +1,6 @@
# Changelog 2026 Q2 — Memory
## 2026-06-01
- 初始化 feature memory 文档。
- 记录现状Mem0 + pgvector 长期记忆,`before_agent` 召回注入 / `after_agent` 后台提取存储。
- 归档设计决策:自定义 embedding 复用custom-embedding、连接池主动释放 + LRUconnection-pool

View File

@ -0,0 +1,25 @@
---
date: "2026-06-01"
status: adopted
topic: "connection-pool"
impact: [memory, performance, stability]
---
# 连接池主动释放 + Mem0 实例 LRU 缓存
## 背景
Mem0 的 PGVector 后端在实例 `__init__` 时从连接池取一个连接,理论上在 `__del__` 时归还。
但 Python GC 时机不确定,高并发下连接迟迟不归还会迅速打满 `MEM0_POOL_SIZE`(默认 50导致后续请求阻塞。
同时若为每个 `(user_id, agent_id)` 都新建 Mem0 实例且不回收,也会无限占用连接。
## 决策
1. `Mem0Manager``OrderedDict` 维护最多 50 个 Mem0 实例的 LRU 缓存,超出淘汰最旧的。
2. 每次记忆操作recall/add后调用 `_release_connection()` 立即把连接归还连接池,不等 GC。
## 影响
- 连接池不再被慢 GC 拖垮,高并发稳定。
- 实例数量有上界,内存可控。
## Gotchas
- 不要在操作链路里持有 Mem0 实例的连接跨多个 await会绕过释放逻辑。
- LRU 上限50`MEM0_POOL_SIZE`50相关联调整其一时需一并评估。

View File

@ -0,0 +1,22 @@
---
date: "2026-06-01"
status: adopted
topic: "custom-embedding"
impact: [memory, performance]
---
# 复用项目 embedding 模型而非 Mem0 自带 SentenceTransformer
## 背景
Mem0 默认会自行加载一个 SentenceTransformer 做 embedding。项目本身已经通过 `GlobalModelManager`
加载了 `paraphrase-multilingual-MiniLM-L12-v2`384 维)。若放任 Mem0 自加载,会出现同一模型在内存中加载两份,浪费显存/内存。
## 决策
`agent/mem0_manager.py` 实现 `CustomMem0Embedding`,把 Mem0 的 embedder 接到项目已加载的全局模型上,复用同一份权重。
## 影响
- 内存占用显著下降(不重复加载模型)。
- embedding 维度固定为 384与项目主模型一致换模型时 pgvector 列维度必须同步调整。
## 备注
相关连接池/实例缓存策略见 [[2026-06-connection-pool]]。

View File

View File

@ -0,0 +1,52 @@
---
feature: "thinking"
scope: "Agent 思考功能(基于 GuidelineMiddleware 的前置辅助推理),在主回答前生成一次 <think> 内容"
updated_at: "2026-06-01"
status: active
---
# Thinking思考功能
## 当前状态
思考功能通过自定义的 **`GuidelineMiddleware`** 实现:在主 agent 执行前,先用业务指引 prompt 调一次模型做"思考"
把结果包成 `<think>...</think>` 标签并打上 `message_tag: "THINK"` 元数据,供前端识别/折叠展示。
> 重要:这是"主请求前的一次辅助请求"**不是** Qwen 模型内置的 reasoning/extended-thinking 模式,因此与具体模型无关,任何 LLM 都能用。对标 OpenAI o1 / Claude thinking但实现更轻。
## 配置开关
| 层级 | 字段 | 默认 | 位置 |
|------|------|------|------|
| Agent 配置 | `enable_thinking: bool` | `False` | `agent/agent_config.py:26` |
| API 请求 | `enable_thinking: bool` | `False` | `utils/api_models.py:54` |
开启路径V1 走请求体 `enable_thinking`V2 走 bot 配置 `enable_thinking`
中间件注册在 `agent/deep_assistant.py:294``if config.enable_thinking: middleware.append(GuidelineMiddleware(...))`。
## 核心文件
- `agent/guideline_middleware.py` — 思考主逻辑。`get_guideline_prompt`(行 53+)组装指引 prompt`before_agent`/`abefore_agent` 调模型生成思考,包 `<think>` 标签并标 `THINK`(行 120-124 / 146-149
- `agent/deep_assistant.py:294-295` — 按 `enable_thinking` 注册中间件。
## 数据流
1. `before_agent` 加载指引system prompt 中的 Guidelines 块)。
2. 从 system prompt 提取 guidelines / tool_description / scenarios / terms_list。
3. 组装 `guideline_prompt` = 业务规则 + 聊天历史 + **记忆上下文** + 工具描述 + 场景 + 术语分析。
4. 调模型一次:`SystemMessage(guideline_prompt)` + 用户最后一条消息 → 得到思考内容。
5. 内容包成 `<think>...</think>``additional_kwargs["message_tag"] = "THINK"`。
6. 追加一条空 `HumanMessage`(兼容"最后必须是 user 消息"的模型)。
7. 主 agent 继续执行,产出正式回答。
## 与记忆功能的耦合
`guideline_middleware.py:63` 读取 `config._mem0_context`(由 [[../memory/MEMORY|memory]] 的 `before_agent` 写入)。
即:思考阶段会把已召回的长期记忆纳入指引 prompt从而基于记忆做更好的分析。
**顺序依赖**memory 中间件需在 thinking 之前执行,`_mem0_context` 才有值。
## Gotchas开发必读
- **思考是非流式的**:思考内容在 `before_agent` 一次性完整生成,只有正式回答才流式输出。前端靠 `<think>` 标签 + `message_tag:"THINK"` 折叠展示。
- **额外一次模型调用**:每次开启都多打一次 LLM 请求,增加延迟和成本,按场景权衡。
- **不是模型原生 reasoning**:别误以为依赖 `enable_thinking` 透传给 Qwen它是中间件层的自定义实现。
- **空 HumanMessage 收尾**:思考消息后会补一条空 user 消息,改消息列表处理逻辑时勿误删。
- **依赖记忆上下文顺序**:若调整中间件注册顺序,确认 memory 仍在 thinking 之前。
## 索引
- 设计决策:`decisions/`
- 变更历史:`changelog/`

View File

@ -0,0 +1,7 @@
# Changelog 2026 Q2 — Thinking
## 2026-06-01
- 初始化 feature memory 文档。
- 记录现状:`GuidelineMiddleware` 在 `before_agent` 生成 `<think>` 思考内容,标 `message_tag:"THINK"`
- 归档设计决策:用中间件实现而非模型原生 reasoningmiddleware-thinking
- 记录与 memory 功能的顺序耦合(依赖 `_mem0_context`)。

View File

@ -0,0 +1,28 @@
---
date: "2026-06-01"
status: adopted
topic: "middleware-thinking"
impact: [thinking, model-compat]
---
# 用中间件实现思考,而非依赖模型原生 reasoning
## 背景
"思考功能"可以有两种实现:
A. 透传 `enable_thinking` 给底层模型,依赖模型自带的 reasoning/extended-thinking 能力。
B. 在主请求前自己加一次"指引思考"的辅助 LLM 调用。
模型 A 路线要求底层模型支持原生 reasoning且不同模型行为/输出格式不一致,难以统一前端处理。
## 决策
采用 B实现 `GuidelineMiddleware`,在 `before_agent` 阶段用业务指引 prompt 调一次模型生成思考,
统一包成 `<think>...</think>` + `message_tag:"THINK"`
## 影响
- 与具体模型解耦,任何 LLMOpenAI/Claude/Qwen都能用。
- 思考阶段可注入业务规则、工具描述、术语分析、记忆上下文,可控性强。
- 代价:每次多一次 LLM 调用(延迟 + 成本);思考内容非流式。
## Gotchas
- 思考依赖 `config._mem0_context`,需保证 memory 中间件先于本中间件执行。
- 思考后补空 `HumanMessage` 以兼容"末条须为 user"的模型,勿删。

View File

@ -18,8 +18,10 @@ from utils.fastapi_utils import (
process_messages, process_messages,
create_project_directory, extract_api_key_from_auth, generate_v2_auth_token, fetch_bot_config, fetch_bot_config_from_db, create_project_directory, extract_api_key_from_auth, generate_v2_auth_token, fetch_bot_config, fetch_bot_config_from_db,
call_preamble_llm, call_preamble_llm,
create_stream_chunk create_stream_chunk,
detect_provider, sanitize_model_kwargs
) )
from langchain.chat_models import init_chat_model
from langchain_core.messages import AIMessageChunk, ToolMessage, AIMessage, HumanMessage from langchain_core.messages import AIMessageChunk, ToolMessage, AIMessage, HumanMessage
from utils.settings import MAX_OUTPUT_TOKENS from utils.settings import MAX_OUTPUT_TOKENS
from agent.agent_config import AgentConfig from agent.agent_config import AgentConfig
@ -968,6 +970,135 @@ async def chat_completions_v3(request: ChatRequestV3, authorization: Optional[st
raise HTTPException(status_code=500, detail=f"Internal server error: {str(e)}") raise HTTPException(status_code=500, detail=f"Internal server error: {str(e)}")
async def build_llm_from_bot_config(bot_id: str, user_identifier: Optional[str] = None):
"""Build a direct LLM client from a bot's database config.
Reuses the v3 config-loading chain to resolve model / api_key / model_server,
then constructs a LangChain chat model without any agent logic.
Returns:
tuple: (llm_instance, model_name)
"""
bot_config = await fetch_bot_config_from_db(bot_id, user_identifier)
model_name = bot_config.get("model", "")
api_key = bot_config.get("api_key", "")
model_server = bot_config.get("model_server", "")
if not model_name:
raise HTTPException(status_code=400, detail=f"No model configured for bot '{bot_id}'")
# Detect provider and sanitize kwargs (same as the agent path)
model_provider, base_url = detect_provider(model_name, model_server)
model_kwargs, _, _ = sanitize_model_kwargs(
model_name=model_name,
model_provider=model_provider,
base_url=base_url,
api_key=api_key,
generate_cfg={},
source="llm_passthrough"
)
llm = init_chat_model(**model_kwargs)
return llm, model_name
@router.post("/api/v3/llm/chat/completions")
async def llm_passthrough_v3(request: ChatRequestV3, authorization: Optional[str] = Header(None)):
"""LLM passthrough API - direct LLM call, bypassing all agent logic.
Only model / api_key / model_server are read from the bot's database config
(resolved via bot_id). Messages are forwarded to the LLM as-is.
Required Parameters:
- bot_id: str - target bot id (used to look up LLM config from db)
- messages: List[Message] - conversation messages, passed through directly
Optional Parameters:
- stream: bool - whether to stream the output, default false
- user_identifier: str - used to resolve the api_key owner
Authentication:
- Authorization header is required: Bearer <token>
- token = md5(MASTERKEY:bot_id), same scheme as the v2 API
Returns:
Union[dict, StreamingResponse]: OpenAI-compatible completion or stream
"""
try:
bot_id = request.bot_id
if not bot_id:
raise HTTPException(status_code=400, detail="bot_id is required")
# Authentication validation (same auth logic as v2: token = md5(MASTERKEY:bot_id))
expected_token = generate_v2_auth_token(bot_id)
provided_token = extract_api_key_from_auth(authorization)
if not provided_token:
raise HTTPException(
status_code=401,
detail="Authorization header is required"
)
if provided_token != expected_token:
raise HTTPException(
status_code=403,
detail=f"Invalid authentication token. Expected: {expected_token[:8]}..., Provided: {provided_token[:8]}..."
)
# Build the LLM client from db config
llm, model_name = await build_llm_from_bot_config(bot_id, request.user_identifier)
# Forward messages as-is (pure passthrough, no agent processing)
lc_messages = [{"role": msg.role, "content": msg.content} for msg in request.messages]
chunk_id = f"chatcmpl-{int(time.time())}"
# Streaming response
if request.stream:
async def generate():
try:
async for chunk in llm.astream(lc_messages):
content = chunk.content if isinstance(chunk.content, str) else str(chunk.content)
if content:
data = create_stream_chunk(chunk_id, model_name, content=content)
yield f"data: {json.dumps(data, ensure_ascii=False)}\n\n"
# Final chunk with finish_reason
done = create_stream_chunk(chunk_id, model_name, finish_reason="stop")
yield f"data: {json.dumps(done, ensure_ascii=False)}\n\n"
yield "data: [DONE]\n\n"
except Exception as stream_error:
logger.error(f"Error in LLM passthrough stream: {stream_error}")
err = {"error": {"message": str(stream_error), "type": "internal_error"}}
yield f"data: {json.dumps(err, ensure_ascii=False)}\n\n"
return StreamingResponse(generate(), media_type="text/event-stream")
# Non-streaming response
response = await llm.ainvoke(lc_messages)
content = response.content if isinstance(response.content, str) else str(response.content)
return {
"id": chunk_id,
"object": "chat.completion",
"created": int(time.time()),
"model": model_name,
"choices": [{
"index": 0,
"message": {"role": "assistant", "content": content},
"finish_reason": "stop"
}]
}
except HTTPException:
raise
except Exception as e:
error_details = traceback.format_exc()
logger.error(f"Error in llm_passthrough_v3: {str(e)}")
logger.error(f"Full traceback: {error_details}")
raise HTTPException(status_code=500, detail=f"Internal server error: {str(e)}")
# ============================================================================ # ============================================================================
# Chat history query endpoints # Chat history query endpoints
# ============================================================================ # ============================================================================

View File

@ -22,6 +22,7 @@ class SkillItem(BaseModel):
name: str name: str
description: str description: str
user_skill: bool = False user_skill: bool = False
category: str = "other"
class SkillListResponse(BaseModel): class SkillListResponse(BaseModel):
@ -35,6 +36,7 @@ class SkillValidationResult:
valid: bool valid: bool
name: Optional[str] = None name: Optional[str] = None
description: Optional[str] = None description: Optional[str] = None
category: Optional[str] = None
error_message: Optional[str] = None error_message: Optional[str] = None
@ -267,7 +269,8 @@ def parse_plugin_json(plugin_json_path: str) -> SkillValidationResult:
return SkillValidationResult( return SkillValidationResult(
valid=True, valid=True,
name=plugin_config['name'], name=plugin_config['name'],
description=plugin_config['description'] description=plugin_config['description'],
category=plugin_config.get('category'),
) )
except json.JSONDecodeError as e: except json.JSONDecodeError as e:
@ -334,7 +337,8 @@ def parse_skill_frontmatter(skill_md_path: str) -> SkillValidationResult:
return SkillValidationResult( return SkillValidationResult(
valid=True, valid=True,
name=metadata['name'], name=metadata['name'],
description=metadata['description'] description=metadata['description'],
category=metadata.get('category'),
) )
except yaml.YAMLError as e: except yaml.YAMLError as e:
@ -410,10 +414,13 @@ def get_skill_metadata_legacy(skill_path: str) -> Optional[dict]:
""" """
result = get_skill_metadata(skill_path) result = get_skill_metadata(skill_path)
if result.valid: if result.valid:
return { ret = {
'name': result.name, 'name': result.name,
'description': result.description 'description': result.description,
} }
if result.category:
ret['category'] = result.category
return ret
return None return None
@ -456,7 +463,8 @@ def get_official_skills(base_dir: str) -> List[SkillItem]:
skills.append(SkillItem( skills.append(SkillItem(
name=metadata['name'], name=metadata['name'],
description=metadata['description'], description=metadata['description'],
user_skill=False user_skill=False,
category=metadata.get('category', 'other'),
)) ))
skill_names.add(skill_name) skill_names.add(skill_name)
logger.debug(f"Found official skill: {metadata['name']} from {official_skills_dir}") logger.debug(f"Found official skill: {metadata['name']} from {official_skills_dir}")
@ -489,7 +497,8 @@ def get_user_skills(base_dir: str, bot_id: str) -> List[SkillItem]:
skills.append(SkillItem( skills.append(SkillItem(
name=metadata['name'], name=metadata['name'],
description=metadata['description'], description=metadata['description'],
user_skill=True user_skill=True,
category=metadata.get('category', 'custom'),
)) ))
logger.debug(f"Found user skill: {metadata['name']}") logger.debug(f"Found user skill: {metadata['name']}")

View File

@ -18,5 +18,6 @@
"{bot_id}" "{bot_id}"
] ]
} }
} },
"category": "Data & Retrieval"
} }

View File

@ -314,7 +314,12 @@ async def handle_request(request: Dict[str, Any]) -> Dict[str, Any]:
top_k = arguments.get("top_k", 100) top_k = arguments.get("top_k", 100)
if not query: if not query:
return create_error_response(request_id, -32602, "Missing required parameter: query") return create_success_response(request_id, {
"content": [{
"type": "text",
"text": "Error: missing required parameter 'query'. Please call this tool again with a non-empty 'query' argument describing what you want to retrieve."
}]
})
result = rag_retrieve(query, top_k, trace_id) result = rag_retrieve(query, top_k, trace_id)
@ -328,7 +333,12 @@ async def handle_request(request: Dict[str, Any]) -> Dict[str, Any]:
query = arguments.get("query", "") query = arguments.get("query", "")
if not query: if not query:
return create_error_response(request_id, -32602, "Missing required parameter: query") return create_success_response(request_id, {
"content": [{
"type": "text",
"text": "Error: missing required parameter 'query'. Please call this tool again with a non-empty 'query' argument describing what you want to retrieve."
}]
})
result = table_rag_retrieve(query, trace_id) result = table_rag_retrieve(query, trace_id)

View File

@ -18,5 +18,6 @@
"{bot_id}" "{bot_id}"
] ]
} }
} },
"category": "Data & Retrieval"
} }

View File

@ -314,7 +314,12 @@ async def handle_request(request: Dict[str, Any]) -> Dict[str, Any]:
top_k = arguments.get("top_k", 100) top_k = arguments.get("top_k", 100)
if not query: if not query:
return create_error_response(request_id, -32602, "Missing required parameter: query") return create_success_response(request_id, {
"content": [{
"type": "text",
"text": "Error: missing required parameter 'query'. Please call this tool again with a non-empty 'query' argument describing what you want to retrieve."
}]
})
result = rag_retrieve(query, top_k, trace_id) result = rag_retrieve(query, top_k, trace_id)
@ -328,7 +333,12 @@ async def handle_request(request: Dict[str, Any]) -> Dict[str, Any]:
query = arguments.get("query", "") query = arguments.get("query", "")
if not query: if not query:
return create_error_response(request_id, -32602, "Missing required parameter: query") return create_success_response(request_id, {
"content": [{
"type": "text",
"text": "Error: missing required parameter 'query'. Please call this tool again with a non-empty 'query' argument describing what you want to retrieve."
}]
})
result = table_rag_retrieve(query, trace_id) result = table_rag_retrieve(query, trace_id)

View File

@ -1,6 +1,7 @@
{ {
"name": "data-dashboard", "name": "data-dashboard",
"description": "Renders data as an interactive dashboard card UI using the mcp-ui protocol.", "description": "Renders data as an interactive dashboard card UI using the mcp-ui protocol.",
"category": "Interactive UI",
"hooks": { "hooks": {
"PrePrompt": [ "PrePrompt": [
{ {

View File

@ -2,6 +2,7 @@
name: docx name: docx
description: "Comprehensive document creation, editing, and analysis with support for tracked changes, comments, formatting preservation, and text extraction. When Claude needs to work with professional documents (.docx files) for: (1) Creating new documents, (2) Modifying or editing content, (3) Working with tracked changes, (4) Adding comments, or any other document tasks" description: "Comprehensive document creation, editing, and analysis with support for tracked changes, comments, formatting preservation, and text extraction. When Claude needs to work with professional documents (.docx files) for: (1) Creating new documents, (2) Modifying or editing content, (3) Working with tracked changes, (4) Adding comments, or any other document tasks"
license: Proprietary. LICENSE.txt has complete terms license: Proprietary. LICENSE.txt has complete terms
category: Document Processing
--- ---
# DOCX creation, editing, and analysis # DOCX creation, editing, and analysis

View File

@ -16,6 +16,7 @@ metadata:
- node - node
- npm - npm
primaryEnv: SMTP_PASS primaryEnv: SMTP_PASS
category: Communication
--- ---
# IMAP/SMTP Email Tool # IMAP/SMTP Email Tool

View File

@ -13,7 +13,11 @@
"mcp_ui": { "mcp_ui": {
"transport": "stdio", "transport": "stdio",
"command": "python", "command": "python",
"args": ["./ui_render_server.py", "{bot_id}"] "args": [
"./ui_render_server.py",
"{bot_id}"
]
} }
} },
"category": "Interactive UI"
} }

View File

@ -2,6 +2,7 @@
name: pdf name: pdf
description: Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When Claude needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale. description: Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When Claude needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.
license: Proprietary. LICENSE.txt has complete terms license: Proprietary. LICENSE.txt has complete terms
category: Document Processing
--- ---
# PDF Processing Guide # PDF Processing Guide

View File

@ -2,6 +2,7 @@
name: pptx name: pptx
description: "Presentation creation, editing, and analysis. When Claude needs to work with presentations (.pptx files) for: (1) Creating new presentations, (2) Modifying or editing content, (3) Working with layouts, (4) Adding comments or speaker notes, or any other presentation tasks" description: "Presentation creation, editing, and analysis. When Claude needs to work with presentations (.pptx files) for: (1) Creating new presentations, (2) Modifying or editing content, (3) Working with layouts, (4) Adding comments or speaker notes, or any other presentation tasks"
license: Proprietary. LICENSE.txt has complete terms license: Proprietary. LICENSE.txt has complete terms
category: Document Processing
--- ---
# PPTX creation, editing, and analysis # PPTX creation, editing, and analysis

View File

@ -5,6 +5,7 @@ compatibility: Requires Python 3.8+ and PyYAML. Uses AWS SigV4 signing (no exter
metadata: metadata:
author: foundra author: foundra
version: "2.1" version: "2.1"
category: Web Services
--- ---
# R2 Upload # R2 Upload

View File

@ -8,5 +8,6 @@
"command": "python scripts/schedule_manager.py list --format brief" "command": "python scripts/schedule_manager.py list --format brief"
} }
] ]
} },
"category": "Task Scheduling"
} }

View File

@ -1,6 +1,7 @@
--- ---
name: schedule-job name: schedule-job
description: Scheduled Task Management - Create, manage, and view scheduled tasks for users (supports cron recurring tasks and one-time tasks) description: Scheduled Task Management - Create, manage, and view scheduled tasks for users (supports cron recurring tasks and one-time tasks)
category: Task Scheduling
--- ---
# Schedule Job - Scheduled Task Management # Schedule Job - Scheduled Task Management

View File

@ -1,6 +1,7 @@
--- ---
name: skill-creator name: skill-creator
description: Create new skills, modify and improve existing skills, and measure skill performance. Use when users want to create a skill from scratch, update or optimize an existing skill, run evals to test a skill, benchmark skill performance with variance analysis, or optimize a skill's description for better triggering accuracy. description: Create new skills, modify and improve existing skills, and measure skill performance. Use when users want to create a skill from scratch, update or optimize an existing skill, run evals to test a skill, benchmark skill performance with variance analysis, or optimize a skill's description for better triggering accuracy.
category: Developer Tools
--- ---
# Skill Creator # Skill Creator

View File

@ -2,6 +2,7 @@
name: xlsx name: xlsx
description: "Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas" description: "Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas"
license: Proprietary. LICENSE.txt has complete terms license: Proprietary. LICENSE.txt has complete terms
category: Document Processing
--- ---
# Requirements for Outputs # Requirements for Outputs

View File

@ -2,6 +2,7 @@
name: ai-ppt-generator name: ai-ppt-generator
description: Generate PPT with Baidu AI. Smart template selection based on content. description: Generate PPT with Baidu AI. Smart template selection based on content.
metadata: { "openclaw": { "emoji": "📑", "requires": { "bins": ["python3"], "env":["BAIDU_API_KEY"]},"primaryEnv":"BAIDU_API_KEY" } } metadata: { "openclaw": { "emoji": "📑", "requires": { "bins": ["python3"], "env":["BAIDU_API_KEY"]},"primaryEnv":"BAIDU_API_KEY" } }
category: Document Processing
--- ---
# AI PPT Generator # AI PPT Generator
@ -82,4 +83,4 @@ python3 scripts/random_ppt_theme.py --query "企业年度总结" --category "企
- **API integration**: Fetches real style_id from Baidu API for each template - **API integration**: Fetches real style_id from Baidu API for each template
- **Error handling**: If template not found, falls back to random selection - **Error handling**: If template not found, falls back to random selection
- **Timeout**: Generation takes 2-5 minutes, set sufficient timeout - **Timeout**: Generation takes 2-5 minutes, set sufficient timeout
- **Streaming**: Uses streaming API, wait for `is_end: true` before considering complete - **Streaming**: Uses streaming API, wait for `is_end: true` before considering complete

View File

@ -8,5 +8,6 @@
}, },
"skills": [ "skills": [
"./skills/catalog-search-agent" "./skills/catalog-search-agent"
] ],
"category": "Data & Retrieval"
} }

View File

@ -13,7 +13,11 @@
"ecommerce_storefront": { "ecommerce_storefront": {
"transport": "stdio", "transport": "stdio",
"command": "python", "command": "python",
"args": ["./ecommerce_server.py", "{bot_id}"] "args": [
"./ecommerce_server.py",
"{bot_id}"
]
} }
} },
"category": "Developer Tools"
} }

View File

@ -0,0 +1,195 @@
---
name: expense-approval-reviewer
description: 对报销单据(差旅/餐饮/办公等费用报销)做合规性与真实性审核,逐项检查发票、金额、费用类型、事由、预算、重复/拆分报销等风险点给出明确的审核结论和建议动作。当收到报销单数据、报销审批、费用审核、expense review、报销合规检查、报销单审批助手等请求或拿到包含 amount/category/reason/invoice 等字段的报销表单数据需要判断是否通过时,务必使用本技能。只输出结构化文本,不要输出 JSON。
category: Compliance & Security
---
# 报销审批审核助手Expense Approval Reviewer
## Overview
本技能面向企业 OA 报销流程,对一张报销单据做**自动初审**,识别合规与真实性风险,并给出一个清晰的审核结论:**通过 / 需关注 / 审批不通过**。
定位说明:
- 你是**初审 agent**,只负责审核并**输出文本结论**。
- 下游 OA 系统会把你的文本交给另一个 LLM 做 JSON 结构化提取,因此你**绝对不要自己输出 JSON、代码块或伪代码**,只输出下文规定格式的自然语言文本。
- 你的结论用于决定单据是被**退回发起人修改**(审批不通过)还是**进入人工审批**(通过/需关注),所以判定要稳定、可解释。
## Triggering Cues
出现以下任一情况就使用本技能:
- 中文:报销审批、报销审核、费用审核、报销单初审、报销合规检查、发票审核、差旅报销审核
- 英文expense review, reimbursement approval, expense compliance check, invoice review
- 收到一段报销表单数据(含 `amount` 金额、`category` 费用类型、`reason` 事由、`invoice_img`/发票 等字段),要求判断是否可以通过审批。
## 输入Input
通常会收到一张报销单的字段数据,常见字段:
| 字段 key | 含义 | 说明 |
|---|---|---|
| `amount` | 报销金额(元) | 必填 |
| `category` | 费用类型 | travel(差旅) / meal(餐饮) / office(办公用品) / other(其他) |
| `reason` | 报销事由 | 自由文本 |
| `invoice_img` | 发票照片/凭证 | URL 或附件标识,**为空表示未上传发票** |
| `date` / `occurred_at` | 费用发生日期 | 可能没有 |
| `creator` / `dept` | 发起人/部门 | 可能没有 |
字段缺失时:必填项(金额、发票)缺失要明确指出;可选上下文(日期、部门、历史记录)缺失时不要因此卡住,按“信息不足”温和提示即可。
## 审核要点清单(核心)
逐项检查以下 8 类。每条标注:**检查什么 → 何时判为问题 → 严重度**。严重度分 `高/中/低`
### 1. 发票与凭证完整性 —— 字段 `invoice_img`
- 检查:是否提供发票/凭证。
- 异常:**未上传发票** → 无法核验真实性。
- 严重度:**高**(硬性缺陷,通常直接“审批不通过”退回补票)。
### 2. 金额合规性 —— 字段 `amount`
- 检查:单笔金额是否超过限额、是否为 0 或负数、是否明显异常。
- 异常:
- 单笔 > 10000 元 → 超单笔阈值,需补充审批说明(**中**,进人工关注)。
- 金额 ≤ 0 或非数字 → 数据无效(**高**)。
- 严重度:高 / 中(按上述)。
### 3. 费用类型与事由一致性 —— 字段 `category` × `reason`
- 检查:`category` 与 `reason` 描述是否吻合。
- 异常:类型为“差旅”但事由是“团队聚餐”等明显不符(**中**)。
- 严重度:**中**。
### 4. 事由充分性 —— 字段 `reason`
- 检查:事由是否具体、能说明用途。
- 异常:事由过于简略(少于约 4 个有效字、或仅写“报销”“费用”等)→ 无法判断用途(**低**)。
- 严重度:**低**。
### 5. 金额与事由的合理性(真实性嗅探)—— `amount` × `reason` × `category`
- 检查:金额相对事由/类型是否离谱。
- 异常:如“一次工作午餐”报销上万元、办公用品报销远超常识 → 疑似异常(**中/高**,视偏离程度)。
- 严重度:中 / 高。
### 6. 重复报销 / 拆分报销嫌疑 —— `amount` × 阈值
- 检查:金额是否“恰好卡在阈值下方”、是否疑似把大额拆成多笔规避审批。
- 异常:金额逼近且略低于限额(如 9800、9900且无合理说明 → 拆分嫌疑(**中**)。
- 提示:若提供了历史报销上下文,检查是否与近期单据重复(同金额同日期同事由 → **高**)。
- 严重度:中 / 高。
### 7. 时效性 —— 字段 `date`/`occurred_at`(若有)
- 检查:费用发生日期距今是否超期(如超过 90 天)。
- 异常:明显超期且无说明(**低/中**)。
- 严重度:低 / 中。无该字段则跳过,不报问题。
### 8. 发票抬头/税务信息 —— 若数据中含发票明细
- 检查:抬头是否为公司主体、是否个人抬头、税号是否缺失。
- 异常:个人抬头报销公司费用、关键税务字段缺失(**中**)。
- 严重度:**中**。无相关数据则跳过。
> 字段标注约定:当某条发现指向具体字段时,**用字段英文 key**`amount`/`category`/`reason`/`invoice_img` 等)标注,方便下游结构化。
## 判定与结论规则
综合所有发现,给出**总体判定**(三选一),映射关系如下(下游会据此路由):
| 总体判定 | 触发条件 | 下游含义 |
|---|---|---|
| **审批不通过** | 存在任一**高**级硬性缺陷(如缺发票、金额无效、疑似重复/造假) | 退回发起人修改后重新提交 |
| **需关注** | 无硬性缺陷,但存在一个或多个**中**级风险 | 进入人工审批,提醒审批人重点关注 |
| **通过** | 无风险,或仅有**低**级提示 | 建议人工审批通过 |
置信度:根据信息完整度与判断确定性给出 `高/中/低`(或 0100% 区间)。信息缺失越多、判断越主观,置信度越低。
## 输出格式Output Format
**只输出下面这种结构化中文文本,不要输出 JSON、不要用代码块包裹。** 按固定小标题组织,便于下游 LLM 抽取:
```
【审核结论】通过 / 需关注 / 审批不通过(三选一)
【一句话摘要】用一句话说明结论原因。
【置信度】高 / 中 / 低(或百分比)
【建议动作】建议通过审批 / 建议进入人工审批并关注以下风险 / 建议退回发起人修改
【风险发现】
1. 字段invoice_img 严重度:高 问题:未上传发票,无法核验报销真实性 建议:补充发票凭证后重新提交
2. 字段amount 严重度:中 问题:报销金额 15000 元超过单笔 10000 元阈值 建议:拆分报销或补充审批说明
...(无风险时写“无”)
【说明与假设】列出做判断时假设的前提、缺失的上下文(如未提供历史报销记录、无费用日期等)。
```
要求:
- `【风险发现】`每条固定四段:`字段 严重度 问题 建议`,用全角竖线 `` 分隔。
- 严重度只用 `高/中/低`
- 结论必须与风险发现自洽:有“高”必为“审批不通过”,最高为“中”则“需关注”,全“低”或无则“通过”。
## Workflow
1. 解析输入的报销字段amount/category/reason/invoice_img 等)。
2. 按“审核要点清单”8 类逐项检查,记录命中的风险(字段、严重度、问题、建议)。
3. 按“判定与结论规则”汇总出总体判定与建议动作。
4. 评估置信度。
5. 按“输出格式”输出结构化文本,**不要输出 JSON**。
## Examples
### 示例 1缺发票 → 审批不通过
输入:
```
amount=800, category=meal, reason=团队聚餐, invoice_img=(空)
```
输出风格:
```
【审核结论】审批不通过
【一句话摘要】未上传发票,无法核验报销真实性,需补票后重新提交。
【置信度】高
【建议动作】建议退回发起人修改
【风险发现】
1. 字段invoice_img 严重度:高 问题:未上传发票照片,无法核验报销真实性 建议:补充发票凭证后重新提交
【说明与假设】未提供历史报销记录,仅基于当前单据字段判断。
```
### 示例 2超额但有发票 → 需关注(进人工)
输入:
```
amount=15000, category=travel, reason=出差北京参加客户签约会议, invoice_img=http://img/inv.png
```
输出风格:
```
【审核结论】需关注
【一句话摘要】金额超单笔阈值但发票齐全、事由清晰,建议进入人工审批并关注金额。
【置信度】中
【建议动作】建议进入人工审批并关注以下风险
【风险发现】
1. 字段amount 严重度:中 问题:报销金额 15000 元超过单笔 10000 元阈值 建议:补充超额审批说明或按规定拆分
【说明与假设】差旅标准与部门预算未提供,金额合理性以常识判断。
```
### 示例 3正常小额 → 通过
输入:
```
amount=120, category=office, reason=采购办公用打印纸, invoice_img=http://img/2.png
```
输出风格:
```
【审核结论】通过
【一句话摘要】金额小、类型与事由一致、发票齐全,无明显风险。
【置信度】高
【建议动作】建议通过审批
【风险发现】无
【说明与假设】基于当前单据字段判断,未发现异常。
```
## Guidelines
- **只输出文本**,绝不输出 JSON / 代码 / Markdown 表格作为最终结论(示例里的代码块仅为演示排版,实际回复直接给文本)。
- 判定要**稳定可复现**:同样的输入应给出同样的结论,便于下游提取与回归测试。
- 缺少可选上下文(历史记录、预算标准、日期)时,在`【说明与假设】`里说明,不要凭空编造数据,也不要因此拒绝出结论。
- 这是**初审辅助**,不替代财务/审计的最终判断;措辞用“疑似/建议/需关注”,不下绝对定论。
- 严重度与总体判定必须自洽(见“判定与结论规则”),避免“有高风险却判通过”这类矛盾。

View File

@ -1,6 +1,7 @@
--- ---
name: managing-scripts name: managing-scripts
description: Manages shared scripts repository for reusable data analysis tools. Check scripts/README.md before writing, design generalized scripts with parameters, and keep documentation in sync. description: Manages shared scripts repository for reusable data analysis tools. Check scripts/README.md before writing, design generalized scripts with parameters, and keep documentation in sync.
category: Data & Retrieval
--- ---
# Managing Scripts # Managing Scripts

View File

@ -0,0 +1,49 @@
---
name: mineru
description: An AI-Native skill for parsing PDF / Office / image files into Markdown with MinerU — a fast, zero-config document parser for AI agents. Works with NO token via the Agent API and auto-upgrades to the Standard API (token) for large files, batches, and DOCX/HTML/LaTeX export. Use when converting PDF/Word/PPT/Excel/image documents, extracting text/tables/formulas, running OCR, or batch processing.
category: Document Processing
metadata:
author: Nebutra
version: "3.3.1"
argument-hint: <pdf-file-or-url>
---
# MinerU PDF Parser
Parse PDF, Office, and image documents into structured Markdown via the MinerU API.
## Quick Start
```bash
# Zero-config: no token, no install (free Agent API)
python3 "${CLAUDE_PLUGIN_ROOT}/scripts/mineru.py" ./document.pdf --output ./output/
# Pipe Markdown back to an agent
python3 "${CLAUDE_PLUGIN_ROOT}/scripts/mineru.py" ./document.pdf --stdout
# Power mode: token unlocks large files / batch / extra formats
export MINERU_TOKEN="..." # https://mineru.net/apiManage/token
python3 "${CLAUDE_PLUGIN_ROOT}/scripts/mineru.py" ./pdfs/ --output ./output/ --workers 8 --resume
```
## Features
- **Auto-routing**: free Agent API by default, auto-upgrades to the Standard API (token) for large/batch/extra-format jobs
- **Multi-modal**: PDF, images, Word, PPT, Excel, HTML
- **High-performance OCR**: `--ocr` with language selection (`--lang`)
- **Formula & table recognition**: LaTeX formulas, structured tables
- **Multi-format export**: Markdown (default), plus DOCX / HTML / LaTeX
- **AI-Native output**: `--stdout` (Markdown) and `--json` (machine status)
- **Batch + resume**: parallel workers with `--resume`
- **Zero dependencies**: standard library only
## Authentication
A token is **optional** — the Agent API works without one. Set a token to unlock
the Standard API (≤ 200 MB / ≤ 200 pages, batch, DOCX/HTML/LaTeX):
```bash
export MINERU_TOKEN="your-token-here" # https://mineru.net/apiManage/token
```
Official API docs: https://mineru.net/apiManage/docs

View File

@ -0,0 +1,170 @@
# MinerU API Reference
Official docs: https://mineru.net/apiManage/docs · Token: https://mineru.net/apiManage/token
MinerU exposes **two** document-parsing APIs. This skill auto-routes between them.
| | 🎯 Standard API | ⚡ Agent API (lightweight) |
|---|---|---|
| Base URL | `https://mineru.net/api/v4` | `https://mineru.net/api/v1/agent` |
| Token | **required** (`Bearer`) | **none** (IP rate-limited) |
| Models | `pipeline` / `vlm` / `MinerU-HTML` | fixed lightweight `pipeline` |
| File size | ≤ 200 MB | ≤ 10 MB |
| Pages | ≤ 200 | ≤ 20 |
| Batch | ≤ 50 per request | single file only |
| Output | zip (Markdown + JSON, optional DOCX/HTML/LaTeX) | Markdown only (CDN link) |
| Designed for | high-accuracy / complex / batch | AI-agent / quick / no-login |
Free Standard-API quota: **1000 pages/day at highest priority** (overflow is lower priority).
---
## Authentication (Standard API)
```
Authorization: Bearer YOUR_API_TOKEN
```
Get a token at https://mineru.net/apiManage/token.
> **Response envelopes.** Business endpoints return `{"code":0,"data":{…},"msg":"ok"}`.
> The auth/gateway layer returns a *different* shape on failure:
> `{"success":false,"msgCode":"A0202","msg":"user authenticate failed"}`.
> Clients must handle both — this skill maps `msgCode` to the same error hints.
---
## Standard API endpoints (`/api/v4`)
### Single URL — `POST /extract/task`
```json
{
"url": "https://example.com/doc.pdf",
"model_version": "vlm",
"is_ocr": false,
"enable_formula": true,
"enable_table": true,
"language": "ch",
"page_ranges": "1-10",
"extra_formats": ["docx", "html"],
"data_id": "my-document"
}
```
Response → `{ "code": 0, "data": { "task_id": "…" } }`. HTML inputs require `model_version: "MinerU-HTML"`.
### Get task result — `GET /extract/task/{task_id}`
```json
{ "code": 0, "data": { "task_id": "…", "state": "done", "full_zip_url": "https://…", "err_msg": "" } }
```
### Batch local upload — `POST /file-urls/batch`
Returns signed upload URLs; PUT each file (no `Content-Type`). Up to **50** files / request.
```json
{ "files": [ { "name": "doc.pdf", "data_id": "doc" } ], "model_version": "vlm" }
```
Response → `{ "code": 0, "data": { "batch_id": "…", "file_urls": ["https://…"] } }`.
### Batch URL — `POST /extract/task/batch`
```json
{ "files": [ { "url": "https://…/doc.pdf", "data_id": "doc" } ], "model_version": "vlm" }
```
### Batch results — `GET /extract-results/batch/{batch_id}`
```json
{ "code": 0, "data": { "batch_id": "…", "extract_result": [
{ "file_name": "doc.pdf", "state": "done", "full_zip_url": "https://…" }
] } }
```
---
## Agent API endpoints (`/api/v1/agent`) — no token
### URL — `POST /parse/url`
```json
{ "url": "https://…/doc.pdf", "language": "ch", "enable_table": true, "is_ocr": false, "enable_formula": true, "page_range": "1-10" }
```
`page_range` accepts `from-to` or a single page only (no commas). Returns `{ "code": 0, "data": { "task_id": "…" } }`.
### File — `POST /parse/file`
```json
{ "file_name": "doc.pdf", "language": "ch" }
```
Response → `{ "data": { "task_id": "…", "file_url": "https://oss…" } }`; PUT the file to `file_url`.
### Result — `GET /parse/{task_id}`
```json
{ "code": 0, "data": { "task_id": "…", "state": "done", "markdown_url": "https://cdn…/full.md" } }
```
---
## Task states
`pending` (queued) · `running` (parsing) · `converting` (format conversion) ·
`uploading` (downloading source, Agent) · `waiting-file` (awaiting upload) ·
`done` (complete) · `failed` (error).
---
## Parameters
| Parameter | Type | Default | Notes |
|-----------|------|---------|-------|
| `model_version` | string | `pipeline` | `pipeline`, `vlm` (recommended), `MinerU-HTML` (HTML only) |
| `is_ocr` | bool | `false` | OCR for scanned docs (pipeline/vlm) |
| `enable_formula` | bool | `true` | Formula recognition |
| `enable_table` | bool | `true` | Table recognition |
| `language` | string | `ch` | OCR language (see official `language` table) |
| `page_ranges` | string | all | Standard: `"2,4-6"`; Agent `page_range`: `"1-10"` only |
| `extra_formats` | array | `[]` | `docx` / `html` / `latex` (Standard only) |
| `data_id` | string | | `[A-Za-z0-9_.-]`, ≤ 128 chars |
| `no_cache` | bool | `false` | Bypass URL cache (Standard) |
| `cache_tolerance` | int | `900` | Cache TTL seconds (Standard) |
---
## Limits
| | Standard | Agent |
|---|---|---|
| File size | 200 MB | 10 MB |
| Pages | 200 | 20 |
| Batch | 50 / request | 1 |
| Quota | 1000 pages/day priority | IP rate-limited (HTTP 429) |
Supported types: PDF, images (png/jpg/jpeg/jp2/webp/gif/bmp), Doc(x), Ppt(x), Xls(x); HTML is Standard-only.
---
## Error codes
| Code | Meaning |
|------|---------|
| `A0202` | Invalid token |
| `A0211` | Token expired |
| `-500` | Parameter error |
| `-10001` / `-10002` | Service error / invalid params |
| `-60002` | Unsupported file format |
| `-60003` / `-60004` | File read failed / empty file |
| `-60005` | File too large (> 200 MB) |
| `-60006` | Too many pages (> 200) |
| `-60008` | File read timeout (URL unreachable) |
| `-60010` | Parse failed |
| `-60015` / `-60016` | File / format conversion failed |
| `-60018` | Daily quota reached |
| `-60022` | Web page read failed (rate-limited) |
| **Agent API** | |
| `-30001` | Exceeds Agent 10 MB limit → use Standard API |
| `-30002` | Unsupported file type for Agent |
| `-30003` | Exceeds Agent 20-page limit → use Standard API or `--pages` |
| `-30004` | Invalid request parameters |

View File

@ -0,0 +1,193 @@
<!-- Web-researched competitive comparison (45 tools, 6 categories, adversarially fact-checked). Last researched 2026-05-31. Star counts / versions are point-in-time. -->
# MinerU Skill — Competitive Comparison Reference
This document gives an honest, sourced, per-tool breakdown of how **MinerU Skill** compares to the document-parsing landscape. Read the framing first: it determines how to interpret every "we win / they win" below.
## What MinerU Skill actually is (and is not)
MinerU Skill is a **zero-config, zero-dependency, agent-native convenience layer over [MinerU](https://github.com/opendatalab/MinerU)'s cloud API**, plus 17 turnkey delivery integrations to note/knowledge/content tools. Concretely (verified in this repo):
- Core script `scripts/mineru.py` is **~54KB / ~1,350 lines of pure Python standard library** — no `requests`/`aiohttp`, no model weights.
- A **genuinely token-free** default: the free **Agent API** path (`agent_parse` → `_agent_poll`) sends **no `Authorization` header** (the Bearer header is set only when a token is present). Files ≤10MB / ≤20 pages.
- **Auto-routing**: with a token, large/batched/extra-format jobs use the **Standard API** (≤200MB / ≤200 pages); the Agent path **auto-escalates** to Standard on size/page limits.
- **17 delivery sinks** (16 sink modules + `local.py` registering both `obsidian` and `logseq`): obsidian, logseq, siyuan, notion, confluence, onenote, coda, yuque, feishu, slack, dingtalk, wecom, ticktick, linear, airtable — all zero-dependency — plus **roam** (needs `roam-client`) and **wps** (needs `html-for-docx`) which lazy-load one library only when used.
- `--resume` dedup, parallel `--workers` (ThreadPoolExecutor), `--stdout`/`--json` agent output.
**Critical dependency:** our accuracy is **entirely downstream of, and capped by, what MinerU's cloud serves.** We own no models. Therefore:
- We have **no quality edge** over any other cloud wrapper that hits the same MinerU API — OCR/table/formula output is **identical**.
- Self-hosting the MinerU engine gives the **same or better** accuracy (version-controllable, no upload caps).
**Hard limits we cannot exceed:** 10MB/20-page free Agent tier, 200MB/200-page Standard tier, plus IP rate limits. Self-hosted tools have no such caps (only hardware).
**Our benchmark is latency-only.** `tests/test_live.py` measures end-to-end cloud round-trip latency (~1314s for the official demo PDF). It is **not** an accuracy benchmark; we have no OmniDocBench/olmOCR-Bench numbers of our own.
### A note on the speed claim
Our ~1314s/doc cloud round-trip is **not** a clean win over self-hosted GPU engines. A normal self-host with a GPU runs at ~0.18s/page (Marker) or ~2.12 pages/sec (MinerU on A100) — far faster at any real scale. We only out-run **slow Apple-Silicon-CPU local runs of small docs** (e.g., M4 VLM at 32148s/page). Do not frame "faster wall-clock" as a general win.
### A note on benchmarks
No single benchmark is authoritative. Different benchmarks favor different tools:
- **OmniDocBench** (v1.5/v1.6): MinerU2.5 **90.67** (v1.5), MinerU2.5-Pro **95.69** (v1.6) — leads, beating Gemini 2.5 Pro / GPT-4o / Qwen2.5-VL-72B on text/table/formula. Source: arXiv 2509.22186.
- **olmOCR-Bench** (Ai2, Oct 2025): olmOCR-2 **82.4** > Marker **76.1** > **MinerU 75.8**. Here MinerU **trails** — this is a real olmOCR win and must stay visible.
- **RD-TableBench**: Reducto 90.2% on complex tables — but Reducto authored this benchmark (vendor-biased).
- Mathpix is the de-facto formula-OCR standard (BLEU/edit-distance studies), though a PaddleOCR-VL-based tool claims to beat it on OmniDocBench v1.0 formula recognition, so the very top is contested.
> Star counts / versions below (e.g. MinerU "65.7k / v3.2.1") are point-in-time and not independently re-verified.
---
## Category 1 — Self-hosted / open-source parsing engines
These are the tools that close our single biggest gap: **fully offline / air-gapped / no cloud / no upload caps.**
### MinerU engine (opendatalab) — the engine we wrap
- **Source:** https://github.com/opendatalab/MinerU · arXiv 2509.22186 · https://huggingface.co/opendatalab/MinerU2.5-Pro-2604-1.2B
- **Strengths:** Owns the SOTA models (OmniDocBench 90.67 / 95.69-Pro v1.6). 109-language OCR, handwriting, cross-page table merge, formula→LaTeX (the source of *our* LaTeX). Fully self-hostable → offline, air-gappable, zero per-page cost, no caps. Pipeline backend runs pure CPU; VLM needs 8GB+ VRAM. Native MCP, Python/Go/TS SDKs, LangChain/LlamaIndex/Dify/FastGPT.
- **Weaknesses vs us:** Heavy install (multi-GB torch/vLLM + weights, 16GB RAM / 20GB disk floor); slow on Apple Silicon; no note/PKM delivery sinks; library/CLI rather than zero-config.
- **Verdict:** **Beats us** on offline, privacy, caps, accuracy ceiling, ecosystem. **We beat it** only on zero-install/zero-config and built-in delivery.
### Marker (datalab-to / VikParuchuri)
- **Source:** https://github.com/datalab-to/marker · https://allenai.org/blog/olmocr-2
- **Strengths:** Fully offline; very high batch throughput (~122 pages/sec/H100, 0.18s/page GPU); broad formats incl. EPUB; optional local-LLM (Ollama) quality boost with no data leaving the machine; ~35k+ stars, active.
- **Weaknesses:** **GPL-3.0** code + model weights under a modified RAIL-M (free only under ~$2M funding+revenue; commercial above that needs a Datalab license). olmOCR-Bench **76.1** — below olmOCR-2 and MinerU's OmniDocBench standing.
- **Verdict:** Beats us on offline/throughput; we beat it on zero-install and 17 delivery sinks. License gate is a real friction it has and we don't.
### Docling (IBM / DS4SD)
- **Source:** https://github.com/docling-project/docling · https://huggingface.co/ibm-granite/granite-docling-258M · arXiv 2408.09869
- **Strengths:** **Widest input modality set** (PDF/DOCX/PPTX/XLSX/HTML/AsciiDoc/LaTeX/CSV/images + **audio via ASR** + USPTO/JATS/XBRL). Tiny 258M Granite-Docling VLM runs on CPU/modest GPU. **MIT code + Apache-2.0 weights.** Deep framework ecosystem (LangChain/LlamaIndex/Haystack + official MCP), IBM-backed, 60k+ stars. Air-gapped by design.
- **Weaknesses:** Absolute accuracy lags MinerU on OmniDocBench/olmOCR-Bench; library-first (not a zero-config CLI); targets framework ingestion, not file delivery to note tools.
- **Verdict:** Beats us on offline, modality breadth, permissive license, ecosystem; we beat it on zero-install and note/PKM delivery. **Do not over-rank its MIT as uniquely best** — olmOCR's Apache-2.0 on *both* code and 7B weights is at least as commercially valuable.
### olmOCR (allenai)
- **Source:** https://github.com/allenai/olmocr · https://allenai.org/blog/olmocr-2 · https://huggingface.co/datasets/allenai/olmOCR-bench
- **Strengths:** **Leads Ai2's olmOCR-Bench (82.4 vs MinerU 75.8)** — a benchmark where MinerU trails. **Apache-2.0 on code AND the olmOCR-2-7B weights** (most commercial-friendly model reuse here). Built for million-page LLM-training linearization. Offline.
- **Weaknesses:** **PDF/image only** (no Office/HTML); **English-primary**, filters non-English (MinerU does 109-lang); **requires a 12GB+ NVIDIA GPU, no CPU mode at all**.
- **Verdict:** Beats us on offline, that-benchmark accuracy, license, scale. We beat it on modality breadth, multilingual, no-GPU, delivery, zero-install. **Keep the olmOCR-Bench lead visible — do not cherry-pick only OmniDocBench.**
### Nougat (facebookresearch / Meta AI)
- **Source:** https://github.com/facebookresearch/nougat · arXiv 2308.13418
- **Strengths:** Strong LaTeX/math on arXiv-style scientific PDFs (its trained niche). Offline.
- **Weaknesses:** **PDF + English/Latin-script only** (no CJK); **CC-BY-NC weights (non-commercial)**; effectively **unmaintained** (last release Aug 2023); known repetition/hallucination/[MISSING_PAGE] failures off-distribution.
- **Verdict:** Offline + niche math is its only edge; we beat it on general-purpose, multilingual, maintenance, commercial license, delivery.
### PyMuPDF4LLM (pymupdf / Artifex)
- **Source:** https://github.com/pymupdf/pymupdf4llm · https://pymupdf.io/blog/pymupdf-layout-10-faster-pdf-parsing-without-gpus
- **Strengths:** **Far faster and lighter than any ML tool on born-digital PDFs** (~hundreds of pages/sec on plain CPU; a C-optimized variant claims ~520 pages/sec). Lowest dependency/hardware footprint. Offline, no cloud, no caps. Ideal for huge clean-PDF corpora where speed > fidelity.
- **Weaknesses:** No ML → no real formula/LaTeX, weak complex tables, poor scanned/handwritten; slow external OCR; **AGPL-3.0 OR Artifex commercial**; Office formats need paid **PyMuPDF Pro**.
- **Verdict:** A genuine win for the speed-over-fidelity, clean-PDF use case. We beat it on hard-doc quality (MinerU's VLM), multilingual OCR, and delivery — but acknowledge its speed/footprint advantage honestly.
### Zerox (getomni-ai)
- **Source:** https://github.com/getomni-ai/zerox
- **Strengths:** Trivial provider-flexibility (OpenAI/Azure/Bedrock-Claude/Gemini/Vertex); JSON-Schema structured extraction (Node SDK); MIT code.
- **Weaknesses:** **NOT offline and NOT token-free** — mandates a paid cloud vision-LLM key; needs graphicsmagick+ghostscript; **no published benchmarks**; per-page LLM cost can exceed MinerU on large jobs.
- **Verdict:** We beat it on token-free start, benchmarked accuracy, dedicated formula/table models, system-dep footprint, and delivery. It beats us on provider-swap flexibility and typed JSON extraction.
---
## Category 2 — Commercial cloud document-parsing APIs
Mostly **stronger than us** on enterprise accuracy, SLAs, structured extraction, and RAG/MCP ecosystems. Our honest edges are narrow: token-free + zero-install hosted default, clean Markdown/LaTeX of academic PDFs, and 17 delivery sinks none of them offer.
### LlamaParse (LlamaIndex / LlamaCloud)
- **Source:** https://www.llamaindex.ai/pricing · LlamaCloud MCP docs
- **Beats us:** Official hosted **MCP server**; deep native RAG stack (parse→index→LlamaExtract/LlamaAgents); steerable NL parsing with frontier LLMs (GPT-4.1/Gemini 2.5 Pro); richer outputs (per-page JSON, XLSX, HTML tables, annotated PDF); enterprise SLAs; mature Python+TS SDKs.
- **We beat:** Token-free start (it needs a LlamaCloud key from page one); zero runtime deps; 17 note/PKM sinks (it delivers to RAG indexes, not note tools); built-in `--resume`/parallel batch CLI.
### Mathpix (Convert API)
- **Source:** https://mathpix.com/pricing/api · https://mathpix.com/image-to-latex
- **Beats us:** **Best-in-class formula/equation OCR (printed AND handwritten) → clean LaTeX — clearly better than MinerU for pure math fidelity; concede this, do not imply parity.** Mature Snip ecosystem + Overleaf workflows; very low per-image cost at scale.
- **We beat:** Token-free start (Mathpix API requires a paid PAYG account, **$19.99 setup fee**, card on file; **no recurring free monthly allowance** — only a one-time $29 test credit; the consumer Snip app's free quota does **not** apply to the API); general-purpose multi-modal Office parsing; 17 delivery sinks; built-in batch CLI.
### Unstructured.io
- **Source:** https://unstructured.io/pricing · https://github.com/Unstructured-IO/unstructured
- **Beats us:** **Apache-2.0 core library is fully self-hostable → 100% offline** (we cannot); official MCP + huge connector ecosystem (S3/SharePoint/vector DBs); built-in chunking+embedding (RAG-ready); 25+ file types; permissive license for product embedding.
- **We beat:** Token-free hosted default with zero install (its hosted API needs a key; self-host means running infra); cleaner human-readable Markdown out of the box (its primary output is JSON "elements"); 17 note/PKM sinks (it targets vector DBs/storage). *On parsing quality:* VLM parsing is generally stronger for complex layout/formula, but this is **not a benchmarked head-to-head** — state it as a tendency, not a measured win.
### Reducto
- **Source:** https://reducto.ai/pricing
- **Beats us:** **Best complex/financial table extraction (90.2% RD-TableBench — vendor-authored but the strongest public evidence)**; agentic multi-pass OCR; SOC2/HIPAA, on-prem/VPC/air-gapped, enterprise SLAs; schema-based extraction with bounding boxes/citations.
- **We beat:** Token-free start (it needs a key + credits); zero-install plain CLI; 17 delivery sinks; auto-routing/--resume/parallel batch.
### Chunkr (and similar RAG-native APIs)
- **Beats us:** Self-hostable (offline option we lack); RAG-native chunking + broad export (DOCX/HTML/LaTeX).
- **We beat:** Token-free start; zero-install; 17 note/PKM sinks.
- **Caveat (fact-check):** Do **not** claim "stronger VLM Markdown for formulas" — Chunkr cloud uses its own proprietary models and we have **no head-to-head benchmark**. Drop the quality claim; keep only the export-breadth and offline framing.
---
## Category 3 — Other MinerU wrappers, skills & MCP servers (our direct peers)
**Every cloud-backed wrapper here hits the same MinerU API we do, so its OCR/table/formula output is IDENTICAL to ours.** We have **no quality edge** over them — only DX differences. Claims of "better OCR/formula/Markdown" vs these are **invalid** and must not appear.
### Official MinerU MCP server (mineru-open-mcp / MinerU-Ecosystem)
- **Source:** https://github.com/opendatalab/MinerU-Ecosystem · https://pypi.org/project/mineru-open-mcp/
- **Beats us:** **Official, first-party** — tracks API/format changes day-one; native **MCP server** (stdio + streamable-http) in Claude Desktop/Cursor/Windsurf with zero glue; full ecosystem (Python/Go/TS SDKs, LangChain/LlamaIndex/Dify/FastGPT). **Same free no-token Flash tier as us** — our "free zero-token" edge is fully matched by the first party.
- **We beat:** Zero runtime deps (vs pip/uvx install); auto-routing Agent⇄Standard with auto-escalation; 17 delivery sinks; `--resume`/parallel batch; usable as a plain CLI outside any MCP host.
### MinerU-Document-Explorer (official, opendatalab)
- **Source:** https://github.com/opendatalab/MinerU-Document-Explorer
- **Beats us:** Different, **larger** value prop — a local agent-native **knowledge engine** (BM25/vector/hybrid retrieval + deep-reading + LLM-wiki) with 15 MCP tools; runs 100% locally for its core; MIT, 568 stars.
- **We beat:** We're a focused zero-dep converter; broader conversion modalities; 17 delivery sinks (it keeps content in its own index/wiki); no Node/local-model download.
### linxule/mineru-mcp (Node, cloud)
- **Source:** https://github.com/linxule/mineru-mcp
- **Beats us:** Native MCP server with 6 granular tools (explicit status-polling + batch-status pagination); first-class for Node/JS MCP stacks; batch up to 200 URLs/request.
- **We beat:** **Free no-token path** (it **requires** a token always); zero runtime deps (vs Node 18+); broader modalities (Excel/HTML); 17 delivery sinks; usable as plain CLI outside MCP.
### mineru-converter-mcp-server (AvatarGanymede/MinerU-MCP)
- **Source:** https://pypi.org/project/mineru-converter-mcp-server/
- **Beats us:** **Auto-splits PDFs >200MB and segments >600-page docs by page range — gracefully exceeding the 200MB/200-page cap we are bound by.** Turnkey Smithery + Render deploy (per-user key); explicit HTML input.
- **We beat:** Free no-token default (it requires a key); zero runtime deps; plain CLI (no MCP host/Render/Smithery needed); 17 sinks; auto-routing.
### grimoire-skill (LeoLin990405)
- **Source:** https://github.com/LeoLin990405/grimoire-skill
- **Beats us:** Higher-level knowledge-capture ("parse once, share twice" → Obsidian notes + reusable skill packs); ingests **video** (YouTube/Bilibili) + subtitles (modalities we don't touch); cross-agent skill management; content-aware Obsidian auto-filing.
- **We beat:** Free no-token default (it needs a token + `--cloud-ok` for local files); zero runtime deps (vs bash+jq+awk + optional yt-dlp/ffmpeg); 17 sinks vs primarily Obsidian; broader Office/HTML; cross-platform single-file portability.
### kesslerio/mineru-pdf-parser (openclaw/ClawHub skill, local CPU)
- **Source:** openclaw/skills · SKILL.md
- **Beats us:** **Fully local/offline (pure CPU, cross-platform)** — no cloud/token/caps; handles privacy-sensitive docs; native Markdown + JSON.
- **We beat:** Zero install (it needs a full local MinerU install + weights + shell wrapper); no GPU/heavy runtime; faster wall-clock **only vs slow local CPU**; broader modalities; 17 sinks; `--stdout`/`--json`; better docs.
### nilecui/mineru-parser-skills (Claude Agent SDK, cloud)
- **Source:** https://github.com/nilecui/mineru-parser-skills
- **Beats us:** Built directly on the Claude Agent SDK (slots into Agent-SDK apps). Honestly little else — it's a thinner cloud wrapper.
- **We beat:** Accepts local files/dirs **and** URLs (it is **URL-only** — cannot parse a local PDF); free no-token default; zero runtime deps; batch/`--resume`/parallel; 17 sinks; broader modalities; mature/documented vs a 4-commit, no-license repo. *Caveat:* our "benchmarked" claim means **latency-measured**, not accuracy-benchmarked.
### TINKPA/mcp-mineru (local MLX, Apple Silicon)
- **Source:** https://github.com/TINKPA/mcp-mineru
- **Beats us:** **Fully offline/local** via MinerU running on-device (MLX accel); no cloud/token/caps; data never leaves the Mac.
- **We beat:** Zero install/no weights/no GPU; **faster wall-clock only for typical multi-page docs vs its slow local inference (32148s/page on M4)** — not a general speed win; broader modalities; batch/`--resume`/17 sinks; more active/documented; usable as plain CLI.
---
## Summary of mandatory concessions (do not bury these)
1. **Offline / air-gapped is our single biggest gap.** MinerU engine, Marker, Docling, olmOCR, Nougat, PyMuPDF4LLM, TINKPA, kesslerio, MinerU-Document-Explorer, and self-hostable Unstructured/Chunkr all run with **zero cloud dependency**. We are cloud-only and **cannot handle confidential/regulated/air-gapped content at all.**
2. **Data privacy:** every self-hosted competitor keeps documents on the machine; we **upload every file** to MinerU's cloud — a hard disqualifier for many regulated users.
3. **Accuracy is downstream of, and capped by, MinerU's cloud.** Self-hosting MinerU2.5-Pro gives the same-or-better accuracy with no caps. Same-backend wrappers yield **identical** quality to us.
4. **Hard caps:** 10MB/20-page (Agent), 200MB/200-page (Standard), IP rate limits. mineru-converter exceeds them via auto-split/segmentation.
5. **Mathpix beats us on formula/LaTeX OCR (incl. handwriting).**
6. **Reducto leads complex/financial tables; olmOCR leads olmOCR-Bench (82.4 vs MinerU 75.8).** Different benchmarks favor different tools — never cherry-pick only OmniDocBench.
7. **Official first-party advantage:** the official MinerU MCP/Document-Explorer + ecosystem track changes day-one and match our free tier; we are third-party, can lag, and ship **no MCP server**.
8. **Permissive-license wins we lack:** olmOCR (Apache-2.0 code + 7B weights), Docling (MIT + Apache-2.0 weights), Unstructured (Apache-2.0 core).
9. **PyMuPDF4LLM is far faster/lighter on born-digital PDFs** (clean-text corpora, speed > fidelity).
## Sources
- MinerU engine: https://github.com/opendatalab/MinerU · arXiv 2509.22186 · https://huggingface.co/opendatalab/MinerU2.5-Pro-2604-1.2B · https://neurohive.io/en/state-of-the-art/mineru2-5-open-source-1-2b-model-for-pdf-parsing-outperforms-gemini-2-5-pro-on-benchmarks/
- Official MCP / ecosystem: https://github.com/opendatalab/MinerU-Ecosystem · https://pypi.org/project/mineru-open-mcp/ · https://github.com/opendatalab/MinerU-Document-Explorer
- Marker: https://github.com/datalab-to/marker · https://allenai.org/blog/olmocr-2
- Docling: https://github.com/docling-project/docling · arXiv 2408.09869 · https://huggingface.co/ibm-granite/granite-docling-258M
- olmOCR: https://github.com/allenai/olmocr · https://allenai.org/blog/olmocr-2 · https://huggingface.co/datasets/allenai/olmOCR-bench
- Nougat: https://github.com/facebookresearch/nougat · arXiv 2308.13418
- PyMuPDF4LLM: https://github.com/pymupdf/pymupdf4llm · https://pymupdf.io/blog/pymupdf-layout-10-faster-pdf-parsing-without-gpus
- Zerox: https://github.com/getomni-ai/zerox
- LlamaParse: https://www.llamaindex.ai/pricing
- Mathpix: https://mathpix.com/pricing/api · https://mathpix.com/image-to-latex
- Unstructured: https://unstructured.io/pricing · https://github.com/Unstructured-IO/unstructured
- Reducto: https://reducto.ai/pricing
- Other wrappers: https://github.com/linxule/mineru-mcp · https://pypi.org/project/mineru-converter-mcp-server/ · https://github.com/LeoLin990405/grimoire-skill · https://github.com/nilecui/mineru-parser-skills · https://github.com/TINKPA/mcp-mineru

View File

@ -0,0 +1,59 @@
# Delivery Integrations (`--to`)
After parsing, MinerU Skill can deliver the Markdown straight into your content
tools using each tool's **official ingestion path** — no fragile generic block
converters. Targets are pluggable sinks; select one or more with `--to NAME`
(repeatable). List them live with `python3 scripts/mineru.py --list-sinks`.
```bash
# Parse and fan out to several destinations at once
python3 scripts/mineru.py paper.pdf --to obsidian --to notion --to slack
```
Each sink reads its configuration from **environment variables** so an AI agent
can run it non-interactively. Delivery results appear in `--json` output under
each result's `sinks` array.
## Support matrix
| Target | `--to` | Native path | Auth / config (env) | Markdown fidelity | Images |
|--------|--------|-------------|---------------------|-------------------|--------|
| **Obsidian** | `obsidian` (`ob`) | filesystem write + YAML frontmatter | `OBSIDIAN_VAULT`, `OBSIDIAN_SUBDIR?` | full | ✅ copied to `<note>.assets/` |
| **Logseq** | `logseq` | filesystem write, outline + `key:: value` | `LOGSEQ_GRAPH` | full (outline transform) | ✅ copied to `assets/` |
| **SiYuan** | `siyuan` | kernel `createDocWithMd` | `SIYUAN_TOKEN`, `SIYUAN_API_URL?`, `SIYUAN_NOTEBOOK?` | full (GFM) | ✅ `asset/upload` |
| **Notion** | `notion` | `POST /v1/pages` (blocks) | `NOTION_API_KEY`, `NOTION_PARENT_PAGE_ID`, `NOTION_VERSION?` | structure (headings/lists/code/quote) | ⚠️ text only¹ |
| **Linear** | `linear` | GraphQL `issueCreate` | `LINEAR_API_KEY`, `LINEAR_TEAM_ID` | full (Markdown-native) | ✅ base64-inlined |
| **Yuque 语雀** | `yuque` (`语雀`) | open API create doc | `YUQUE_TOKEN`, `YUQUE_NAMESPACE` | full (Markdown-native) | ⚠️ host publicly² |
| **Coda** | `coda` | page canvas `format:markdown` | `CODA_API_TOKEN`, `CODA_DOC_ID?` | full (Markdown-native) | ⚠️ public URL² |
| **Slack** | `slack` | external-upload `.md` file | `SLACK_BOT_TOKEN`, `SLACK_CHANNEL` | full (raw file) | ⚠️ not embedded |
| **Lark 飞书** | `feishu` (`lark`, `飞书`) | Drive `import_tasks` → Docx | `FEISHU_APP_ID`, `FEISHU_APP_SECRET`, `FEISHU_FOLDER_TOKEN?` | full (server-converted) | ⚠️ public URL² |
| **Confluence** | `confluence` | `POST /wiki/api/v2/pages` (storage) | `CONFLUENCE_BASE_URL`, `CONFLUENCE_EMAIL`, `CONFLUENCE_API_TOKEN`, `CONFLUENCE_SPACE_ID` | MD→HTML | ⚠️ not attached |
| **OneNote** | `onenote` | Graph `sections/{id}/pages` | `ONENOTE_TOKEN`³, `ONENOTE_SECTION_ID` | MD→HTML | ⚠️ remote only |
| **TickTick 滴答** | `ticktick` (`dida`, `滴答清单`) | `POST /open/v1/task` | `TICKTICK_TOKEN`, `TICKTICK_PROJECT_ID?` | task note | ❌ unsupported |
| **DingTalk 钉钉** | `dingtalk` (`钉钉`) | robot markdown webhook | `DINGTALK_WEBHOOK`, `DINGTALK_SECRET?` | markdown message | ⚠️ public URL only |
| **Airtable** | `airtable` | `POST /v0/{base}/{table}` record | `AIRTABLE_API_KEY`, `AIRTABLE_BASE_ID`, `AIRTABLE_TABLE`, `AIRTABLE_TITLE_FIELD?`, `AIRTABLE_BODY_FIELD?` | record field⁴ | ❌ not uploaded |
| **WeCom 企业微信** | `wecom` (`企业微信`) | app `message/send` markdown | `WECOM_CORPID`, `WECOM_CORPSECRET`, `WECOM_AGENTID`, `WECOM_TOUSER?` | message (subset, ≤2 KB)⁵ | ❌ unsupported |
| **Roam Research** ⁶ | `roam` | `batch-actions` block tree | `ROAM_API_TOKEN`, `ROAM_GRAPH_NAME` | full (Markdown→outline) | ⚠️ public URL |
| **WPS 金山文档** ⁶ | `wps` (`kdocs`, `金山`) | Markdown→DOCX → kdocs upload | `WPS_APP_ID`, `WPS_APP_SECRET`, `WPS_PARENT_PATH?` | DOCX (via html-for-docx) | embedded in DOCX |
Notes:
1. **Notion** images need a separate `file_uploads` upload-then-reference dance; v1 delivers text + structure and notes the count of un-embedded local images. (Roadmap: image upload.)
2. Hosted services that ingest Markdown by value but have no first-class CLI asset upload — local images must be hosted at a public URL to render. The Markdown is delivered intact; image links that are already URLs work.
3. **OneNote** `ONENOTE_TOKEN` is a Microsoft Graph access token (delegated, scope `Notes.Create`). Obtain it via the device-code OAuth flow; the sink itself stays non-interactive.
4. **Airtable** is a database, not a document store — the doc is stored as one record (title + body fields). A good "save this doc as a row" target, not a document publisher.
5. **WeCom** markdown messages are a limited subset (≤2048 bytes, no images/tables, not rendered in the workbench). Best as a notification/summary; for a full document deliver via Lark/Notion and send the link.
6. **Optional-dependency sinks** — these two rely on a third-party library that the sink lazy-imports only when used, so the core and the other 15 sinks stay zero-dependency. If the library is absent, the sink returns a clear `pip install …` hint. They are implemented to the official specs but, being credential/desktop-gated, are best-effort until validated against live accounts.
## Optional-dependency sinks (`[roam]`, `[wps]`)
```bash
pip install "mineru-skill[wps]" # html-for-docx (Markdown → DOCX)
pip install "mineru-skill[roam]" # official roam-client SDK (git, needs Python ≥3.11)
# roam-client is git-only; equivalently:
pip install "roam-client @ git+https://github.com/Roam-Research/backend-sdks.git#subdirectory=python"
```
- **Roam** — no library ingests Markdown into Roam, but the official `roam-client` SDK handles the genuinely error-prone transport (307/308 peer-host redirect, dual `Authorization`/`x-authorization` Bearer headers, `/write`). We depend on it for transport and build only the Markdown→outline tree, delivering the whole document in one `batch-actions` request. Images must be public URLs.
- **WPS / 金山文档** — Markdown→DOCX uses the maintained pure-pip `html-for-docx` (reusing this project's Markdown→HTML); the kdocs upload signs requests with the documented WPS-2 scheme (plain SHA-1) using only the standard library. Requires an approved kdocs developer app + provisioned appspace.
Adding more targets is a single small module — see `scripts/sinks/base.py`. PRs welcome.

View File

@ -0,0 +1 @@
"""Importable package for MinerU Skill console entry points."""

View File

@ -0,0 +1,88 @@
"""Heading-aware Markdown chunking for RAG pipelines (zero-dependency).
``chunk_markdown`` splits a parsed Markdown document into retrieval-sized chunks
that preserve heading context matching the RAG-friendliness of LlamaParse /
Unstructured without any dependency.
"""
from __future__ import annotations
import re
_HEADING = re.compile(r"^(#{1,6})\s+(.*)$")
def _slug(text: str) -> str:
text = (text or "doc").strip().lower()
text = re.sub(r"[^a-z0-9]+", "-", text).strip("-")
return text or "doc"
def _split_by_size(text: str, max_chars: int) -> list:
"""Split text into <= max_chars pieces on paragraph boundaries (hard-split if needed)."""
if len(text) <= max_chars:
return [text]
pieces: list = []
current = ""
for para in text.split("\n\n"):
if len(para) > max_chars:
if current:
pieces.append(current)
current = ""
for i in range(0, len(para), max_chars):
pieces.append(para[i:i + max_chars])
elif not current:
current = para
elif len(current) + len(para) + 2 <= max_chars:
current = f"{current}\n\n{para}"
else:
pieces.append(current)
current = para
if current:
pieces.append(current)
return pieces
def chunk_markdown(markdown: str, *, max_chars: int = 2000, source: str = "") -> list:
"""Chunk Markdown by heading, size-splitting long sections.
Returns ``[{id, index, heading, text, chars, source}, ...]`` where ``heading``
is the ``H1 > H2 > H3`` breadcrumb for the chunk.
"""
lines = markdown.replace("\r\n", "\n").split("\n")
chunks: list = []
stack: list = [] # (level, text) heading breadcrumb
buf: list = []
base = _slug(source)
def breadcrumb() -> str:
return " > ".join(t for _, t in stack)
def flush():
text = "\n".join(buf).strip()
buf.clear()
if not text:
return
head = breadcrumb()
for piece in _split_by_size(text, max_chars):
idx = len(chunks)
chunks.append({
"id": f"{base}-{idx}",
"index": idx,
"heading": head,
"text": piece,
"chars": len(piece),
"source": source,
})
for line in lines:
match = _HEADING.match(line.strip())
if match:
flush() # close the previous section under its own breadcrumb
level = len(match.group(1))
while stack and stack[-1][0] >= level:
stack.pop()
stack.append((level, match.group(2)))
buf.append(line)
flush()
return chunks

View File

@ -0,0 +1,59 @@
"""Optional fully-offline parsing backend for born-digital PDFs.
Our single biggest honest gap is being cloud-only. ``--engine local`` parses a
PDF **entirely offline** with the optional, lightweight ``pymupdf4llm`` library
(no GPU, no cloud, no upload caps) ideal for confidential or born-digital PDFs
where MinerU's cloud VLM is overkill. Scanned/complex docs still want the cloud
engine, so ``--engine auto`` only uses local when the PDF has real text.
pip install "mineru-skill[local]" # i.e. pip install pymupdf4llm
"""
from __future__ import annotations
from pathlib import Path
_HINT = (
"--engine local needs pymupdf4llm — pip install 'mineru-skill[local]' "
"(i.e. pip install pymupdf4llm)"
)
class LocalEngineError(Exception):
"""Raised when local parsing is requested but cannot be performed."""
def available() -> bool:
try:
import pymupdf4llm # noqa: F401
return True
except ImportError:
return False
def is_born_digital(path, min_chars: int = 200) -> bool:
"""True if the PDF has extractable text (so local parsing is appropriate)."""
try:
import pymupdf
except ImportError:
return False
doc = pymupdf.open(str(path))
total = 0
for page in doc:
total += len(page.get_text().strip())
if total >= min_chars:
return True
return total >= min_chars
def parse_local(path, output_dir=None) -> str:
"""Parse a PDF to Markdown fully offline. Returns the Markdown string."""
try:
import pymupdf4llm
except ImportError as exc:
raise LocalEngineError(_HINT) from exc
if output_dir is not None:
images = Path(output_dir) / "images"
images.mkdir(parents=True, exist_ok=True)
return pymupdf4llm.to_markdown(str(path), write_images=True, image_path=str(images))
return pymupdf4llm.to_markdown(str(path))

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,178 @@
#!/usr/bin/env python3
"""Zero-dependency MCP server (stdio) for MinerU Skill.
Speaks newline-delimited JSON-RPC 2.0 over stdin/stdout using only the standard
library, so an MCP host (Claude, Cursor, Windsurf, ...) can call MinerU. Register:
{"command": "python3", "args": ["scripts/mineru_mcp.py"]}
Tools: ``mineru_parse``, ``mineru_parse_to``, ``mineru_list_sinks``.
"""
from __future__ import annotations
import json
import os
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent))
import mineru # noqa: E402
PROTOCOL_VERSION = "2024-11-05"
SERVER_INFO = {"name": "mineru", "version": mineru.__version__}
TOOLS = [
{
"name": "mineru_parse",
"description": "Parse a PDF / Office / image file or URL into clean Markdown via MinerU.",
"inputSchema": {
"type": "object",
"properties": {
"input": {"type": "string", "description": "Local file path or http(s) URL"},
"output_dir": {"type": "string", "description": "Where to write output (default ./output)"},
"api": {"type": "string", "enum": ["auto", "agent", "standard"]},
"engine": {"type": "string", "enum": ["cloud", "local", "auto"]},
"ocr": {"type": "boolean"},
"lang": {"type": "string"},
},
"required": ["input"],
},
},
{
"name": "mineru_parse_to",
"description": "Parse a document and deliver the Markdown into content tools (Obsidian, Notion, Feishu, ...).",
"inputSchema": {
"type": "object",
"properties": {
"input": {"type": "string"},
"sinks": {"type": "array", "items": {"type": "string"}, "description": "Sink names, e.g. ['obsidian','notion']"},
"output_dir": {"type": "string"},
},
"required": ["input", "sinks"],
},
},
{
"name": "mineru_list_sinks",
"description": "List available delivery targets and their required environment variables.",
"inputSchema": {"type": "object", "properties": {}},
},
]
class MethodNotFound(Exception):
pass
def _text_result(text: str, is_error: bool = False) -> dict:
return {"content": [{"type": "text", "text": text}], "isError": is_error}
def _tool_parse(args: dict) -> dict:
opts = mineru.ParseOptions(is_ocr=bool(args.get("ocr")), language=args.get("lang", "ch"))
token = os.environ.get("MINERU_TOKEN")
output_dir = Path(args.get("output_dir") or "./output")
res = mineru.process_one(
args["input"], opts, token=token, output_dir=output_dir,
api=args.get("api", "auto"), engine=args.get("engine", "cloud"),
)
if res.state == "done":
return _text_result(res.markdown or "")
return _text_result(f"Parse failed: {res.error}", is_error=True)
def _tool_parse_to(args: dict) -> dict:
opts = mineru.ParseOptions()
token = os.environ.get("MINERU_TOKEN")
output_dir = Path(args.get("output_dir") or "./output")
res = mineru.process_one(args["input"], opts, token=token, output_dir=output_dir)
if res.state != "done":
return _text_result(f"Parse failed: {res.error}", is_error=True)
sinks = mineru._load_sinks()
if sinks is None:
return _text_result("Sinks package unavailable.", is_error=True)
doc = sinks.ParsedDoc(title=res.name, markdown=res.markdown, source=res.source,
modality=res.modality, markdown_path=res.markdown_path)
outcomes = [o.to_status() for o in sinks.deliver_all(doc, args["sinks"])]
any_fail = any(not o["ok"] for o in outcomes)
return _text_result(json.dumps({"name": res.name, "deliveries": outcomes}, ensure_ascii=False, indent=2),
is_error=any_fail)
def _tool_list_sinks(_args: dict) -> dict:
sinks = mineru._load_sinks()
if sinks is None:
return _text_result("Sinks package unavailable.", is_error=True)
listing = [{"name": n, "label": sinks.get_sink(n).label, "requires": list(sinks.get_sink(n).requires)}
for n in sinks.sink_names()]
return _text_result(json.dumps(listing, ensure_ascii=False, indent=2))
_TOOL_HANDLERS = {
"mineru_parse": _tool_parse,
"mineru_parse_to": _tool_parse_to,
"mineru_list_sinks": _tool_list_sinks,
}
def _route(method: str, params: dict):
if method == "initialize":
return {"protocolVersion": PROTOCOL_VERSION, "capabilities": {"tools": {}}, "serverInfo": SERVER_INFO}
if method == "tools/list":
return {"tools": TOOLS}
if method == "tools/call":
name = params.get("name")
handler = _TOOL_HANDLERS.get(name)
if handler is None:
return _text_result(f"Unknown tool: {name}", is_error=True)
try:
return handler(params.get("arguments") or {})
except Exception as exc: # noqa: BLE001 - report as a tool error, never crash the server
return _text_result(f"{type(exc).__name__}: {exc}", is_error=True)
raise MethodNotFound(method)
def dispatch(request: dict):
"""Handle one JSON-RPC request dict; return a response dict, or None for notifications."""
is_notification = "id" not in request
req_id = request.get("id")
try:
result = _route(request.get("method"), request.get("params") or {})
except MethodNotFound as exc:
if is_notification:
return None
return {"jsonrpc": "2.0", "id": req_id, "error": {"code": -32601, "message": f"Method not found: {exc}"}}
except Exception as exc: # noqa: BLE001
if is_notification:
return None
return {"jsonrpc": "2.0", "id": req_id, "error": {"code": -32603, "message": str(exc)}}
if is_notification:
return None
return {"jsonrpc": "2.0", "id": req_id, "result": result}
def serve(stdin=None, stdout=None) -> None:
"""Read newline-delimited JSON-RPC from stdin, write responses to stdout."""
stdin = stdin or sys.stdin
stdout = stdout or sys.stdout
for line in stdin:
line = line.strip()
if not line:
continue
try:
request = json.loads(line)
except ValueError:
continue
response = dispatch(request)
if response is not None:
stdout.write(json.dumps(response, ensure_ascii=False) + "\n")
stdout.flush()
def main() -> int:
serve()
return 0
if __name__ == "__main__":
sys.exit(main())

View File

@ -0,0 +1,75 @@
"""Pluggable delivery sinks for parsed Markdown.
Each submodule registers one or more :class:`Sink` implementations that deliver a
:class:`ParsedDoc` into a content tool using that tool's official ingestion path.
Importing this package populates the registry; a sink module that fails to import
is recorded in :data:`IMPORT_ERRORS` rather than breaking the others.
"""
from __future__ import annotations
import importlib
import sys
from .base import ( # noqa: F401
ParsedDoc,
Sink,
SinkError,
SinkResult,
get_sink,
sink_names,
REGISTRY,
)
# Sink modules to load. Order is cosmetic.
_MODULES = [
"local", # obsidian, logseq (filesystem)
"siyuan",
"notion",
"linear",
"yuque",
"coda",
"ticktick",
"dingtalk",
"airtable",
"wecom",
"slack",
"feishu",
"confluence",
"onenote",
"roam", # optional dependency (roam-client)
"wps", # optional dependency (html-for-docx)
]
IMPORT_ERRORS: dict = {}
for _name in _MODULES:
try:
importlib.import_module(f"{__name__}.{_name}")
except Exception as exc: # noqa: BLE001 - a bad sink shouldn't break the rest
IMPORT_ERRORS[_name] = f"{type(exc).__name__}: {exc}"
print(f"[sinks] failed to load {_name}: {exc}", file=sys.stderr)
def deliver_all(doc: ParsedDoc, names) -> list:
"""Deliver ``doc`` to each named sink, returning a list of :class:`SinkResult`."""
results = []
for name in names:
sink = get_sink(name)
if sink is None:
results.append(SinkResult(sink=name, ok=False, error=f"unknown sink '{name}'"))
continue
missing = sink.missing_config()
if missing:
results.append(SinkResult(
sink=sink.name, ok=False,
error=f"missing config: {', '.join(missing)}",
))
continue
try:
results.append(sink.deliver(doc))
except SinkError as exc:
results.append(SinkResult(sink=sink.name, ok=False, error=str(exc)))
except Exception as exc: # noqa: BLE001 - surface but never crash the run
results.append(SinkResult(sink=sink.name, ok=False, error=f"{type(exc).__name__}: {exc}"))
return results

View File

@ -0,0 +1,72 @@
"""Zero-dependency HTTP helpers shared by all sinks (stdlib urllib only).
``http_request`` is the single seam tests monkeypatch.
"""
from __future__ import annotations
import json
import mimetypes
import urllib.error
import urllib.request
from typing import Optional
USER_AGENT = "MinerU-Skill-sink/1.0"
def http_request(method, url, *, headers=None, data=None, timeout=60):
"""Perform one HTTP request. Returns ``(status_code, body_bytes)``."""
req = urllib.request.Request(url, data=data, method=method, headers=headers or {})
req.add_header("User-Agent", USER_AGENT)
try:
with urllib.request.urlopen(req, timeout=timeout) as resp:
return resp.getcode(), resp.read()
except urllib.error.HTTPError as exc:
body = exc.read() if hasattr(exc, "read") else b""
return exc.code, body
def request_json(method, url, *, headers=None, payload=None, timeout=60):
"""JSON request helper. Returns ``(status_code, parsed_json_or_empty_dict)``."""
hdrs = dict(headers or {})
body = None
if payload is not None:
hdrs.setdefault("Content-Type", "application/json")
body = json.dumps(payload, ensure_ascii=False).encode("utf-8")
status, raw = http_request(method, url, headers=hdrs, data=body, timeout=timeout)
parsed: dict = {}
if raw:
try:
parsed = json.loads(raw.decode("utf-8"))
except (ValueError, UnicodeDecodeError):
parsed = {}
return status, parsed
def encode_multipart(fields=None, files=None):
"""Build a ``multipart/form-data`` body with stdlib only.
``fields``: dict of str -> str. ``files``: list of (field_name, filename, bytes).
Returns ``(content_type, body_bytes)``.
"""
boundary = "----MinerUSinkBoundary7MA4YWxkTrZu0gW"
crlf = b"\r\n"
parts = []
for name, value in (fields or {}).items():
parts.append(b"--" + boundary.encode())
parts.append(f'Content-Disposition: form-data; name="{name}"'.encode())
parts.append(b"")
parts.append(str(value).encode("utf-8"))
for field_name, filename, content in files or []:
ctype = mimetypes.guess_type(filename)[0] or "application/octet-stream"
parts.append(b"--" + boundary.encode())
parts.append(
f'Content-Disposition: form-data; name="{field_name}"; filename="{filename}"'.encode()
)
parts.append(f"Content-Type: {ctype}".encode())
parts.append(b"")
parts.append(content)
parts.append(b"--" + boundary.encode() + b"--")
parts.append(b"")
body = crlf.join(parts)
return f"multipart/form-data; boundary={boundary}", body

View File

@ -0,0 +1,244 @@
"""Small, dependency-free Markdown utilities used by sinks.
These are intentionally pragmatic, not a full CommonMark implementation: they
cover the constructs MinerU emits (headings, emphasis, code, lists, tables,
blockquotes, links, images) well enough to deliver faithful content to tools
that require HTML (Confluence, OneNote) or an outline (Logseq).
"""
from __future__ import annotations
import html
import re
from pathlib import Path
from typing import Optional
_IMAGE_RE = re.compile(r"!\[(?P<alt>[^\]]*)\]\((?P<ref>[^)\s]+)(?:\s+\"[^\"]*\")?\)")
_ILLEGAL_FS = re.compile(r'[\\/:*?"<>|#^\[\]]+')
def slugify(text: str, default: str = "document") -> str:
"""Filesystem/URL-safe slug."""
text = text.strip().lower()
text = re.sub(r"[\s_]+", "-", text)
text = re.sub(r"[^a-z0-9\-]+", "", text)
text = re.sub(r"-{2,}", "-", text).strip("-")
return text or default
def safe_filename(title: str, default: str = "document") -> str:
"""Clean a title into a safe note filename (keeps unicode, drops illegal chars)."""
name = _ILLEGAL_FS.sub(" ", title).strip()
name = re.sub(r"\s{2,}", " ", name)
return name[:120] or default
def is_remote(ref: str) -> bool:
return ref.startswith("http://") or ref.startswith("https://") or ref.startswith("data:")
def find_local_images(markdown: str, base_dir) -> list:
"""Return ``[(alt, ref, Path)]`` for image refs that point at existing local files."""
base = Path(base_dir) if base_dir else None
found = []
seen = set()
for match in _IMAGE_RE.finditer(markdown):
ref = match.group("ref")
if is_remote(ref) or ref in seen:
continue
path = Path(ref)
if not path.is_absolute() and base is not None:
path = base / ref
if path.is_file():
found.append((match.group("alt"), ref, path))
seen.add(ref)
return found
def rewrite_images(markdown: str, mapping: dict) -> str:
"""Rewrite local image refs using ``{old_ref: new_ref}``."""
def repl(match):
ref = match.group("ref")
if ref in mapping:
return f"![{match.group('alt')}]({mapping[ref]})"
return match.group(0)
return _IMAGE_RE.sub(repl, markdown)
def yaml_frontmatter(props: dict) -> str:
"""Render a YAML frontmatter block. List values become ``- item`` lines."""
lines = ["---"]
for key, value in props.items():
if value is None or value == "" or value == []:
continue
if isinstance(value, (list, tuple)):
lines.append(f"{key}:")
for item in value:
lines.append(f" - {item}")
else:
lines.append(f"{key}: {value}")
lines.append("---")
return "\n".join(lines)
# --------------------------------------------------------------------------- #
# Inline + block Markdown -> HTML (pragmatic, XHTML-safe)
# --------------------------------------------------------------------------- #
def _inline(text: str) -> str:
"""Convert inline Markdown to HTML on already-escaped text."""
# images first, then links
text = _IMAGE_RE.sub(
lambda m: f'<img src="{html.escape(m.group("ref"), quote=True)}" alt="{m.group("alt")}" />',
text,
)
text = re.sub(r"\[([^\]]+)\]\(([^)\s]+)\)",
lambda m: f'<a href="{html.escape(m.group(2), quote=True)}">{m.group(1)}</a>', text)
text = re.sub(r"`([^`]+)`", r"<code>\1</code>", text)
text = re.sub(r"\*\*([^*]+)\*\*", r"<strong>\1</strong>", text)
text = re.sub(r"(?<!\*)\*(?!\*)([^*]+)\*(?!\*)", r"<em>\1</em>", text)
return text
def md_to_html(markdown: str) -> str:
"""Convert a Markdown document to a pragmatic, XHTML-safe HTML fragment."""
out = []
lines = markdown.replace("\r\n", "\n").split("\n")
i = 0
n = len(lines)
in_code = False
code_buf: list = []
list_stack: list = [] # 'ul' / 'ol'
def close_lists():
while list_stack:
out.append(f"</{list_stack.pop()}>")
while i < n:
line = lines[i]
fence = line.strip().startswith("```")
if fence and not in_code:
close_lists()
in_code = True
code_buf = []
i += 1
continue
if fence and in_code:
out.append("<pre><code>" + html.escape("\n".join(code_buf)) + "</code></pre>")
in_code = False
i += 1
continue
if in_code:
code_buf.append(line)
i += 1
continue
stripped = line.strip()
if not stripped:
close_lists()
i += 1
continue
# table block
if "|" in stripped and i + 1 < n and re.match(r"^\s*\|?[\s:|-]+\|?\s*$", lines[i + 1]):
close_lists()
header = [c.strip() for c in stripped.strip("|").split("|")]
rows = []
i += 2
while i < n and "|" in lines[i] and lines[i].strip():
rows.append([c.strip() for c in lines[i].strip().strip("|").split("|")])
i += 1
out.append("<table><thead><tr>"
+ "".join(f"<th>{_inline(html.escape(c))}</th>" for c in header)
+ "</tr></thead><tbody>")
for row in rows:
out.append("<tr>" + "".join(f"<td>{_inline(html.escape(c))}</td>" for c in row) + "</tr>")
out.append("</tbody></table>")
continue
heading = re.match(r"^(#{1,6})\s+(.*)$", stripped)
if heading:
close_lists()
level = len(heading.group(1))
out.append(f"<h{level}>{_inline(html.escape(heading.group(2)))}</h{level}>")
i += 1
continue
if stripped.startswith(">"):
close_lists()
out.append(f"<blockquote>{_inline(html.escape(stripped[1:].strip()))}</blockquote>")
i += 1
continue
if re.match(r"^([-*+])\s+", stripped):
if not list_stack or list_stack[-1] != "ul":
close_lists()
list_stack.append("ul")
out.append("<ul>")
item = re.sub(r"^([-*+])\s+", "", stripped)
out.append(f"<li>{_inline(html.escape(item))}</li>")
i += 1
continue
if re.match(r"^\d+\.\s+", stripped):
if not list_stack or list_stack[-1] != "ol":
close_lists()
list_stack.append("ol")
out.append("<ol>")
item = re.sub(r"^\d+\.\s+", "", stripped)
out.append(f"<li>{_inline(html.escape(item))}</li>")
i += 1
continue
if re.match(r"^([-*_])\1{2,}$", stripped):
close_lists()
out.append("<hr />")
i += 1
continue
close_lists()
out.append(f"<p>{_inline(html.escape(stripped))}</p>")
i += 1
if in_code:
out.append("<pre><code>" + html.escape("\n".join(code_buf)) + "</code></pre>")
close_lists()
return "\n".join(out)
# --------------------------------------------------------------------------- #
# Markdown -> Logseq outline
# --------------------------------------------------------------------------- #
def md_to_logseq(markdown: str, properties: Optional[dict] = None) -> str:
"""Convert flat Markdown into a Logseq outline.
Every line becomes a ``- `` block. Headings are top-level blocks; the content
that follows a heading nests one level beneath it. Page properties
(``key:: value``) go on the first block, as Logseq requires.
"""
out = []
if properties:
prop_lines = []
for key, value in properties.items():
if not value:
continue
if isinstance(value, (list, tuple)):
value = ", ".join(str(v) for v in value)
prop_lines.append(f"{key}:: {value}")
if prop_lines:
out.append("- " + prop_lines[0])
out.extend(f" {p}" for p in prop_lines[1:])
have_heading = False
for raw in markdown.replace("\r\n", "\n").split("\n"):
line = raw.strip()
if not line:
continue
if re.match(r"^#{1,6}\s+", line):
out.append(f"- {line}")
have_heading = True
elif have_heading:
out.append(f"\t- {line}")
else:
out.append(f"- {line}")
return "\n".join(out)

View File

@ -0,0 +1,50 @@
"""Airtable sink — store parsed Markdown as a record in a base/table.
Airtable is a database, not a document tool: the native ingestion path is a
record whose fields hold the title and the Markdown body. Field names are
configurable to match an existing table schema.
Docs: https://airtable.com/developers/web/api/create-records
(POST /v0/{baseId}/{tableIdOrName}).
"""
from __future__ import annotations
import urllib.parse
from . import _http
from .base import ParsedDoc, Sink, SinkError, SinkResult, register
API_BASE = "https://api.airtable.com/v0"
@register
class AirtableSink(Sink):
name = "airtable"
requires = ("AIRTABLE_API_KEY", "AIRTABLE_BASE_ID", "AIRTABLE_TABLE")
label = "Airtable record (database)"
def deliver(self, doc: ParsedDoc) -> SinkResult:
api_key = self.env("AIRTABLE_API_KEY")
base = self.env("AIRTABLE_BASE_ID")
table = self.env("AIRTABLE_TABLE")
title_field = self.env("AIRTABLE_TITLE_FIELD", "Title")
body_field = self.env("AIRTABLE_BODY_FIELD", "Notes")
url = f"{API_BASE}/{base}/{urllib.parse.quote(table)}"
headers = {"Authorization": f"Bearer {api_key}"}
payload = {"fields": {title_field: doc.title, body_field: doc.markdown}}
status, parsed = _http.request_json("POST", url, headers=headers, payload=payload)
if parsed.get("error") or status >= 400:
raise SinkError(str(parsed.get("error") or f"HTTP {status}"))
if not parsed.get("id"):
raise SinkError(f"Airtable returned no record id: {parsed}")
return SinkResult(
sink=self.name,
ok=True,
url=None,
detail="stored as a database record (Airtable is a DB, not a doc)",
)

View File

@ -0,0 +1,101 @@
"""Core types and the sink registry for delivering parsed Markdown to content tools.
A *sink* takes a :class:`ParsedDoc` (Markdown + local images + metadata) and
delivers it into one destination (Obsidian, Notion, Slack, Feishu, ...) using
that tool's OFFICIAL native ingestion path. Sinks read their configuration from
environment variables so an AI agent can run them without interactive prompts.
"""
from __future__ import annotations
import os
from dataclasses import dataclass, field
from typing import Optional
@dataclass
class ParsedDoc:
"""A parsed document ready for delivery."""
title: str
markdown: str
images: tuple = () # absolute paths to local image files
source: str = ""
modality: str = "unknown"
markdown_path: Optional[str] = None
@dataclass
class SinkResult:
"""Outcome of delivering a :class:`ParsedDoc` to one sink."""
sink: str
ok: bool
url: Optional[str] = None
detail: Optional[str] = None
error: Optional[str] = None
def to_status(self) -> dict:
return {
"sink": self.sink,
"ok": self.ok,
"url": self.url,
"detail": self.detail,
"error": self.error,
}
class SinkError(Exception):
"""Raised by a sink when delivery fails for a known reason."""
class Sink:
"""Base class for a delivery target.
Subclasses set ``name``/``aliases``/``requires`` and implement
:meth:`deliver`. ``requires`` lists the environment variables that must be
present for the sink to be usable.
"""
name: str = "base"
aliases: tuple = ()
requires: tuple = () # required env vars
label: str = "" # human description
local: bool = False # filesystem-only, no network/auth
def env(self, key: str, default: Optional[str] = None) -> Optional[str]:
value = os.environ.get(key, default)
return value.strip() if isinstance(value, str) else value
def missing_config(self) -> list:
return [k for k in self.requires if not self.env(k)]
def is_configured(self) -> bool:
return not self.missing_config()
def deliver(self, doc: ParsedDoc) -> SinkResult: # pragma: no cover - abstract
raise NotImplementedError
# --------------------------------------------------------------------------- #
# Registry
# --------------------------------------------------------------------------- #
REGISTRY: dict = {}
def register(cls):
"""Class decorator that instantiates a sink and registers it by name+aliases."""
inst = cls()
REGISTRY[inst.name] = inst
for alias in inst.aliases:
REGISTRY[alias] = inst
return cls
def get_sink(name: str) -> Optional[Sink]:
return REGISTRY.get(name.lower())
def sink_names() -> list:
"""Canonical sink names (no aliases), sorted."""
return sorted({s.name for s in REGISTRY.values()})

View File

@ -0,0 +1,72 @@
"""Coda sink: deliver Markdown as a page, into an existing doc or a new one.
Coda's API (``https://coda.io/apis/v1``) authenticates with a Bearer token.
Markdown is delivered as canvas page content. If ``CODA_DOC_ID`` is set, a new
page is added to that doc; otherwise a new doc is created with the content as its
initial page.
Coda canvas content embeds images by URL only, so local image refs are left
untouched host images at a public URL for them to render.
"""
from __future__ import annotations
from pathlib import Path
from . import _http, _md
from .base import ParsedDoc, Sink, SinkError, SinkResult, register
API = "https://coda.io/apis/v1"
def _canvas(markdown: str) -> dict:
return {"type": "canvas", "canvasContent": {"format": "markdown", "content": markdown}}
@register
class CodaSink(Sink):
name = "coda"
requires = ("CODA_API_TOKEN",)
label = "Coda page (REST API)"
def deliver(self, doc: ParsedDoc) -> SinkResult:
token = self.env("CODA_API_TOKEN")
doc_id = self.env("CODA_DOC_ID")
headers = {
"Authorization": f"Bearer {token}",
"Content-Type": "application/json",
}
base_dir = Path(doc.markdown_path).parent if doc.markdown_path else None
n_images = len(_md.find_local_images(doc.markdown, base_dir))
if doc_id:
status, parsed = _http.request_json(
"POST", f"{API}/docs/{doc_id}/pages", headers=headers, payload={
"name": doc.title,
"pageContent": _canvas(doc.markdown),
},
)
else:
status, parsed = _http.request_json(
"POST", f"{API}/docs", headers=headers, payload={
"title": doc.title,
"initialPage": {
"name": doc.title,
"pageContent": _canvas(doc.markdown),
},
},
)
if status >= 400:
raise SinkError(parsed.get("message") or f"HTTP {status}")
if n_images:
detail = f"text only ({n_images} local image(s); Coda embeds images by URL)"
else:
detail = "text only"
return SinkResult(
sink=self.name, ok=True,
url=parsed.get("browserLink"),
detail=detail,
)

View File

@ -0,0 +1,66 @@
"""Confluence sink: create a page from the parsed Markdown via the Cloud REST API.
Confluence Cloud ingests content as *storage-format* HTML. Delivery converts the
Markdown to HTML and creates a page with the v2 REST API
(``POST /wiki/api/v2/pages``) using Basic auth (email + API token).
Local images are not attached Confluence storage HTML references attachments by
filename, which would require a separate upload step.
"""
from __future__ import annotations
import base64
from . import _http, _md
from .base import ParsedDoc, Sink, SinkError, SinkResult, register
@register
class ConfluenceSink(Sink):
name = "confluence"
requires = (
"CONFLUENCE_BASE_URL",
"CONFLUENCE_EMAIL",
"CONFLUENCE_API_TOKEN",
"CONFLUENCE_SPACE_ID",
)
label = "Confluence Cloud page (storage HTML)"
def deliver(self, doc: ParsedDoc) -> SinkResult:
base = self.env("CONFLUENCE_BASE_URL").rstrip("/")
email = self.env("CONFLUENCE_EMAIL")
token = self.env("CONFLUENCE_API_TOKEN")
space = self.env("CONFLUENCE_SPACE_ID")
auth = base64.b64encode(f"{email}:{token}".encode("utf-8")).decode("ascii")
headers = {
"Authorization": f"Basic {auth}",
"Content-Type": "application/json",
}
html = _md.md_to_html(doc.markdown)
status, parsed = _http.request_json(
"POST",
f"{base}/wiki/api/v2/pages",
headers=headers,
payload={
"spaceId": space,
"status": "current",
"title": doc.title,
"body": {"representation": "storage", "value": html},
},
)
if status >= 400:
raise SinkError(
parsed.get("title")
or parsed.get("message")
or f"Confluence HTTP {status}"
)
webui = (parsed.get("_links") or {}).get("webui")
url = base + webui if webui else None
return SinkResult(
sink=self.name, ok=True, url=url,
detail="converted Markdown->storage HTML (local images not attached)",
)

View File

@ -0,0 +1,65 @@
"""DingTalk (钉钉) sink — push parsed Markdown as a robot markdown message.
A DingTalk custom robot accepts a ``markdown`` message type. The official native
ingestion path is therefore a webhook POST carrying the document title and body.
When a signing secret is configured the request is HMAC-SHA256 signed per
DingTalk's spec. DingTalk's markdown renderer only fetches images over public
URLs, so local images won't render.
Docs: https://open.dingtalk.com/document/robots/custom-robot-access.
"""
from __future__ import annotations
import base64
import hashlib
import hmac
import time
import urllib.parse
from . import _http
from .base import ParsedDoc, Sink, SinkError, SinkResult, register
@register
class DingTalkSink(Sink):
name = "dingtalk"
aliases = ("钉钉",)
requires = ("DINGTALK_WEBHOOK",)
label = "DingTalk robot markdown (钉钉)"
def _build_url(self) -> str:
webhook = self.env("DINGTALK_WEBHOOK")
if webhook.startswith("http"):
url = webhook
else:
url = f"https://oapi.dingtalk.com/robot/send?access_token={webhook}"
secret = self.env("DINGTALK_SECRET")
if secret:
timestamp = str(round(time.time() * 1000))
string_to_sign = f"{timestamp}\n{secret}"
hmac_code = hmac.new(
secret.encode(), string_to_sign.encode(), hashlib.sha256
).digest()
sign = urllib.parse.quote_plus(base64.b64encode(hmac_code))
url += f"&timestamp={timestamp}&sign={sign}"
return url
def deliver(self, doc: ParsedDoc) -> SinkResult:
url = self._build_url()
payload = {
"msgtype": "markdown",
"markdown": {"title": doc.title, "text": doc.markdown},
}
status, parsed = _http.request_json("POST", url, payload=payload)
if parsed.get("errcode") not in (0, None):
raise SinkError(parsed.get("errmsg") or f"DingTalk HTTP {status}: {parsed}")
return SinkResult(
sink=self.name,
ok=True,
url=None,
detail="robot markdown message (local images won't render; host publicly)",
)

View File

@ -0,0 +1,124 @@
"""Feishu / Lark sink: import the parsed Markdown as a Docx document.
Feishu (飞书) / Lark ingests Markdown through its Drive import pipeline. Delivery
follows that official path:
1. ``tenant_access_token/internal`` exchange the app id/secret for a tenant
access token.
2. ``drive/v1/medias/upload_all`` upload the ``.md`` bytes as an import medium
and obtain a ``file_token``.
3. ``drive/v1/import_tasks`` kick off an import task converting the medium to a
Docx, returning a ``ticket``.
4. Poll ``drive/v1/import_tasks/{ticket}`` until the job finishes, surfacing the
resulting document URL.
Local images are not uploaded they would need public URLs to render in Docx.
"""
from __future__ import annotations
import json
import time
from . import _http, _md
from .base import ParsedDoc, Sink, SinkError, SinkResult, register
@register
class FeishuSink(Sink):
name = "feishu"
aliases = ("lark", "飞书")
requires = ("FEISHU_APP_ID", "FEISHU_APP_SECRET")
label = "Feishu / Lark Docx (Drive import)"
def deliver(self, doc: ParsedDoc) -> SinkResult:
app_id = self.env("FEISHU_APP_ID")
app_secret = self.env("FEISHU_APP_SECRET")
folder_token = self.env("FEISHU_FOLDER_TOKEN")
# Step 1: tenant access token.
status, parsed = _http.request_json(
"POST",
"https://open.feishu.cn/open-apis/auth/v3/tenant_access_token/internal",
payload={"app_id": app_id, "app_secret": app_secret},
)
token = parsed.get("tenant_access_token")
if parsed.get("code") not in (0, None) or not token:
raise SinkError(parsed.get("msg") or f"Feishu auth failed (HTTP {status})")
headers = {"Authorization": f"Bearer {token}"}
# Step 2: upload the Markdown bytes as an import medium.
content = doc.markdown.encode("utf-8")
fname = _md.safe_filename(doc.title) + ".md"
ctype, body = _http.encode_multipart(
fields={
"file_name": fname,
"parent_type": "ccm_import_open",
"size": str(len(content)),
"extra": json.dumps({"obj_type": "docx", "file_extension": "md"}),
},
files=[("file", fname, content)],
)
up_status, raw = _http.http_request(
"POST",
"https://open.feishu.cn/open-apis/drive/v1/medias/upload_all",
headers={**headers, "Content-Type": ctype},
data=body,
)
parsed = _parse_json(raw)
if parsed.get("code") not in (0, None):
raise SinkError(parsed.get("msg") or f"Feishu media upload failed (HTTP {up_status})")
file_token = (parsed.get("data") or {}).get("file_token")
if not file_token:
raise SinkError("Feishu did not return a file_token")
# Step 3: create the import task.
status, parsed = _http.request_json(
"POST",
"https://open.feishu.cn/open-apis/drive/v1/import_tasks",
headers=headers,
payload={
"file_extension": "md",
"file_token": file_token,
"type": "docx",
"file_name": doc.title,
"point": {"mount_type": 1, "mount_key": folder_token or ""},
},
)
if parsed.get("code") not in (0, None):
raise SinkError(parsed.get("msg") or f"Feishu import task failed (HTTP {status})")
ticket = (parsed.get("data") or {}).get("ticket")
if not ticket:
raise SinkError("Feishu did not return an import ticket")
# Step 4: poll until the import job completes.
url = None
for _attempt in range(20):
status, parsed = _http.request_json(
"GET",
f"https://open.feishu.cn/open-apis/drive/v1/import_tasks/{ticket}",
headers=headers,
)
res = (parsed.get("data") or {}).get("result") or {}
job_status = res.get("job_status")
if job_status == 0:
url = res.get("url")
break
if job_status in (1, 2):
time.sleep(1)
continue
raise SinkError(res.get("job_error_msg") or "Feishu import failed")
return SinkResult(
sink=self.name, ok=True, url=url,
detail="imported to Feishu Docx (local images need public URLs)",
)
def _parse_json(raw):
if not raw:
return {}
try:
return json.loads(raw.decode("utf-8"))
except (ValueError, UnicodeDecodeError):
return {}

View File

@ -0,0 +1,75 @@
"""Linear sink: create an issue from Markdown via the GraphQL API.
Linear's API is GraphQL at ``https://api.linear.app/graphql`` and authenticates
with a raw API key in the ``Authorization`` header (no ``Bearer`` prefix). The
issue description is Markdown; Linear renders inline ``data:`` image URIs, so
local images are read and embedded as base64 data URIs before delivery.
"""
from __future__ import annotations
import base64
from pathlib import Path
from . import _http, _md
from .base import ParsedDoc, Sink, SinkError, SinkResult, register
API = "https://api.linear.app/graphql"
_MUTATION = (
"mutation IssueCreate($input: IssueCreateInput!)"
"{issueCreate(input:$input){success issue{id url identifier}}}"
)
_MIME = {
".png": "image/png",
".jpg": "image/jpeg",
".jpeg": "image/jpeg",
".gif": "image/gif",
".webp": "image/webp",
}
def _data_uri(path: Path) -> str:
mime = _MIME.get(path.suffix.lower(), "image/png")
b64 = base64.b64encode(path.read_bytes()).decode("ascii")
return f"data:{mime};base64,{b64}"
@register
class LinearSink(Sink):
name = "linear"
requires = ("LINEAR_API_KEY", "LINEAR_TEAM_ID")
label = "Linear issue (GraphQL API)"
def deliver(self, doc: ParsedDoc) -> SinkResult:
key = self.env("LINEAR_API_KEY")
team = self.env("LINEAR_TEAM_ID")
headers = {"Authorization": key, "Content-Type": "application/json"}
base_dir = Path(doc.markdown_path).parent if doc.markdown_path else None
images = _md.find_local_images(doc.markdown, base_dir)
mapping = {ref: _data_uri(path) for _alt, ref, path in images}
body = _md.rewrite_images(doc.markdown, mapping)
status, parsed = _http.request_json("POST", API, headers=headers, payload={
"query": _MUTATION,
"variables": {"input": {
"teamId": team,
"title": doc.title,
"description": body,
}},
})
if parsed.get("errors"):
raise SinkError(str(parsed["errors"]))
result = ((parsed.get("data") or {}).get("issueCreate")) or {}
if not result.get("success"):
raise SinkError(f"Linear did not create the issue (HTTP {status})")
issue = result.get("issue") or {}
return SinkResult(
sink=self.name, ok=True,
url=issue.get("url"),
detail=f"{len(mapping)} image(s) inlined",
)

View File

@ -0,0 +1,105 @@
"""Local-first sinks: Obsidian and Logseq (filesystem writes, no auth).
Both tools are folders of Markdown files. The native ingestion is a filesystem
write following each tool's conventions:
* Obsidian a flat note with YAML frontmatter; images in a per-note assets
folder, referenced with relative Markdown embeds.
* Logseq an outline (every line a ``- `` block) with ``key:: value`` page
properties on the first block; images in ``assets/`` referenced as
``![](../assets/x.png)``.
"""
from __future__ import annotations
from pathlib import Path
from . import _md
from .base import ParsedDoc, Sink, SinkError, SinkResult, register
def _copy_images(doc: ParsedDoc, dest_dir: Path, ref_prefix: str) -> dict:
"""Copy referenced local images into ``dest_dir``; return ``{old_ref: new_ref}``."""
base = Path(doc.markdown_path).parent if doc.markdown_path else None
mapping = {}
images = _md.find_local_images(doc.markdown, base)
if images:
dest_dir.mkdir(parents=True, exist_ok=True)
for _alt, ref, path in images:
target = dest_dir / path.name
target.write_bytes(path.read_bytes())
mapping[ref] = f"{ref_prefix}{path.name}"
return mapping
@register
class ObsidianSink(Sink):
name = "obsidian"
aliases = ("ob",)
requires = ("OBSIDIAN_VAULT",)
label = "Obsidian vault (local Markdown)"
local = True
def deliver(self, doc: ParsedDoc) -> SinkResult:
vault = Path(self.env("OBSIDIAN_VAULT")).expanduser()
if not vault.is_dir():
raise SinkError(f"Obsidian vault not found: {vault}")
subdir = self.env("OBSIDIAN_SUBDIR", "") or ""
note_dir = vault / subdir if subdir else vault
note_dir.mkdir(parents=True, exist_ok=True)
stem = _md.safe_filename(doc.title)
assets = note_dir / f"{stem}.assets"
mapping = _copy_images(doc, assets, f"{stem}.assets/")
body = _md.rewrite_images(doc.markdown, mapping)
front = _md.yaml_frontmatter({
"title": doc.title,
"source": doc.source,
"modality": doc.modality,
"tags": ["mineru", "parsed"],
})
note_path = note_dir / f"{stem}.md"
note_path.write_text(f"{front}\n\n{body}\n", encoding="utf-8")
return SinkResult(sink=self.name, ok=True, url=str(note_path),
detail=f"{len(mapping)} image(s)")
@register
class LogseqSink(Sink):
name = "logseq"
requires = ("LOGSEQ_GRAPH",)
label = "Logseq graph (local outline)"
local = True
def deliver(self, doc: ParsedDoc) -> SinkResult:
graph = Path(self.env("LOGSEQ_GRAPH")).expanduser()
if not graph.is_dir():
raise SinkError(f"Logseq graph not found: {graph}")
pages = graph / "pages"
assets = graph / "assets"
pages.mkdir(parents=True, exist_ok=True)
stem = _md.safe_filename(doc.title)
# Namespace asset names by page slug to avoid collisions in the shared assets/.
prefix = _md.slugify(doc.title)
mapping = {}
base = Path(doc.markdown_path).parent if doc.markdown_path else None
images = _md.find_local_images(doc.markdown, base)
if images:
assets.mkdir(parents=True, exist_ok=True)
for _alt, ref, path in images:
new_name = f"{prefix}-{path.name}"
(assets / new_name).write_bytes(path.read_bytes())
mapping[ref] = f"../assets/{new_name}"
body = _md.rewrite_images(doc.markdown, mapping)
outline = _md.md_to_logseq(body, properties={
"title": doc.title,
"source": doc.source,
"tags": "mineru, parsed",
})
page_path = pages / f"{stem}.md"
page_path.write_text(outline + "\n", encoding="utf-8")
return SinkResult(sink=self.name, ok=True, url=str(page_path),
detail=f"{len(mapping)} image(s)")

View File

@ -0,0 +1,130 @@
"""Notion sink: create a page under a parent page from Markdown blocks.
Notion's native ingestion is the block API: each Markdown line becomes a typed
block (heading, quote, code, list item, paragraph). A page is created with up to
100 children inline; any remainder is appended in 100-block chunks via the
``/blocks/{id}/children`` PATCH endpoint.
Notion has no inline image-from-bytes path (images must be uploaded or hosted
separately), so local image refs are intentionally left untouched.
"""
from __future__ import annotations
from pathlib import Path
from . import _http, _md
from .base import ParsedDoc, Sink, SinkError, SinkResult, register
API = "https://api.notion.com/v1"
MAX_BLOCKS = 100
MAX_TEXT = 2000
def _rich(text: str) -> list:
return [{"type": "text", "text": {"content": text[:MAX_TEXT]}}]
def _block(block_type: str, text: str, **extra) -> dict:
inner = {"rich_text": _rich(text)}
inner.update(extra)
return {"object": "block", "type": block_type, block_type: inner}
def _is_numbered(text: str) -> bool:
head = text.split(".", 1)
return len(head) == 2 and head[0].isdigit() and head[1].startswith(" ")
def _blocks(markdown: str) -> list:
"""Convert flat Markdown lines into a list of Notion block dicts."""
blocks = []
in_code = False
code_buf: list = []
for raw in markdown.replace("\r\n", "\n").split("\n"):
stripped = raw.strip()
if stripped.startswith("```"):
if in_code:
blocks.append(_block("code", "\n".join(code_buf), language="plain text"))
in_code = False
code_buf = []
else:
in_code = True
code_buf = []
continue
if in_code:
code_buf.append(raw)
continue
if not stripped:
continue
if stripped.startswith("# "):
blocks.append(_block("heading_1", stripped[2:].strip()))
elif stripped.startswith("## "):
blocks.append(_block("heading_2", stripped[3:].strip()))
elif stripped.startswith("### "):
blocks.append(_block("heading_3", stripped[4:].strip()))
elif stripped.startswith("> "):
blocks.append(_block("quote", stripped[2:].strip()))
elif stripped.startswith("- ") or stripped.startswith("* "):
blocks.append(_block("bulleted_list_item", stripped[2:].strip()))
elif _is_numbered(stripped):
blocks.append(_block("numbered_list_item", stripped.split(".", 1)[1].strip()))
else:
blocks.append(_block("paragraph", stripped))
if in_code:
blocks.append(_block("code", "\n".join(code_buf), language="plain text"))
return blocks
@register
class NotionSink(Sink):
name = "notion"
requires = ("NOTION_API_KEY", "NOTION_PARENT_PAGE_ID")
label = "Notion page (blocks API)"
def deliver(self, doc: ParsedDoc) -> SinkResult:
key = self.env("NOTION_API_KEY")
parent = self.env("NOTION_PARENT_PAGE_ID")
version = self.env("NOTION_VERSION", "2022-06-28") or "2022-06-28"
headers = {
"Authorization": f"Bearer {key}",
"Notion-Version": version,
"Content-Type": "application/json",
}
# Count local images for the detail note (refs are left as-is).
base_dir = Path(doc.markdown_path).parent if doc.markdown_path else None
n_images = len(_md.find_local_images(doc.markdown, base_dir))
blocks = _blocks(doc.markdown)
status, parsed = _http.request_json("POST", f"{API}/pages", headers=headers, payload={
"parent": {"page_id": parent},
"properties": {"title": {"title": [{"text": {"content": doc.title}}]}},
"children": blocks[:MAX_BLOCKS],
})
if parsed.get("object") == "error":
raise SinkError(parsed.get("message") or f"Notion API error (HTTP {status})")
created_id = parsed.get("id")
if not created_id:
raise SinkError(f"Notion did not return a page id (HTTP {status})")
page_url = parsed.get("url")
for start in range(MAX_BLOCKS, len(blocks), MAX_BLOCKS):
chunk = blocks[start:start + MAX_BLOCKS]
ch_status, ch_parsed = _http.request_json(
"PATCH", f"{API}/blocks/{created_id}/children",
headers=headers, payload={"children": chunk},
)
if ch_parsed.get("object") == "error":
raise SinkError(ch_parsed.get("message")
or f"Notion block append failed (HTTP {ch_status})")
if n_images:
detail = (f"text+structure ({n_images} local images not embedded; "
f"Notion needs file upload)")
else:
detail = "text+structure"
return SinkResult(sink=self.name, ok=True, url=page_url, detail=detail)

View File

@ -0,0 +1,66 @@
"""OneNote sink: create a page from the parsed Markdown via Microsoft Graph.
OneNote pages are created by POSTing an HTML document to a section's ``pages``
endpoint with a pre-obtained Microsoft Graph access token (OAuth). Delivery
converts the Markdown to a full HTML document and creates the page.
Only remote images render Graph fetches ``<img src>`` URLs, so local image
paths emitted by MinerU would need to be public URLs.
"""
from __future__ import annotations
import html
import json
from . import _http, _md
from .base import ParsedDoc, Sink, SinkError, SinkResult, register
@register
class OneNoteSink(Sink):
name = "onenote"
aliases = ("msonenote",)
requires = ("ONENOTE_TOKEN", "ONENOTE_SECTION_ID")
label = "OneNote section page (Microsoft Graph)"
def deliver(self, doc: ParsedDoc) -> SinkResult:
token = self.env("ONENOTE_TOKEN")
section = self.env("ONENOTE_SECTION_ID")
body_html = _md.md_to_html(doc.markdown)
page = (
"<!DOCTYPE html><html><head>"
f"<title>{html.escape(doc.title)}</title>"
f"</head><body>{body_html}</body></html>"
)
status, raw = _http.http_request(
"POST",
f"https://graph.microsoft.com/v1.0/me/onenote/sections/{section}/pages",
headers={
"Authorization": f"Bearer {token}",
"Content-Type": "text/html",
},
data=page.encode("utf-8"),
)
if status >= 400:
preview = raw.decode("utf-8", "replace") if raw else ""
raise SinkError(f"OneNote HTTP {status}: {preview[:200]}")
if status != 201:
raise SinkError(f"OneNote unexpected response (HTTP {status})")
parsed = {}
if raw:
try:
parsed = json.loads(raw.decode("utf-8"))
except (ValueError, UnicodeDecodeError):
parsed = {}
links = parsed.get("links") or {}
web = links.get("oneNoteWebUrl") or {}
url = web.get("href")
return SinkResult(
sink=self.name, ok=True, url=url,
detail="converted Markdown->HTML (remote images only; OAuth token required)",
)

View File

@ -0,0 +1,106 @@
"""Roam Research sink — optional dependency.
There is no library that ingests a Markdown document into Roam, but the official
``roam-client`` SDK correctly handles the parts that are easy to get wrong the
307/308 peer-host redirect, the dual ``Authorization`` / ``x-authorization``
Bearer headers, and the ``/write`` plumbing. So we lazily depend on it for
transport and only build the Markdown block-tree ourselves, delivering the whole
document in a single ``batch-actions`` request (one HTTP round-trip).
Install the SDK (git-only, not on PyPI; needs Python 3.11):
pip install "roam-client @ git+https://github.com/Roam-Research/backend-sdks.git#subdirectory=python"
Config: ``ROAM_API_TOKEN`` (graph edit token), ``ROAM_GRAPH_NAME``.
"""
from __future__ import annotations
import itertools
import re
from .base import ParsedDoc, Sink, SinkError, SinkResult, register
_HEADING = re.compile(r"^(#{1,6})\s+(.*)$")
_INSTALL_HINT = (
'Roam sink needs the official SDK — pip install '
'"roam-client @ git+https://github.com/Roam-Research/backend-sdks.git#subdirectory=python"'
)
def md_to_roam_tree(markdown: str) -> list:
"""Convert Markdown into a nested Roam block tree.
Headings become parent blocks (``heading`` 13); the lines under a heading
nest beneath it. Returns ``[{"string", "heading"?, "children": [...]}, ...]``.
"""
roots: list = []
stack: list = [] # [(heading_level, node)]
for raw in markdown.replace("\r\n", "\n").split("\n"):
line = raw.strip()
if not line:
continue
match = _HEADING.match(line)
if match:
level = len(match.group(1))
node = {"string": match.group(2), "heading": min(level, 3), "children": []}
while stack and stack[-1][0] >= level:
stack.pop()
(stack[-1][1]["children"] if stack else roots).append(node)
stack.append((level, node))
else:
node = {"string": line, "children": []}
(stack[-1][1]["children"] if stack else roots).append(node)
return roots
def tree_to_actions(children: list, parent_uid: str, uidgen) -> list:
"""Flatten a block tree into ``create-block`` actions for one batch request."""
actions: list = []
for order, node in enumerate(children):
uid = uidgen()
block = {"string": node["string"], "uid": uid}
if node.get("heading"):
block["heading"] = node["heading"]
actions.append({
"action": "create-block",
"location": {"parent-uid": parent_uid, "order": order},
"block": block,
})
actions.extend(tree_to_actions(node.get("children", []), uid, uidgen))
return actions
@register
class RoamSink(Sink):
name = "roam"
aliases = ("roamresearch",)
requires = ("ROAM_API_TOKEN", "ROAM_GRAPH_NAME")
label = "Roam Research (batch-actions, optional dep)"
def deliver(self, doc: ParsedDoc) -> SinkResult:
try:
from roam_client.client import create_page, initialize_graph
except ImportError as exc: # pragma: no cover - exercised via SinkError path
raise SinkError(_INSTALL_HINT) from exc
token = self.env("ROAM_API_TOKEN")
graph = self.env("ROAM_GRAPH_NAME")
client = initialize_graph({"token": token, "graph": graph})
create_page(client, {"page": {"title": doc.title}})
counter = itertools.count(1)
actions = tree_to_actions(
md_to_roam_tree(doc.markdown), doc.title, lambda: f"mu{next(counter):07d}"
)
if actions:
client.call(
f"/api/graph/{graph}/write", "POST",
{"action": "batch-actions", "actions": actions},
)
return SinkResult(
sink=self.name, ok=True,
url=f"https://roamresearch.com/#/app/{graph}",
detail=f"{len(actions)} block(s) via batch-actions (images need public URLs)",
)

View File

@ -0,0 +1,111 @@
"""SiYuan sink: create a new document from Markdown via the local kernel API.
SiYuan (思源笔记) exposes a kernel HTTP API (default ``http://127.0.0.1:6806``)
authenticated with an API token. Delivery follows SiYuan's native ingestion path:
1. Resolve the target notebook (``SIYUAN_NOTEBOOK`` or the first listed notebook).
2. Upload each referenced local image via ``/api/asset/upload`` and rewrite the
Markdown to point at the returned ``assets/...`` paths.
3. Create the document with ``/api/filetree/createDocWithMd``.
Every kernel response wraps its payload as ``{"code": 0, "msg": "", "data": ...}``;
a non-zero ``code`` is an error.
"""
from __future__ import annotations
import json
from pathlib import Path
from . import _http, _md
from .base import ParsedDoc, Sink, SinkError, SinkResult, register
@register
class SiYuanSink(Sink):
name = "siyuan"
requires = ("SIYUAN_TOKEN",)
label = "SiYuan notebook (local kernel API)"
def _json_post(self, base: str, path: str, headers: dict, payload: dict):
"""POST JSON; return ``data`` after verifying ``code == 0``."""
try:
status, parsed = _http.request_json("POST", f"{base}{path}",
headers=headers, payload=payload)
except Exception as exc: # noqa: BLE001
raise self._unreachable(base, exc) from exc
return self._unwrap(base, status, parsed)
def _upload_post(self, base: str, headers: dict, content_type: str, body: bytes):
"""POST a multipart body; return ``data`` after verifying ``code == 0``."""
hdrs = dict(headers)
hdrs["Content-Type"] = content_type
try:
status, raw = _http.http_request("POST", f"{base}/api/asset/upload",
headers=hdrs, data=body)
except Exception as exc: # noqa: BLE001
raise self._unreachable(base, exc) from exc
parsed: dict = {}
if raw:
try:
parsed = json.loads(raw.decode("utf-8"))
except (ValueError, UnicodeDecodeError):
parsed = {}
return self._unwrap(base, status, parsed)
@staticmethod
def _unreachable(base: str, exc=None) -> SinkError:
suffix = f" ({exc})" if exc else ""
return SinkError(
f"SiYuan kernel not reachable at {base} — start SiYuan and enable "
f"the API token{suffix}"
)
def _unwrap(self, base: str, status: int, parsed: dict):
if status == 0:
raise self._unreachable(base)
if parsed.get("code") != 0:
raise SinkError(parsed.get("msg") or f"SiYuan API error (HTTP {status})")
return parsed.get("data")
def deliver(self, doc: ParsedDoc) -> SinkResult:
base = (self.env("SIYUAN_API_URL", "http://127.0.0.1:6806")
or "http://127.0.0.1:6806").rstrip("/")
token = self.env("SIYUAN_TOKEN")
headers = {"Authorization": f"Token {token}"}
notebook = self.env("SIYUAN_NOTEBOOK")
if not notebook:
data = self._json_post(base, "/api/notebook/lsNotebooks", headers, {})
notebooks = (data or {}).get("notebooks") or []
if not notebooks:
raise SinkError("SiYuan has no notebooks — create one before delivering")
notebook = notebooks[0]["id"]
base_dir = Path(doc.markdown_path).parent if doc.markdown_path else None
images = _md.find_local_images(doc.markdown, base_dir)
mapping = {}
for _alt, ref, path in images:
content_type, body = _http.encode_multipart(
fields={"assetsDirPath": "/assets/"},
files=[("file[]", path.name, path.read_bytes())],
)
data = self._upload_post(base, headers, content_type, body)
succ_map = (data or {}).get("succMap") or {}
if path.name in succ_map:
mapping[ref] = succ_map[path.name]
body_md = _md.rewrite_images(doc.markdown, mapping)
docid = self._json_post(base, "/api/filetree/createDocWithMd", headers, {
"notebook": notebook,
"path": "/" + _md.safe_filename(doc.title),
"markdown": body_md,
})
if not docid:
raise SinkError("SiYuan did not return a document id")
return SinkResult(
sink=self.name, ok=True,
url=f"siyuan://blocks/{docid}",
detail=f"{len(mapping)} image(s)",
)

View File

@ -0,0 +1,95 @@
"""Slack sink: upload the parsed Markdown as a file via the external-upload flow.
Slack deprecated ``files.upload`` (retired) in favour of a three-step external
upload. Delivery follows that official path:
1. ``files.getUploadURLExternal`` reserve an upload URL + file id for the
given filename and byte length.
2. ``POST`` the raw bytes to the returned upload URL.
3. ``files.completeUploadExternal`` finalize the upload, attach it to the
target channel, and post an initial comment.
Images are *not* embedded: Markdown is uploaded as a single ``.md`` file.
"""
from __future__ import annotations
import urllib.parse
from . import _http, _md
from .base import ParsedDoc, Sink, SinkError, SinkResult, register
@register
class SlackSink(Sink):
name = "slack"
requires = ("SLACK_BOT_TOKEN", "SLACK_CHANNEL")
label = "Slack channel (file upload)"
def deliver(self, doc: ParsedDoc) -> SinkResult:
token = self.env("SLACK_BOT_TOKEN")
channel = self.env("SLACK_CHANNEL")
auth = {"Authorization": f"Bearer {token}"}
content = doc.markdown.encode("utf-8")
filename = _md.slugify(doc.title) + ".md"
# Step 1: reserve an external upload URL + file id. This endpoint wants
# form-encoded data, so use http_request and parse the JSON response.
form = urllib.parse.urlencode({
"filename": filename,
"length": len(content),
}).encode("utf-8")
status, raw = _http.http_request(
"POST",
"https://slack.com/api/files.getUploadURLExternal",
headers={**auth, "Content-Type": "application/x-www-form-urlencoded"},
data=form,
)
parsed = _parse_json(raw)
if not parsed.get("ok"):
raise SinkError(parsed.get("error") or f"Slack getUploadURLExternal failed (HTTP {status})")
upload_url = parsed.get("upload_url")
file_id = parsed.get("file_id")
if not upload_url or not file_id:
raise SinkError("Slack did not return an upload URL / file id")
# Step 2: upload the raw bytes to the reserved URL.
up_status, _up_body = _http.http_request(
"POST", upload_url,
headers={"Content-Type": "application/octet-stream"},
data=content,
)
if up_status != 200:
raise SinkError(f"Slack file upload failed (HTTP {up_status})")
# Step 3: finalize the upload into the channel.
status, parsed = _http.request_json(
"POST",
"https://slack.com/api/files.completeUploadExternal",
headers=auth,
payload={
"files": [{"id": file_id, "title": doc.title}],
"channel_id": channel,
"initial_comment": f"Parsed: {doc.title}",
},
)
if not parsed.get("ok"):
raise SinkError(parsed.get("error") or f"Slack completeUploadExternal failed (HTTP {status})")
files = parsed.get("files") or [{}]
url = files[0].get("permalink")
return SinkResult(
sink=self.name, ok=True, url=url,
detail="uploaded .md file (images not embedded)",
)
def _parse_json(raw):
import json
if not raw:
return {}
try:
return json.loads(raw.decode("utf-8"))
except (ValueError, UnicodeDecodeError):
return {}

View File

@ -0,0 +1,48 @@
"""TickTick (滴答清单) sink — create a task from parsed Markdown.
TickTick's Open API exposes a task object whose ``content`` field holds the body
text. The official native ingestion path for arbitrary Markdown is therefore a
task: the document title becomes the task title and the Markdown becomes the
task content. Tasks have no attachment/inline-image surface, so local images are
not delivered.
Docs: https://developer.ticktick.com/docs (POST /open/v1/task).
"""
from __future__ import annotations
from . import _http
from .base import ParsedDoc, Sink, SinkError, SinkResult, register
API_URL = "https://api.ticktick.com/open/v1/task"
@register
class TickTickSink(Sink):
name = "ticktick"
aliases = ("dida", "滴答清单")
requires = ("TICKTICK_TOKEN",)
label = "TickTick task (滴答清单)"
def deliver(self, doc: ParsedDoc) -> SinkResult:
token = self.env("TICKTICK_TOKEN")
project_id = self.env("TICKTICK_PROJECT_ID")
payload = {"title": doc.title, "content": doc.markdown}
if project_id:
payload["projectId"] = project_id
headers = {"Authorization": f"Bearer {token}"}
status, parsed = _http.request_json("POST", API_URL, headers=headers, payload=payload)
if status >= 400:
raise SinkError(f"TickTick HTTP {status}: {parsed}")
if not parsed.get("id"):
raise SinkError(f"TickTick returned no task id: {parsed}")
return SinkResult(
sink=self.name,
ok=True,
url=None,
detail="task content (no inline images supported by TickTick)",
)

View File

@ -0,0 +1,60 @@
"""WeCom (企业微信 / WeChat Work) sink — send parsed Markdown as an app message.
WeCom apps deliver content via the message-send API. The native ingestion path
is a ``markdown`` message from a self-built app: first an access token is fetched
with the corp id + secret, then the message is posted. WeCom's markdown is a
limited subset with a 2048-byte content cap and no inline images, so the body is
truncated to fit.
Docs: https://developer.work.weixin.qq.com/document/path/90236 (message/send),
https://developer.work.weixin.qq.com/document/path/91039 (gettoken).
"""
from __future__ import annotations
from . import _http
from .base import ParsedDoc, Sink, SinkError, SinkResult, register
TOKEN_URL = "https://qyapi.weixin.qq.com/cgi-bin/gettoken"
SEND_URL = "https://qyapi.weixin.qq.com/cgi-bin/message/send"
@register
class WeComSink(Sink):
name = "wecom"
aliases = ("企业微信", "wechatwork")
requires = ("WECOM_CORPID", "WECOM_CORPSECRET", "WECOM_AGENTID")
label = "WeCom app markdown (企业微信)"
def deliver(self, doc: ParsedDoc) -> SinkResult:
corpid = self.env("WECOM_CORPID")
secret = self.env("WECOM_CORPSECRET")
agentid = self.env("WECOM_AGENTID")
touser = self.env("WECOM_TOUSER", "@all")
# Step 1: fetch an access token.
token_url = f"{TOKEN_URL}?corpid={corpid}&corpsecret={secret}"
status, parsed = _http.request_json("GET", token_url)
if parsed.get("errcode") not in (0, None) or not parsed.get("access_token"):
raise SinkError(parsed.get("errmsg") or f"WeCom token fetch failed: {parsed}")
token = parsed["access_token"]
# Step 2: send the markdown message.
send_url = f"{SEND_URL}?access_token={token}"
payload = {
"touser": touser,
"msgtype": "markdown",
"agentid": int(agentid),
"markdown": {"content": doc.markdown[:2048]},
}
status, parsed = _http.request_json("POST", send_url, payload=payload)
if parsed.get("errcode") not in (0, None):
raise SinkError(parsed.get("errmsg") or f"WeCom send failed: {parsed}")
return SinkResult(
sink=self.name,
ok=True,
url=None,
detail="markdown notification (WeCom markdown is a limited subset, "
"2048-byte cap, no inline images)",
)

View File

@ -0,0 +1,104 @@
"""WPS / 金山文档 (Kingsoft kdocs) sink — optional dependency.
The native ingestion path is: Markdown ``.docx`` upload to the kdocs cloud
appspace. There is no official Python SDK, so:
* MarkdownDOCX uses the maintained, pure-pip ``html-for-docx`` package
(reusing this project's Markdown→HTML), lazily imported so the core stays
zero-dependency. Install with ``pip install mineru-skill[wps]``.
* The kdocs WPS-2 request signing (plain SHA-1) and multipart upload are done
with the standard library small and fully documented.
Cloud upload requires an approved kdocs developer app (``WPS_APP_ID`` /
``WPS_APP_SECRET``) and a provisioned appspace; it is opt-in and surfaces the
raw kdocs error on failure. Docs: https://developer.kdocs.cn/server/guide/signature.html
"""
from __future__ import annotations
import email.utils
import hashlib
import io
import json
from . import _http, _md
from .base import ParsedDoc, Sink, SinkError, SinkResult, register
KDOCS_UPLOAD = "https://developer.kdocs.cn/api/v1/openapi/appspace/files/upload"
def _markdown_to_docx_bytes(markdown: str) -> bytes:
"""Convert Markdown → HTML → DOCX bytes via the optional html-for-docx lib."""
try:
from html4docx import HtmlToDocx # pip install html-for-docx
except ImportError as exc: # pragma: no cover - exercised via SinkError path
raise SinkError(
"WPS sink needs a Markdown→DOCX converter — "
"pip install 'mineru-skill[wps]' (i.e. pip install html-for-docx)"
) from exc
html = _md.md_to_html(markdown)
document = HtmlToDocx().parse_html_string(html)
buf = io.BytesIO()
document.save(buf)
return buf.getvalue()
def _wps2_headers(app_id: str, app_secret: str, body: bytes, content_type: str) -> dict:
"""Build kdocs WPS-2 auth headers.
signature = sha1(app_secret + content_md5 + content_type + date) hex.
Content-Md5 / Content-Type must match the exact wire body and header sent.
"""
content_md5 = hashlib.md5(body).hexdigest()
date = email.utils.formatdate(usegmt=True) # RFC1123 GMT
signature = hashlib.sha1(
(app_secret + content_md5 + content_type + date).encode("utf-8")
).hexdigest()
return {
"Date": date,
"Content-Md5": content_md5,
"Content-Type": content_type,
"Authorization": f"WPS-2:{app_id}:{signature}",
}
@register
class WpsSink(Sink):
name = "wps"
aliases = ("kdocs", "金山文档", "金山")
requires = ("WPS_APP_ID", "WPS_APP_SECRET")
label = "WPS / 金山文档 (Markdown→DOCX upload, optional dep)"
def deliver(self, doc: ParsedDoc) -> SinkResult:
app_id = self.env("WPS_APP_ID")
app_secret = self.env("WPS_APP_SECRET")
docx_bytes = _markdown_to_docx_bytes(doc.markdown)
filename = _md.safe_filename(doc.title) + ".docx"
fields = {}
parent_path = self.env("WPS_PARENT_PATH")
parent_token = self.env("WPS_PARENT_TOKEN")
if parent_path:
fields["parent_path"] = parent_path
if parent_token:
fields["parent_token"] = parent_token
content_type, body = _http.encode_multipart(
fields=fields, files=[("file", filename, docx_bytes)]
)
headers = _wps2_headers(app_id, app_secret, body, content_type)
status, raw = _http.http_request("POST", KDOCS_UPLOAD, headers=headers, data=body)
try:
parsed = json.loads(raw.decode("utf-8")) if raw else {}
except (ValueError, UnicodeDecodeError):
parsed = {}
if status >= 400 or parsed.get("code") not in (0, None):
raise SinkError(parsed.get("message") or parsed.get("msg") or f"kdocs HTTP {status}")
file_token = (parsed.get("data") or {}).get("file_token")
return SinkResult(
sink=self.name, ok=True, url=file_token,
detail="Markdown→DOCX uploaded to 金山文档 (experimental; needs a provisioned appspace)",
)

View File

@ -0,0 +1,65 @@
"""Yuque (语雀) sink: create a Markdown doc in a repository via the open API.
Yuque's open API (``https://www.yuque.com/api/v2``) authenticates with an
``X-Auth-Token`` header and creates docs under a repository namespace. The body
is posted as raw Markdown.
Yuque's open API has no asset-upload endpoint, so local image refs are left
untouched host images at a public URL for them to render.
"""
from __future__ import annotations
from pathlib import Path
from . import _http, _md
from .base import ParsedDoc, Sink, SinkError, SinkResult, register
API = "https://www.yuque.com/api/v2"
@register
class YuqueSink(Sink):
name = "yuque"
aliases = ("语雀",)
requires = ("YUQUE_TOKEN", "YUQUE_NAMESPACE")
label = "Yuque doc (open API)"
def deliver(self, doc: ParsedDoc) -> SinkResult:
token = self.env("YUQUE_TOKEN")
namespace = self.env("YUQUE_NAMESPACE")
headers = {
"X-Auth-Token": token,
"User-Agent": "MinerU-Skill/3.0",
"Content-Type": "application/json",
}
base_dir = Path(doc.markdown_path).parent if doc.markdown_path else None
n_images = len(_md.find_local_images(doc.markdown, base_dir))
status, parsed = _http.request_json(
"POST", f"{API}/repos/{namespace}/docs", headers=headers, payload={
"title": doc.title,
"slug": _md.slugify(doc.title),
"public": 0,
"format": "markdown",
"body": doc.markdown,
},
)
data = parsed.get("data")
if not data:
if status >= 400 or parsed.get("message"):
raise SinkError(parsed.get("message") or f"HTTP {status}")
raise SinkError(f"Yuque returned no doc data (HTTP {status})")
slug = data.get("slug")
if n_images:
detail = f"text only ({n_images} local image(s); host images publicly to embed)"
else:
detail = "text only"
return SinkResult(
sink=self.name, ok=True,
url=f"https://www.yuque.com/{namespace}/{slug}",
detail=detail,
)

View File

@ -0,0 +1,64 @@
"""Split oversized PDFs into cap-sized parts so they clear the MinerU API limits.
The MinerU cloud caps at 20 pages (free Agent API) / 200 pages (Standard API).
``--split`` slices a larger PDF into parts locally, each is parsed, and the
Markdown is merged back so we are no longer bound by those page caps (the same
trick mineru-converter uses). Uses the optional ``pypdf`` library, lazily
imported, so the core stays zero-dependency.
pip install "mineru-skill[split]" # i.e. pip install pypdf
"""
from __future__ import annotations
from pathlib import Path
class SplitError(Exception):
"""Raised when splitting is requested but cannot be performed."""
def _load_pypdf():
try:
import pypdf # noqa: F401
return pypdf
except ImportError as exc:
raise SplitError(
"--split needs the pypdf library — pip install 'mineru-skill[split]' "
"(i.e. pip install pypdf)"
) from exc
def pdf_page_count(path) -> int:
"""Return the page count of a local PDF (requires pypdf)."""
pypdf = _load_pypdf()
return len(pypdf.PdfReader(str(path)).pages)
def split_pdf(path, max_pages: int, out_dir) -> list:
"""Slice ``path`` into ``max_pages``-page parts under ``out_dir``.
Returns the list of part paths (a single-element list pointing at the original
file if it already fits).
"""
if max_pages < 1:
raise SplitError("max_pages must be >= 1")
pypdf = _load_pypdf()
reader = pypdf.PdfReader(str(path))
total = len(reader.pages)
if total <= max_pages:
return [Path(path)]
out_dir = Path(out_dir)
out_dir.mkdir(parents=True, exist_ok=True)
stem = Path(path).stem
parts = []
for part_index, start in enumerate(range(0, total, max_pages), start=1):
writer = pypdf.PdfWriter()
for page in range(start, min(start + max_pages, total)):
writer.add_page(reader.pages[page])
part_path = out_dir / f"{stem}__part{part_index:03d}.pdf"
with open(part_path, "wb") as handle:
writer.write(handle)
parts.append(part_path)
return parts

View File

@ -1,6 +1,7 @@
--- ---
name: nfc-medicine-lookup name: nfc-medicine-lookup
description: 药品检索技能通过NFC芯片ID或药品名称查询药品信息。当用户提交NFC芯片ID、扫描药品标签、提到药品名称想了解用法、或提到"NFC"+"药"相关词汇时使用此技能。以语音助手身份向老人介绍药名、用途和用法用量。 description: 药品检索技能通过NFC芯片ID或药品名称查询药品信息。当用户提交NFC芯片ID、扫描药品标签、提到药品名称想了解用法、或提到"NFC"+"药"相关词汇时使用此技能。以语音助手身份向老人介绍药名、用途和用法用量。
category: Developer Tools
--- ---
# NFC 药品检索 # NFC 药品检索

View File

@ -8,5 +8,6 @@
"command": "python hooks/pre_prompt.py" "command": "python hooks/pre_prompt.py"
} }
] ]
} },
"category": "Developer Tools"
} }

View File

@ -17,5 +17,6 @@
"./pmda_server.py" "./pmda_server.py"
] ]
} }
} },
"category": "Developer Tools"
} }

View File

@ -1,6 +1,8 @@
--- ---
name: ppt-outline name: ppt-outline
description: "PPT outline and HTML presentation generator. PPT大纲、PPT模板、演示文稿、presentation、PowerPoint、幻灯片、slides、HTML演示文稿、HTML slides、浏览器演示、商业路演、pitch deck、BP商业计划书、business plan、工作汇报PPT、培训课件、课件大纲、产品介绍PPT、产品发布、keynote、演讲稿、述职PPT、答辩PPT、竞品分析PPT、毕业答辩、论文答辩、项目复盘、迭代复盘。Generate PPT outlines and standalone HTML presentations (open directly in browser, no dependencies). Use when: (1) creating PPT/presentation outlines, (2) building pitch deck/BP structures, (3) preparing work report slides, (4) designing training course outlines, (5) creating thesis defense PPT outlines, (6) building project review/retrospective PPTs, (7) generating HTML slide decks for browser-based presentations, (8) any PowerPoint/Keynote/Google Slides planning. 适用场景做PPT大纲、写路演BP、汇报PPT结构、培训课件大纲、毕业答辩PPT、项目复盘PPT、述职答辩PPT、生成HTML演示文稿浏览器直接打开支持dark/light/tech/minimal四种风格。" description: "PPT outline and HTML presentation generator. PPT大纲、PPT模板、演示文稿、presentation、PowerPoint、幻灯片、slides、HTML演示文稿、HTML slides、浏览器演示、商业路演、pitch deck、BP商业计划书、business plan、工作汇报PPT、培训课件、课件大纲、产品介绍PPT、产品发布、keynote、演讲稿、述职PPT、答辩PPT、竞品分析PPT、毕业答辩、论文答辩、项目复盘、迭代复盘。Generate PPT outlines and standalone HTML presentations (open directly in browser, no dependencies). Use when: (1) creating PPT/presentation outlines, (2) building pitch deck/BP structures, (3) preparing work report slides, (4) designing training course outlines, (5) creating thesis defense PPT outlines, (6) building project review/retrospective PPTs, (7) generating HTML slide decks for browser-based presentations, (8) any PowerPoint/Keynote/Google Slides planning. 适用场景做PPT大纲、写路演BP、汇报PPT结构、培训课件大纲、毕业答辩PPT、项目复盘PPT、述职答辩PPT、生成HTML演示文稿浏览器直接打开支持dark/light/tech/minimal四种风格。"
category: Document Processing
--- ---
# ppt-outline # ppt-outline

View File

@ -1,6 +1,7 @@
--- ---
name: rag-retrieve name: rag-retrieve
description: RAG retrieval skill for querying and retrieving relevant documents from knowledge base. Use this skill when users need to search documentation, retrieve knowledge base articles, or get context from a vector database. Supports semantic search with configurable top-k results. description: RAG retrieval skill for querying and retrieving relevant documents from knowledge base. Use this skill when users need to search documentation, retrieve knowledge base articles, or get context from a vector database. Supports semantic search with configurable top-k results.
category: Data & Retrieval
--- ---
# RAG Retrieve # RAG Retrieve

View File

@ -18,5 +18,6 @@
"{bot_id}" "{bot_id}"
] ]
} }
} },
"category": "Data & Retrieval"
} }

View File

@ -167,7 +167,12 @@ async def handle_request(request: Dict[str, Any]) -> Dict[str, Any]:
top_k = arguments.get("top_k", 100) top_k = arguments.get("top_k", 100)
if not query: if not query:
return create_error_response(request_id, -32602, "Missing required parameter: query") return create_success_response(request_id, {
"content": [{
"type": "text",
"text": "Error: missing required parameter 'query'. Please call this tool again with a non-empty 'query' argument describing what you want to retrieve."
}]
})
result = rag_retrieve(query, top_k) result = rag_retrieve(query, top_k)

View File

@ -1,6 +1,7 @@
--- ---
name: static-hosting name: static-hosting
description: Serve static HTML/CSS/JS/images from robot project directories via the built-in FastAPI static file server. Use when generating web pages, reports, or interactive content for a bot. description: Serve static HTML/CSS/JS/images from robot project directories via the built-in FastAPI static file server. Use when generating web pages, reports, or interactive content for a bot.
category: Web Services
--- ---
# Static Hosting # Static Hosting

View File

@ -0,0 +1,137 @@
---
name: table-query
description: Query structured spreadsheet/table data (Excel/CSV) to answer questions about values, prices, quantities, inventory, specifications, rankings, comparisons, summaries, aggregations, lists, or any numeric/tabular lookup. Use this skill whenever the answer likely comes from uploaded tables. You locate tables, read their schema, author SQLite SQL yourself, and run it — the backend does no LLM work, so it is fast.
category: Data & Retrieval
---
# Table Query
Answer table/spreadsheet questions by authoring and running SQLite SQL against the
bot's uploaded Excel data. The backend is a thin, fast SQL executor — **you** do the
thinking (rewrite the question, pick tables, write SQL). Row-level citations
(`__src`) are produced for you.
## When to use
Use `table-query` for: values, prices, quantities, inventory, specifications,
rankings, comparisons, summaries, aggregations (sum/avg/count), lists, person /
project / product lookups, monthly/period totals, or any question whose answer
comes from structured tables. For pure concept / definition / policy / explanation
questions, use the `rag_retrieve` document tool instead.
## Workflow (do this in order, once)
1. **search-tables** — rewrite the user's question into a retrieval query (core
entity + attributes + synonyms), then locate candidate tables. Call this **once**.
2. **get-schemas** — for the relevant subset of returned tables, fetch their
`CREATE TABLE` schema and sample rows. Never write SQL without seeing the schema.
3. **author SQL** — write a SQLite query plan as JSON (see below).
4. **run-sql** — execute the plan. It returns CSV with an `__src` column and a
`file_ref_table` mapping plus citation instructions.
5. **answer + cite** — write the answer and add `<CITATION ... />` tags built from
`__src` + `file_ref_table`. Never print the `__src` column to the user.
### Anti-waste rules
- Call **search-tables at most once** per question. Do not re-locate tables you
already have schemas for.
- If `run-sql` returns an error, fix the SQL and call **run-sql** again (at most ~2
tries). Do **NOT** restart from search-tables.
- If `search-tables` finds nothing, fall back to the `rag_retrieve` document tool.
## Commands
```bash
# 1. locate tables
python {SKILL_DIR}/scripts/table_query.py search-tables --query "2025 April May June sales total" --top-k 20
# 2. read schema + sample rows for the tables you picked
python {SKILL_DIR}/scripts/table_query.py get-schemas --tables "sales_2025,customers"
# 3. run your authored plan — pipe the JSON plan via stdin (no temp file needed)
python {SKILL_DIR}/scripts/table_query.py run-sql <<'PLAN'
{"queries":[{"step":1,"sql":"CREATE TEMP TABLE \"final_table_step1\" AS SELECT \"month\", SUM(\"amount\") AS \"total\" FROM \"sales_2025\" GROUP BY \"month\"","source_table_names":["sales_2025"],"destine_table_name":"final_table_step1","destine_table_type":"final","destine_table_description":"Monthly totals"}]}
PLAN
```
## Authoring the SQL plan
The plan is a JSON object `{ "queries": [ ... ] }` that you pass to `run-sql` **on
stdin via a quoted heredoc** (`<<'PLAN' ... PLAN`). The quoted delimiter keeps all
the double quotes, single quotes and `$` in your SQL intact — no shell escaping.
(You may instead write it to a file and use `--plan-file path.json` if a plan is very
large, but stdin is the default and needs no extra step.)
Each query is one SQL step:
```json
{
"queries": [
{
"step": 1,
"sql": "CREATE TEMP TABLE \"final_table_step1\" AS SELECT \"month\", SUM(\"amount\") AS \"total\" FROM \"sales_2025\" WHERE \"month\" IN ('2025-04','2025-05','2025-06') GROUP BY \"month\"",
"source_table_names": ["sales_2025"],
"destine_table_name": "final_table_step1",
"destine_table_type": "final",
"destine_table_description": "Monthly sales totals for Apr-Jun 2025"
}
]
}
```
Field meaning:
- `step`: 1-based execution order.
- `sql`: a SQLite statement, normally `CREATE TEMP TABLE "..." AS SELECT ...`.
- `source_table_names`: tables this step reads (original tables, or earlier steps'
`destine_table_name` for multi-step plans).
- `destine_table_name`: the temp table this step creates. Convention:
`intermediate_table_stepN` or `final_table_stepN`.
- `destine_table_type`: `"final"` for results the user should see, `"intermediate"`
for helper steps. **At least one `final` is required.**
- `destine_table_description`: short human description of the result.
### SQL rules (important)
- **Quote every identifier** with double quotes: `"column name"`, `"table name"`.
- String literals use single quotes; escape `'` as `''`.
- Prefer **one logical result per `final` table**. For multiple separate results,
emit multiple `final` tables (e.g. step1, step2) — do **NOT** `UNION` unrelated results.
- For row-level citations to be precise, keep `final` steps as simple single-table
`SELECT`s (no `JOIN` / `GROUP BY` / aggregation). Aggregations still work but the
citation degrades to file+sheet level (`F1S2`) instead of an exact row (`F1S2R5`).
- Multi-step plans run in `step` order: build `intermediate_table_stepN` first, then
read it in a later step. Don't reference a temp table before it is created.
- **Sample rows are a format hint only** — never assume they represent the full data
or the row count. Your SQL must scan the whole table. Use `LIKE '%value%'` for free
text and `=` for enums/codes.
## Result handling & citations
- `run-sql` output begins with citation instructions, then `file_ref_table`, then the
result CSV (with `__src`).
- Parse `__src` (`F1S2R5` = file_ref F1, sheet 2, row 5) and `file_ref_table` to build
`<CITATION file="..." filename="..." sheet=N rows=[...] />`.
- Put citations on their own line **after** the list/table that uses the data; combine
same-(file,sheet) rows into one citation.
- If the result hint says rows were truncated (`Only the first N rows ...; the
remaining M ...`), tell the user the total (`N+M`), shown (`N`), and omitted (`M`).
- Never expose the `__src` column itself to the user.
### Controlling truncation
`run-sql` truncates results by default (total rows and per-cell characters) to keep
the context manageable. If a result comes back truncated and you genuinely need more,
re-run with higher limits — do **not** re-run search-tables:
```bash
python {SKILL_DIR}/scripts/table_query.py run-sql --max-rows 500 --cell-max 4000 <<'PLAN'
{"queries":[ ... ]}
PLAN
```
- `--max-rows`: max total rows across all `final` tables (default from backend config,
hard ceiling 2000). Prefer writing an aggregate query (SUM/COUNT/GROUP BY) over
pulling thousands of detail rows.
- `--cell-max`: max characters per cell before it is truncated with `..` (default from
backend config, hard ceiling 10000). Raise this when a long-text column (e.g. a
description/spec field) is getting cut off.

View File

@ -0,0 +1,213 @@
#!/usr/bin/env python3
"""
table-query CLI.
Fast, LLM-free table querying. Talks to the felo-mygpt table_query endpoints:
- search-tables : POST /v1/table_query/search_tables/{bot_id}
- get-schemas : POST /v1/table_query/get_schemas/{bot_id}
- run-sql : POST /v1/table_query/run_sql/{bot_id}
The agent drives the orchestration (rewrite -> locate -> author SQL -> run);
the backend only does cheap work, so each call returns in seconds.
"""
import argparse
import hashlib
import json
import os
import sys
try:
import requests
except ImportError:
print("Error: requests module is required. Please install it with: pip install requests")
sys.exit(1)
DEFAULT_BACKEND_HOST = os.getenv("BACKEND_HOST", "https://api-dev.gptbase.ai")
DEFAULT_MASTERKEY = os.getenv("MASTERKEY", "master")
# Same citation contract the legacy table_rag_retrieve used, so the agent's
# <CITATION ... /> behaviour is unchanged.
TABLE_CITATION_INSTRUCTIONS = """<CITATION_INSTRUCTIONS>
When using the retrieved table knowledge below, you MUST add XML citation tags for factual claims.
Format: `<CITATION file="file_id" filename="name.xlsx" sheet=1 rows=[2, 4] />`
- Parse `__src`: `F1S2R5` = file_ref F1, sheet 2, row 5
- Look up file_id in `file_ref_table`
- Combine same-sheet rows into one citation: `rows=[2, 4, 6]`
- MANDATORY: Create SEPARATE citation for EACH (file, sheet) combination
- NEVER put <CITATION> on the same line as a bullet point or table row
- Citations MUST be on separate lines AFTER the complete list/table
- NEVER include the `__src` column in your response - it is internal metadata only
- Citations MUST appear IMMEDIATELY AFTER the paragraph or bullet list that uses the knowledge
- NEVER collect all citations and place them at the end of your response
</CITATION_INSTRUCTIONS>
"""
def load_config() -> dict:
"""Load robot_config.json from the robot project root (3 levels up from scripts/)."""
config_path = os.path.join(os.path.dirname(__file__), '..', '..', '..', 'robot_config.json')
if os.path.exists(config_path):
try:
with open(config_path, 'r', encoding='utf-8') as f:
return json.load(f)
except (json.JSONDecodeError, IOError) as e:
print(f"Warning: failed to load robot_config.json: {e}", file=sys.stderr)
return {}
def _resolve_bot_id(cli_bot_id: str) -> str:
if cli_bot_id:
return cli_bot_id
return load_config().get('bot_id') or os.getenv("BOT_ID") or os.getenv("ASSISTANT_ID")
def _post(path: str, bot_id: str, payload: dict) -> dict:
url = f"{DEFAULT_BACKEND_HOST}/v1/table_query/{path}/{bot_id}"
auth_token = hashlib.md5(f"{DEFAULT_MASTERKEY}:{bot_id}".encode()).hexdigest()
headers = {
"content-type": "application/json",
"authorization": f"Bearer {auth_token}",
}
trace_id = os.getenv("TRACE_ID") or os.getenv("X_REQUEST_ID")
if trace_id:
headers["X-Request-ID"] = trace_id
resp = requests.post(url, json=payload, headers=headers, timeout=30)
if resp.status_code != 200:
raise RuntimeError(f"API {path} returned {resp.status_code}: {resp.text}")
return resp.json()
def cmd_search_tables(args, bot_id: str) -> str:
res = _post("search_tables", bot_id, {"query": args.query, "top_k": args.top_k})
tables = res.get("tables", [])
if not tables:
return ("No matching tables found. If the question may be answered from documents "
"instead of spreadsheets, fall back to the rag_retrieve document tool.")
lines = [f"Found {len(tables)} candidate table(s). Pick the relevant ones and call "
f"`get-schemas` for them next.\n"]
for t in tables:
lines.append(
f"- table_name: {t['table_name']}\n"
f" file: {t.get('file_name','')} | sheet: {t.get('sheet_name','')} "
f"| score: {round(t.get('score', 0), 3)}\n"
f" description: {t.get('table_description','')}"
)
return "\n".join(lines)
def cmd_get_schemas(args, bot_id: str) -> str:
table_names = [t.strip() for t in args.tables.split(',') if t.strip()]
res = _post("get_schemas", bot_id,
{"table_names": table_names, "sample_rows": args.sample_rows})
schemas = res.get("schemas", [])
missing = res.get("missing_tables", [])
if not schemas:
return f"No schemas resolved. Missing tables: {missing}"
blocks = []
for s in schemas:
block = [f"### Table: {s['table_name']}",
f"File: {s.get('file_name','')} | Sheet: {s.get('sheet_name','')}",
"```sql", s.get('sql_create', ''), "```"]
sample = s.get('sample_rows') or []
if sample:
block.append("Sample rows (format hint only, NOT the row count):")
block.append("```csv")
for row in sample:
block.append(",".join('"' + str(c).replace('"', '""') + '"' for c in row))
block.append("```")
blocks.append("\n".join(block))
out = "\n\n".join(blocks)
if missing:
out += f"\n\nNote: these requested tables were not found: {missing}"
out += ("\n\nNow author a SQLite plan and run it by piping the JSON to run-sql on stdin:\n"
" run-sql <<'PLAN'\n"
" {\"queries\": [{\"step\": 1, \"sql\": \"CREATE TEMP TABLE \\\"final_table_step1\\\" "
"AS SELECT ...\", \"source_table_names\": [\"...\"], "
"\"destine_table_name\": \"final_table_step1\", \"destine_table_type\": \"final\"}]}\n"
" PLAN\n"
"Quote all identifiers with double quotes.")
return out
def cmd_run_sql(args, bot_id: str) -> str:
# Read the plan from --plan-file if given, otherwise from stdin (heredoc).
try:
if args.plan_file:
with open(args.plan_file, 'r', encoding='utf-8') as f:
raw = f.read()
else:
raw = sys.stdin.read()
if not raw.strip():
return ("Error: no plan provided. Pipe the JSON plan via stdin, e.g.\n"
" python scripts/table_query.py run-sql <<'PLAN'\n"
" {\"queries\": [...]}\n"
" PLAN")
plan = json.loads(raw)
except (json.JSONDecodeError, IOError) as e:
return f"Error: failed to read SQL plan: {e}"
# accept either {"queries": [...]} or a bare [...] list
queries = plan.get("queries") if isinstance(plan, dict) else plan
if not queries:
return "Error: the plan must contain a non-empty `queries` list."
payload = {"queries": queries}
if args.max_rows is not None:
payload["max_rows"] = args.max_rows
if args.cell_max is not None:
payload["cell_max"] = args.cell_max
res = _post("run_sql", bot_id, payload)
if not res.get("success"):
return (f"SQL execution failed: {res.get('error')}\n"
"Fix your SQL and call run-sql again. Do NOT restart from search-tables.")
parts = [TABLE_CITATION_INSTRUCTIONS]
if res.get("instruction"):
parts.append(res["instruction"])
if res.get("knowledge"):
parts.append(res["knowledge"])
if res.get("extra_goal"):
parts.append(res["extra_goal"])
return "\n".join(parts)
def main():
parser = argparse.ArgumentParser(description="table-query: fast LLM-free table querying")
parser.add_argument("--bot-id", default=None, help="Bot id (defaults to robot_config.json)")
sub = parser.add_subparsers(dest="command", required=True)
p_search = sub.add_parser("search-tables", help="Vector-locate relevant tables")
p_search.add_argument("--query", "-q", required=True, help="Rewritten retrieval query")
p_search.add_argument("--top-k", "-k", type=int, default=20)
p_schemas = sub.add_parser("get-schemas", help="Fetch CREATE TABLE schema + sample rows")
p_schemas.add_argument("--tables", "-t", required=True, help="Comma-separated table names")
p_schemas.add_argument("--sample-rows", type=int, default=3)
p_run = sub.add_parser("run-sql", help="Execute an authored SQL plan (JSON via stdin or file)")
p_run.add_argument("--plan-file", "-f", default=None,
help="Path to plan JSON file (optional; defaults to reading stdin)")
p_run.add_argument("--max-rows", type=int, default=None,
help="Max total result rows (raise if a result came back truncated)")
p_run.add_argument("--cell-max", type=int, default=None,
help="Max characters per cell before truncation")
args = parser.parse_args()
bot_id = _resolve_bot_id(args.bot_id)
if not bot_id:
print("Error: bot_id is required (robot_config.json / --bot-id / BOT_ID env)")
sys.exit(1)
try:
if args.command == "search-tables":
print(cmd_search_tables(args, bot_id))
elif args.command == "get-schemas":
print(cmd_get_schemas(args, bot_id))
elif args.command == "run-sql":
print(cmd_run_sql(args, bot_id))
except Exception as e:
print(f"Error: {str(e)}")
sys.exit(1)
if __name__ == "__main__":
main()

View File

@ -0,0 +1,25 @@
name: table-query
version: 1.0.0
description: Fast LLM-free table querying. Locate tables, fetch schema, author SQLite SQL, and run it with row-level citations.
author:
name: sparticle
email: support@gbase.ai
license: MIT
tags:
- table
- sql
- excel
- retrieval
- citation
runtime:
python: ">=3.7"
dependencies:
- requests
entry_point: scripts/table_query.py
commands:
search-tables:
description: Vector-locate relevant tables for a query
get-schemas:
description: Fetch CREATE TABLE schema + sample rows for given tables
run-sql:
description: Execute an authored SQLite plan and return CSV with __src citations

View File

@ -0,0 +1,67 @@
#!/usr/bin/env bash
#
# Manual verification for the new table_query endpoints.
# Run this against an environment where the feature/table-query-split branch is
# deployed (e.g. dev). It checks the 3 fast endpoints and diffs run_sql output
# against the legacy table_rag_retrieve for parity.
#
# Usage:
# HOST=https://api-dev.gptbase.ai BOT_ID=<bot> MASTERKEY=master ./verify_table_query.sh
#
set -euo pipefail
HOST="${HOST:-https://api-dev.gptbase.ai}"
# bot from the slow-request log (has the 案1_売上明細 xlsx). Override as needed.
BOT_ID="${BOT_ID:-c1fa021b-6c41-41d5-b1e6-adfb8896aaaa}"
MASTERKEY="${MASTERKEY:-master}"
QUERY="${QUERY:-2025年4月〜6月の売上実績}"
# auth token = MD5(masterkey:bot_id)
TOKEN=$(python3 -c "import hashlib,sys;print(hashlib.md5(f'{sys.argv[1]}:{sys.argv[2]}'.encode()).hexdigest())" "$MASTERKEY" "$BOT_ID")
AUTH="authorization: Bearer ${TOKEN}"
CT="content-type: application/json"
echo "=== HOST=$HOST BOT_ID=$BOT_ID ==="
echo
echo "### 1) search_tables ###"
curl -s --request POST "$HOST/v1/table_query/search_tables/$BOT_ID" \
--header "$AUTH" --header "$CT" \
--data "{\"query\": \"$QUERY\", \"top_k\": 20}" | python3 -m json.tool
echo
echo "### 2) get_schemas (EDIT --data table_names with names from step 1) ###"
echo "curl -s --request POST \"$HOST/v1/table_query/get_schemas/$BOT_ID\" \\"
echo " --header \"$AUTH\" --header \"$CT\" \\"
echo " --data '{\"table_names\": [\"<TABLE_NAME_FROM_STEP_1>\"], \"sample_rows\": 3}' | python3 -m json.tool"
echo
echo "### 3) run_sql (EDIT the sql to match the real table/columns from step 2) ###"
cat > /tmp/tq_plan.json <<'JSON'
{
"queries": [
{
"step": 1,
"sql": "CREATE TEMP TABLE \"final_table_step1\" AS SELECT \"計上日\", \"得意先名\", \"売上金額\" FROM \"<TABLE_NAME>\" LIMIT 10",
"source_table_names": ["<TABLE_NAME>"],
"destine_table_name": "final_table_step1",
"destine_table_type": "final",
"destine_table_description": "sample rows"
}
]
}
JSON
echo "Edit /tmp/tq_plan.json (replace <TABLE_NAME>), then:"
echo "curl -s --request POST \"$HOST/v1/table_query/run_sql/$BOT_ID\" \\"
echo " --header \"$AUTH\" --header \"$CT\" \\"
echo " --data @/tmp/tq_plan.json | python3 -m json.tool"
echo
echo "ASSERT: run_sql output 'knowledge' contains a '__src' column and 'file_ref_table'."
echo
echo "### 4) legacy table_rag_retrieve (parity reference, same question) ###"
echo "curl -s --request POST \"$HOST/v1/table_rag_retrieve/$BOT_ID\" \\"
echo " --header \"$AUTH\" --header \"$CT\" \\"
echo " --data '{\"query\": \"$QUERY\"}' | python3 -m json.tool"
echo
echo "Compare the __src tokens / result rows between #3 and #4 for the same SQL intent."

View File

@ -30,8 +30,11 @@
"mcpServers": { "mcpServers": {
"user-context-example": { "user-context-example": {
"command": "echo", "command": "echo",
"args": ["Example MCP server for user context loader"], "args": [
"Example MCP server for user context loader"
],
"comment": "这是一个示例 MCP 配置,实际使用时替换为真实的 MCP 服务器" "comment": "这是一个示例 MCP 配置,实际使用时替换为真实的 MCP 服务器"
} }
} },
"category": "Developer Tools"
} }

View File

@ -8,6 +8,7 @@ metadata:
bins: bins:
- python3 - python3
- google-chrome - google-chrome
category: Creative Generation
--- ---
# z-card-image # z-card-image

View File

@ -2,12 +2,13 @@
"name": "baidu-search", "name": "baidu-search",
"description": "百度搜索服务", "description": "百度搜索服务",
"mcpServers": { "mcpServers": {
"web-search-mcp-server": { "web-search-mcp-server": {
"transport": "http", "transport": "http",
"url": "https://qianfan.baidubce.com/v2/tools/web-search/mcp", "url": "https://qianfan.baidubce.com/v2/tools/web-search/mcp",
"headers": { "headers": {
"Authorization": "Bearer {BAIDU_API_KEY}" "Authorization": "Bearer {BAIDU_API_KEY}"
}
} }
} }
},
"category": "Search & Intelligence"
} }

View File

@ -2,6 +2,7 @@
name: baidu-search name: baidu-search
description: Search the web using Baidu AI Search Engine (BDSE). Use for live information, documentation, or research topics. description: Search the web using Baidu AI Search Engine (BDSE). Use for live information, documentation, or research topics.
metadata: { "openclaw": { "emoji": "🔍︎", "requires": { "bins": ["python3"], "env":["BAIDU_API_KEY"]},"primaryEnv":"BAIDU_API_KEY" } } metadata: { "openclaw": { "emoji": "🔍︎", "requires": { "bins": ["python3"], "env":["BAIDU_API_KEY"]},"primaryEnv":"BAIDU_API_KEY" } }
category: Search & Intelligence
--- ---
# Baidu Search # Baidu Search

View File

@ -9,6 +9,7 @@ description: |
- 用户要求 bot 安装、启用、禁用或卸载技能时(如"帮我装上这个技能包"、"把 XX 技能关掉" → 管理技能列表 - 用户要求 bot 安装、启用、禁用或卸载技能时(如"帮我装上这个技能包"、"把 XX 技能关掉" → 管理技能列表
- 用户要求 bot 配置 API 密钥或运行参数时(如"把 JINA_API_KEY 设置成 xxx" → 修改环境变量 - 用户要求 bot 配置 API 密钥或运行参数时(如"把 JINA_API_KEY 设置成 xxx" → 修改环境变量
- bot 需要自主进化、动态调整自身能力边界的自动化场景 - bot 需要自主进化、动态调整自身能力边界的自动化场景
category: Developer Tools
--- ---
# Bot Self-Modifier # Bot Self-Modifier

View File

@ -13,6 +13,7 @@ metadata:
"primaryEnv": "CAIYUN_WEATHER_API_TOKEN", "primaryEnv": "CAIYUN_WEATHER_API_TOKEN",
}, },
} }
category: Weather
--- ---
# 彩云天气 (Caiyun Weather) # 彩云天气 (Caiyun Weather)

View File

@ -1,6 +1,7 @@
--- ---
name: competitor-news-intel name: competitor-news-intel
description: Research competitor news, organize developments by company and theme, and produce actionable competitive intelligence with impact assessment and follow-up recommendations. Use when the user asks for competitor monitoring, competitor news tracking, market watch summaries, or business intelligence from external updates. 中文触发词包括:竞品跟踪、竞对情报、竞品新闻、市场监听、舆情观察、竞品周报、最近竞品有什么动作。 description: Research competitor news, organize developments by company and theme, and produce actionable competitive intelligence with impact assessment and follow-up recommendations. Use when the user asks for competitor monitoring, competitor news tracking, market watch summaries, or business intelligence from external updates. 中文触发词包括:竞品跟踪、竞对情报、竞品新闻、市场监听、舆情观察、竞品周报、最近竞品有什么动作。
category: Search & Intelligence
--- ---
# Competitor News Intelligence # Competitor News Intelligence

View File

@ -1,6 +1,7 @@
--- ---
name: contract-document-generator name: contract-document-generator
description: Draft contracts and formal business documents, rewrite clauses, identify risks, and organize negotiation-ready language. Use when the user asks for contract drafting, clause revision, legal-style document generation, formal agreement structuring, or document-ready policy and terms content. 中文触发词包括:合同起草、协议生成、条款修改、风险审查、保密协议、正式文档撰写。 description: Draft contracts and formal business documents, rewrite clauses, identify risks, and organize negotiation-ready language. Use when the user asks for contract drafting, clause revision, legal-style document generation, formal agreement structuring, or document-ready policy and terms content. 中文触发词包括:合同起草、协议生成、条款修改、风险审查、保密协议、正式文档撰写。
category: Writing & Reporting
--- ---
# Contract & Document Generator # Contract & Document Generator

View File

@ -1,6 +1,7 @@
--- ---
name: financial-report-generator name: financial-report-generator
description: Generate management-friendly financial reporting outputs from structured financial data, including KPI summaries, variance analysis, risk notes, and reporting narratives. Use when the user asks for financial reports, management reporting, monthly or quarterly performance summaries, or finance-oriented document generation. 中文触发词包括:财务月报、财务季报、经营分析、管理层汇报、董事会报告、财务简报。 description: Generate management-friendly financial reporting outputs from structured financial data, including KPI summaries, variance analysis, risk notes, and reporting narratives. Use when the user asks for financial reports, management reporting, monthly or quarterly performance summaries, or finance-oriented document generation. 中文触发词包括:财务月报、财务季报、经营分析、管理层汇报、董事会报告、财务简报。
category: Writing & Reporting
--- ---
# Financial Report Generator # Financial Report Generator

View File

@ -1,6 +1,7 @@
--- ---
name: market-academic-insight name: market-academic-insight
description: Generate structured market research and academic insight briefs with clear evidence, trends, risks, and opportunities. Use when the user asks for industry research, market trends, literature review, academic progress tracking, or evidence-based insight synthesis. 中文触发词包括:行业洞察、市场研究、学术综述、论文进展、趋势分析、研究简报。 description: Generate structured market research and academic insight briefs with clear evidence, trends, risks, and opportunities. Use when the user asks for industry research, market trends, literature review, academic progress tracking, or evidence-based insight synthesis. 中文触发词包括:行业洞察、市场研究、学术综述、论文进展、趋势分析、研究简报。
category: Search & Intelligence
--- ---
# Market & Academic Insight # Market & Academic Insight

View File

@ -18,5 +18,6 @@
"X-Dataset-Ids": "{dataset_ids}" "X-Dataset-Ids": "{dataset_ids}"
} }
} }
} },
"category": "Data & Retrieval"
} }

View File

@ -1,6 +1,7 @@
--- ---
name: sales-decision-report name: sales-decision-report
description: Analyze sales data and produce decision-oriented reports with KPI summaries, anomaly explanation, channel and region analysis, and HTML-ready report structure. Use when the user asks for sales analysis, management dashboards, sales summaries, or decision reports from business data. 中文触发词包括销售分析、经营分析、销售周报、销售月报、数据决策报告、HTML 报表。 description: Analyze sales data and produce decision-oriented reports with KPI summaries, anomaly explanation, channel and region analysis, and HTML-ready report structure. Use when the user asks for sales analysis, management dashboards, sales summaries, or decision reports from business data. 中文触发词包括销售分析、经营分析、销售周报、销售月报、数据决策报告、HTML 报表。
category: Writing & Reporting
--- ---
# Sales Decision Report # Sales Decision Report

View File

@ -1,6 +1,7 @@
--- ---
name: seedream name: seedream
description: 使用火山引擎 Seedream/Seedance API 生成高质量图片和视频。适用于文生图、图生图、文生视频、图生视频以及生成关联组图的场景。 description: 使用火山引擎 Seedream/Seedance API 生成高质量图片和视频。适用于文生图、图生图、文生视频、图生视频以及生成关联组图的场景。
category: Creative Generation
--- ---
# Seedream # Seedream

View File

@ -1,6 +1,7 @@
--- ---
name: static-hosting name: static-hosting
description: Serve static HTML/CSS/JS/images from robot project directories via the built-in FastAPI static file server. Use when generating web pages, reports, or interactive content for a bot. description: Serve static HTML/CSS/JS/images from robot project directories via the built-in FastAPI static file server. Use when generating web pages, reports, or interactive content for a bot.
category: Web Services
--- ---
# Static Hosting # Static Hosting

View File

@ -17,6 +17,7 @@ triggers:
- 从服务器下载 - 从服务器下载
- 浏览服务器文件 - 浏览服务器文件
- 读取服务器文件 - 读取服务器文件
category: Web Services
--- ---
# Static Site Deploy # Static Site Deploy

View File

@ -1,6 +1,7 @@
--- ---
name: voice-notification name: voice-notification
description: Voice Notification - Push voice broadcast messages to active voice sessions for real-time TTS playback description: Voice Notification - Push voice broadcast messages to active voice sessions for real-time TTS playback
category: Communication
--- ---
# Voice Notification - Voice Broadcast # Voice Notification - Voice Broadcast

View File

@ -5,6 +5,7 @@ version: 1.0.2
tags: [weather, china, forecast, chinese, weather-cn, life-index, 7day-forecast] tags: [weather, china, forecast, chinese, weather-cn, life-index, 7day-forecast]
metadata: {"openclaw":{"emoji":"🌤️","requires":{"bins":["python3"]}}} metadata: {"openclaw":{"emoji":"🌤️","requires":{"bins":["python3"]}}}
allowed-tools: [exec] allowed-tools: [exec]
category: Weather
--- ---
# 中国天气预报查询 (China Weather) # 中国天气预报查询 (China Weather)

View File

@ -1,6 +1,7 @@
--- ---
name: kfs-answer name: kfs-answer
description: Primary skill for answering ALL questions about the datasets knowledge base. Search files, run queries (SQL / markdown), and return answers with citations. MUST be used first for any data-related question. description: Primary skill for answering ALL questions about the datasets knowledge base. Search files, run queries (SQL / markdown), and return answers with citations. MUST be used first for any data-related question.
category: Data & Retrieval
--- ---
# kfs-answer # kfs-answer

View File

@ -18,5 +18,6 @@
"{bot_id}" "{bot_id}"
] ]
} }
} },
"category": "Data & Retrieval"
} }

View File

@ -193,7 +193,12 @@ async def handle_request(request: Dict[str, Any]) -> Dict[str, Any]:
top_k = arguments.get("top_k", 100) top_k = arguments.get("top_k", 100)
if not query: if not query:
return create_error_response(request_id, -32602, "Missing required parameter: query") return create_success_response(request_id, {
"content": [{
"type": "text",
"text": "Error: missing required parameter 'query'. Please call this tool again with a non-empty 'query' argument describing what you want to retrieve."
}]
})
result = rag_retrieve(query, top_k) result = rag_retrieve(query, top_k)

View File

@ -1,6 +1,7 @@
--- ---
name: board-meeting-pack-helper name: board-meeting-pack-helper
description: Assemble board-meeting materials into a coherent pack with agenda logic, board-level KPIs, strategic risks, governance context, and decision-ready content. Use this whenever users ask for board materials, board pack, board meeting agenda, governance updates, director pre-read, 取締役会資料, or resolution-ready content for executive or board review; use it for board-level governance materials, not for generic executive one-pagers. description: Assemble board-meeting materials into a coherent pack with agenda logic, board-level KPIs, strategic risks, governance context, and decision-ready content. Use this whenever users ask for board materials, board pack, board meeting agenda, governance updates, director pre-read, 取締役会資料, or resolution-ready content for executive or board review; use it for board-level governance materials, not for generic executive one-pagers.
category: Writing & Reporting
--- ---
# Board Meeting Pack Helper # Board Meeting Pack Helper

View File

@ -1,6 +1,7 @@
--- ---
name: customer-reply-tone name: customer-reply-tone
description: Rewrite customer-facing replies in the right tone while preserving factual accuracy, accountability, and clear next steps across sensitive support, delivery, and account situations. Use this whenever users ask to soften, professionalize, de-escalate, polish, or reframe a customer email or chat response, including complaint reply, support response polish, or クレーム返信; use it for reply rewriting and de-escalation, not for sales follow-up or general Japanese business writing. description: Rewrite customer-facing replies in the right tone while preserving factual accuracy, accountability, and clear next steps across sensitive support, delivery, and account situations. Use this whenever users ask to soften, professionalize, de-escalate, polish, or reframe a customer email or chat response, including complaint reply, support response polish, or クレーム返信; use it for reply rewriting and de-escalation, not for sales follow-up or general Japanese business writing.
category: Writing & Reporting
--- ---
# Customer Reply Tone # Customer Reply Tone

View File

@ -1,6 +1,7 @@
--- ---
name: exec-brief-1pager name: exec-brief-1pager
description: Turn complex business, product, and operational topics into a one-page executive brief with decision-ready insights, options, and recommended actions. Use this whenever users ask for an executive summary, leadership brief, one-pager, decision memo, CEO brief, or key points at a glance for senior leadership; use it for one-page decision support, not for recurring status updates or board meeting packs. description: Turn complex business, product, and operational topics into a one-page executive brief with decision-ready insights, options, and recommended actions. Use this whenever users ask for an executive summary, leadership brief, one-pager, decision memo, CEO brief, or key points at a glance for senior leadership; use it for one-page decision support, not for recurring status updates or board meeting packs.
category: Writing & Reporting
--- ---
# Exec Brief 1Pager # Exec Brief 1Pager

View File

@ -1,6 +1,7 @@
--- ---
name: incident-postmortem-ja name: incident-postmortem-ja
description: Create structured postmortems and 障害報告書 for incidents, outages, and service failures with clear timelines, root-cause analysis, and preventive actions. Use this whenever users ask for an incident report, postmortem, RCA, incident review, 障害報告, 障害報告書, 振り返り, or 再発防止計画 focused on system and process improvement; use it for formal incident analysis, not for routine status updates or personal blame. description: Create structured postmortems and 障害報告書 for incidents, outages, and service failures with clear timelines, root-cause analysis, and preventive actions. Use this whenever users ask for an incident report, postmortem, RCA, incident review, 障害報告, 障害報告書, 振り返り, or 再発防止計画 focused on system and process improvement; use it for formal incident analysis, not for routine status updates or personal blame.
category: Writing & Reporting
--- ---
# Incident Postmortem JA # Incident Postmortem JA

View File

@ -1,6 +1,7 @@
--- ---
name: japan-compliance-checker name: japan-compliance-checker
description: Review Japan-specific compliance risks in business text, campaign copy, contracts, and operating processes with clear, practical screening guidance. Use this whenever users ask for Japan compliance review, legal review, regulatory check, 法務チェック, コンプラ確認, 契約レビュー, or 広告審査 within the v1 scope of APPI, 景品表示法, and 下請法; use it for risk screening rather than drafting, anonymization, or legal advice. description: Review Japan-specific compliance risks in business text, campaign copy, contracts, and operating processes with clear, practical screening guidance. Use this whenever users ask for Japan compliance review, legal review, regulatory check, 法務チェック, コンプラ確認, 契約レビュー, or 広告審査 within the v1 scope of APPI, 景品表示法, and 下請法; use it for risk screening rather than drafting, anonymization, or legal advice.
category: Compliance & Security
--- ---
# Japan Compliance Checker # Japan Compliance Checker

View File

@ -1,6 +1,7 @@
--- ---
name: japanese-business-writer name: japanese-business-writer
description: Draft and polish formal Japanese business writing for emails, notices, request letters, cover notes, and workplace communication with clear structure and appropriate 敬語. Use this whenever users ask for Japanese business writing, formal JP writing, 敬語 polishing, 文面添削, 依頼メール, 案内文, 送付状, or 社内通知; use it for writing quality and business tone, not for compliance review or complaint de-escalation. description: Draft and polish formal Japanese business writing for emails, notices, request letters, cover notes, and workplace communication with clear structure and appropriate 敬語. Use this whenever users ask for Japanese business writing, formal JP writing, 敬語 polishing, 文面添削, 依頼メール, 案内文, 送付状, or 社内通知; use it for writing quality and business tone, not for compliance review or complaint de-escalation.
category: Writing & Reporting
--- ---
# Japanese Business Writer # Japanese Business Writer

View File

@ -1,6 +1,7 @@
--- ---
name: japanese-pii-redactor name: japanese-pii-redactor
description: Redact, anonymize, and de-identify personal information in Japanese-language or mixed-language text and tabular data while preserving analytical usefulness. Use this whenever users ask for PII redaction, PII scrub, de-identification, 個人情報匿名化, 匿名加工, 仮名化, 秘匿化, or マスキング; use it for executing anonymization rules, not for legal interpretation or general writing polish. description: Redact, anonymize, and de-identify personal information in Japanese-language or mixed-language text and tabular data while preserving analytical usefulness. Use this whenever users ask for PII redaction, PII scrub, de-identification, 個人情報匿名化, 匿名加工, 仮名化, 秘匿化, or マスキング; use it for executing anonymization rules, not for legal interpretation or general writing polish.
category: Compliance & Security
--- ---
# Japanese PII Redactor # Japanese PII Redactor

View File

@ -1,6 +1,7 @@
--- ---
name: kfs-answer name: kfs-answer
description: Primary skill for answering ALL questions about the datasets knowledge base. Search files, run queries (SQL / markdown), and return answers with citations. MUST be used first for any data-related question. description: Primary skill for answering ALL questions about the datasets knowledge base. Search files, run queries (SQL / markdown), and return answers with citations. MUST be used first for any data-related question.
category: Data & Retrieval
--- ---
# kfs-answer # kfs-answer

Some files were not shown because too many files have changed in this diff Show More