Merge branch 'developing' into bot_manager

2026-06-23 11:47:30 +08:00 · 2026-06-23 11:47:30 +08:00 · 7f0c91d2d1
commit 7f0c91d2d1
parent 2508d644fb 05ed2e8d45
2 changed files with 18 additions and 2 deletions
--- a/skills/common/web2md/SKILL.md
+++ b/skills/common/web2md/SKILL.md
@ -1,6 +1,6 @@
 ---
 name: web2md
-description: Convert any web URL to Markdown. Triggers on "转成Markdown/转换/网页转Markdown/convert to Markdown + URL". Handles static sites, dynamic SPAs, WeChat, arXiv, Twitter/X.
+description: "Fetch any web URL and return its content as Markdown. ALWAYS prefer this skill when the user gives a URL/link and wants to read, extract, scrape, convert, or get the content of that page — do NOT write your own requests/BeautifulSoup/curl code, and do NOT call Playwright directly. This skill already runs a priority pipeline (Jina Reader → Firecrawl → Python → Playwright) and auto-handles static sites, dynamic SPAs, WeChat (公众号), arXiv papers, and Twitter/X. Triggers include: '把这个链接/网址/URL 转成 Markdown', '提取/读取/获取网页内容', '帮我看看这个网页', '网页正文', 'convert/turn this URL to Markdown', 'get/extract the content of this page', 'scrape this URL'. If the user wants a summary instead of raw content, use web2summary."
 ---

 # docai:web2md
@ -13,6 +13,14 @@ User wants to convert a web page to Markdown. Common patterns:

 If user wants summary, use web2summary instead.

+## ⚠️ Do NOT roll your own
+When the user gives a URL, **always call this skill**. Do NOT:
+- write `requests.get(...)` + BeautifulSoup / markdownify yourself
+- run `curl` and parse HTML by hand
+- spawn Playwright / `mcp__pw__browser_*` to drive a browser
+
+This skill already runs Jina → Firecrawl → Python → Playwright in parallel and returns the first successful result, with special handling for WeChat / arXiv / Twitter. Going around it wastes time and usually produces worse output.
+
 ## How to Execute
 ```bash
 python skills/web2md/tools/convert.py <URL> [--use-python] [-o <file>]
--- a/skills/common/web2summary/SKILL.md
+++ b/skills/common/web2summary/SKILL.md
@ -1,6 +1,6 @@
 ---
 name: web2summary
-description: Summarize any web URL. Triggers on "summarize/总结/概括/摘要 + URL". Auto-detects content type (paper, news, tutorial, product, AI news) and generates adaptive structured summary.
+description: "Summarize any web URL into a structured, content-aware summary. ALWAYS prefer this skill when the user gives a URL/link and wants a summary, TL;DR, key points, or takeaways — do NOT fetch the page yourself with requests/curl/Playwright first. This skill internally calls web2md to get the content, then auto-detects the content type (paper / news / tutorial / product / AI news / generic) and outputs an adaptive structured summary. Triggers include: '总结/概括/摘要一下这个链接/网址/文章', '帮我看看这个网页讲了什么', '这篇说了啥', 'summarize this URL/page/article', 'TL;DR', 'give me the key points of this link'. If the user only wants the raw Markdown content without a summary, use web2md instead."
 ---

 # docai:web2summary
@ -11,6 +11,14 @@ User wants to summarize a web page. Common patterns:
 - "summarize this URL"、"give me a summary of"
 - Any URL + intent to understand/extract key points

+## ⚠️ Do NOT fetch the URL yourself
+When the user gives a URL + asks for a summary, **always call this skill**. Do NOT:
+- write `requests.get(...)` / `curl` / BeautifulSoup to grab the page first
+- spawn Playwright / `mcp__pw__browser_*` to read the page
+- call `web2md` manually and then summarize by hand
+
+Step 1 below already uses `web2md` internally to get the content — just follow Step 1 → Step 2.
+
 ## How to Execute

 ### Step 1 — 获取网页内容