diff --git a/skills/common/web2md/SKILL.md b/skills/common/web2md/SKILL.md index 639ce16..a14a8d3 100644 --- a/skills/common/web2md/SKILL.md +++ b/skills/common/web2md/SKILL.md @@ -1,6 +1,6 @@ --- name: web2md -description: Convert any web URL to Markdown. Triggers on "转成Markdown/转换/网页转Markdown/convert to Markdown + URL". Handles static sites, dynamic SPAs, WeChat, arXiv, Twitter/X. +description: "Fetch any web URL and return its content as Markdown. ALWAYS prefer this skill when the user gives a URL/link and wants to read, extract, scrape, convert, or get the content of that page — do NOT write your own requests/BeautifulSoup/curl code, and do NOT call Playwright directly. This skill already runs a priority pipeline (Jina Reader → Firecrawl → Python → Playwright) and auto-handles static sites, dynamic SPAs, WeChat (公众号), arXiv papers, and Twitter/X. Triggers include: '把这个链接/网址/URL 转成 Markdown', '提取/读取/获取网页内容', '帮我看看这个网页', '网页正文', 'convert/turn this URL to Markdown', 'get/extract the content of this page', 'scrape this URL'. If the user wants a summary instead of raw content, use web2summary." --- # docai:web2md @@ -13,6 +13,14 @@ User wants to convert a web page to Markdown. Common patterns: If user wants summary, use web2summary instead. +## ⚠️ Do NOT roll your own +When the user gives a URL, **always call this skill**. Do NOT: +- write `requests.get(...)` + BeautifulSoup / markdownify yourself +- run `curl` and parse HTML by hand +- spawn Playwright / `mcp__pw__browser_*` to drive a browser + +This skill already runs Jina → Firecrawl → Python → Playwright in parallel and returns the first successful result, with special handling for WeChat / arXiv / Twitter. Going around it wastes time and usually produces worse output. + ## How to Execute ```bash python skills/web2md/tools/convert.py [--use-python] [-o ] diff --git a/skills/common/web2summary/SKILL.md b/skills/common/web2summary/SKILL.md index 0fb589d..899b99d 100644 --- a/skills/common/web2summary/SKILL.md +++ b/skills/common/web2summary/SKILL.md @@ -1,6 +1,6 @@ --- name: web2summary -description: Summarize any web URL. Triggers on "summarize/总结/概括/摘要 + URL". Auto-detects content type (paper, news, tutorial, product, AI news) and generates adaptive structured summary. +description: "Summarize any web URL into a structured, content-aware summary. ALWAYS prefer this skill when the user gives a URL/link and wants a summary, TL;DR, key points, or takeaways — do NOT fetch the page yourself with requests/curl/Playwright first. This skill internally calls web2md to get the content, then auto-detects the content type (paper / news / tutorial / product / AI news / generic) and outputs an adaptive structured summary. Triggers include: '总结/概括/摘要一下这个链接/网址/文章', '帮我看看这个网页讲了什么', '这篇说了啥', 'summarize this URL/page/article', 'TL;DR', 'give me the key points of this link'. If the user only wants the raw Markdown content without a summary, use web2md instead." --- # docai:web2summary @@ -11,6 +11,14 @@ User wants to summarize a web page. Common patterns: - "summarize this URL"、"give me a summary of" - Any URL + intent to understand/extract key points +## ⚠️ Do NOT fetch the URL yourself +When the user gives a URL + asks for a summary, **always call this skill**. Do NOT: +- write `requests.get(...)` / `curl` / BeautifulSoup to grab the page first +- spawn Playwright / `mcp__pw__browser_*` to read the page +- call `web2md` manually and then summarize by hand + +Step 1 below already uses `web2md` internally to get the content — just follow Step 1 → Step 2. + ## How to Execute ### Step 1 — 获取网页内容