Merge branch 'developing' into bot_manager

This commit is contained in:
朱潮 2026-06-23 11:47:30 +08:00
commit 7f0c91d2d1
2 changed files with 18 additions and 2 deletions

View File

@ -1,6 +1,6 @@
---
name: web2md
description: Convert any web URL to Markdown. Triggers on "转成Markdown/转换/网页转Markdown/convert to Markdown + URL". Handles static sites, dynamic SPAs, WeChat, arXiv, Twitter/X.
description: "Fetch any web URL and return its content as Markdown. ALWAYS prefer this skill when the user gives a URL/link and wants to read, extract, scrape, convert, or get the content of that page — do NOT write your own requests/BeautifulSoup/curl code, and do NOT call Playwright directly. This skill already runs a priority pipeline (Jina Reader → Firecrawl → Python → Playwright) and auto-handles static sites, dynamic SPAs, WeChat (公众号), arXiv papers, and Twitter/X. Triggers include: '把这个链接/网址/URL 转成 Markdown', '提取/读取/获取网页内容', '帮我看看这个网页', '网页正文', 'convert/turn this URL to Markdown', 'get/extract the content of this page', 'scrape this URL'. If the user wants a summary instead of raw content, use web2summary."
---
# docai:web2md
@ -13,6 +13,14 @@ User wants to convert a web page to Markdown. Common patterns:
If user wants summary, use web2summary instead.
## ⚠️ Do NOT roll your own
When the user gives a URL, **always call this skill**. Do NOT:
- write `requests.get(...)` + BeautifulSoup / markdownify yourself
- run `curl` and parse HTML by hand
- spawn Playwright / `mcp__pw__browser_*` to drive a browser
This skill already runs Jina → Firecrawl → Python → Playwright in parallel and returns the first successful result, with special handling for WeChat / arXiv / Twitter. Going around it wastes time and usually produces worse output.
## How to Execute
```bash
python skills/web2md/tools/convert.py <URL> [--use-python] [-o <file>]

View File

@ -1,6 +1,6 @@
---
name: web2summary
description: Summarize any web URL. Triggers on "summarize/总结/概括/摘要 + URL". Auto-detects content type (paper, news, tutorial, product, AI news) and generates adaptive structured summary.
description: "Summarize any web URL into a structured, content-aware summary. ALWAYS prefer this skill when the user gives a URL/link and wants a summary, TL;DR, key points, or takeaways — do NOT fetch the page yourself with requests/curl/Playwright first. This skill internally calls web2md to get the content, then auto-detects the content type (paper / news / tutorial / product / AI news / generic) and outputs an adaptive structured summary. Triggers include: '总结/概括/摘要一下这个链接/网址/文章', '帮我看看这个网页讲了什么', '这篇说了啥', 'summarize this URL/page/article', 'TL;DR', 'give me the key points of this link'. If the user only wants the raw Markdown content without a summary, use web2md instead."
---
# docai:web2summary
@ -11,6 +11,14 @@ User wants to summarize a web page. Common patterns:
- "summarize this URL"、"give me a summary of"
- Any URL + intent to understand/extract key points
## ⚠️ Do NOT fetch the URL yourself
When the user gives a URL + asks for a summary, **always call this skill**. Do NOT:
- write `requests.get(...)` / `curl` / BeautifulSoup to grab the page first
- spawn Playwright / `mcp__pw__browser_*` to read the page
- call `web2md` manually and then summarize by hand
Step 1 below already uses `web2md` internally to get the content — just follow Step 1 → Step 2.
## How to Execute
### Step 1 — 获取网页内容