qwen_agent/skills/common/web2md/SKILL.md
2026-06-23 10:12:55 +08:00

2.0 KiB

name description
web2md Convert any web URL to Markdown. Triggers on "转成Markdown/转换/网页转Markdown/convert to Markdown + URL". Handles static sites, dynamic SPAs, WeChat, arXiv, Twitter/X.

docai:web2md

When to Trigger

User wants to convert a web page to Markdown. Common patterns:

  • "把这个链接转成 Markdown"、"网页转 Markdown"、"提取网页内容"
  • "convert this URL to Markdown"、"get the content of this page"
  • Any URL + intent to extract/read content (without summarization)

If user wants summary, use web2summary instead.

How to Execute

python skills/web2md/tools/convert.py <URL> [--use-python] [-o <file>]

Parameters

Parameter Required Description
url Yes Web page URL
--use-python No Force Python method (skip Jina/Firecrawl)
-o / --output No Save to file instead of stdout

Examples

# Basic conversion (parallel: Jina / Firecrawl / Python)
python skills/web2md/tools/convert.py https://example.com/article

# arXiv paper (auto HTML priority, PDF fallback)
python skills/web2md/tools/convert.py https://arxiv.org/abs/2601.04500v1

# Save to file
python skills/web2md/tools/convert.py https://mp.weixin.qq.com/s/... -o article.md

# Force Python method
python skills/web2md/tools/convert.py https://example.com --use-python

What It Does

Four methods run in parallel, returning the first successful result:

  1. Jina Reader API (fastest, zero install)
  2. Firecrawl API (if key configured)
  3. Python fallback (requests + BeautifulSoup)
  4. Playwright (headless browser for JS-rendered pages)

Special cases handled automatically: WeChat → Playwright first (mobile UA, JS rendering), arXiv → HTML priority, Twitter/X → Playwright.

Troubleshooting

  • arXiv PDF garbled: Requires pymupdfpip install pymupdf
  • Dynamic page empty: Script auto-detects SPAs and uses Playwright
  • All methods fail: Try --use-python to bypass API methods