qwen_agent/skills/japanese-pii-redactor/SKILL.md
2026-04-14 20:49:56 +08:00

2.6 KiB

name description
japanese-pii-redactor Redact, anonymize, and de-identify personal information in Japanese-language or mixed-language text and tabular data while preserving analytical usefulness. Use this whenever users ask for PII redaction, PII scrub, de-identification, 個人情報匿名化, 匿名加工, 仮名化, 秘匿化, or マスキング; use it for executing anonymization rules, not for legal interpretation or general writing polish.

Japanese PII Redactor

Overview

Detect and redact personal information in Japanese-language or mixed-language content for safer sharing and analysis.

Common targets:

  • Person names
  • Phone numbers
  • Email addresses
  • Home/work addresses
  • Account/member/employee identifiers
  • Free-text notes containing identifiable details

Triggering Cues

Use this skill when user messages include:

  • Chinese cues: 脱敏、匿名化、个人信息、隐私遮蔽、PII处理、数据清洗
  • Japanese cues: 個人情報、匿名化、マスキング、伏字、漏えい対策、PII
  • English cues: redact PII, anonymize Japanese data, privacy masking

Input Requirements

Ask for or infer:

  1. Source text/table
  2. Target output format (text/table/json)
  3. Redaction strength (light/standard/strict)
  4. Whether reversible pseudonyms are needed

Output Format

Always output:

  1. Redacted Data
  2. Redaction Rules Applied
  3. Fields Preserved vs Masked
  4. Residual Risk Notes

For rules section, use this schema:

Field Type Detection Pattern Redaction Method Example

Workflow

  1. Detect direct identifiers first (email/phone/account IDs).
  2. Detect contextual identifiers (address/detail combinations).
  3. Apply consistent masking policy across the dataset.
  4. Keep analytical utility while minimizing re-identification risk.
  5. Report what was masked and why.

Examples

Example 1

Input:

  • 日文客服对话日志,需要共享给外部分析团队。

Output style:

  • Replace identifiers with neutral tokens
  • Preserve issue semantics and timeline

Example 2

Input:

  • 员工名单(姓名、邮箱、电话、住址、员工编号)。

Output style:

  • Table output with masked fields and preserved non-sensitive columns
  • Explicit rule list for audit traceability

Guidelines

  • Prefer consistency: same entity should map to same token within one output.
  • Never expose raw originals in final output.
  • Mark uncertain detections as "Needs Manual Review".
  • State that redaction reduces risk but does not guarantee zero re-identification risk.