2.6 KiB
2.6 KiB
| name | description |
|---|---|
| japanese-pii-redactor | Redact, anonymize, and de-identify personal information in Japanese-language or mixed-language text and tabular data while preserving analytical usefulness. Use this whenever users ask for PII redaction, PII scrub, de-identification, 個人情報匿名化, 匿名加工, 仮名化, 秘匿化, or マスキング; use it for executing anonymization rules, not for legal interpretation or general writing polish. |
Japanese PII Redactor
Overview
Detect and redact personal information in Japanese-language or mixed-language content for safer sharing and analysis.
Common targets:
- Person names
- Phone numbers
- Email addresses
- Home/work addresses
- Account/member/employee identifiers
- Free-text notes containing identifiable details
Triggering Cues
Use this skill when user messages include:
- Chinese cues: 脱敏、匿名化、个人信息、隐私遮蔽、PII处理、数据清洗
- Japanese cues: 個人情報、匿名化、マスキング、伏字、漏えい対策、PII
- English cues: redact PII, anonymize Japanese data, privacy masking
Input Requirements
Ask for or infer:
- Source text/table
- Target output format (text/table/json)
- Redaction strength (light/standard/strict)
- Whether reversible pseudonyms are needed
Output Format
Always output:
- Redacted Data
- Redaction Rules Applied
- Fields Preserved vs Masked
- Residual Risk Notes
For rules section, use this schema:
| Field Type | Detection Pattern | Redaction Method | Example |
|---|
Workflow
- Detect direct identifiers first (email/phone/account IDs).
- Detect contextual identifiers (address/detail combinations).
- Apply consistent masking policy across the dataset.
- Keep analytical utility while minimizing re-identification risk.
- Report what was masked and why.
Examples
Example 1
Input:
- 日文客服对话日志,需要共享给外部分析团队。
Output style:
- Replace identifiers with neutral tokens
- Preserve issue semantics and timeline
Example 2
Input:
- 员工名单(姓名、邮箱、电话、住址、员工编号)。
Output style:
- Table output with masked fields and preserved non-sensitive columns
- Explicit rule list for audit traceability
Guidelines
- Prefer consistency: same entity should map to same token within one output.
- Never expose raw originals in final output.
- Mark uncertain detections as "Needs Manual Review".
- State that redaction reduces risk but does not guarantee zero re-identification risk.