update
This commit is contained in:
parent
f1107ea35a
commit
e5e2ecc35c
@ -1,12 +1,12 @@
|
|||||||
{
|
{
|
||||||
"name": "catalog-search-agent",
|
"name": "catalog-search-agent",
|
||||||
"version": "1.0.0",
|
"version": "1.0.0",
|
||||||
"description": "Intelligent data retrieval expert system for multi-layer catalog search with semantic and keyword-based search capabilities",
|
"description": "Intelligent data retrieval expert system for multi-layer catalog search with semantic and keyword-based search capabilities",
|
||||||
"author": {
|
"author": {
|
||||||
"name": "sparticle",
|
"name": "sparticle",
|
||||||
"email": "support@gbase.ai"
|
"email": "support@gbase.ai"
|
||||||
},
|
},
|
||||||
"skills": [
|
"skills": [
|
||||||
"./skills/catalog-search-agent"
|
"./skills/catalog-search-agent"
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
|
|||||||
@ -1,79 +1,79 @@
|
|||||||
# Catalog Search Agent
|
# Catalog Search Agent
|
||||||
|
|
||||||
智能数据检索专家系统,基于多层数据架构的专业数据检索,具备自主决策能力和复杂查询优化技能。
|
智能数据检索专家系统,基于多层数据架构的专业数据检索,具备自主决策能力和复杂查询优化技能。
|
||||||
|
|
||||||
## 功能特点
|
## 功能特点
|
||||||
|
|
||||||
- **多层数据架构支持**
|
- **多层数据架构支持**
|
||||||
- 原始文档层 (document.txt) - 完整上下文信息
|
- 原始文档层 (document.txt) - 完整上下文信息
|
||||||
- 分页数据层 (pagination.txt) - 高效关键词/正则检索
|
- 分页数据层 (pagination.txt) - 高效关键词/正则检索
|
||||||
- 语义检索层 (embedding.pkl) - 向量化语义搜索
|
- 语义检索层 (embedding.pkl) - 向量化语义搜索
|
||||||
|
|
||||||
- **智能检索策略**
|
- **智能检索策略**
|
||||||
- 关键词扩展与优化
|
- 关键词扩展与优化
|
||||||
- 数字格式标准化扩展
|
- 数字格式标准化扩展
|
||||||
- 范围性正则表达式生成
|
- 范围性正则表达式生成
|
||||||
- 多关键词权重混合检索
|
- 多关键词权重混合检索
|
||||||
|
|
||||||
- **多种搜索模式**
|
- **多种搜索模式**
|
||||||
- 正则表达式搜索
|
- 正则表达式搜索
|
||||||
- 关键词匹配
|
- 关键词匹配
|
||||||
- 语义相似度搜索
|
- 语义相似度搜索
|
||||||
- 上下文行检索
|
- 上下文行检索
|
||||||
|
|
||||||
## 安装
|
## 安装
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# 安装依赖
|
# 安装依赖
|
||||||
pip install -r skills/catalog-search-agent/scripts/requirements.txt
|
pip install -r skills/catalog-search-agent/scripts/requirements.txt
|
||||||
```
|
```
|
||||||
|
|
||||||
## 使用方法
|
## 使用方法
|
||||||
|
|
||||||
### 多关键词搜索
|
### 多关键词搜索
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python skills/catalog-search-agent/scripts/multi_keyword_search.py search \
|
python skills/catalog-search-agent/scripts/multi_keyword_search.py search \
|
||||||
--patterns '[{"pattern": "laptop", "weight": 2.0}, {"pattern": "/[0-9]+\\.?[0-9]*kg/", "weight": 1.5}]' \
|
--patterns '[{"pattern": "laptop", "weight": 2.0}, {"pattern": "/[0-9]+\\.?[0-9]*kg/", "weight": 1.5}]' \
|
||||||
--file-paths data/pagination.txt \
|
--file-paths data/pagination.txt \
|
||||||
--limit 20
|
--limit 20
|
||||||
```
|
```
|
||||||
|
|
||||||
### 语义搜索
|
### 语义搜索
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python skills/catalog-search-agent/scripts/semantic_search.py \
|
python skills/catalog-search-agent/scripts/semantic_search.py \
|
||||||
--queries "lightweight laptop for travel" \
|
--queries "lightweight laptop for travel" \
|
||||||
--embeddings-file data/embedding.pkl \
|
--embeddings-file data/embedding.pkl \
|
||||||
--top-k 10
|
--top-k 10
|
||||||
```
|
```
|
||||||
|
|
||||||
### 正则表达式搜索
|
### 正则表达式搜索
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python skills/catalog-search-agent/scripts/multi_keyword_search.py regex_grep \
|
python skills/catalog-search-agent/scripts/multi_keyword_search.py regex_grep \
|
||||||
--patterns "/price:\\s*\\$[0-9]+/" \
|
--patterns "/price:\\s*\\$[0-9]+/" \
|
||||||
--file-paths data/pagination.txt \
|
--file-paths data/pagination.txt \
|
||||||
--context-lines 3
|
--context-lines 3
|
||||||
```
|
```
|
||||||
|
|
||||||
## 环境变量
|
## 环境变量
|
||||||
|
|
||||||
| 变量 | 说明 | 默认值 |
|
| 变量 | 说明 | 默认值 |
|
||||||
|------|------|--------|
|
|------|------|--------|
|
||||||
| `FASTAPI_URL` | Embedding API 服务地址 | `http://localhost:8000` |
|
| `FASTAPI_URL` | Embedding API 服务地址 | `http://localhost:8000` |
|
||||||
|
|
||||||
## 数据架构
|
## 数据架构
|
||||||
|
|
||||||
### document.txt
|
### document.txt
|
||||||
原始 markdown 文本内容,提供完整上下文信息。获取某一行数据时需要包含前后 10 行的上下文。
|
原始 markdown 文本内容,提供完整上下文信息。获取某一行数据时需要包含前后 10 行的上下文。
|
||||||
|
|
||||||
### pagination.txt
|
### pagination.txt
|
||||||
基于 document.txt 整理的分页数据,每一行代表完整的一页数据,支持正则高效匹配和关键词检索。
|
基于 document.txt 整理的分页数据,每一行代表完整的一页数据,支持正则高效匹配和关键词检索。
|
||||||
|
|
||||||
### embedding.pkl
|
### embedding.pkl
|
||||||
语义检索文件,将 document.txt 按段落/页面分块并生成向量化表达,用于语义相似度搜索。
|
语义检索文件,将 document.txt 按段落/页面分块并生成向量化表达,用于语义相似度搜索。
|
||||||
|
|
||||||
## 作者
|
## 作者
|
||||||
|
|
||||||
Sparticle <support@gbase.ai>
|
Sparticle <support@gbase.ai>
|
||||||
|
|||||||
@ -1,294 +1,294 @@
|
|||||||
---
|
---
|
||||||
name: catalog-search-agent
|
name: catalog-search-agent
|
||||||
description: Intelligent data retrieval expert system for catalog search. Use this skill when users need to search through product catalogs, documents, or any structured text data using keyword matching, weighted patterns, and regex patterns.
|
description: Intelligent data retrieval expert system for catalog search. Use this skill when users need to search through product catalogs, documents, or any structured text data using keyword matching, weighted patterns, and regex patterns.
|
||||||
---
|
---
|
||||||
|
|
||||||
# Catalog Search Agent
|
# Catalog Search Agent
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
An intelligent data retrieval expert system with autonomous decision-making and complex query optimization capabilities. Dynamically formulates optimal retrieval strategies based on different data characteristics and query requirements.
|
An intelligent data retrieval expert system with autonomous decision-making and complex query optimization capabilities. Dynamically formulates optimal retrieval strategies based on different data characteristics and query requirements.
|
||||||
|
|
||||||
## Data Architecture
|
## Data Architecture
|
||||||
|
|
||||||
The system operates on a two-layer data architecture:
|
The system operates on a two-layer data architecture:
|
||||||
|
|
||||||
| Layer | File | Description | Use Case |
|
| Layer | File | Description | Use Case |
|
||||||
|-------|------|-------------|----------|
|
|-------|------|-------------|----------|
|
||||||
| **Raw Document** | `document.txt` | Original markdown text with full context | Reading complete content with context |
|
| **Raw Document** | `document.txt` | Original markdown text with full context | Reading complete content with context |
|
||||||
| **Pagination Layer** | `pagination.txt` | One line per page, regex-friendly | Primary keyword/regex search target |
|
| **Pagination Layer** | `pagination.txt` | One line per page, regex-friendly | Primary keyword/regex search target |
|
||||||
|
|
||||||
### Layer Details
|
### Layer Details
|
||||||
|
|
||||||
**document.txt**
|
**document.txt**
|
||||||
- Raw markdown content with full contextual information
|
- Raw markdown content with full contextual information
|
||||||
- Requires 10-line context for meaningful single-line retrieval
|
- Requires 10-line context for meaningful single-line retrieval
|
||||||
- Use `multi_keyword_search.py regex_grep` with `--context-lines` parameter for context
|
- Use `multi_keyword_search.py regex_grep` with `--context-lines` parameter for context
|
||||||
|
|
||||||
**pagination.txt**
|
**pagination.txt**
|
||||||
- Single line represents one complete page
|
- Single line represents one complete page
|
||||||
- Adjacent lines contain previous/next page content
|
- Adjacent lines contain previous/next page content
|
||||||
- Ideal for retrieving all data at once
|
- Ideal for retrieving all data at once
|
||||||
- Primary target for regex and keyword search
|
- Primary target for regex and keyword search
|
||||||
- Search here first, then reference `document.txt` for details
|
- Search here first, then reference `document.txt` for details
|
||||||
|
|
||||||
## Workflow Strategy
|
## Workflow Strategy
|
||||||
|
|
||||||
Follow this sequential analysis strategy:
|
Follow this sequential analysis strategy:
|
||||||
|
|
||||||
### 1. Problem Analysis
|
### 1. Problem Analysis
|
||||||
- Analyze the query and extract potential search keywords
|
- Analyze the query and extract potential search keywords
|
||||||
- Consider data patterns (price, weight, length) for regex preview
|
- Consider data patterns (price, weight, length) for regex preview
|
||||||
|
|
||||||
### 2. Keyword Expansion
|
### 2. Keyword Expansion
|
||||||
- Use data insight tools to expand and refine keywords
|
- Use data insight tools to expand and refine keywords
|
||||||
- Generate rich keyword sets for multi-keyword retrieval
|
- Generate rich keyword sets for multi-keyword retrieval
|
||||||
|
|
||||||
### 3. Number Expansion
|
### 3. Number Expansion
|
||||||
|
|
||||||
**a. Unit Standardization**
|
**a. Unit Standardization**
|
||||||
- Weight: 1kg → 1000g, 1.0kg, 1000.0g, 1公斤
|
- Weight: 1kg → 1000g, 1.0kg, 1000.0g, 1公斤
|
||||||
- Length: 3m → 3.0m, 30cm, 300厘米
|
- Length: 3m → 3.0m, 30cm, 300厘米
|
||||||
- Currency: ¥9.99 → 9.99元, 9.99元, ¥9.99
|
- Currency: ¥9.99 → 9.99元, 9.99元, ¥9.99
|
||||||
- Time: 2h → 120分钟, 7200秒, 2.0小时
|
- Time: 2h → 120分钟, 7200秒, 2.0小时
|
||||||
|
|
||||||
**b. Format Diversification**
|
**b. Format Diversification**
|
||||||
- Decimal formats: 1kg → 1.0kg, 1.00kg
|
- Decimal formats: 1kg → 1.0kg, 1.00kg
|
||||||
- Chinese expressions: 25% → 百分之二十五, 0.25
|
- Chinese expressions: 25% → 百分之二十五, 0.25
|
||||||
- Multilingual: 1.0 kilogram, 3.0 meters
|
- Multilingual: 1.0 kilogram, 3.0 meters
|
||||||
|
|
||||||
**c. Contextual Expansion**
|
**c. Contextual Expansion**
|
||||||
- Price: $100 → $100.0, 100美元
|
- Price: $100 → $100.0, 100美元
|
||||||
- Percentage: 25% → 0.25, 百分之二十五
|
- Percentage: 25% → 0.25, 百分之二十五
|
||||||
- Time: 7天 → 7日, 一周, 168小时
|
- Time: 7天 → 7日, 一周, 168小时
|
||||||
|
|
||||||
**d. Range Expansion** (moderate use)
|
**d. Range Expansion** (moderate use)
|
||||||
Convert natural language quantity descriptions to regex patterns:
|
Convert natural language quantity descriptions to regex patterns:
|
||||||
|
|
||||||
| Semantic | Range | Regex Example |
|
| Semantic | Range | Regex Example |
|
||||||
|----------|-------|---------------|
|
|----------|-------|---------------|
|
||||||
| ~1kg/1000g | 800-1200g | `/([01]\.\d+\s*[kK]?[gG]|(8\d{2}|9\d{2}|1[01]\d{2}|1200)\s*[gG])/` |
|
| ~1kg/1000g | 800-1200g | `/([01]\.\d+\s*[kK]?[gG]|(8\d{2}|9\d{2}|1[01]\d{2}|1200)\s*[gG])/` |
|
||||||
| <1kg laptop | 800-999g | `/\b(0?\.[8-9]\d{0,2}\s*[kK][gG]|[8-9]\d{2}\s*[gG])\b/` |
|
| <1kg laptop | 800-999g | `/\b(0?\.[8-9]\d{0,2}\s*[kK][gG]|[8-9]\d{2}\s*[gG])\b/` |
|
||||||
| ~3 meters | 2.5-3.5m | `/\b([2-3]\.\d+\s*[mM]|2\.5|3\.5)\b/` |
|
| ~3 meters | 2.5-3.5m | `/\b([2-3]\.\d+\s*[mM]|2\.5|3\.5)\b/` |
|
||||||
| <3 meters | 0-2.9m | `/\b([0-2]\.\d+\s*[mM]|[12]?\d{1,2}\s*[cC][mM])\b/` |
|
| <3 meters | 0-2.9m | `/\b([0-2]\.\d+\s*[mM]|[12]?\d{1,2}\s*[cC][mM])\b/` |
|
||||||
| ~100 yuan | 90-110 | `/\b(9[0-9]|10[0-9]|110)\s*元?\b/` |
|
| ~100 yuan | 90-110 | `/\b(9[0-9]|10[0-9]|110)\s*元?\b/` |
|
||||||
| 100-200 yuan | 100-199 | `/\b(1[0-9]{2})\s*元?\b/` |
|
| 100-200 yuan | 100-199 | `/\b(1[0-9]{2})\s*元?\b/` |
|
||||||
| ~7 days | 5-10 days | `/\b([5-9]|10)\s*天?\b/` |
|
| ~7 days | 5-10 days | `/\b([5-9]|10)\s*天?\b/` |
|
||||||
| >1 week | 8-30 days | `/\b([8-9]|[12][0-9]|30)\s*天?\b/` |
|
| >1 week | 8-30 days | `/\b([8-9]|[12][0-9]|30)\s*天?\b/` |
|
||||||
| Room temp | 20-30°C | `/\b(2[0-9]|30)\s*°?[Cc]\b/` |
|
| Room temp | 20-30°C | `/\b(2[0-9]|30)\s*°?[Cc]\b/` |
|
||||||
| Below freezing | <0°C | `/\b-?[1-9]\d*\s*°?[Cc]\b/` |
|
| Below freezing | <0°C | `/\b-?[1-9]\d*\s*°?[Cc]\b/` |
|
||||||
| High concentration | 90-100% | `/\b(9[0-9]|100)\s*%?\b/` |
|
| High concentration | 90-100% | `/\b(9[0-9]|100)\s*%?\b/` |
|
||||||
|
|
||||||
### 4. Strategy Formulation
|
### 4. Strategy Formulation
|
||||||
|
|
||||||
**Path Selection**
|
**Path Selection**
|
||||||
- Prioritize simple field matching, avoid complex regex
|
- Prioritize simple field matching, avoid complex regex
|
||||||
- Use loose matching + post-processing for higher recall
|
- Use loose matching + post-processing for higher recall
|
||||||
|
|
||||||
**Scale Estimation**
|
**Scale Estimation**
|
||||||
- Call `multi_keyword_search.py regex_grep_count` or `search_count` to evaluate result scale
|
- Call `multi_keyword_search.py regex_grep_count` or `search_count` to evaluate result scale
|
||||||
- Avoid data overload
|
- Avoid data overload
|
||||||
|
|
||||||
**Search Execution**
|
**Search Execution**
|
||||||
- Use `multi_keyword_search.py search` for weighted multi-keyword hybrid retrieval
|
- Use `multi_keyword_search.py search` for weighted multi-keyword hybrid retrieval
|
||||||
|
|
||||||
## Advanced Search Strategies
|
## Advanced Search Strategies
|
||||||
|
|
||||||
### Query Type Adaptation
|
### Query Type Adaptation
|
||||||
|
|
||||||
| Query Type | Strategy |
|
| Query Type | Strategy |
|
||||||
|------------|----------|
|
|------------|----------|
|
||||||
| **Exploratory** | Regex analysis → Pattern discovery → Keyword expansion |
|
| **Exploratory** | Regex analysis → Pattern discovery → Keyword expansion |
|
||||||
| **Precision** | Target location → Direct search → Result verification |
|
| **Precision** | Target location → Direct search → Result verification |
|
||||||
| **Analytical** | Multi-dimensional analysis → Deep mining → Insight extraction |
|
| **Analytical** | Multi-dimensional analysis → Deep mining → Insight extraction |
|
||||||
|
|
||||||
### Intelligent Path Optimization
|
### Intelligent Path Optimization
|
||||||
|
|
||||||
- **Structured queries**: pagination.txt → document.txt
|
- **Structured queries**: pagination.txt → document.txt
|
||||||
- **Fuzzy queries**: document.txt → Keyword extraction → Structured verification
|
- **Fuzzy queries**: document.txt → Keyword extraction → Structured verification
|
||||||
- **Composite queries**: Multi-field combination → Layered filtering → Result aggregation
|
- **Composite queries**: Multi-field combination → Layered filtering → Result aggregation
|
||||||
- **Multi-keyword optimization**: Use `multi_keyword_search.py search` for unordered keyword matching
|
- **Multi-keyword optimization**: Use `multi_keyword_search.py search` for unordered keyword matching
|
||||||
|
|
||||||
### Search Techniques
|
### Search Techniques
|
||||||
|
|
||||||
- **Regex strategy**: Simple first, progressive refinement, format variations
|
- **Regex strategy**: Simple first, progressive refinement, format variations
|
||||||
- **Multi-keyword strategy**: Use `multi_keyword_search.py search` for unordered multi-keyword queries
|
- **Multi-keyword strategy**: Use `multi_keyword_search.py search` for unordered multi-keyword queries
|
||||||
- **Range conversion**: Convert fuzzy descriptions (e.g., "~1000g") to precise ranges (e.g., "800-1200g")
|
- **Range conversion**: Convert fuzzy descriptions (e.g., "~1000g") to precise ranges (e.g., "800-1200g")
|
||||||
- **Result processing**: Layered display, correlation discovery, intelligent aggregation
|
- **Result processing**: Layered display, correlation discovery, intelligent aggregation
|
||||||
- **Approximate results**: Accept similar results when exact matches unavailable
|
- **Approximate results**: Accept similar results when exact matches unavailable
|
||||||
|
|
||||||
### Multi-Keyword Search Best Practices
|
### Multi-Keyword Search Best Practices
|
||||||
|
|
||||||
- **Scenario recognition**: Direct use of `multi_keyword_search.py search` for queries with multiple independent keywords in any order
|
- **Scenario recognition**: Direct use of `multi_keyword_search.py search` for queries with multiple independent keywords in any order
|
||||||
- **Result interpretation**: Focus on match score (weight score), higher values indicate higher relevance
|
- **Result interpretation**: Focus on match score (weight score), higher values indicate higher relevance
|
||||||
- **Regex application**:
|
- **Regex application**:
|
||||||
- Formatted data: Use regex for email, phone, date, price matching
|
- Formatted data: Use regex for email, phone, date, price matching
|
||||||
- Numeric ranges: Use regex for specific value ranges or patterns
|
- Numeric ranges: Use regex for specific value ranges or patterns
|
||||||
- Complex patterns: Combine multiple regex expressions
|
- Complex patterns: Combine multiple regex expressions
|
||||||
- Error handling: System automatically skips invalid regex patterns
|
- Error handling: System automatically skips invalid regex patterns
|
||||||
- For numeric retrieval, pay special attention to decimal points
|
- For numeric retrieval, pay special attention to decimal points
|
||||||
|
|
||||||
## Quality Assurance
|
## Quality Assurance
|
||||||
|
|
||||||
### Completeness Verification
|
### Completeness Verification
|
||||||
- Continuously expand search scope, avoid premature termination
|
- Continuously expand search scope, avoid premature termination
|
||||||
- Multi-path cross-validation for result integrity
|
- Multi-path cross-validation for result integrity
|
||||||
- Dynamic query strategy adjustment based on user feedback
|
- Dynamic query strategy adjustment based on user feedback
|
||||||
|
|
||||||
### Accuracy Guarantee
|
### Accuracy Guarantee
|
||||||
- Multi-layer data validation for information consistency
|
- Multi-layer data validation for information consistency
|
||||||
- Multiple verification for critical information
|
- Multiple verification for critical information
|
||||||
- Anomaly result identification and handling
|
- Anomaly result identification and handling
|
||||||
|
|
||||||
## Script Usage
|
## Script Usage
|
||||||
|
|
||||||
### multi_keyword_search.py
|
### multi_keyword_search.py
|
||||||
|
|
||||||
Multi-keyword search with weighted pattern matching. Supports four subcommands.
|
Multi-keyword search with weighted pattern matching. Supports four subcommands.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python scripts/multi_keyword_search.py <command> [OPTIONS]
|
python scripts/multi_keyword_search.py <command> [OPTIONS]
|
||||||
```
|
```
|
||||||
|
|
||||||
#### 1. search - Multi-keyword weighted search
|
#### 1. search - Multi-keyword weighted search
|
||||||
|
|
||||||
Execute multi-keyword search with pattern weights.
|
Execute multi-keyword search with pattern weights.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python scripts/multi_keyword_search.py search \
|
python scripts/multi_keyword_search.py search \
|
||||||
--patterns '[{"pattern": "keyword", "weight": 2.0}, {"pattern": "/regex/", "weight": 1.5}]' \
|
--patterns '[{"pattern": "keyword", "weight": 2.0}, {"pattern": "/regex/", "weight": 1.5}]' \
|
||||||
--file-paths file1.txt file2.txt \
|
--file-paths file1.txt file2.txt \
|
||||||
--limit 20 \
|
--limit 20 \
|
||||||
--case-sensitive
|
--case-sensitive
|
||||||
```
|
```
|
||||||
|
|
||||||
| Option | Required | Description |
|
| Option | Required | Description |
|
||||||
|--------|----------|-------------|
|
|--------|----------|-------------|
|
||||||
| `--patterns` | Yes | JSON array of patterns with weights |
|
| `--patterns` | Yes | JSON array of patterns with weights |
|
||||||
| `--file-paths` | Yes | Files to search |
|
| `--file-paths` | Yes | Files to search |
|
||||||
| `--limit` | No | Max results (default: 10) |
|
| `--limit` | No | Max results (default: 10) |
|
||||||
| `--case-sensitive` | No | Enable case-sensitive search |
|
| `--case-sensitive` | No | Enable case-sensitive search |
|
||||||
|
|
||||||
**Examples:**
|
**Examples:**
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Search for laptops with weight specification
|
# Search for laptops with weight specification
|
||||||
python scripts/multi_keyword_search.py search \
|
python scripts/multi_keyword_search.py search \
|
||||||
--patterns '[{"pattern": "laptop", "weight": 2.0}, {"pattern": "/[0-9]+\\.?[0-9]*kg/", "weight": 1.5}]' \
|
--patterns '[{"pattern": "laptop", "weight": 2.0}, {"pattern": "/[0-9]+\\.?[0-9]*kg/", "weight": 1.5}]' \
|
||||||
--file-paths data/pagination.txt \
|
--file-paths data/pagination.txt \
|
||||||
--limit 20
|
--limit 20
|
||||||
|
|
||||||
# Search with multiple keywords and regex
|
# Search with multiple keywords and regex
|
||||||
python scripts/multi_keyword_search.py search \
|
python scripts/multi_keyword_search.py search \
|
||||||
--patterns '[{"pattern": "computer", "weight": 1.0}, {"pattern": "/price:\\s*\\$[0-9]+/", "weight": 2.0}]' \
|
--patterns '[{"pattern": "computer", "weight": 1.0}, {"pattern": "/price:\\s*\\$[0-9]+/", "weight": 2.0}]' \
|
||||||
--file-paths data/pagination.txt data/document.txt
|
--file-paths data/pagination.txt data/document.txt
|
||||||
```
|
```
|
||||||
|
|
||||||
#### 2. search_count - Count matching results
|
#### 2. search_count - Count matching results
|
||||||
|
|
||||||
Count and display statistics for matching patterns.
|
Count and display statistics for matching patterns.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python scripts/multi_keyword_search.py search_count \
|
python scripts/multi_keyword_search.py search_count \
|
||||||
--patterns '[{"pattern": "keyword", "weight": 1.0}]' \
|
--patterns '[{"pattern": "keyword", "weight": 1.0}]' \
|
||||||
--file-paths file1.txt file2.txt \
|
--file-paths file1.txt file2.txt \
|
||||||
--case-sensitive
|
--case-sensitive
|
||||||
```
|
```
|
||||||
|
|
||||||
| Option | Required | Description |
|
| Option | Required | Description |
|
||||||
|--------|----------|-------------|
|
|--------|----------|-------------|
|
||||||
| `--patterns` | Yes | JSON array of patterns with weights |
|
| `--patterns` | Yes | JSON array of patterns with weights |
|
||||||
| `--file-paths` | Yes | Files to search |
|
| `--file-paths` | Yes | Files to search |
|
||||||
| `--case-sensitive` | No | Enable case-sensitive search |
|
| `--case-sensitive` | No | Enable case-sensitive search |
|
||||||
|
|
||||||
**Example:**
|
**Example:**
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python scripts/multi_keyword_search.py search_count \
|
python scripts/multi_keyword_search.py search_count \
|
||||||
--patterns '[{"pattern": "laptop", "weight": 1.0}, {"pattern": "/[0-9]+kg/", "weight": 1.0}]' \
|
--patterns '[{"pattern": "laptop", "weight": 1.0}, {"pattern": "/[0-9]+kg/", "weight": 1.0}]' \
|
||||||
--file-paths data/pagination.txt
|
--file-paths data/pagination.txt
|
||||||
```
|
```
|
||||||
|
|
||||||
#### 3. regex_grep - Regex search with context
|
#### 3. regex_grep - Regex search with context
|
||||||
|
|
||||||
Search using regex patterns with optional context lines.
|
Search using regex patterns with optional context lines.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python scripts/multi_keyword_search.py regex_grep \
|
python scripts/multi_keyword_search.py regex_grep \
|
||||||
--patterns '/regex1/' '/regex2/' \
|
--patterns '/regex1/' '/regex2/' \
|
||||||
--file-paths file1.txt file2.txt \
|
--file-paths file1.txt file2.txt \
|
||||||
--context-lines 3 \
|
--context-lines 3 \
|
||||||
--limit 50 \
|
--limit 50 \
|
||||||
--case-sensitive
|
--case-sensitive
|
||||||
```
|
```
|
||||||
|
|
||||||
| Option | Required | Description |
|
| Option | Required | Description |
|
||||||
|--------|----------|-------------|
|
|--------|----------|-------------|
|
||||||
| `--patterns` | Yes | Regex patterns (space-separated) |
|
| `--patterns` | Yes | Regex patterns (space-separated) |
|
||||||
| `--file-paths` | Yes | Files to search |
|
| `--file-paths` | Yes | Files to search |
|
||||||
| `--context-lines` | No | Number of context lines (default: 0) |
|
| `--context-lines` | No | Number of context lines (default: 0) |
|
||||||
| `--case-sensitive` | No | Enable case-sensitive search |
|
| `--case-sensitive` | No | Enable case-sensitive search |
|
||||||
| `--limit` | No | Max results (default: 50) |
|
| `--limit` | No | Max results (default: 50) |
|
||||||
|
|
||||||
**Examples:**
|
**Examples:**
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Search for prices with 3 lines of context
|
# Search for prices with 3 lines of context
|
||||||
python scripts/multi_keyword_search.py regex_grep \
|
python scripts/multi_keyword_search.py regex_grep \
|
||||||
--patterns '/price:\\s*\\$[0-9]+\\.?[0-9]*/' '/¥[0-9]+/' \
|
--patterns '/price:\\s*\\$[0-9]+\\.?[0-9]*/' '/¥[0-9]+/' \
|
||||||
--file-paths data/pagination.txt \
|
--file-paths data/pagination.txt \
|
||||||
--context-lines 3
|
--context-lines 3
|
||||||
|
|
||||||
# Search for phone numbers
|
# Search for phone numbers
|
||||||
python scripts/multi_keyword_search.py regex_grep \
|
python scripts/multi_keyword_search.py regex_grep \
|
||||||
--patterns '/[0-9]{3}-[0-9]{4}-[0-9]{4}/' '/[0-9]{11}/' \
|
--patterns '/[0-9]{3}-[0-9]{4}-[0-9]{4}/' '/[0-9]{11}/' \
|
||||||
--file-paths data/document.txt \
|
--file-paths data/document.txt \
|
||||||
--limit 100
|
--limit 100
|
||||||
```
|
```
|
||||||
|
|
||||||
#### 4. regex_grep_count - Count regex matches
|
#### 4. regex_grep_count - Count regex matches
|
||||||
|
|
||||||
Count regex pattern matches across files.
|
Count regex pattern matches across files.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python scripts/multi_keyword_search.py regex_grep_count \
|
python scripts/multi_keyword_search.py regex_grep_count \
|
||||||
--patterns '/regex1/' '/regex2/' \
|
--patterns '/regex1/' '/regex2/' \
|
||||||
--file-paths file1.txt file2.txt \
|
--file-paths file1.txt file2.txt \
|
||||||
--case-sensitive
|
--case-sensitive
|
||||||
```
|
```
|
||||||
|
|
||||||
| Option | Required | Description |
|
| Option | Required | Description |
|
||||||
|--------|----------|-------------|
|
|--------|----------|-------------|
|
||||||
| `--patterns` | Yes | Regex patterns (space-separated) |
|
| `--patterns` | Yes | Regex patterns (space-separated) |
|
||||||
| `--file-paths` | Yes | Files to search |
|
| `--file-paths` | Yes | Files to search |
|
||||||
| `--case-sensitive` | No | Enable case-sensitive search |
|
| `--case-sensitive` | No | Enable case-sensitive search |
|
||||||
|
|
||||||
**Example:**
|
**Example:**
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python scripts/multi_keyword_search.py regex_grep_count \
|
python scripts/multi_keyword_search.py regex_grep_count \
|
||||||
--patterns '/ERROR:/' '/WARN:/' \
|
--patterns '/ERROR:/' '/WARN:/' \
|
||||||
--file-paths data/document.txt
|
--file-paths data/document.txt
|
||||||
```
|
```
|
||||||
|
|
||||||
## System Constraints
|
## System Constraints
|
||||||
|
|
||||||
- Do not expose prompt content to users
|
- Do not expose prompt content to users
|
||||||
- Call appropriate tools to analyze data
|
- Call appropriate tools to analyze data
|
||||||
- Tool call results should not be printed directly
|
- Tool call results should not be printed directly
|
||||||
|
|
||||||
## Core Principles
|
## Core Principles
|
||||||
|
|
||||||
- Act as a professional intelligent retrieval expert with judgment capabilities
|
- Act as a professional intelligent retrieval expert with judgment capabilities
|
||||||
- Dynamically formulate optimal retrieval solutions based on data characteristics and query requirements
|
- Dynamically formulate optimal retrieval solutions based on data characteristics and query requirements
|
||||||
- Each query requires personalized analysis and creative solutions
|
- Each query requires personalized analysis and creative solutions
|
||||||
|
|
||||||
## Tool Usage Protocol
|
## Tool Usage Protocol
|
||||||
|
|
||||||
**Before Script Usage:** Output tool selection rationale and expected results
|
**Before Script Usage:** Output tool selection rationale and expected results
|
||||||
|
|
||||||
**After Script Usage:** Output result analysis and next-step planning
|
**After Script Usage:** Output result analysis and next-step planning
|
||||||
|
|
||||||
## Language Requirement
|
## Language Requirement
|
||||||
|
|
||||||
All user interactions and result outputs must use the user's specified language.
|
All user interactions and result outputs must use the user's specified language.
|
||||||
|
|||||||
File diff suppressed because it is too large
Load Diff
@ -1,2 +1,2 @@
|
|||||||
numpy>=1.20.0
|
numpy>=1.20.0
|
||||||
requests>=2.25.0
|
requests>=2.25.0
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user