fix(skills): improve skill extraction and handling logic
- Refactor _extract_skills_to_robot to accept bot_id instead of robot_dir - Add multi-directory skill search with priority order - Switch from zip extraction to direct directory copying - Add rag-retrieve skill directory
This commit is contained in:
parent
92c82c24a4
commit
f74f09c191
147
skills/rag-retrieve/SKILL.md
Normal file
147
skills/rag-retrieve/SKILL.md
Normal file
@ -0,0 +1,147 @@
|
|||||||
|
---
|
||||||
|
name: rag-retrieve
|
||||||
|
description: RAG retrieval skill for querying and retrieving relevant documents from knowledge base. Use this skill when users need to search documentation, retrieve knowledge base articles, or get context from a vector database. Supports semantic search with configurable top-k results.
|
||||||
|
---
|
||||||
|
|
||||||
|
# RAG Retrieve
|
||||||
|
|
||||||
|
## Skill Structure
|
||||||
|
|
||||||
|
This is a **self-contained skill package** that can be distributed independently. The skill includes its own scripts and configuration:
|
||||||
|
|
||||||
|
```
|
||||||
|
rag-retrieve/
|
||||||
|
├── SKILL.md # Core instruction file (this file)
|
||||||
|
├── skill.yaml # Skill metadata
|
||||||
|
├── scripts/ # Executable scripts
|
||||||
|
│ └── rag_retrieve.py # Main RAG retrieval script
|
||||||
|
```
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Query and retrieve relevant documents from a RAG (Retrieval-Augmented Generation) knowledge base using vector search. This skill provides semantic search capabilities with support for multiple bot instances and configurable result limits.
|
||||||
|
|
||||||
|
## Required Parameters
|
||||||
|
|
||||||
|
Before executing any retrieval, you MUST confirm the following required parameters with the user if they are not explicitly provided:
|
||||||
|
|
||||||
|
| Parameter | Description | Type |
|
||||||
|
|-----------|-------------|------|
|
||||||
|
| **query** | Search query content | string |
|
||||||
|
|
||||||
|
### Optional Parameters
|
||||||
|
|
||||||
|
| Parameter | Description | Type | Default |
|
||||||
|
|-----------|-------------|------|---------|
|
||||||
|
| **top_k** | Maximum number of results | integer | 100 |
|
||||||
|
|
||||||
|
### Confirmation Template
|
||||||
|
|
||||||
|
When the required parameter is missing, ask the user:
|
||||||
|
|
||||||
|
```
|
||||||
|
I need some information to perform the RAG retrieval:
|
||||||
|
|
||||||
|
1. Query: What would you like to search for?
|
||||||
|
```
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
Use the `scripts/rag_retrieve.py` script to execute RAG queries:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
scripts/rag_retrieve.py --query "your search query"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Usage Examples
|
||||||
|
|
||||||
|
### Basic Query
|
||||||
|
|
||||||
|
```bash
|
||||||
|
scripts/rag_retrieve.py --query "How to configure authentication?"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Search with Specific Top-K
|
||||||
|
|
||||||
|
```bash
|
||||||
|
scripts/rag_retrieve.py --query "API error handling" --top-k 50
|
||||||
|
```
|
||||||
|
|
||||||
|
### Common Use Cases
|
||||||
|
|
||||||
|
**Scenario 1: Documentation Search**
|
||||||
|
```bash
|
||||||
|
scripts/rag_retrieve.py --query "deployment guide"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Scenario 2: Troubleshooting**
|
||||||
|
```bash
|
||||||
|
scripts/rag_retrieve.py --query "connection timeout error"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Scenario 3: Feature Information**
|
||||||
|
```bash
|
||||||
|
scripts/rag_retrieve.py --query "enterprise pricing plans"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Script Usage
|
||||||
|
|
||||||
|
### rag_retrieve.py
|
||||||
|
|
||||||
|
Main script for executing RAG retrieval queries.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
scripts/rag_retrieve.py [OPTIONS]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Options:**
|
||||||
|
|
||||||
|
| Option | Required | Description | Default |
|
||||||
|
|--------|----------|-------------|---------|
|
||||||
|
| `--query`, `-q` | Yes | Search query content | - |
|
||||||
|
| `--top-k`, `-k` | No | Maximum number of results | 100 |
|
||||||
|
|
||||||
|
**Examples:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Basic query
|
||||||
|
scripts/rag_retrieve.py --query "authentication setup"
|
||||||
|
|
||||||
|
# Custom top-k
|
||||||
|
scripts/rag_retrieve.py --query "API reference" --top-k 20
|
||||||
|
```
|
||||||
|
|
||||||
|
## Common Workflows
|
||||||
|
|
||||||
|
### Research Mode: Comprehensive Search
|
||||||
|
|
||||||
|
```bash
|
||||||
|
scripts/rag_retrieve.py --query "machine learning algorithms" --top-k 100
|
||||||
|
```
|
||||||
|
|
||||||
|
### Quick Answer Mode: Focused Search
|
||||||
|
|
||||||
|
```bash
|
||||||
|
scripts/rag_retrieve.py --query "password reset" --top-k 10
|
||||||
|
```
|
||||||
|
|
||||||
|
### Comparison Mode: Multiple Queries
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Search for related topics
|
||||||
|
scripts/rag_retrieve.py --query "REST API" --top-k 30
|
||||||
|
scripts/rag_retrieve.py --query "GraphQL API" --top-k 30
|
||||||
|
```
|
||||||
|
|
||||||
|
## Resources
|
||||||
|
|
||||||
|
### scripts/rag_retrieve.py
|
||||||
|
|
||||||
|
Executable Python script for RAG retrieval. Handles:
|
||||||
|
- HTTP requests to RAG API
|
||||||
|
- Authentication token generation
|
||||||
|
- Configuration file loading
|
||||||
|
- Error handling and reporting
|
||||||
|
- Markdown response parsing
|
||||||
|
|
||||||
|
The script can be executed directly without loading into context.
|
||||||
144
skills/rag-retrieve/scripts/rag_retrieve.py
Normal file
144
skills/rag-retrieve/scripts/rag_retrieve.py
Normal file
@ -0,0 +1,144 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
RAG检索脚本
|
||||||
|
调用本地RAG API进行文档检索
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import hashlib
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
try:
|
||||||
|
import requests
|
||||||
|
except ImportError:
|
||||||
|
print("Error: requests module is required. Please install it with: pip install requests")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
|
||||||
|
# 默认配置
|
||||||
|
DEFAULT_BACKEND_HOST = os.getenv("BACKEND_HOST", "https://api-dev.gptbase.ai")
|
||||||
|
DEFAULT_MASTERKEY = os.getenv("MASTERKEY", "master")
|
||||||
|
|
||||||
|
|
||||||
|
def load_config() -> dict:
|
||||||
|
"""
|
||||||
|
从项目根目录的robot_config.json加载配置
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
dict: 配置字典
|
||||||
|
"""
|
||||||
|
print(os.path.dirname(__file__))
|
||||||
|
config_path = os.path.join(os.path.dirname(__file__), '..', '..', '..', 'robot_config.json')
|
||||||
|
|
||||||
|
if os.path.exists(config_path):
|
||||||
|
try:
|
||||||
|
with open(config_path, 'r', encoding='utf-8') as f:
|
||||||
|
return json.load(f)
|
||||||
|
except (json.JSONDecodeError, IOError) as e:
|
||||||
|
print(f"Warning: Failed to load config file: {e}", file=sys.stderr)
|
||||||
|
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
def rag_retrieve(query: str, top_k: int = 100, config: dict = None) -> str:
|
||||||
|
"""
|
||||||
|
调用RAG检索API
|
||||||
|
|
||||||
|
Args:
|
||||||
|
bot_id: Bot标识符(如果为None则从config读取)
|
||||||
|
query: 检索查询内容
|
||||||
|
top_k: 返回结果数量
|
||||||
|
config: 配置字典(可选)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
str: markdown格式的检索结果
|
||||||
|
"""
|
||||||
|
if config is None:
|
||||||
|
config = {}
|
||||||
|
|
||||||
|
# 从config.env读取配置,如果没有则使用默认值
|
||||||
|
host =DEFAULT_BACKEND_HOST
|
||||||
|
masterkey = DEFAULT_MASTERKEY
|
||||||
|
|
||||||
|
bot_id = config.get('bot_id')
|
||||||
|
|
||||||
|
if not bot_id:
|
||||||
|
return "Error: bot_id is required"
|
||||||
|
|
||||||
|
if not query:
|
||||||
|
return "Error: query is required"
|
||||||
|
|
||||||
|
url = f"{host}/v1/rag_retrieve/{bot_id}"
|
||||||
|
|
||||||
|
# 生成认证token
|
||||||
|
token_input = f"{masterkey}:{bot_id}"
|
||||||
|
auth_token = hashlib.md5(token_input.encode()).hexdigest()
|
||||||
|
|
||||||
|
headers = {
|
||||||
|
"content-type": "application/json",
|
||||||
|
"authorization": f"Bearer {auth_token}"
|
||||||
|
}
|
||||||
|
data = {
|
||||||
|
"query": query,
|
||||||
|
"top_k": top_k
|
||||||
|
}
|
||||||
|
|
||||||
|
try:
|
||||||
|
response = requests.post(url, json=data, headers=headers, timeout=30)
|
||||||
|
|
||||||
|
if response.status_code != 200:
|
||||||
|
return f"Error: RAG API returned status code {response.status_code}. Response: {response.text}"
|
||||||
|
|
||||||
|
try:
|
||||||
|
response_data = response.json()
|
||||||
|
except json.JSONDecodeError as e:
|
||||||
|
return f"Error: Failed to parse API response as JSON. Error: {str(e)}, Raw response: {response.text}"
|
||||||
|
|
||||||
|
# 提取markdown字段
|
||||||
|
if "markdown" in response_data:
|
||||||
|
return response_data["markdown"]
|
||||||
|
else:
|
||||||
|
return f"Error: 'markdown' field not found in API response. Response: {json.dumps(response_data, indent=2, ensure_ascii=False)}"
|
||||||
|
|
||||||
|
except requests.exceptions.RequestException as e:
|
||||||
|
return f"Error: Failed to connect to RAG API. {str(e)}"
|
||||||
|
except Exception as e:
|
||||||
|
return f"Error: {str(e)}"
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="RAG检索工具 - 从知识库中检索相关文档"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--query",
|
||||||
|
"-q",
|
||||||
|
required=True,
|
||||||
|
help="检索查询内容"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--top-k",
|
||||||
|
"-k",
|
||||||
|
type=int,
|
||||||
|
default=100,
|
||||||
|
help="返回结果数量(默认:100)"
|
||||||
|
)
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
# 加载配置
|
||||||
|
config = load_config()
|
||||||
|
|
||||||
|
result = rag_retrieve(
|
||||||
|
query=args.query,
|
||||||
|
top_k=args.top_k,
|
||||||
|
config=config
|
||||||
|
)
|
||||||
|
|
||||||
|
print(result)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
26
skills/rag-retrieve/skill.yaml
Normal file
26
skills/rag-retrieve/skill.yaml
Normal file
@ -0,0 +1,26 @@
|
|||||||
|
name: rag-retrieve
|
||||||
|
version: 1.0.0
|
||||||
|
description: RAG retrieval skill for querying and retrieving relevant documents from knowledge base using vector search
|
||||||
|
author:
|
||||||
|
name: sparticle
|
||||||
|
email: support@gbase.ai
|
||||||
|
license: MIT
|
||||||
|
tags:
|
||||||
|
- rag
|
||||||
|
- retrieval
|
||||||
|
- vector-search
|
||||||
|
- knowledge-base
|
||||||
|
runtime:
|
||||||
|
python: ">=3.7"
|
||||||
|
dependencies:
|
||||||
|
- requests
|
||||||
|
entry_point: scripts/rag_retrieve.py
|
||||||
|
config:
|
||||||
|
query:
|
||||||
|
type: string
|
||||||
|
required: true
|
||||||
|
description: Search query content
|
||||||
|
top_k:
|
||||||
|
type: integer
|
||||||
|
default: 100
|
||||||
|
description: Maximum number of results
|
||||||
@ -407,7 +407,7 @@ def create_robot_project(dataset_ids: List[str], bot_id: str, force_rebuild: boo
|
|||||||
logger.info(f"Using existing robot project: {robot_dir}")
|
logger.info(f"Using existing robot project: {robot_dir}")
|
||||||
# 即使使用现有项目,也要处理 skills(如果提供了)
|
# 即使使用现有项目,也要处理 skills(如果提供了)
|
||||||
if skills:
|
if skills:
|
||||||
_extract_skills_to_robot(robot_dir, skills, project_path)
|
_extract_skills_to_robot(bot_id, skills, project_path)
|
||||||
return str(robot_dir)
|
return str(robot_dir)
|
||||||
|
|
||||||
# 创建机器人目录结构
|
# 创建机器人目录结构
|
||||||
@ -479,7 +479,7 @@ def create_robot_project(dataset_ids: List[str], bot_id: str, force_rebuild: boo
|
|||||||
|
|
||||||
# 处理 skills 解压
|
# 处理 skills 解压
|
||||||
if skills:
|
if skills:
|
||||||
_extract_skills_to_robot(robot_dir, skills, project_path)
|
_extract_skills_to_robot(bot_id, skills, project_path)
|
||||||
|
|
||||||
return str(robot_dir)
|
return str(robot_dir)
|
||||||
|
|
||||||
@ -493,52 +493,61 @@ if __name__ == "__main__":
|
|||||||
logger.info(f"Created robot project at: {robot_dir}")
|
logger.info(f"Created robot project at: {robot_dir}")
|
||||||
|
|
||||||
|
|
||||||
def _extract_skills_to_robot(robot_dir: Path, skills: List[str], project_path: Path) -> None:
|
def _extract_skills_to_robot(bot_id: str, skills: List[str], project_path: Path) -> None:
|
||||||
"""
|
"""
|
||||||
解压 skills 到 robot 项目的 skills 文件夹
|
复制 skills 到 robot 项目的 skills 文件夹
|
||||||
|
- 如果是完整路径(如 "projects/uploads/xxx/skills/rag-retrieve_2.zip"),直接使用该路径
|
||||||
|
- 如果是简单名称(如 "rag-retrieve"),从以下目录按优先级顺序查找:
|
||||||
|
1. projects/uploads/{bot_id}/skills/
|
||||||
|
2. skills/
|
||||||
|
|
||||||
|
搜索目录优先级:先搜索 projects/uploads/{bot_id}/skills/,再搜索 skills/
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
robot_dir: 机器人项目目录
|
bot_id: 机器人 ID
|
||||||
skills: 技能文件名列表(如 ["rag-retrieve", "device_controller.zip"])
|
skills: 技能文件名列表(如 ["rag-retrieve", "projects/uploads/{bot_id}/skills/rag-retrieve"])
|
||||||
project_path: 项目路径
|
project_path: 项目路径
|
||||||
"""
|
"""
|
||||||
import zipfile
|
import zipfile
|
||||||
|
|
||||||
# skills 源目录在 projects/skills,需要通过解析软链接获取正确路径
|
# skills 源目录(按优先级顺序)
|
||||||
# project_path 可能是 ~/.deepagents (软链接 -> projects/robot)
|
skills_source_dirs = [
|
||||||
# 所以 skills 源目录是 project_path.resolve().parent / "skills"
|
project_path / "uploads" / bot_id / "skills",
|
||||||
skills_source_dir = project_path / "skills"
|
Path("skills"),
|
||||||
skills_target_dir = robot_dir / "skills"
|
]
|
||||||
|
skills_target_dir = project_path / "robot" / bot_id / "skills"
|
||||||
|
|
||||||
# 先清空 skills_target_dir,然后重新解压
|
# 先清空 skills_target_dir,然后重新复制
|
||||||
if skills_target_dir.exists():
|
if skills_target_dir.exists():
|
||||||
logger.info(f" Removing existing skills directory: {skills_target_dir}")
|
logger.info(f" Removing existing skills directory: {skills_target_dir}")
|
||||||
shutil.rmtree(skills_target_dir)
|
shutil.rmtree(skills_target_dir)
|
||||||
|
|
||||||
skills_target_dir.mkdir(parents=True, exist_ok=True)
|
skills_target_dir.mkdir(parents=True, exist_ok=True)
|
||||||
logger.info(f"Extracting skills to {skills_target_dir}")
|
logger.info(f"Copying skills to {skills_target_dir}")
|
||||||
|
|
||||||
for skill in skills:
|
for skill in skills:
|
||||||
# 规范化文件名(确保有 .zip 后缀)
|
source_dir = None
|
||||||
if not skill.endswith(".zip"):
|
|
||||||
skill_file = skill + ".zip"
|
|
||||||
else:
|
|
||||||
skill_file = skill
|
|
||||||
|
|
||||||
skill_source_path = skills_source_dir / skill_file
|
# 简单名称:按优先级顺序在多个目录中查找
|
||||||
|
for base_dir in skills_source_dirs:
|
||||||
|
candidate_dir = base_dir / skill
|
||||||
|
if candidate_dir.exists():
|
||||||
|
source_dir = candidate_dir
|
||||||
|
logger.info(f" Found skill '{skill}' in {base_dir}")
|
||||||
|
break
|
||||||
|
|
||||||
if not skill_source_path.exists():
|
if source_dir is None:
|
||||||
logger.warning(f" Skill file not found: {skill_source_path}")
|
logger.warning(f" Skill directory '{skill}' not found in any source directory: {[str(d) for d in skills_source_dirs]}")
|
||||||
continue
|
continue
|
||||||
|
|
||||||
# 获取解压后的文件夹名称(去掉 .zip 后缀)
|
if not source_dir.exists():
|
||||||
folder_name = skill_file.replace(".zip", "")
|
logger.warning(f" Skill directory not found: {source_dir}")
|
||||||
extract_target = skills_target_dir / folder_name
|
continue
|
||||||
|
|
||||||
|
target_dir = skills_target_dir / os.path.basename(skill)
|
||||||
|
|
||||||
# 解压文件
|
|
||||||
try:
|
try:
|
||||||
with zipfile.ZipFile(skill_source_path, 'r') as zip_ref:
|
shutil.copytree(source_dir, target_dir)
|
||||||
zip_ref.extractall(extract_target)
|
logger.info(f" Copied: {source_dir} -> {target_dir}")
|
||||||
logger.info(f" Extracted: {skill_file} -> {extract_target}")
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error(f" Failed to extract {skill_file}: {e}")
|
logger.error(f" Failed to copy {source_dir}: {e}")
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user