update _get_cache_key

This commit is contained in:
朱潮 2025-11-07 10:19:20 +08:00
parent bff5817520
commit 764a723023
2 changed files with 202 additions and 101 deletions

256
README.md
View File

@ -59,23 +59,36 @@ docker-compose up -d
---
## 📖 使用指南
# Catalog Agent API 文档
公网:
dev: https://catalog-agent-dev.gbase.ai
prod: https://catalog-agent.gbase.ai
## 概述
本文档提供 Catalog Agent 服务的 API 接口说明,支持多语言对话、文件上传和解析功能。
内网:
prod http://catalog-agent.default.svc.cluster.local
dev http://catalog-agent.gbase-dev.svc.cluster.local
## 服务地址
### 1. 聊天接口 (OpenAI 兼容)
### 公网地址
- **开发环境**: `https://catalog-agent-dev.gbase.ai`
- **生产环境**: `https://catalog-agent.gbase.ai`
### 内网地址
- **生产环境**: `http://catalog-agent.default.svc.cluster.local`
- **开发环境**: `http://catalog-agent.gbase-dev.svc.cluster.local`
---
## 接口列表
### 1. 通用聊天接口 V1 (OpenAI 兼容)
**端点**: `POST /api/v1/chat/completions`
**认证方式**: Bearer Token (使用大语言模型的 API Key)
**请求示例**:
```bash
curl -X POST "{host}/api/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer {api_key}"
-H "Authorization: Bearer {api_key}" \
-d '{
"messages": [
{
@ -88,51 +101,188 @@ curl -X POST "{host}/api/v1/chat/completions" \
"robot_type": "catalog_agent",
"model": "gpt-4.1",
"model_server": "https://one-dev.felo.me/v1",
"unique_id": "1624be71-5432-40bf-9758-f4aecffd4e9c",
"bot_id": "f4aecffd4e9c-624be71-5432-40bf-9758",
"dataset_ids": ["624be71-5432-40bf-9758-f4aecffd4e9c"],
"tool_response": false
}'
}'
```
### 2. 异步文件处理队列
**请求参数说明**:
| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| messages | array | 是 | 对话消息列表 |
| stream | boolean | 是 | 是否启用流式输出 |
| language | string | 是 | 语言代码: zh/en/ja |
| robot_type | string | 是 | 固定值: catalog_agent |
| model | string | 是 | AI 模型名称 |
| model_server | string | 是 | AI 模型服务器地址 |
| bot_id | string | 是 | 机器人唯一标识 |
| dataset_ids | array | 是 | 知识库 ID 数组 |
| tool_response | boolean | 是 | 是否返回工具响应 |
#### 启动队列系统
---
### 2. GBase 聊天接口 V2
**端点**: `POST /api/v2/chat/completions`
**认证方式**: Bearer Token (使用 `md5(master:{bot_id})` 生成)
> 注意此接口的模型、服务器、API Key、数据集信息会自动从 GBase 读取,无需手动传入。
**请求示例**:
```bash
# 终端1启动队列消费者
poetry run python task_queue/consumer.py --workers 2
# 终端2启动API服务器
poetry run python fastapi_app.py
curl -X POST "{host}/api/v2/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer {md5(master:{bot_id})}" \
-d '{
"messages": [
{
"role": "user",
"content": "1kg未満のートPCを知りたいので表で出力してください"
}
],
"stream": true,
"language": "ja",
"bot_id": "f4aecffd4e9c-624be71-5432-40bf-9758",
"tool_response": false
}'
```
#### 提交异步任务
**请求参数说明**:
| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| messages | array | 是 | 对话消息列表 |
| stream | boolean | 是 | 是否启用流式输出 |
| language | string | 是 | 语言代码: zh/en/ja |
| bot_id | string | 是 | 机器人唯一标识 |
| tool_response | boolean | 是 | 是否返回工具响应 |
---
### 3. 文件上传
**端点**: `POST /api/v1/upload`
**请求方式**: 表单数据上传
**参数名**: `file`
**响应示例**:
```json
{
"success": true,
"message": "文件上传成功",
"filename": "12345678-1234-5678-9abc-123456789def.pdf",
"original_filename": "document.pdf",
"file_path": "projects/uploads/12345678-1234-5678-9abc-123456789def.pdf"
}
```
---
### 4. 文件解析
#### 4.1 全量文件解析
**端点**: `POST /api/v1/files/process/async`
**请求示例**:
```bash
curl -X POST "{host}/api/v1/files/process/async" \
-H "Content-Type: application/json" \
-d '{
"unique_id": "1624be71-5432-40bf-9758-f4aecffd4e9c",
"dataset_id": "624be71-5432-40bf-9758-f4aecffd4e9c",
"files": {
"group_name":{
"group_name1": [
"public/document.txt",
"public/data.zip",
"public/goods.xlsx"
}
],
"group_name2": [
"public/document.txt",
"public/data.zip",
"public/goods.xlsx"
]
}
}'
```
**响应**:
#### 4.2 增量文件解析
**端点**: `POST /api/v1/files/process/incremental`
**请求示例**:
```bash
curl -X POST "{host}/api/v1/files/process/incremental" \
-H "Content-Type: application/json" \
-d '{
"dataset_id": "624be71-5432-40bf-9758-f4aecffd4e9c",
"files_to_add": {
"group_name1": [
"projects/uploads/report_list/1090571652550791207.md"
],
"group_name2": [
"projects/uploads/report_list/1090570966043889684.md"
]
},
"files_to_remove": {
"group_name1": [
"projects/uploads/report_list/1090570712888283212.md"
],
"group_name2": [
"projects/uploads/report_list/1090570712888283214.md"
]
}
}'
```
**解析响应**:
```json
{
"success": true,
"task_id": "abc-123-def",
"unique_id": "my_project_123",
"dataset_id": "624be71-5432-40bf-9758-f4aecffd4e9c",
"task_status": "pending",
"estimated_processing_time": 30
}
```
---
### 5. 查询任务状态
**端点**: `GET /api/v1/task/{task_id}/status`
**请求示例**:
```bash
curl "{host}/api/v1/task/{task_id}/status"
```
**响应示例**:
```json
{
"success": true,
"task_id": "abc-123-def",
"status": "completed",
"dataset_id": "my_project_123",
"result": {
"status": "success",
"message": "成功处理了 2 个文档文件",
"processed_files": ["projects/my_project_123/dataset/docs/document.txt"]
}
}
```
---
## 注意事项
1. **认证方式**: V1 和 V2 接口使用不同的认证机制,请根据接口版本选择合适的认证方式
2. **必填参数**: 所有标记为"必填"的参数必须提供
3. **文件路径**: 文件解析接口中使用的文件路径应为文件上传接口返回的 `file_path`
4. **任务状态**: 文件解析为异步操作,需要通过任务状态接口查询处理进度
#### 项目目录结构
文件处理后,会在 `projects/{unique_id}/` 目录下生成以下结构:
@ -156,66 +306,6 @@ projects/{unique_id}/
- **分页数据层 (pagination.txt)**: 按页分割的数据每页5000字符便于检索
- **向量嵌入层 (embedding.pkl)**: 文档的语义向量,支持语义搜索
#### 查询任务状态
```bash
# 🎯 主要接口 - 只需要记住这一个
curl "{host}/api/v1/task/{task_id}/status"
```
**状态响应**:
```json
{
"success": true,
"task_id": "abc-123-def",
"status": "completed",
"unique_id": "my_project_123",
"result": {
"status": "success",
"message": "成功处理了 2 个文档文件",
"processed_files": ["projects/my_project_123/dataset/docs/document.txt"]
}
}
```
### 3. Python 客户端示例
```python
import requests
import time
def submit_and_monitor_task():
# 1. 提交任务
response = requests.post(
"http://localhost:8001/api/v1/files/process/async",
json={
"unique_id": "my_project",
"files": {"docs": ["public/file.txt"]}
}
)
task_id = response.json()["task_id"]
print(f"任务已提交: {task_id}")
# 2. 监控任务状态
while True:
response = requests.get(f"http://localhost:8001/api/v1/task/{task_id}/status")
data = response.json()
status = data["status"]
print(f"任务状态: {status}")
if status == "completed":
print("🎉 任务完成!")
break
elif status == "failed":
print("❌ 任务失败!")
break
time.sleep(2)
submit_and_monitor_task()
```
### 4. 项目目录树接口

View File

@ -39,11 +39,17 @@ class FileLoadedAgentManager:
self.creation_times: Dict[str, float] = {} # 创建时间记录
self.max_cached_agents = max_cached_agents
def _get_cache_key(self, bot_id: str, system_prompt: str = None, mcp_settings: List[Dict] = None) -> str:
"""获取包含 bot_id、system_prompt 和 mcp_settings 的哈希值作为缓存键
def _get_cache_key(self, bot_id: str, model_name: str = None, api_key: str = None,
model_server: str = None, generate_cfg: Dict = None,
system_prompt: str = None, mcp_settings: List[Dict] = None) -> str:
"""获取包含所有相关参数的哈希值作为缓存键
Args:
bot_id: 机器人项目ID
model_name: 模型名称
api_key: API密钥
model_server: 模型服务器地址
generate_cfg: 生成配置
system_prompt: 系统提示词
mcp_settings: MCP设置列表
@ -53,6 +59,10 @@ class FileLoadedAgentManager:
# 构建包含所有相关参数的字符串
cache_data = {
'bot_id': bot_id,
'model_name': model_name or '',
'api_key': api_key or '',
'model_server': model_server or '',
'generate_cfg': json.dumps(generate_cfg or {}, sort_keys=True),
'system_prompt': system_prompt or '',
'mcp_settings': json.dumps(mcp_settings or [], sort_keys=True)
}
@ -124,7 +134,8 @@ class FileLoadedAgentManager:
final_system_prompt = load_system_prompt(project_dir, language, system_prompt, robot_type, bot_id)
final_mcp_settings = load_mcp_settings(project_dir, mcp_settings, bot_id, robot_type)
cache_key = self._get_cache_key(bot_id, final_system_prompt, final_mcp_settings)
cache_key = self._get_cache_key(bot_id, model_name, api_key, model_server,
generate_cfg, final_system_prompt, final_mcp_settings)
# 检查是否已存在该助手实例
if cache_key in self.agents: