Merge branch 'feature/enable_redis' into dev

# Conflicts: # poetry.lock
2026-06-08 19:43:27 +08:00 · 2026-06-08 19:43:27 +08:00 · fabb14c66a
commit fabb14c66a
parent 0983623a75 77079539c1
27 changed files with 165 additions and 4336 deletions
--- a/.features/skill/MEMORY.md
+++ b/.features/skill/MEMORY.md
@ -1,6 +1,7 @@
 # Skill 功能

 > 负责范围：技能包管理服务 - 核心实现
+> 最后更新：2026-05-26
 > 最后更新：2026-05-23
 > 最后更新：2026-04-20

@ -30,6 +31,10 @@ MCP UI 类 skill 已按 MCP Apps 模式改造：工具返回数据，静态 HTML

 ## 最近重要事项

+- [2026-05-26](changelog/2026-Q2.md): skill 引入 `category` 字段——`routes/skill_manager.py` 在 `SkillItem` / `SkillValidationResult` 增加 `category`，从 `plugin.json` 与 `SKILL.md` frontmatter 解析，official skill 默认 `"other"`、user skill 默认 `"custom"`；并通过 batch 给 common/developing/onprem/support 路径下大量 skill 元数据补 `category`，`data-dashboard` / `mcp-ui` 归类 `Interactive UI`（`203dcf4`, `3ada55a`, `9658588`）
+- [2026-05-26](changelog/2026-Q2.md): developing 分支大合并新增多个 skill：`ai-ppt-generator`（百度 AI PPT）、`nfc-medicine-lookup`（NFC 药品检索）、`ppt-outline`（PPT 大纲 / HTML 演示文稿）、`z-card-image`（配图 / 卡片图），同时 `skills/linggan/*` 系列 skill 经合并回归（`3ada55a`）
+- [2026-05-23](changelog/2026-Q2.md): 新增 MCP App 型 `skills/developing/ecommerce-storefront/`——含 `product-list` / `order-confirm` 两个 HTML App + 自带 `ecommerce_server.py` MCP server；同时落地 `docs/mcp-app-training.md`（约 1063 行）作为 MCP App 培训材料（`9d001c8`）
+- [2026-05-21](changelog/2026-Q2.md): Daytona 沙箱模式下 `init_agent` 在沙箱内写入 `BASH_ENV` 文件，注入 `ASSISTANT_ID` / `USER_IDENTIFIER` / `TRACE_ID` / `ENABLE_SELF_KNOWLEDGE` 与 `config.shell_env` 的 shell 环境变量（`776acc2`）
 - [2026-05-12](changelog/2026-Q2.md): 跨 6→10 个 skill 变体批量精修 `retrieval-policy*.md`，统一 onprem/support/autoload 各路径下的 policy 口径（`be96f24`, `7b4f03d`）
 - [2026-05-11](changelog/2026-Q2.md): 新增子 agent (SubAgent) 支持——skill 包通过 `agents/*.md` 暴露子 agent，由 `SubAgentMiddleware` 加载；附 `pmda-drug-info` skill 示例（`5b634bc`）
 - [2026-05-11](changelog/2026-Q2.md): `pmda-drug-info` 的 `pmda_server.py` 大改为 mock 实现（`a92096a`）
@ -77,6 +82,9 @@ MCP UI 类 skill 已按 MCP Apps 模式改造：工具返回数据，静态 HTML
 - ⚠️ **MCP `_meta.trace_id` 是全局 monkey-patch 注入**：`agent/mcp_trace_meta.patch_mcp_client_session_trace_meta()` 在 `get_tools_from_mcp()` 入口调用一次后，会把 `mcp.ClientSession.call_tool` 永久包装；仅对工具名在 `{"rag_retrieve", "table_rag_retrieve"}` 集合内的调用注入 `_meta.trace_id`，扩展白名单要直接改 `_TRACE_META_TOOL_NAMES` 常量。
 - ⚠️ **PrePrompt hook 内容位置由模板决定**：自 2026-04-23 起 hook 产出通过 `{hook_content}` 占位符注入 `prompt/system_prompt.md`，不再追加在 prompt 末尾；自定义模板必须包含 `{hook_content}` 占位符否则 hook 内容会丢失。
 - ⚠️ **`init_agent` 返回值已变 3 元素**：Daytona 改造后 `init_agent` 返回 `(agent, checkpointer, sandbox)`；调用方解构必须更新。
+- ⚠️ **skill `category` 默认值**：API 返回的 `SkillItem.category`——official skill fallback 为 `"other"`、user skill fallback 为 `"custom"`；前端做分类视图时需要同时识别这两个 sentinel，不要假设官方/用户 skill 用同一套缺省值。
+- ⚠️ **`category` 字段双入口**：同一 skill 可以同时在 `.claude-plugin/plugin.json` 和 `SKILL.md` frontmatter 写 `category`；`get_skill_metadata` 优先走 `parse_plugin_json`，若 skill 包没有 plugin.json 才回落到 `parse_skill_frontmatter`——两者写不一致时以 plugin.json 为准。
+- ⚠️ **Daytona shell_env 是文件注入而非 process env**：`init_agent` 通过 `cat > $REMOTE_BASH_ENV_PATH` 写入 `export VAR=...` 行，沙箱内必须由 shell（bash）的 `BASH_ENV` 加载才能生效；非 daytona 模式或不走 bash 启动的脚本拿不到这些变量。扩展注入项需直接改 `init_agent` 里的 `_shell_env` 字典。

 ## Skill 目录结构

--- a/.features/skill/changelog/2026-Q2.md
+++ b/.features/skill/changelog/2026-Q2.md
@ -4,6 +4,126 @@

 ---

+## 2026-05-26: skill `category` 字段全面接入
+
+**类型**：新功能
+
+**背景**：skill 数量越来越多（common / developing / onprem / support / linggan / autoload 各路径下数十个），列表 API 需要前端能按类别分组展示，元数据层面缺少 `category` 字段。
+
+**改动**：
+- `routes/skill_manager.py`：
+  - `SkillItem` model 新增 `category: str = "other"`。
+  - `SkillValidationResult` dataclass 新增可选 `category: Optional[str]`。
+  - `parse_plugin_json` 解析 `plugin_config.get('category')`；`parse_skill_frontmatter` 解析 frontmatter 的 `metadata.get('category')`。
+  - `get_official_skills` 中 fallback 为 `"other"`；`get_user_skills` 中 fallback 为 `"custom"`。
+  - `get_skill_metadata_legacy` 在 `category` 非空时写入返回 dict（保持向后兼容）。
+- 批量给 common / developing / onprem / support 多个 skill 的 `.claude-plugin/plugin.json` 与 `SKILL.md` frontmatter 添加 `category` 字段。
+- `data-dashboard` 与 `mcp-ui` 的 `category` 从 `"Data & Retrieval"` 修正为 `"Interactive UI"`（更贴切 MCP App 的渲染语义）。
+
+**根因**：N/A（新功能）
+
+**影响**：
+- `GET /api/v1/skill/list` 返回项现在包含 `category` 字段；前端可按 category 维度做分组/筛选。
+- skill 元数据约定扩展——新 skill 应在 plugin.json 或 SKILL.md frontmatter 中写明 `category`，否则会落到 `"other"` / `"custom"` 兜底。
+- `plugin.json.category` 与 `SKILL.md.category` 同时存在时以前者为准（`get_skill_metadata` 优先 plugin.json）。
+
+**相关文件**：
+- `routes/skill_manager.py`
+- `skills/common/data-dashboard/.claude-plugin/plugin.json`
+- `skills/common/mcp-ui/.claude-plugin/plugin.json`
+- 以及一批 `skills/{common,developing,onprem,support}/*/SKILL.md` 与 `.claude-plugin/plugin.json`
+
+**Commit/PR**：`203dcf4`, `3ada55a`, `9658588`
+
+---
+
+## 2026-05-26: developing 分支批量新增多类 skill
+
+**类型**：新功能
+
+**背景**：[待补充]——经 developing→staging 合并集中落地一批新 skill 与 linggan 系列 skill 回归。
+
+**改动**：
+- 新增 `skills/developing/ai-ppt-generator/`：调用百度 AI 生成 PPT，按 topic 自动选模板（商务/科技/教育/创意/中国风等）；`category: Document Processing`。
+- 新增 `skills/developing/nfc-medicine-lookup/`：通过 NFC 芯片 ID 或药品名称查询药品信息，面向老年用户的语音助手交互口径；`category: Developer Tools`。
+- 新增 `skills/developing/ppt-outline/`：PPT 大纲与独立 HTML 演示文稿生成（dark/light/tech/minimal 四种风格）；`category: Document Processing`。
+- 新增 `skills/developing/z-card-image/`：生成配图、封面图、卡片图、社媒帖子分享图等；依赖 `python3` + `google-chrome`。
+- `skills/developing/static-hosting/SKILL.md` 由 1 行说明扩展为完整 80 行 skill；同时一批已有 SKILL.md / plugin.json 补 `category`。
+- `skills/linggan/*` 系列 skill（baidu-search / bot-self-modifier / caiyun-weather / competitor-news-intel / contract-document-generator / financial-report-generator / market-academic-insight / ragflow-loader / sales-decision-report / seedream / static-hosting / static-site-deploy / voice-notification / weather-china）经合并回归 staging。
+
+**根因**：N/A
+
+**影响**：
+- developing skill 池扩张约 5 个新业务 skill；linggan 系列重新出现在 staging。
+- 新 skill 多为 SKILL.md 型业务 skill，符合"workflow + 模板"的纯 markdown 模式；其中 `ai-ppt-generator`、`z-card-image` 依赖外部 `BAIDU_API_KEY` 或 `google-chrome` 二进制。
+
+**相关文件**：
+- `skills/developing/ai-ppt-generator/SKILL.md`
+- `skills/developing/nfc-medicine-lookup/SKILL.md`
+- `skills/developing/ppt-outline/SKILL.md`
+- `skills/developing/z-card-image/SKILL.md`
+- `skills/developing/static-hosting/SKILL.md`
+- `skills/linggan/**`（回归）
+
+**Commit/PR**：`3ada55a`
+
+---
+
+## 2026-05-23: 新增 ecommerce-storefront skill（MCP App 型）+ MCP App 培训文档
+
+**类型**：新功能
+
+**背景**：MCP App 模式（host 加载静态 HTML + postMessage 传数据）已经在 `mcp-ui`、`data-dashboard` 上跑通，需要一个面向电商场景的样例 skill，演示产品浏览 / 选购 / 下单确认这类多步交互的 App 渲染；同时沉淀一份 MCP App 开发指南。
+
+**改动**：
+- 新增 `skills/developing/ecommerce-storefront/`：
+  - `apps/product-list.html`（288 行）与 `apps/order-confirm.html`（233 行）两个静态 App。
+  - `ecommerce_server.py`（213 行）作为自带 MCP server，`ecommerce_tools.json` 定义工具 schema。
+  - `hooks/ecommerce_guide.md` + `hooks/pre_prompt.py` 注入 skill 使用指引到 system prompt。
+  - `mcp_common.py`（252 行）复用 MCP 通用工具基类。
+  - `.claude-plugin/plugin.json` 配置 PrePrompt hook 与 stdio MCP server，`category: Developer Tools`。
+- 新增 `docs/mcp-app-training.md`（约 1063 行）：MCP App 模式的开发培训材料。
+
+**根因**：N/A
+
+**影响**：
+- developing skill 池新增一个 MCP App 型 skill，体例对齐 `mcp-ui` / `data-dashboard`。
+- MCP App 开发者有完整培训材料可参考。
+
+**相关文件**：
+- `skills/developing/ecommerce-storefront/**`
+- `docs/mcp-app-training.md`
+
+**Commit/PR**：`9d001c8`
+
+---
+
+## 2026-05-21: Daytona 沙箱注入 shell_env 到 BASH_ENV
+
+**类型**：新功能
+
+**背景**：Daytona 沙箱内的 skill 脚本需要能读取 `ASSISTANT_ID` / `USER_IDENTIFIER` / `TRACE_ID` 等运行时上下文，但宿主 process env 无法直接透传到沙箱里。
+
+**改动**：
+- `agent/deep_assistant.py` `init_agent`：当 `sandbox is not None and sandbox_type == "daytona"` 时，组装 `_shell_env` 字典（`ASSISTANT_ID` / `USER_IDENTIFIER` / `TRACE_ID` / `ENABLE_SELF_KNOWLEDGE` 加上 `config.shell_env`），构造 `cd {REMOTE_WORKSPACE_ROOT}\n` + `export VAR="..."` 行，通过 `sandbox.execute("cat > $REMOTE_BASH_ENV_PATH << 'ENVEOF' ... ENVEOF")` 写入沙箱内。
+- `utils/daytona_sync.py` 提供常量 `REMOTE_BASH_ENV_PATH` / `REMOTE_WORKSPACE_ROOT`。
+- `AgentConfig` 增加 `shell_env: Optional[Dict[str, str]]`（调用方可追加自定义 env）。
+
+**根因**：N/A
+
+**影响**：
+- 沙箱内通过 bash 启动的 skill 脚本可以 `os.environ.get("ASSISTANT_ID")` 等读到运行时上下文。
+- 仅 daytona 沙箱模式生效；本地或非 bash 启动的进程不会收到 `BASH_ENV` 注入的变量。
+- 扩展注入项（新增固定环境变量）需要直接改 `init_agent` 里的 `_shell_env` 字典。
+
+**相关文件**：
+- `agent/deep_assistant.py`
+- `utils/daytona_sync.py`
+
+**Commit/PR**：`776acc2`
+
+---
+
 ## 2026-05-12: 批量精修 retrieval policy 文案

 **类型**：内容调整
--- a/db_manager.py
+++ b/db_manager.py
@ -1,168 +0,0 @@
-#!/usr/bin/env python3
-"""
-SQLite task status database management tool
-"""
-
-import sqlite3
-import json
-import time
-from task_queue.task_status import task_status_store
-
-def view_database():
-    """View database contents"""
-    print("SQLite task status database contents")
-    print("=" * 40)
-    print(f"Database path: {task_status_store.db_path}")
-    
-    # Connect to the database
-    conn = sqlite3.connect(task_status_store.db_path)
-    cursor = conn.cursor()
-    
-    # View table schema
-    print(f"\nTable schema:")
-    cursor.execute("PRAGMA table_info(task_status)")
-    columns = cursor.fetchall()
-    for col in columns:
-        print(f"   {col[1]} ({col[2]})")
-    
-    # View all records
-    print(f"\nAll records:")
-    cursor.execute("SELECT * FROM task_status ORDER BY updated_at DESC")
-    rows = cursor.fetchall()
-    
-    if not rows:
-        print("   (empty database)")
-    else:
-        print(f"   Total {len(rows)} records:")
-        for i, row in enumerate(rows):
-            task_id, unique_id, status, created_at, updated_at, result, error = row
-            created_str = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(created_at))
-            updated_str = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(updated_at))
-            
-            print(f"   {i+1}. {task_id}")
-            print(f"      Project ID: {unique_id}")
-            print(f"      Status: {status}")
-            print(f"      Created: {created_str}")
-            print(f"      Updated: {updated_str}")
-            if result:
-                try:
-                    result_data = json.loads(result)
-                    print(f"      Result: {result_data.get('message', 'N/A')}")
-                except:
-                    print(f"      Result: {result[:50]}...")
-            if error:
-                print(f"      Error: {error}")
-            print()
-    
-    conn.close()
-
-def run_query(sql_query: str):
-    """Run a custom query"""
-    print(f"Running query: {sql_query}")
-    
-    try:
-        conn = sqlite3.connect(task_status_store.db_path)
-        conn.row_factory = sqlite3.Row
-        cursor = conn.cursor()
-        cursor.execute(sql_query)
-        rows = cursor.fetchall()
-        
-        if not rows:
-            print("   (no results)")
-        else:
-            print(f"   {len(rows)} results:")
-            for row in rows:
-                print(f"   {dict(row)}")
-        
-        conn.close()
-        
-    except Exception as e:
-        print(f"Query failed: {e}")
-
-def interactive_shell():
-    """Interactive database management"""
-    print("\n🖥️  Interactive database management")
-    print("Type 'help' to view available commands, or 'quit' to exit")
-    
-    while True:
-        try:
-            command = input("\n> ").strip()
-            
-            if command.lower() in ['quit', 'exit', 'q']:
-                break
-            elif command.lower() == 'help':
-                print("""
-Available commands:
-  view                - View all records
-  stats               - View statistics  
-  pending             - View pending tasks
-  completed           - View completed tasks
-  failed              - View failed tasks
-  sql <query>          - Run an SQL query
-  cleanup <days>       - Clean up records older than N days
-  count               - Count total tasks
-  help                - Show help
-  quit/exit/q         - Exit
-                """)
-            elif command.lower() == 'view':
-                view_database()
-            elif command.lower() == 'stats':
-                stats = task_status_store.get_statistics()
-                print(f"Statistics:")
-                print(f"  Total tasks: {stats['total_tasks']}")
-                print(f"  Status breakdown: {stats['status_breakdown']}")
-                print(f"  Last 24 hours: {stats['recent_24h']}")
-            elif command.lower() == 'pending':
-                tasks = task_status_store.search_tasks(status="pending")
-                print(f"Pending tasks ({len(tasks)}):")
-                for task in tasks:
-                    print(f"  - {task['task_id']}: {task['unique_id']}")
-            elif command.lower() == 'completed':
-                tasks = task_status_store.search_tasks(status="completed")
-                print(f"Completed tasks ({len(tasks)}):")
-                for task in tasks:
-                    print(f"  - {task['task_id']}: {task['unique_id']}")
-            elif command.lower() == 'failed':
-                tasks = task_status_store.search_tasks(status="failed")
-                print(f"Failed tasks ({len(tasks)}):")
-                for task in tasks:
-                    print(f"  - {task['task_id']}: {task['unique_id']}")
-            elif command.lower().startswith('sql '):
-                sql_query = command[4:]
-                run_query(sql_query)
-            elif command.lower().startswith('cleanup '):
-                try:
-                    days = int(command[8:])
-                    count = task_status_store.cleanup_old_tasks(days)
-                    print(f"Cleaned up {count} records older than {days} days")
-                except ValueError:
-                    print("Please enter a valid number of days")
-            elif command.lower() == 'count':
-                all_tasks = task_status_store.list_all()
-                print(f"Total tasks: {len(all_tasks)}")
-            else:
-                print("Unknown command. Type 'help' for help")
-                
-        except KeyboardInterrupt:
-            print("\nGoodbye!")
-            break
-        except Exception as e:
-            print(f"Execution error: {e}")
-
-def main():
-    """Main function"""
-    import sys
-    
-    if len(sys.argv) > 1:
-        if sys.argv[1] == 'view':
-            view_database()
-        elif sys.argv[1] == 'interactive':
-            interactive_shell()
-        else:
-            print("Usage: python db_manager.py [view|interactive]")
-    else:
-        view_database()
-        interactive_shell()
-
-if __name__ == "__main__":
-    main()
--- a/poetry.lock
+++ b/poetry.lock
@ -1790,21 +1790,6 @@ files = [
    {file = "httpx_sse-0.4.3.tar.gz", hash = "sha256:9b1ed0127459a66014aec3c56bebd93da3c1bc8bb6618c8082039a44889a755d"},
 ]

-[[package]]
-name = "huey"
-version = "2.5.3"
-description = "huey, a little task queue"
-optional = false
-python-versions = "*"
-groups = ["main"]
-files = [
-    {file = "huey-2.5.3.tar.gz", hash = "sha256:089fc72b97fd26a513f15b09925c56fad6abe4a699a1f0e902170b37e85163c7"},
-]
-
-[package.extras]
-backends = ["redis (>=3.0.0)"]
-redis = ["redis (>=3.0.0)"]
-
 [[package]]
 name = "huggingface-hub"
 version = "0.35.3"
@ -4825,6 +4810,23 @@ urllib3 = ">=1.26.14,<3"
 fastembed = ["fastembed (>=0.7,<0.8)"]
 fastembed-gpu = ["fastembed-gpu (>=0.7,<0.8)"]

+[[package]]
+name = "redis"
+version = "6.4.0"
+description = "Python client for Redis database and key-value store"
+optional = false
+python-versions = ">=3.9"
+groups = ["main"]
+files = [
+    {file = "redis-6.4.0-py3-none-any.whl", hash = "sha256:f0544fa9604264e9464cdf4814e7d4830f74b165d52f2a330a760a88dd248b7f"},
+    {file = "redis-6.4.0.tar.gz", hash = "sha256:b01bc7282b8444e28ec36b261df5375183bb47a07eb9c603f284e89cbc5ef010"},
+]
+
+[package.extras]
+hiredis = ["hiredis (>=3.2.0)"]
+jwt = ["pyjwt (>=2.9.0)"]
+ocsp = ["cryptography (>=36.0.1)", "pyopenssl (>=20.0.1)", "requests (>=2.31.0)"]
+
 [[package]]
 name = "referencing"
 version = "0.37.0"
@ -7132,4 +7134,4 @@ cffi = ["cffi (>=1.17,<2.0) ; platform_python_implementation != \"PyPy\" and pyt
 [metadata]
 lock-version = "2.1"
 python-versions = ">=3.12,<3.15"
-content-hash = "dc130664802ad1344adc341931036a343f9892934a41bbc15c48663d0146696b"
+content-hash = "f5b01d5a1e60672741f2c5e8cc6e2ec55534963a9a3791fd1cdf67d3c2fbd70b"
--- a/pyproject.toml
+++ b/pyproject.toml
@ -19,7 +19,7 @@ dependencies = [
    "numpy<2",
    "aiohttp",
    "aiofiles",
-    "huey (>=2.5.3,<3.0.0)",
+    "redis (>=4.0,<7.0)",
    "pandas>=1.5.0",
    "openpyxl>=3.0.0",
    "xlrd>=2.0.0",
--- a/requirements.txt
+++ b/requirements.txt
@ -58,7 +58,6 @@ httpcore==1.0.9 ; python_version >= "3.12" and python_version < "3.15"
 httptools==0.7.1 ; python_version >= "3.12" and python_version < "3.15" and platform_system != "Windows"
 httpx-sse==0.4.3 ; python_version >= "3.12" and python_version < "3.15"
 httpx==0.28.1 ; python_version >= "3.12" and python_version < "3.15"
-huey==2.5.3 ; python_version >= "3.12" and python_version < "3.15"
 huggingface-hub==0.35.3 ; python_version >= "3.12" and python_version < "3.15"
 hyperframe==6.1.0 ; python_version >= "3.12" and python_version < "3.15"
 idna==3.11 ; python_version >= "3.12" and python_version < "3.15"
@ -161,6 +160,7 @@ pywin32==311 ; python_version >= "3.12" and python_version < "3.15" and (sys_pla
 pyyaml==6.0.3 ; python_version >= "3.12" and python_version < "3.15"
 qdrant-client==1.12.1 ; python_version >= "3.13" and python_version < "3.15"
 qdrant-client==1.16.2 ; python_version == "3.12"
+redis==6.4.0 ; python_version >= "3.12" and python_version < "3.15"
 referencing==0.37.0 ; python_version >= "3.12" and python_version < "3.15"
 regex==2025.9.18 ; python_version >= "3.12" and python_version < "3.15"
 requests-toolbelt==1.0.0 ; python_version >= "3.12" and python_version < "3.15"
--- a/routes/files.py
+++ b/routes/files.py
@ -1,273 +1,18 @@
 import os
 import uuid
 import shutil
-import zipfile
 from datetime import datetime
-from typing import Optional, List
-from fastapi import APIRouter, HTTPException, Header, UploadFile, File, Form
-from pydantic import BaseModel
+from typing import Optional
+from fastapi import APIRouter, HTTPException, UploadFile, File, Form
 import logging

 logger = logging.getLogger('app')

-from utils import (
-    DatasetRequest, QueueTaskRequest, IncrementalTaskRequest, QueueTaskResponse,
-    load_processed_files_log, remove_file_or_directory, remove_dataset_directory_by_key
-)
 from utils.fastapi_utils import get_versioned_filename
-from task_queue.manager import queue_manager
-from task_queue.integration_tasks import process_files_async, process_files_incremental_async, cleanup_project_async
-from task_queue.task_status import task_status_store

 router = APIRouter()


-@router.post("/api/v1/files/process/async")
-async def process_files_async_endpoint(request: QueueTaskRequest, authorization: Optional[str] = Header(None)):
-    """
-    Queue-based API for asynchronous file processing.
-    Same functionality as /api/v1/files/process, but processed asynchronously through the queue.
-    
-    Args:
-        request: QueueTaskRequest containing dataset_id, files, system_prompt, mcp_settings, and queue options
-        authorization: Authorization header containing API key (Bearer <API_KEY>)
-        
-    Returns:
-        QueueTaskResponse: Processing result with task ID for tracking
-    """
-    try:
-        dataset_id = request.dataset_id
-        if not dataset_id:
-            raise HTTPException(status_code=400, detail="dataset_id is required")
-        
-        # Estimate processing time (based on file count)
-        estimated_time = 0
-        if request.upload_folder:
-            # For upload_folder, file count cannot be estimated in advance, so use the default time
-            estimated_time = 120  # Default: 2 minutes
-        elif request.files:
-            total_files = sum(len(file_list) for file_list in request.files.values())
-            estimated_time = max(30, total_files * 10)  # Estimated 10 seconds per file, minimum 30 seconds
-        
-        # Create task status record
-        import uuid
-        task_id = str(uuid.uuid4())
-        task_status_store.set_status(
-            task_id=task_id,
-            unique_id=dataset_id,
-            status="pending"
-        )
-        
-        # Submit async task
-        task = process_files_async(
-            dataset_id=dataset_id,
-            files=request.files,
-            upload_folder=request.upload_folder,
-            task_id=task_id
-        )
-        
-        # Build a more detailed message
-        message = f"File processing task has been submitted to the queue, project ID: {dataset_id}"
-        if request.upload_folder:
-            group_count = len(request.upload_folder)
-            message += f", files will be scanned automatically from {group_count} uploaded folders"
-        elif request.files:
-            total_files = sum(len(file_list) for file_list in request.files.values())
-            message += f", including {total_files} files"
-        
-        return QueueTaskResponse(
-            success=True,
-            message=message,
-            dataset_id=dataset_id,
-            task_id=task_id,  # Use our own task_id
-            task_status="pending",
-            estimated_processing_time=estimated_time
-        )
-        
-    except HTTPException:
-        raise
-    except Exception as e:
-        logger.error(f"Error submitting async file processing task: {str(e)}")
-        raise HTTPException(status_code=500, detail=f"Internal server error: {str(e)}")
-
-
-@router.post("/api/v1/files/process/incremental")
-async def process_files_incremental_endpoint(request: IncrementalTaskRequest, authorization: Optional[str] = Header(None)):
-    """
-    Queue-based API for incremental file processing, supporting file additions and deletions.
-    
-    Args:
-        request: IncrementalTaskRequest containing dataset_id, files_to_add, files_to_remove, system_prompt, mcp_settings, and queue options
-        authorization: Authorization header containing API key (Bearer <API_KEY>)
-        
-    Returns:
-        QueueTaskResponse: Processing result with task ID for tracking
-    """
-    try:
-        dataset_id = request.dataset_id
-        if not dataset_id:
-            raise HTTPException(status_code=400, detail="dataset_id is required")
-        
-        # Validate that there is at least one add or delete operation
-        if not request.files_to_add and not request.files_to_remove:
-            raise HTTPException(status_code=400, detail="At least one of files_to_add or files_to_remove must be provided")
-        
-        # Estimate processing time (based on file count)
-        estimated_time = 0
-        total_add_files = sum(len(file_list) for file_list in (request.files_to_add or {}).values())
-        total_remove_files = sum(len(file_list) for file_list in (request.files_to_remove or {}).values())
-        total_files = total_add_files + total_remove_files
-        estimated_time = max(30, total_files * 10)  # Estimated 10 seconds per file, minimum 30 seconds
-        
-        # Create task status record
-        import uuid
-        task_id = str(uuid.uuid4())
-        task_status_store.set_status(
-            task_id=task_id,
-            unique_id=dataset_id,
-            status="pending"
-        )
-        
-        # Submit incremental async task
-        task = process_files_incremental_async(
-            dataset_id=dataset_id,
-            files_to_add=request.files_to_add,
-            files_to_remove=request.files_to_remove,
-            system_prompt=request.system_prompt,
-            mcp_settings=request.mcp_settings,
-            task_id=task_id
-        )
-        
-        return QueueTaskResponse(
-            success=True,
-            message=f"Incremental file processing task has been submitted to the queue - added {total_add_files} files, removed {total_remove_files} files, project ID: {dataset_id}",
-            dataset_id=dataset_id,
-            task_id=task_id,
-            task_status="pending",
-            estimated_processing_time=estimated_time
-        )
-        
-    except HTTPException:
-        raise
-    except Exception as e:
-        logger.error(f"Error submitting incremental file processing task: {str(e)}")
-        raise HTTPException(status_code=500, detail=f"Internal server error: {str(e)}")
-
-
-@router.get("/api/v1/files/{dataset_id}/status")
-async def get_files_processing_status(dataset_id: str):
-    """Get the file processing status for the project."""
-    try:
-        # Load processed files log
-        processed_log = load_processed_files_log(dataset_id)
-        
-        # Get project directory info
-        project_dir = os.path.join("projects", "data", dataset_id)
-        project_exists = os.path.exists(project_dir)
-        
-        # Collect document.txt files
-        document_files = []
-        if project_exists:
-            for root, dirs, files in os.walk(project_dir):
-                for file in files:
-                    if file == "document.txt":
-                        document_files.append(os.path.join(root, file))
-        
-        return {
-            "dataset_id": dataset_id,
-            "project_exists": project_exists,
-            "processed_files_count": len(processed_log),
-            "processed_files": processed_log,
-            "document_files_count": len(document_files),
-            "document_files": document_files,
-            "log_file_exists": os.path.exists(os.path.join("projects", "data", dataset_id, "processed_files.json"))
-        }
-    except Exception as e:
-        raise HTTPException(status_code=500, detail=f"Failed to retrieve file processing status: {str(e)}")
-
-
-@router.post("/api/v1/files/{dataset_id}/reset")
-async def reset_files_processing(dataset_id: str):
-    """Reset the project's file processing status by deleting the processing log and all files."""
-    try:
-        project_dir = os.path.join("projects", "data", dataset_id)
-        log_file = os.path.join("projects", "data", dataset_id, "processed_files.json")
-        
-        # Load processed log to know what files to remove
-        processed_log = load_processed_files_log(dataset_id)
-        
-        removed_files = []
-        # Remove all processed files and their dataset directories
-        for file_hash, file_info in processed_log.items():
-            # Remove local file in files directory
-            if 'local_path' in file_info:
-                if remove_file_or_directory(file_info['local_path']):
-                    removed_files.append(file_info['local_path'])
-            
-            # Handle new key-based structure first
-            if 'key' in file_info:
-                # Remove dataset directory by key
-                key = file_info['key']
-                if remove_dataset_directory_by_key(dataset_id, key):
-                    removed_files.append(f"dataset/{key}")
-            elif 'filename' in file_info:
-                # Fallback to old filename-based structure
-                filename_without_ext = os.path.splitext(file_info['filename'])[0]
-                dataset_dir = os.path.join("projects", "data", dataset_id, "datasets", filename_without_ext)
-                if remove_file_or_directory(dataset_dir):
-                    removed_files.append(dataset_dir)
-            
-            # Also remove any specific dataset path if exists (fallback)
-            if 'dataset_path' in file_info:
-                if remove_file_or_directory(file_info['dataset_path']):
-                    removed_files.append(file_info['dataset_path'])
-        
-        # Remove the log file
-        if remove_file_or_directory(log_file):
-            removed_files.append(log_file)
-        
-        # Remove the entire files directory
-        files_dir = os.path.join(project_dir, "files")
-        if remove_file_or_directory(files_dir):
-            removed_files.append(files_dir)
-        
-        # Also remove the entire dataset directory (clean up any remaining files)
-        dataset_dir = os.path.join(project_dir, "datasets")
-        if remove_file_or_directory(dataset_dir):
-            removed_files.append(dataset_dir)
-        
-        # Remove README.md if exists
-        readme_file = os.path.join(project_dir, "README.md")
-        if remove_file_or_directory(readme_file):
-            removed_files.append(readme_file)
-        
-        return {
-            "message": f"File processing status reset successfully: {dataset_id}",
-            "removed_files_count": len(removed_files),
-            "removed_files": removed_files
-        }
-    except Exception as e:
-        raise HTTPException(status_code=500, detail=f"Failed to reset file processing status: {str(e)}")
-
-
-@router.post("/api/v1/files/{dataset_id}/cleanup/async")
-async def cleanup_project_async_endpoint(dataset_id: str, remove_all: bool = False):
-    """Asynchronously clean up project files."""
-    try:
-        task = cleanup_project_async(dataset_id=dataset_id, remove_all=remove_all)
-        
-        return {
-            "success": True,
-            "message": f"Project cleanup task has been submitted to the queue, project ID: {dataset_id}",
-            "dataset_id": dataset_id,
-            "task_id": task.id,
-            "action": "remove_all" if remove_all else "cleanup_logs"
-        }
-    except Exception as e:
-        logger.error(f"Error submitting cleanup task: {str(e)}")
-        raise HTTPException(status_code=500, detail=f"Failed to submit cleanup task: {str(e)}")
-
-
@router.post("/api/v1/upload")
 async def upload_file(file: UploadFile = File(...), folder: Optional[str] = Form(None)):
    """
@ -348,121 +93,3 @@ async def upload_file(file: UploadFile = File(...), folder: Optional[str] = Form
    except Exception as e:
        logger.error(f"Error uploading file: {str(e)}")
        raise HTTPException(status_code=500, detail=f"File upload failed: {str(e)}")
-
-
-# Task management routes that are related to file processing
-@router.get("/api/v1/task/{task_id}/status")
-async def get_task_status(task_id: str):
-    """Get task status - simple and reliable."""
-    try:
-        status_data = task_status_store.get_status(task_id)
-        
-        if not status_data:
-            return {
-                "success": False,
-                "message": "Task does not exist or has expired",
-                "task_id": task_id,
-                "status": "not_found"
-            }
-        
-        return {
-            "success": True,
-            "message": "Task status retrieved successfully",
-            "task_id": task_id,
-            "status": status_data["status"],
-            "unique_id": status_data["unique_id"],
-            "created_at": status_data["created_at"],
-            "updated_at": status_data["updated_at"],
-            "result": status_data.get("result"),
-            "error": status_data.get("error")
-        }
-        
-    except Exception as e:
-        logger.error(f"Error getting task status: {str(e)}")
-        raise HTTPException(status_code=500, detail=f"Failed to retrieve task status: {str(e)}")
-
-
-@router.delete("/api/v1/task/{task_id}")
-async def delete_task(task_id: str):
-    """Delete task record."""
-    try:
-        success = task_status_store.delete_status(task_id)
-        if success:
-            return {
-                "success": True,
-                "message": f"Task record deleted: {task_id}",
-                "task_id": task_id
-            }
-        else:
-            return {
-                "success": False,
-                "message": f"Task record does not exist: {task_id}",
-                "task_id": task_id
-            }
-    except Exception as e:
-        logger.error(f"Error deleting task: {str(e)}")
-        raise HTTPException(status_code=500, detail=f"Failed to delete task record: {str(e)}")
-
-
-@router.get("/api/v1/tasks")
-async def list_tasks(status: Optional[str] = None, dataset_id: Optional[str] = None, limit: int = 100):
-    """List tasks with optional filters."""
-    try:
-        if status or dataset_id:
-            # Use search function
-            tasks = task_status_store.search_tasks(status=status, unique_id=dataset_id, limit=limit)
-        else:
-            # Get all tasks
-            all_tasks = task_status_store.list_all()
-            tasks = list(all_tasks.values())[:limit]
-        
-        return {
-            "success": True,
-            "message": "Task list retrieved successfully",
-            "total_tasks": len(tasks),
-            "tasks": tasks,
-            "filters": {
-                "status": status,
-                "dataset_id": dataset_id,
-                "limit": limit
-            }
-        }
-        
-    except Exception as e:
-        logger.error(f"Error listing tasks: {str(e)}")
-        raise HTTPException(status_code=500, detail=f"Failed to retrieve task list: {str(e)}")
-
-
-@router.get("/api/v1/tasks/statistics")
-async def get_task_statistics():
-    """Get task statistics."""
-    try:
-        stats = task_status_store.get_statistics()
-        
-        return {
-            "success": True,
-            "message": "Statistics retrieved successfully",
-            "statistics": stats
-        }
-        
-    except Exception as e:
-        logger.error(f"Error getting statistics: {str(e)}")
-        raise HTTPException(status_code=500, detail=f"Failed to retrieve statistics: {str(e)}")
-
-
-@router.post("/api/v1/tasks/cleanup")
-async def cleanup_tasks(older_than_days: int = 7):
-    """Clean up old task records."""
-    try:
-        deleted_count = task_status_store.cleanup_old_tasks(older_than_days=older_than_days)
-        
-        return {
-            "success": True,
-            "message": f"Cleaned up {deleted_count} old task records",
-            "deleted_count": deleted_count,
-            "older_than_days": older_than_days
-        }
-        
-    except Exception as e:
-        logger.error(f"Error cleaning up tasks: {str(e)}")
-        raise HTTPException(status_code=500, detail=f"Failed to clean up task records: {str(e)}")
--- a/routes/projects.py
+++ b/routes/projects.py
@ -6,8 +6,6 @@ import logging

 logger = logging.getLogger('app')

-from task_queue.task_status import task_status_store
-
 router = APIRouter()


@ -155,22 +153,3 @@ async def list_datasets():
    except Exception as e:
        logger.error(f"Error listing datasets: {str(e)}")
        raise HTTPException(status_code=500, detail=f"Failed to retrieve dataset list: {str(e)}")
-
-
-@router.get("/api/v1/projects/{dataset_id}/tasks")
-async def get_project_tasks(dataset_id: str):
-    """Get all tasks for the specified project."""
-    try:
-        tasks = task_status_store.get_by_unique_id(dataset_id)
-        
-        return {
-            "success": True,
-            "message": "Project tasks retrieved successfully",
-            "dataset_id": dataset_id,
-            "total_tasks": len(tasks),
-            "tasks": tasks
-        }
-        
-    except Exception as e:
-        logger.error(f"Error getting project tasks: {str(e)}")
-        raise HTTPException(status_code=500, detail=f"Failed to retrieve project tasks: {str(e)}")
--- a/start_all_optimized.sh
+++ b/start_all_optimized.sh
@ -1,5 +1,5 @@
 #!/bin/bash
-# Optimized startup script - integrates the FastAPI application and queue consumer
+# Optimized startup script for the FastAPI application

 set -e

@ -7,7 +7,6 @@ set -e
 DEFAULT_HOST="0.0.0.0"
 DEFAULT_PORT="8001"
 DEFAULT_API_WORKERS="4"
-DEFAULT_QUEUE_WORKERS="2"
 DEFAULT_PROFILE="balanced"
 DEFAULT_LOG_LEVEL="info"
 DEFAULT_MAX_RESTARTS="3"
@ -17,7 +16,6 @@ DEFAULT_CHECK_INTERVAL="5"
 HOST=${HOST:-$DEFAULT_HOST}
 PORT=${PORT:-$DEFAULT_PORT}
 API_WORKERS=${API_WORKERS:-$DEFAULT_API_WORKERS}
-QUEUE_WORKERS=${QUEUE_WORKERS:-$DEFAULT_QUEUE_WORKERS}
 PROFILE=${PROFILE:-$DEFAULT_PROFILE}
 LOG_LEVEL=${LOG_LEVEL:-$DEFAULT_LOG_LEVEL}
 MAX_RESTARTS=${MAX_RESTARTS:-$DEFAULT_MAX_RESTARTS}
@ -47,7 +45,6 @@ print_config() {
    print_color $GREEN "Startup configuration:"
    echo "- API server: http://$HOST:$PORT"
    echo "- API worker processes: $API_WORKERS"
-    echo "- Queue worker threads: $QUEUE_WORKERS"
    echo "- Performance profile: $PROFILE"
    echo "- Log level: $LOG_LEVEL"
    echo "- Maximum restarts: $MAX_RESTARTS"
@ -87,7 +84,6 @@ create_directories() {
    print_color $YELLOW "Creating project directories..."

    directories=(
-        "projects/queue_data"
        "projects/data"
        "projects/uploads"
        "projects/robot"
@ -161,16 +157,6 @@ start_services() {
    API_PID=$!
    echo "API server PID: $API_PID"

-    # Start the queue consumer
-    print_color $BLUE "Starting queue consumer..."
-    python3 task_queue/consumer.py \
-        --workers=$QUEUE_WORKERS \
-        --worker-type=threads \
-        > queue_consumer.log 2>&1 &
-
-    CONSUMER_PID=$!
-    echo "Queue consumer PID: $CONSUMER_PID"
-
    echo
    print_color $GREEN "All services started successfully!"
    print_color $GREEN "API server: http://$HOST:$PORT"
@ -179,7 +165,7 @@ start_services() {
 }

 monitor_services() {
-    local restart_counts=(0 0)  # API, Consumer
+    local restart_counts=(0)  # API

    while true; do
        # Check the API server
@ -205,26 +191,6 @@ monitor_services() {
            fi
        fi

-        # Check the queue consumer
-        if ! kill -0 $CONSUMER_PID 2>/dev/null; then
-            print_color $RED "Queue consumer stopped unexpectedly"
-
-            if [ ${restart_counts[1]} -lt $MAX_RESTARTS ]; then
-                print_color $YELLOW "Restarting queue consumer (${restart_counts[1]} + 1/$MAX_RESTARTS)..."
-                python3 task_queue/consumer.py \
-                    --workers=$QUEUE_WORKERS \
-                    --worker-type=threads \
-                    >> queue_consumer.log 2>&1 &
-
-                CONSUMER_PID=$!
-                restart_counts[1]=$((restart_counts[1] + 1))
-                print_color $GREEN "Queue consumer restarted successfully, PID: $CONSUMER_PID"
-            else
-                print_color $RED "Queue consumer restart limit reached, stopping all services"
-                break
-            fi
-        fi
-
        # Wait for the next check interval
        sleep $CHECK_INTERVAL
    done
@ -253,25 +219,6 @@ cleanup() {
        fi
    fi

-    # Stop the queue consumer
-    if [ ! -z "$CONSUMER_PID" ] && kill -0 $CONSUMER_PID 2>/dev/null; then
-        print_color $BLUE "Stopping queue consumer (PID: $CONSUMER_PID)..."
-        kill $CONSUMER_PID 2>/dev/null || true
-
-        # Wait for graceful shutdown
-        local count=0
-        while kill -0 $CONSUMER_PID 2>/dev/null && [ $count -lt 10 ]; do
-            sleep 1
-            count=$((count + 1))
-        done
-
-        # Force terminate if it is still running
-        if kill -0 $CONSUMER_PID 2>/dev/null; then
-            print_color $RED "Force stopping queue consumer..."
-            kill -9 $CONSUMER_PID 2>/dev/null || true
-        fi
-    fi
-
    print_color $GREEN "All services have been stopped"
    exit 0
 }
@ -288,7 +235,6 @@ main() {
        echo "  HOST               API bind host address (default: $DEFAULT_HOST)"
        echo "  PORT               API bind port (default: $DEFAULT_PORT)"
        echo "  API_WORKERS        Number of API worker processes (default: $DEFAULT_API_WORKERS)"
-        echo "  QUEUE_WORKERS      Number of queue worker threads (default: $DEFAULT_QUEUE_WORKERS)"
        echo "  PROFILE            Performance profile: low_memory, balanced, high_performance (default: $DEFAULT_PROFILE)"
        echo "  LOG_LEVEL          Log level: debug, info, warning, error (default: $DEFAULT_LOG_LEVEL)"
        echo "  MAX_RESTARTS       Maximum restart count (default: $DEFAULT_MAX_RESTARTS)"
@ -296,7 +242,7 @@ main() {
        echo
        echo "Examples:"
        echo "  PROFILE=high_performance API_WORKERS=8 $0"
-        echo "  PORT=8080 QUEUE_WORKERS=4 $0"
+        echo "  PORT=8080 API_WORKERS=4 $0"
        exit 0
    fi

--- a/start_unified.py
+++ b/start_unified.py
@ -1,6 +1,6 @@
 #!/usr/bin/env python3
 """
-Optimized unified startup script combining the FastAPI application and queue consumer.
+Optimized unified startup script for the FastAPI application.
 Supports performance monitoring, automatic restart, graceful shutdown, and related features.
 """

@ -17,7 +17,7 @@ from typing import List, Optional, Dict, Any


 class ProcessManager:
-    """Process manager that controls the API service and queue consumer."""
+    """Process manager that controls the API service."""
    
    def __init__(self):
        self.processes: Dict[str, subprocess.Popen] = {}
@ -78,44 +78,6 @@ class ProcessManager:
            print(f"Failed to start API server: {e}")
            return None
    
-    def start_queue_consumer(self, args) -> Optional[subprocess.Popen]:
-        """Start the queue consumer."""
-        print("Starting queue consumer...")
-
-        consumer_script = Path("task_queue/consumer.py")
-        if not consumer_script.exists():
-            consumer_script = consumer_script.with_suffix(".pyc")
-
-        # Build the queue consumer command
-        cmd = [
-            sys.executable,
-            str(consumer_script),
-            "--workers", str(args.queue_workers),
-            "--worker-type", args.worker_type
-        ]
-
-        try:
-            process = subprocess.Popen(
-                cmd,
-                stdout=subprocess.PIPE,
-                stderr=subprocess.STDOUT,
-                universal_newlines=True,
-                bufsize=1
-            )
-
-            # Start the output monitoring thread
-            threading.Thread(
-                target=self._monitor_output,
-                args=(process, "Queue consumer"),
-                daemon=True
-            ).start()
-
-            return process
-
-        except Exception as e:
-            print(f"Failed to start queue consumer: {e}")
-            return None
-    
    def _monitor_output(self, process: subprocess.Popen, name: str):
        """Monitor process output."""
        try:
@ -138,8 +100,6 @@ class ProcessManager:

        if name == "API server":
            new_process = self.start_api_server(args)
-        elif name == "Queue consumer":
-            new_process = self.start_queue_consumer(args)
        else:
            return False

@ -169,27 +129,19 @@ class ProcessManager:
            print("Failed to start API server; exiting")
            return False

-        queue_process = self.start_queue_consumer(args)
-        if not queue_process:
-            print("Failed to start queue consumer; exiting")
-            api_process.terminate()
-            return False
-
        self.processes["API server"] = api_process
-        self.processes["Queue consumer"] = queue_process

        print("\n" + "=" * 70)
        print("All services started successfully!")
        print(f"API server: http://{args.host}:{args.port}")
        print(f"API PID: {api_process.pid}")
-        print(f"Queue consumer PID: {queue_process.pid}")
        print("Press Ctrl+C to stop all services")
        print("=" * 70 + "\n")

        self.running = True

        # Main monitoring loop
-        restart_counts = {"API server": 0, "Queue consumer": 0}
+        restart_counts = {"API server": 0}
        max_restarts = args.max_restarts

        while self.running and not self.shutdown_event.is_set():
@ -262,7 +214,6 @@ class ProcessManager:
    def create_directories(self):
        """Create the required directories."""
        directories = [
-            "projects/queue_data",
            "projects/data",
            "projects/uploads",
            "projects/robot",
@ -313,11 +264,6 @@ def parse_args():
    parser.add_argument("--log-level", type=str, default="info",
                       choices=["debug", "info", "warning", "error"], help="Log level")

-    # Queue consumer configuration
-    parser.add_argument("--queue-workers", type=int, default=2, help="Number of queue consumer worker threads")
-    parser.add_argument("--worker-type", type=str, default="threads",
-                       choices=["threads", "greenlets", "gevent"], help="Queue worker type")
-
    # Performance profile
    parser.add_argument("--profile", type=str, default="low_memory",
                       choices=["low_memory", "balanced", "high_performance"], help="Performance profile")
--- a/task_queue/README.md
+++ b/task_queue/README.md
@ -1,154 +0,0 @@
-# 队列系统使用说明
-
-## 概述
-
-本项目集成了基于 huey 和 SqliteHuey 的异步队列系统，用于处理文件的异步处理任务。
-
-## 安装依赖
-
-```bash
-pip install huey
-```
-
-## 目录结构
-
-```
-queue/
-├── __init__.py          # 包初始化文件
-├── config.py           # 队列配置（SqliteHuey配置）
-├── tasks.py            # 文件处理任务定义
-├── manager.py          # 队列管理器
-├── consumer.py         # 队列消费者（工作进程）
-├── example.py          # 使用示例
-└── README.md           # 说明文档
-```
-
-## 核心功能
-
-### 1. 队列配置 (config.py)
- 使用 SqliteHuey 作为消息队列
- 数据库文件存储在 `queue_data/huey.db`
- 支持任务重试和错误存储
-
-### 2. 文件处理任务 (tasks.py)
- `process_file_async`: 异步处理单个文件
- `process_multiple_files_async`: 批量异步处理文件
- `process_zip_file_async`: 异步处理zip压缩文件
- `cleanup_processed_files`: 清理旧的文件
-
-### 3. 队列管理器 (manager.py)
- 任务提交和管理
- 队列状态监控
- 任务结果查询
- 任务记录清理
-
-## 使用方法
-
-### 1. 启动队列消费者
-
-```bash
-# 启动默认配置的消费者
-python queue/consumer.py
-
-# 指定工作线程数
-python queue/consumer.py --workers 4
-
-# 查看队列统计信息
-python queue/consumer.py --stats
-
-# 检查队列状态
-python queue/consumer.py --check
-
-# 清空队列
-python queue/consumer.py --flush
-```
-
-### 2. 在代码中使用队列
-
-```python
-from queue.manager import queue_manager
-
-# 处理单个文件
-task_id = queue_manager.enqueue_file(
-    project_id="my_project",
-    file_path="/path/to/file.txt",
-    original_filename="myfile.txt"
-)
-
-# 批量处理文件
-task_ids = queue_manager.enqueue_multiple_files(
-    project_id="my_project",
-    file_paths=["/path/file1.txt", "/path/file2.txt"],
-    original_filenames=["file1.txt", "file2.txt"]
-)
-
-# 处理zip文件
-task_id = queue_manager.enqueue_zip_file(
-    project_id="my_project",
-    zip_path="/path/to/archive.zip"
-)
-
-# 查看任务状态
-status = queue_manager.get_task_status(task_id)
-print(status)
-
-# 获取队列统计信息
-stats = queue_manager.get_queue_stats()
-print(stats)
-```
-
-### 3. 运行示例
-
-```bash
-python queue/example.py
-```
-
-## 配置说明
-
-### 队列配置参数 (config.py)
- `filename`: SQLite数据库文件路径
- `always_eager`: 是否立即执行任务（开发时可设为True）
- `utc`: 是否使用UTC时间
- `compression_level`: 压缩级别
- `store_errors`: 是否存储错误信息
- `max_retries`: 最大重试次数
- `retry_delay`: 重试延迟
-
-### 消费者参数 (consumer.py)
- `--workers`: 工作线程数（默认2）
- `--worker-type`: 工作类型（threads/greenlets/processes）
- `--stats`: 显示统计信息
- `--check`: 检查队列状态
- `--flush`: 清空队列
-
-## 任务状态
-
- `pending`: 等待处理
- `running`: 正在处理
- `complete/finished`: 处理完成
- `error`: 处理失败
- `scheduled`: 定时任务
-
-## 最佳实践
-
-1. **生产环境建议**:
-   - 设置合适的工作线程数（建议CPU核心数的1-2倍）
-   - 定期清理旧的任务记录
-   - 监控队列状态和任务执行情况
-
-2. **开发环境建议**:
-   - 可以设置 `always_eager=True` 立即执行任务进行调试
-   - 使用 `--check` 参数查看队列状态
-   - 运行示例代码了解功能
-
-3. **错误处理**:
-   - 任务失败后会自动重试（最多3次）
-   - 错误信息会存储在数据库中
-   - 可以通过 `get_task_status()` 查看错误详情
-
-## 故障排除
-
-1. **数据库锁定**: 确保只有一个消费者实例在运行
-2. **任务卡住**: 检查文件路径和权限
-3. **内存不足**: 调整工作线程数或使用进程模式
-4. **磁盘空间**: 定期清理旧文件和任务记录
--- a/task_queue/init.py
+++ b/task_queue/init.py
@ -1,23 +0,0 @@
-#!/usr/bin/env python3
-"""
-Queue package initialization.
-"""
-
-from .config import huey
-from .manager import QueueManager, queue_manager
-from .tasks import (
-    process_file_async,
-    process_multiple_files_async,
-    process_zip_file_async,
-    cleanup_processed_files
-)
-
-__all__ = [
-    "huey",
-    "QueueManager",
-    "queue_manager",
-    "process_file_async",
-    "process_multiple_files_async",
-    "process_zip_file_async",
-    "cleanup_processed_files"
-]
--- a/task_queue/config.py
+++ b/task_queue/config.py
@ -1,31 +0,0 @@
-#!/usr/bin/env python3
-"""
-Queue configuration using SqliteHuey for asynchronous file processing.
-"""
-
-import os
-import logging
-from huey import SqliteHuey
-from datetime import timedelta
-
-# Configure logging
-logger = logging.getLogger('app')
-
-# Ensure projects/queue_data directory exists
-queue_data_dir = os.path.join(os.path.dirname(__file__), '..', 'projects', 'queue_data')
-os.makedirs(queue_data_dir, exist_ok=True)
-
-# Initialize SqliteHuey
-huey = SqliteHuey(
-    filename=os.path.join(queue_data_dir, 'huey.db'),
-    name='file_processor',  # Queue name
-    always_eager=False,  # Set to False to enable async processing
-    utc=True,  # Use UTC time
-)
-
-# Set default task configuration
-huey.store_errors = True  # Store error information
-huey.max_retries = 3  # Maximum retry count
-huey.retry_delay = timedelta(seconds=60)  # Retry delay
-
-logger.info(f"SqliteHuey queue initialized, database path: {os.path.join(queue_data_dir, 'huey.db')}")
--- a/task_queue/consumer.py
+++ b/task_queue/consumer.py
@ -1,171 +0,0 @@
-#!/usr/bin/env python3
-"""
-Queue consumer for processing file tasks.
-"""
-
-import sys
-import os
-import time
-import signal
-import argparse
-from pathlib import Path
-
-# Add project root directory to Python path
-project_root = Path(__file__).parent.parent
-sys.path.insert(0, str(project_root))
-
-from task_queue.config import huey
-from task_queue.manager import queue_manager
-from task_queue.integration_tasks import process_files_async, cleanup_project_async
-from huey.consumer import Consumer
-
-
-class QueueConsumer:
-    """Queue consumer for processing async tasks."""
-
-    def __init__(self, worker_type: str = "threads", workers: int = 2):
-        self.huey = huey
-        self.worker_type = worker_type
-        self.workers = workers
-        self.running = False
-        self.consumer = None
-
-        # Register signal handlers
-        signal.signal(signal.SIGINT, self._signal_handler)
-        signal.signal(signal.SIGTERM, self._signal_handler)
-
-    def _signal_handler(self, signum, frame):
-        """Signal handler for graceful shutdown."""
-        print(f"\nReceived signal {signum}, shutting down queue consumer...")
-        self.running = False
-
-    def start(self):
-        """Start the queue consumer."""
-        print(f"Starting queue consumer...")
-        print(f"Worker threads: {self.workers}")
-        print(f"Worker type: {self.worker_type}")
-        print(f"Database: {os.path.join(os.path.dirname(__file__), '..', 'projects', 'queue_data', 'huey.db')}")
-        print("Press Ctrl+C to stop the consumer")
-
-        self.running = True
-
-        try:
-            # Create Huey consumer
-            self.consumer = Consumer(self.huey, workers=self.workers, worker_type=self.worker_type.rstrip('s'))
-
-            # Display queue statistics
-            stats = queue_manager.get_queue_stats()
-            print(f"Current queue status: {stats}")
-
-            # Start consumer run loop
-            print("Consumer starting task processing...")
-            self.consumer.run()
-
-        except KeyboardInterrupt:
-            print("\nReceived interrupt signal, shutting down...")
-        except Exception as e:
-            print(f"Queue consumer runtime error: {str(e)}")
-        finally:
-            self.stop()
-
-    def stop(self):
-        """Stop the queue consumer."""
-        print("Stopping queue consumer...")
-        try:
-            if self.consumer:
-                # Stop the consumer
-                self.consumer.stop()
-                self.consumer = None
-            print("Queue consumer stopped")
-        except Exception as e:
-            print(f"Error stopping queue consumer: {str(e)}")
-
-    def process_scheduled_tasks(self):
-        """Process scheduled tasks."""
-        print("Processing scheduled tasks...")
-        # Additional scheduled task processing logic can be added here
-
-
-def main():
-    """Main entry point."""
-    parser = argparse.ArgumentParser(description="File processing queue consumer")
-    parser.add_argument(
-        "--workers",
-        type=int,
-        default=2,
-        help="Number of worker threads (default: 2)"
-    )
-    parser.add_argument(
-        "--worker-type",
-        choices=["threads", "greenlets", "processes"],
-        default="threads",
-        help="Worker thread type (default: threads)"
-    )
-    parser.add_argument(
-        "--stats",
-        action="store_true",
-        help="Display queue statistics and exit"
-    )
-    parser.add_argument(
-        "--flush",
-        action="store_true",
-        help="Flush the queue and exit"
-    )
-    parser.add_argument(
-        "--check",
-        action="store_true",
-        help="Check queue status and exit"
-    )
-
-    args = parser.parse_args()
-
-    # Initialize consumer
-    consumer = QueueConsumer(
-        worker_type=args.worker_type,
-        workers=args.workers
-    )
-
-    # Handle different command-line options
-    if args.stats:
-        print("=== Queue Statistics ===")
-        stats = queue_manager.get_queue_stats()
-        print(f"Total tasks: {stats.get('total_tasks', 0)}")
-        print(f"Pending tasks: {stats.get('pending_tasks', 0)}")
-        print(f"Running tasks: {stats.get('running_tasks', 0)}")
-        print(f"Completed tasks: {stats.get('completed_tasks', 0)}")
-        print(f"Error tasks: {stats.get('error_tasks', 0)}")
-        print(f"Scheduled tasks: {stats.get('scheduled_tasks', 0)}")
-        print(f"Database: {stats.get('queue_database', 'N/A')}")
-        return
-
-    if args.flush:
-        print("=== Flushing Queue ===")
-        try:
-            # Flush all tasks
-            consumer.huey.flush()
-            print("Queue flushed")
-        except Exception as e:
-            print(f"Failed to flush queue: {str(e)}")
-        return
-
-    if args.check:
-        print("=== Checking Queue Status ===")
-        stats = queue_manager.get_queue_stats()
-        print(f"Queue status: OK" if "error" not in stats else f"Queue status: ERROR - {stats['error']}")
-
-        pending_tasks = queue_manager.list_pending_tasks(limit=10)
-        if pending_tasks:
-            print(f"\nPending tasks (showing up to 10):")
-            for task in pending_tasks:
-                print(f"  Task ID: {task['task_id']}, Status: {task['status']}, Created: {task['created_time']}")
-        else:
-            print("No pending tasks")
-        return
-
-    # Start consumer
-    print("=== Starting File Processing Queue Consumer ===")
-    consumer.start()
-
-
-if __name__ == "__main__":
-    main()
--- a/task_queue/example.py
+++ b/task_queue/example.py
@ -1,132 +0,0 @@
-#!/usr/bin/env python3
-"""
-Example usage of the queue system.
-"""
-
-import sys
-import time
-from pathlib import Path
-
-# Add project root directory to Python path
-project_root = Path(__file__).parent.parent
-sys.path.insert(0, str(project_root))
-
-from task_queue.manager import queue_manager
-from task_queue.tasks import process_file_async, process_multiple_files_async
-
-
-def example_single_file():
-    """Example: Process a single file."""
-    print("=== Example: Process a single file ===")
-
-    project_id = "test_project"
-    file_path = "public/test_document.txt"
-
-    # Enqueue file for processing
-    task_id = queue_manager.enqueue_file(
-        project_id=project_id,
-        file_path=file_path,
-        original_filename="example_document.txt"
-    )
-
-    print(f"Task submitted, task ID: {task_id}")
-
-    # Check task status
-    time.sleep(2)
-    status = queue_manager.get_task_status(task_id)
-    print(f"Task status: {status}")
-
-
-def example_multiple_files():
-    """Example: Batch process files."""
-    print("\n=== Example: Batch process files ===")
-
-    project_id = "test_project_batch"
-    file_paths = [
-        "public/test_document.txt",
-        "public/goods.xlsx"  # Assuming this file exists
-    ]
-    original_filenames = [
-        "batch_document_1.txt",
-        "batch_goods.xlsx"
-    ]
-
-    # Enqueue multiple files for processing
-    task_ids = queue_manager.enqueue_multiple_files(
-        project_id=project_id,
-        file_paths=file_paths,
-        original_filenames=original_filenames
-    )
-
-    print(f"Batch tasks submitted, task IDs: {task_ids}")
-
-
-def example_zip_file():
-    """Example: Process a zip file."""
-    print("\n=== Example: Process a zip file ===")
-
-    project_id = "test_project_zip"
-    zip_path = "public/all_hp_product_spec_book2506.zip"
-
-    # Enqueue zip file for processing
-    task_id = queue_manager.enqueue_zip_file(
-        project_id=project_id,
-        zip_path=zip_path
-    )
-
-    print(f"Zip task submitted, task ID: {task_id}")
-
-
-def example_queue_stats():
-    """Example: Get queue statistics."""
-    print("\n=== Example: Queue statistics ===")
-
-    stats = queue_manager.get_queue_stats()
-    print("Queue statistics:")
-    for key, value in stats.items():
-        if key != "recent_tasks":
-            print(f"  {key}: {value}")
-
-
-def example_cleanup():
-    """Example: Cleanup tasks."""
-    print("\n=== Example: Cleanup tasks ===")
-
-    project_id = "test_project"
-
-    # Enqueue cleanup task (delayed 10 seconds)
-    task_id = queue_manager.enqueue_cleanup_task(
-        project_id=project_id,
-        older_than_days=1,  # Clean files older than 1 day
-        delay=10
-    )
-
-    print(f"Cleanup task submitted, task ID: {task_id}")
-
-
-def main():
-    """Main entry point."""
-    print("Queue System Usage Examples")
-    print("=" * 50)
-
-    try:
-        # Run examples
-        example_single_file()
-        example_multiple_files()
-        example_zip_file()
-        example_queue_stats()
-        example_cleanup()
-
-        print("\n" + "=" * 50)
-        print("Examples completed!")
-        print("\nTo check task execution, run:")
-        print("python queue/consumer.py --check")
-        print("\nTo start the queue consumer, run:")
-        print("python queue/consumer.py")
-
-    except Exception as e:
-        print(f"Error running examples: {str(e)}")
-
-
-if __name__ == "__main__":
-    main()
--- a/task_queue/integration_tasks.py
+++ b/task_queue/integration_tasks.py
@ -1,499 +0,0 @@
-#!/usr/bin/env python3
-"""
-Queue tasks for file processing integration.
-"""
-
-import os
-import json
-import time
-import hashlib
-import shutil
-from typing import Dict, List, Optional, Any
-
-from task_queue.config import huey
-from task_queue.manager import queue_manager
-from task_queue.task_status import task_status_store
-from utils import download_dataset_files, save_processed_files_log, load_processed_files_log
-from utils.dataset_manager import remove_dataset_directory_by_key
-
-
-def scan_upload_folder(upload_dir: str) -> List[str]:
-    """
-    Scan all supported file formats in the upload folder.
-
-    Args:
-        upload_dir: Upload folder path
-
-    Returns:
-        List[str]: List of supported file paths
-    """
-    supported_extensions = {
-        # Text files
-        '.txt', '.md', '.rtf',
-        # Document files
-        '.doc', '.docx', '.pdf', '.odt',
-        # Spreadsheet files
-        '.xls', '.xlsx', '.csv', '.ods',
-        # Presentation files
-        '.ppt', '.pptx', '.odp',
-        # E-books
-        '.epub', '.mobi',
-        # Web files
-        '.html', '.htm',
-        # Config files
-        '.json', '.xml', '.yaml', '.yml',
-        # Code files
-        '.py', '.js', '.java', '.cpp', '.c', '.go', '.rs',
-        # Archive files
-        '.zip', '.rar', '.7z', '.tar', '.gz'
-    }
-
-    scanned_files = []
-
-    if not os.path.exists(upload_dir):
-        return scanned_files
-
-    for root, dirs, files in os.walk(upload_dir):
-        for file in files:
-            # Skip hidden files and system files
-            if file.startswith('.') or file.startswith('~'):
-                continue
-
-            file_path = os.path.join(root, file)
-            file_extension = os.path.splitext(file)[1].lower()
-
-            # Check if file extension is supported
-            if file_extension in supported_extensions:
-                scanned_files.append(file_path)
-            else:
-                # For files without extension, try to process them (may be text files)
-                if not file_extension:
-                    try:
-                        # Try reading the file header to determine if it's a text file
-                        with open(file_path, 'r', encoding='utf-8') as f:
-                            f.read(1024)  # Read the first 1KB
-                        scanned_files.append(file_path)
-                    except (UnicodeDecodeError, PermissionError):
-                        # Not a text file or unreadable, skip
-                        pass
-
-    return scanned_files
-
-
-@huey.task()
-def process_files_async(
-    dataset_id: str,
-    files: Optional[Dict[str, List[str]]] = None,
-    upload_folder: Optional[Dict[str, str]] = None,
-    task_id: Optional[str] = None
-) -> Dict[str, Any]:
-    """
-    Asynchronously process file tasks - compatible with existing files/process API.
-
-    Args:
-        dataset_id: Unique project ID
-        files: Dictionary of file paths grouped by key
-        upload_folder: Upload folder dictionary organized by group name, e.g. {'group1': 'my_project1', 'group2': 'my_project2'}
-        task_id: Task ID (for status tracking)
-
-    Returns:
-        Processing result dictionary
-    """
-    try:
-        print(f"Starting async file processing task, project ID: {dataset_id}")
-
-        # If task_id is provided, set initial status
-        if task_id:
-            task_status_store.set_status(
-                task_id=task_id,
-                unique_id=dataset_id,
-                status="running"
-            )
-
-        # Ensure project directory exists
-        project_dir = os.path.join("projects", "data", dataset_id)
-        if not os.path.exists(project_dir):
-            os.makedirs(project_dir, exist_ok=True)
-
-        # Process files: use key-grouped format
-        processed_files_by_key = {}
-
-        # If upload_folder is provided, scan files in those folders
-        if upload_folder and not files:
-            scanned_files_by_group = {}
-            total_scanned_files = 0
-
-            for group_name, folder_name in upload_folder.items():
-                # Security check: prevent path traversal attacks
-                safe_folder_name = os.path.basename(folder_name)
-                upload_dir = os.path.join("projects", "uploads", safe_folder_name)
-
-                if os.path.exists(upload_dir):
-                    scanned_files = scan_upload_folder(upload_dir)
-                    if scanned_files:
-                        scanned_files_by_group[group_name] = scanned_files
-                        total_scanned_files += len(scanned_files)
-                        print(f"Scanned {len(scanned_files)} files from upload folder '{safe_folder_name}' (group: {group_name})")
-                    else:
-                        print(f"No supported files found in upload folder '{safe_folder_name}' (group: {group_name})")
-                else:
-                    print(f"Upload folder does not exist: {upload_dir} (group: {group_name})")
-
-            if scanned_files_by_group:
-                files = scanned_files_by_group
-                print(f"Total scanned {total_scanned_files} files from {len(scanned_files_by_group)} groups")
-            else:
-                print("No supported files found in any upload folder")
-
-        if files:
-            # Use files from the request (grouped by key)
-            # Since this is an async task, call synchronously
-            import asyncio
-            try:
-                loop = asyncio.get_event_loop()
-            except RuntimeError:
-                loop = asyncio.new_event_loop()
-                asyncio.set_event_loop(loop)
-
-            processed_files_by_key = loop.run_until_complete(download_dataset_files(dataset_id, files))
-            total_files = sum(len(files_list) for files_list in processed_files_by_key.values())
-            print(f"Async processed {total_files} dataset files across {len(processed_files_by_key)} keys, project ID: {dataset_id}")
-        else:
-            print(f"No files provided in request, project ID: {dataset_id}")
-
-        # Collect all document.txt files in the project directory
-        document_files = []
-        for root, dirs, files_list in os.walk(project_dir):
-            for file in files_list:
-                if file == "document.txt":
-                    document_files.append(os.path.join(root, file))
-
-        # Generate project README.md file
-        try:
-            from utils.project_manager import save_project_readme
-            save_project_readme(dataset_id)
-            print(f"README.md generated, project ID: {dataset_id}")
-        except Exception as e:
-            print(f"Failed to generate README.md, project ID: {dataset_id}, error: {str(e)}")
-            # Does not affect main processing flow, continue
-
-        # Build result file list
-        result_files = []
-        for key in processed_files_by_key.keys():
-            # Add corresponding dataset document.txt path
-            document_path = os.path.join("projects", "data", dataset_id, "datasets", key, "document.txt")
-            if os.path.exists(document_path):
-                result_files.append(document_path)
-
-        # Also add document.txt files that exist but are not in processed_files_by_key
-        existing_document_paths = set(result_files)  # Avoid duplicates
-        for doc_file in document_files:
-            if doc_file not in existing_document_paths:
-                result_files.append(doc_file)
-
-        result = {
-            "status": "success",
-            "message": f"Successfully processed {len(result_files)} document files across {len(processed_files_by_key)} keys",
-            "dataset_id": dataset_id,
-            "processed_files": result_files,
-            "processed_files_by_key": processed_files_by_key,
-            "document_files": document_files,
-            "total_files_processed": sum(len(files_list) for files_list in processed_files_by_key.values()),
-            "processing_time": time.time()
-        }
-
-        # Update task status to completed
-        if task_id:
-            task_status_store.update_status(
-                task_id=task_id,
-                status="completed",
-                result=result
-            )
-
-        print(f"Async file processing task completed: {dataset_id}")
-        return result
-
-    except Exception as e:
-        error_msg = f"Error during async file processing: {str(e)}"
-        print(error_msg)
-
-        # Update task status to error
-        if task_id:
-            task_status_store.update_status(
-                task_id=task_id,
-                status="failed",
-                error=error_msg
-            )
-
-        return {
-            "status": "error",
-            "message": error_msg,
-            "dataset_id": dataset_id,
-            "error": str(e)
-        }
-
-
-@huey.task()
-def process_files_incremental_async(
-    dataset_id: str,
-    files_to_add: Optional[Dict[str, List[str]]] = None,
-    files_to_remove: Optional[Dict[str, List[str]]] = None,
-    system_prompt: Optional[str] = None,
-    mcp_settings: Optional[List[Dict]] = None,
-    task_id: Optional[str] = None
-) -> Dict[str, Any]:
-    """
-    Incremental file processing task - supports adding and removing files.
-
-    Args:
-        dataset_id: Unique project ID
-        files_to_add: Dictionary of file paths to add, grouped by key
-        files_to_remove: Dictionary of file paths to remove, grouped by key
-        system_prompt: System prompt
-        mcp_settings: MCP settings
-        task_id: Task ID (for status tracking)
-
-    Returns:
-        Processing result dictionary
-    """
-    try:
-        print(f"Starting incremental file processing task, project ID: {dataset_id}")
-
-        # If task_id is provided, set initial status
-        if task_id:
-            task_status_store.set_status(
-                task_id=task_id,
-                unique_id=dataset_id,
-                status="running"
-            )
-
-        # Ensure project directory exists
-        project_dir = os.path.join("projects", "data", dataset_id)
-        if not os.path.exists(project_dir):
-            os.makedirs(project_dir, exist_ok=True)
-
-        # Load existing processing log
-        processed_log = load_processed_files_log(dataset_id)
-        print(f"Loaded existing processing log with {len(processed_log)} file records")
-
-        removed_files = []
-        added_files = []
-
-        # 1. Process removals
-        if files_to_remove:
-            print(f"Starting removal processing across {len(files_to_remove)} key groups")
-            for key, file_list in files_to_remove.items():
-                if not file_list:  # If file list is empty, remove the entire key group
-                    print(f"Removing entire key group: {key}")
-                    if remove_dataset_directory_by_key(dataset_id, key):
-                        removed_files.append(f"dataset/{key}")
-
-                    # Remove all records for this key from the processing log
-                    keys_to_remove = [file_hash for file_hash, file_info in processed_log.items()
-                                    if file_info.get('key') == key]
-                    for file_hash in keys_to_remove:
-                        del processed_log[file_hash]
-                        removed_files.append(f"log_entry:{file_hash}")
-                else:
-                    # Remove specific files
-                    for file_path in file_list:
-                        print(f"Removing specific file: {key}/{file_path}")
-
-                        # Actually delete the file
-                        filename = os.path.basename(file_path)
-
-                        # Delete original file
-                        source_file = os.path.join("projects", "data", dataset_id, "files", key, filename)
-                        if os.path.exists(source_file):
-                            os.remove(source_file)
-                            removed_files.append(f"file:{key}/{filename}")
-
-                        # Delete processed file directory
-                        processed_dir = os.path.join("projects", "data", dataset_id, "processed", key, filename)
-                        if os.path.exists(processed_dir):
-                            shutil.rmtree(processed_dir)
-                            removed_files.append(f"processed:{key}/{filename}")
-
-                        # Compute file hash to find in log
-                        file_hash = hashlib.md5(file_path.encode('utf-8')).hexdigest()
-
-                        # Remove from processing log
-                        if file_hash in processed_log:
-                            del processed_log[file_hash]
-                            removed_files.append(f"log_entry:{file_hash}")
-
-        # 2. Process additions
-        processed_files_by_key = {}
-        if files_to_add:
-            print(f"Starting addition processing across {len(files_to_add)} key groups")
-            # Use async processing to download files
-            import asyncio
-            try:
-                loop = asyncio.get_event_loop()
-            except RuntimeError:
-                loop = asyncio.new_event_loop()
-                asyncio.set_event_loop(loop)
-
-            processed_files_by_key = loop.run_until_complete(download_dataset_files(dataset_id, files_to_add, incremental_mode=True))
-            total_added_files = sum(len(files_list) for files_list in processed_files_by_key.values())
-            print(f"Async processed {total_added_files} dataset files across {len(processed_files_by_key)} keys, project ID: {dataset_id}")
-
-            # Record added files
-            for key, files_list in processed_files_by_key.items():
-                for file_path in files_list:
-                    added_files.append(f"{key}/{file_path}")
-        else:
-            print(f"No files to add provided in request, project ID: {dataset_id}")
-
-        # Save updated processing log
-        save_processed_files_log(dataset_id, processed_log)
-        print(f"Updated processing log, now contains {len(processed_log)} file records")
-
-        # Save system_prompt and mcp_settings to project directory (if provided)
-        if system_prompt:
-            system_prompt_file = os.path.join(project_dir, "system_prompt.md")
-            with open(system_prompt_file, 'w', encoding='utf-8') as f:
-                f.write(system_prompt)
-            print(f"Saved system_prompt, project ID: {dataset_id}")
-
-        if mcp_settings:
-            mcp_settings_file = os.path.join(project_dir, "mcp_settings.json")
-            with open(mcp_settings_file, 'w', encoding='utf-8') as f:
-                json.dump(mcp_settings, f, ensure_ascii=False, indent=2)
-            print(f"Saved mcp_settings, project ID: {dataset_id}")
-
-        # Generate project README.md file
-        try:
-            from utils.project_manager import save_project_readme
-            save_project_readme(dataset_id)
-            print(f"README.md generated, project ID: {dataset_id}")
-        except Exception as e:
-            print(f"Failed to generate README.md, project ID: {dataset_id}, error: {str(e)}")
-            # Does not affect main processing flow, continue
-
-        # Collect all document.txt files in the project directory
-        document_files = []
-        for root, dirs, files_list in os.walk(project_dir):
-            for file in files_list:
-                if file == "document.txt":
-                    document_files.append(os.path.join(root, file))
-
-        # Build result file list
-        result_files = []
-        for key in processed_files_by_key.keys():
-            # Add corresponding dataset document.txt path
-            document_path = os.path.join("projects", "data", dataset_id, "datasets", key, "document.txt")
-            if os.path.exists(document_path):
-                result_files.append(document_path)
-
-        # Also add document.txt files that exist but are not in processed_files_by_key
-        existing_document_paths = set(result_files)  # Avoid duplicates
-        for doc_file in document_files:
-            if doc_file not in existing_document_paths:
-                result_files.append(doc_file)
-
-        result = {
-            "status": "success",
-            "message": f"Incremental processing complete - added {len(added_files)} files, removed {len(removed_files)} files, {len(result_files)} document files remaining",
-            "dataset_id": dataset_id,
-            "removed_files": removed_files,
-            "added_files": added_files,
-            "processed_files": result_files,
-            "processed_files_by_key": processed_files_by_key,
-            "document_files": document_files,
-            "total_files_added": sum(len(files_list) for files_list in processed_files_by_key.values()),
-            "total_files_removed": len(removed_files),
-            "final_files_count": len(result_files),
-            "processing_time": time.time()
-        }
-
-        # Update task status to completed
-        if task_id:
-            task_status_store.update_status(
-                task_id=task_id,
-                status="completed",
-                result=result
-            )
-
-        print(f"Incremental file processing task completed: {dataset_id}")
-        return result
-
-    except Exception as e:
-        error_msg = f"Error during incremental file processing: {str(e)}"
-        print(error_msg)
-
-        # Update task status to error
-        if task_id:
-            task_status_store.update_status(
-                task_id=task_id,
-                status="failed",
-                error=error_msg
-            )
-
-        return {
-            "status": "error",
-            "message": error_msg,
-            "dataset_id": dataset_id,
-            "error": str(e)
-        }
-
-
-@huey.task()
-def cleanup_project_async(
-    dataset_id: str,
-    remove_all: bool = False
-) -> Dict[str, Any]:
-    """
-    Asynchronously clean up project files.
-
-    Args:
-        dataset_id: Unique project ID
-        remove_all: Whether to remove the entire project directory
-
-    Returns:
-        Cleanup result dictionary
-    """
-    try:
-        print(f"Starting async project cleanup, project ID: {dataset_id}")
-
-        project_dir = os.path.join("projects", "data", dataset_id)
-        removed_items = []
-
-        if remove_all and os.path.exists(project_dir):
-            import shutil
-            shutil.rmtree(project_dir)
-            removed_items.append(project_dir)
-            result = {
-                "status": "success",
-                "message": f"Deleted entire project directory: {project_dir}",
-                "dataset_id": dataset_id,
-                "removed_items": removed_items,
-                "action": "remove_all"
-            }
-        else:
-            # Only clean processing log
-            log_file = os.path.join(project_dir, "processed_files.json")
-            if os.path.exists(log_file):
-                os.remove(log_file)
-                removed_items.append(log_file)
-
-            result = {
-                "status": "success",
-                "message": f"Cleaned project processing log, project ID: {dataset_id}",
-                "dataset_id": dataset_id,
-                "removed_items": removed_items,
-                "action": "cleanup_logs"
-            }
-
-        print(f"Async cleanup task completed: {dataset_id}")
-        return result
-
-    except Exception as e:
-        error_msg = f"Error during async project cleanup: {str(e)}"
-        print(error_msg)
-        return {
-            "status": "error",
-            "message": error_msg,
-            "dataset_id": dataset_id,
-            "error": str(e)
-        }
--- a/task_queue/manager.py
+++ b/task_queue/manager.py
@ -1,228 +0,0 @@
-#!/usr/bin/env python3
-"""
-Queue manager for handling file processing queues.
-"""
-
-import os
-import json
-import time
-import logging
-from typing import Dict, List, Optional, Any
-from huey import Huey
-from huey.api import Task
-from datetime import datetime, timedelta
-
-# Configure logging
-logger = logging.getLogger('app')
-
-from .config import huey
-from .tasks import process_file_async, process_multiple_files_async, process_zip_file_async, cleanup_processed_files
-
-
-class QueueManager:
-    """Queue manager for file processing tasks."""
-
-    def __init__(self):
-        self.huey = huey
-        logger.info(f"Queue manager initialized with database: {os.path.join(os.path.dirname(__file__), '..', 'projects', 'queue_data', 'huey.db')}")
-
-    def enqueue_file(
-        self,
-        project_id: str,
-        file_path: str,
-        original_filename: str = None,
-        delay: int = 0
-    ) -> str:
-        """
-        Add a file to the processing queue.
-
-        Args:
-            project_id: Project ID
-            file_path: File path
-            original_filename: Original filename
-            delay: Delay before execution in seconds
-
-        Returns:
-            Task ID
-        """
-        if delay > 0:
-            task = process_file_async.schedule(
-                args=(project_id, file_path, original_filename),
-                delay=timedelta(seconds=delay)
-            )
-        else:
-            task = process_file_async(project_id, file_path, original_filename)
-
-        logger.info(f"File queued for processing: {file_path}, task ID: {task.id}")
-        return task.id
-
-    def enqueue_multiple_files(
-        self,
-        project_id: str,
-        file_paths: List[str],
-        original_filenames: List[str] = None,
-        delay: int = 0
-    ) -> List[str]:
-        """
-        Add multiple files to the processing queue.
-
-        Args:
-            project_id: Project ID
-            file_paths: List of file paths
-            original_filenames: List of original filenames
-            delay: Delay before execution in seconds
-
-        Returns:
-            List of task IDs
-        """
-        if delay > 0:
-            task = process_multiple_files_async.schedule(
-                args=(project_id, file_paths, original_filenames),
-                delay=timedelta(seconds=delay)
-            )
-        else:
-            task = process_multiple_files_async(project_id, file_paths, original_filenames)
-
-        logger.info(f"Batch files queued for processing: {len(file_paths)} files, task ID: {task.id}")
-        return [task.id]
-
-    def enqueue_zip_file(
-        self,
-        project_id: str,
-        zip_path: str,
-        extract_to: str = None,
-        delay: int = 0
-    ) -> str:
-        """
-        Add a zip file to the processing queue.
-
-        Args:
-            project_id: Project ID
-            zip_path: Path to the zip file
-            extract_to: Extraction target directory
-            delay: Delay before execution in seconds
-
-        Returns:
-            Task ID
-        """
-        if delay > 0:
-            task = process_zip_file_async.schedule(
-                args=(project_id, zip_path, extract_to),
-                delay=timedelta(seconds=delay)
-            )
-        else:
-            task = process_zip_file_async(project_id, zip_path, extract_to)
-
-        logger.info(f"Zip file queued for processing: {zip_path}, task ID: {task.id}")
-        return task.id
-
-    def get_task_status(self, task_id: str) -> Dict[str, Any]:
-        """
-        Get task status.
-
-        Args:
-            task_id: Task ID
-
-        Returns:
-            Task status information
-        """
-        try:
-            # Try getting the task result from result storage
-            try:
-                # Use Huey's built-in result lookup when available
-                if hasattr(self.huey, 'result') and self.huey.result:
-                    result = self.huey.result(task_id)
-                    if result is not None:
-                        return {
-                            "task_id": task_id,
-                            "status": "complete",
-                            "result": result
-                        }
-            except Exception:
-                pass
-
-            # Check whether the task is in the pending queue
-            try:
-                pending_tasks = list(self.huey.pending())
-                for task in pending_tasks:
-                    if hasattr(task, 'id') and task.id == task_id:
-                        return {
-                            "task_id": task_id,
-                            "status": "pending"
-                        }
-            except Exception:
-                pass
-
-            # Check whether the task is in the scheduled queue
-            try:
-                scheduled_tasks = list(self.huey.scheduled())
-                for task in scheduled_tasks:
-                    if hasattr(task, 'id') and task.id == task_id:
-                        return {
-                            "task_id": task_id,
-                            "status": "scheduled"
-                        }
-            except Exception:
-                pass
-
-            # If not found anywhere, it may not exist or may have completed with cleaned results
-            return {
-                "task_id": task_id,
-                "status": "unknown",
-                "message": "Task status is unknown; it may already be complete or may not exist"
-            }
-
-        except Exception as e:
-            return {
-                "task_id": task_id,
-                "status": "error",
-                "message": f"Failed to get task status: {str(e)}"
-            }
-
-    def get_queue_stats(self) -> Dict[str, Any]:
-        """
-        Get queue statistics.
-
-        Returns:
-            Queue statistics information
-        """
-        try:
-            # Use a simplified approach for queue statistics
-            stats = {
-                "total_tasks": 0,
-                "pending_tasks": 0,
-                "running_tasks": 0,
-                "completed_tasks": 0,
-                "error_tasks": 0,
-                "scheduled_tasks": 0,
-                "recent_tasks": [],
-                "queue_database": os.path.join(os.path.dirname(__file__), '..', 'projects', 'queue_data', 'huey.db')
-            }
-
-            # Try to get the number of pending tasks
-            try:
-                pending_tasks = list(self.huey.pending())
-                stats["pending_tasks"] = len(pending_tasks)
-                stats["total_tasks"] += len(pending_tasks)
-            except Exception as e:
-                logger.error(f"Failed to get pending tasks: {e}")
-
-            # Try to get the number of scheduled tasks
-            try:
-                scheduled_tasks = list(self.huey.scheduled())
-                stats["scheduled_tasks"] = len(scheduled_tasks)
-                stats["total_tasks"] += len(scheduled_tasks)
-            except Exception as e:
-                logger.error(f"Failed to get scheduled tasks: {e}")
-
-            return stats
-
-        except Exception as e:
-            return {
-                "error": str(e),
-                "queue_database": os.path.join(os.path.dirname(__file__), '..', 'projects', 'queue_data', 'huey.db')
-            }
-
-
-# Global singleton instance
-queue_manager = QueueManager()
--- a/task_queue/optimized_consumer.py
+++ b/task_queue/optimized_consumer.py
@ -1,286 +0,0 @@
-#!/usr/bin/env python3
-"""
-Optimized queue consumer with integrated performance monitoring.
-"""
-
-import sys
-import os
-import time
-import signal
-import argparse
-import multiprocessing
-import logging
-from pathlib import Path
-from concurrent.futures import ThreadPoolExecutor
-import threading
-
-# Configure logging
-logger = logging.getLogger('app')
-
-# Add project root directory to Python path
-project_root = Path(__file__).parent.parent
-sys.path.insert(0, str(project_root))
-
-from task_queue.config import huey
-from task_queue.manager import queue_manager
-from task_queue.integration_tasks import process_files_async, cleanup_project_async
-from huey.consumer import Consumer
-
-
-class OptimizedQueueConsumer:
-    """Optimized queue consumer with integrated performance monitoring."""
-
-    def __init__(self, worker_type: str = "threads", workers: int = 2):
-        self.huey = huey
-        self.worker_type = worker_type
-        self.workers = workers
-        self.running = False
-        self.consumer = None
-        self.processed_count = 0
-        self.start_time = None
-
-        # Performance monitoring
-        self.performance_stats = {
-            'tasks_processed': 0,
-            'tasks_failed': 0,
-            'avg_processing_time': 0,
-            'start_time': None,
-            'last_activity': None
-        }
-
-        # Register signal handlers
-        signal.signal(signal.SIGINT, self._signal_handler)
-        signal.signal(signal.SIGTERM, self._signal_handler)
-
-    def _signal_handler(self, signum, frame):
-        """Signal handler for graceful shutdown."""
-        logger.info(f"\nReceived signal {signum}, shutting down queue consumer...")
-        self.running = False
-        if self.consumer:
-            self.consumer.stop()
-
-    def setup_optimizations(self):
-        """Set up performance optimizations."""
-        # Set environment variables
-        env_vars = {
-            'PYTHONUNBUFFERED': '1',
-            'PYTHONDONTWRITEBYTECODE': '1',
-        }
-
-        for key, value in env_vars.items():
-            os.environ[key] = value
-
-        # Optimize huey configuration
-        if hasattr(huey, 'immediate'):
-            huey.immediate = False
-
-        # Adjust based on worker type
-        if self.worker_type == "threads":
-            # Thread pool optimization
-            if hasattr(huey, 'worker_type'):
-                huey.worker_type = 'threads'
-
-            # Set thread pool size
-            if hasattr(huey, 'always_eager'):
-                huey.always_eager = False
-
-        logger.info("Queue consumer optimization setup complete:")
-        logger.info(f"- Worker type: {self.worker_type}")
-        logger.info(f"- Worker count: {self.workers}")
-
-    def monitor_performance(self):
-        """Performance monitoring thread."""
-        while self.running:
-            time.sleep(30)  # Output statistics every 30 seconds
-
-            if self.start_time:
-                elapsed = time.time() - self.start_time
-                rate = self.performance_stats['tasks_processed'] / max(1, elapsed)
-
-                logger.info(f"\n[Performance Stats]")
-                logger.info(f"- Uptime: {elapsed:.1f}s")
-                logger.info(f"- Tasks processed: {self.performance_stats['tasks_processed']}")
-                logger.info(f"- Failed tasks: {self.performance_stats['tasks_failed']}")
-                logger.info(f"- Average processing rate: {rate:.2f} tasks/s")
-
-                if self.performance_stats['avg_processing_time'] > 0:
-                    logger.info(f"- Average processing time: {self.performance_stats['avg_processing_time']:.2f}s")
-
-    def start(self):
-        """Start the queue consumer."""
-        logger.info("=" * 60)
-        logger.info("Optimized queue consumer starting")
-        logger.info("=" * 60)
-
-        # Apply optimizations
-        self.setup_optimizations()
-
-        logger.info(f"Database: {os.path.join(os.path.dirname(__file__), '..', 'projects', 'queue_data', 'huey.db')}")
-        logger.info("Press Ctrl+C to stop the consumer")
-
-        self.running = True
-        self.start_time = time.time()
-        self.performance_stats['start_time'] = self.start_time
-
-        # Start performance monitoring thread
-        monitor_thread = threading.Thread(target=self.monitor_performance, daemon=True)
-        monitor_thread.start()
-
-        try:
-            # Create consumer
-            self.consumer = Consumer(
-                self.huey,
-                workers=self.workers,
-                worker_type=self.worker_type,
-                max_delay=60.0,  # Maximum delay
-                check_delay=1.0,  # Check interval
-                periodic=True,    # Enable periodic tasks
-            )
-
-            logger.info("Queue consumer started, waiting for tasks...")
-
-            # Start the consumer
-            self.consumer.run()
-
-        except KeyboardInterrupt:
-            logger.info("\nReceived keyboard interrupt signal")
-        except Exception as e:
-            logger.error(f"Queue consumer runtime error: {e}")
-            import traceback
-            traceback.print_exc()
-        finally:
-            self.shutdown()
-
-    def shutdown(self):
-        """Shut down the queue consumer."""
-        logger.info("\nShutting down queue consumer...")
-        self.running = False
-
-        if self.consumer:
-            try:
-                self.consumer.stop()
-                logger.info("Queue consumer stopped")
-            except Exception as e:
-                logger.error(f"Error stopping queue consumer: {e}")
-
-        # Output final statistics
-        if self.start_time:
-            elapsed = time.time() - self.start_time
-            logger.info(f"\n[Final Stats]")
-            logger.info(f"- Total uptime: {elapsed:.1f}s")
-            logger.info(f"- Total tasks processed: {self.performance_stats['tasks_processed']}")
-            logger.info(f"- Total failed tasks: {self.performance_stats['tasks_failed']}")
-
-            if self.performance_stats['tasks_processed'] > 0:
-                rate = self.performance_stats['tasks_processed'] / elapsed
-                logger.info(f"- Average processing rate: {rate:.2f} tasks/s")
-
-
-def calculate_optimal_workers():
-    """Calculate the optimal number of worker threads."""
-    cpu_count = multiprocessing.cpu_count()
-
-    # Based on CPU core count and system resources
-    if cpu_count <= 2:
-        return 2
-    elif cpu_count <= 4:
-        return 4
-    else:
-        return min(8, cpu_count)
-
-
-def check_queue_status():
-    """Check queue status."""
-    try:
-        stats = queue_manager.get_queue_stats()
-
-        logger.info("\n[Queue Status]")
-        if isinstance(stats, dict):
-            if 'total_tasks' in stats:
-                logger.info(f"- Total tasks: {stats['total_tasks']}")
-            if 'pending_tasks' in stats:
-                logger.info(f"- Pending tasks: {stats['pending_tasks']}")
-            if 'scheduled_tasks' in stats:
-                logger.info(f"- Scheduled tasks: {stats['scheduled_tasks']}")
-
-            # Check database file
-            db_path = os.path.join(os.path.dirname(__file__), '..', 'projects', 'queue_data', 'huey.db')
-            if os.path.exists(db_path):
-                size = os.path.getsize(db_path)
-                logger.info(f"- Database size: {size} bytes")
-            else:
-                logger.info("- Database file: not found")
-
-    except Exception as e:
-        logger.error(f"Failed to get queue status: {e}")
-
-
-def main():
-    """Main entry point."""
-    parser = argparse.ArgumentParser(description="Optimized queue consumer")
-    parser.add_argument(
-        "--workers",
-        type=int,
-        default=calculate_optimal_workers(),
-        help=f"Number of worker threads (default: {calculate_optimal_workers()})"
-    )
-    parser.add_argument(
-        "--worker-type",
-        type=str,
-        default="threads",
-        choices=["threads", "greenlets", "gevent"],
-        help="Worker type (default: threads)"
-    )
-    parser.add_argument(
-        "--check-status",
-        action="store_true",
-        help="Check queue status and exit"
-    )
-    parser.add_argument(
-        "--profile",
-        type=str,
-        default="balanced",
-        choices=["low_memory", "balanced", "high_performance"],
-        help="Performance profile"
-    )
-
-    args = parser.parse_args()
-
-    # Apply performance profile
-    if args.profile == "low_memory":
-        os.environ['PYTHONOPTIMIZE'] = '1'
-        if args.workers > 2:
-            args.workers = 2
-            logger.info(f"Low memory mode: adjusted worker count to {args.workers}")
-    elif args.profile == "high_performance":
-        if args.workers < 4:
-            args.workers = 4
-            logger.info(f"High performance mode: adjusted worker count to {args.workers}")
-
-    # Check queue status
-    if args.check_status:
-        check_queue_status()
-        return
-
-    # Check environment
-    try:
-        import psutil
-        memory = psutil.virtual_memory()
-        logger.info("[System Info]")
-        logger.info(f"- CPU cores: {multiprocessing.cpu_count()}")
-        logger.info(f"- Available memory: {memory.available / (1024**3):.1f}GB")
-        logger.info(f"- Memory usage: {memory.percent:.1f}%")
-    except ImportError:
-        logger.info("[Tip] Install psutil to display system info: pip install psutil")
-
-    # Create and start the queue consumer
-    consumer = OptimizedQueueConsumer(
-        worker_type=args.worker_type,
-        workers=args.workers
-    )
-
-    consumer.start()
-
-
-if __name__ == "__main__":
-    main()
--- a/task_queue/task_status.py
+++ b/task_queue/task_status.py
@ -1,210 +0,0 @@
-#!/usr/bin/env python3
-"""
-Task status SQLite storage system.
-"""
-
-import json
-import os
-import sqlite3
-import time
-from typing import Dict, Optional, Any, List
-from pathlib import Path
-
-
-class TaskStatusStore:
-    """SQLite-based task status store."""
-
-    def __init__(self, db_path: str = "projects/queue_data/task_status.db"):
-        self.db_path = db_path
-        # Ensure directory exists
-        Path(db_path).parent.mkdir(parents=True, exist_ok=True)
-        self._init_database()
-
-    def _init_database(self):
-        """Initialize database tables."""
-        with sqlite3.connect(self.db_path) as conn:
-            conn.execute('''
-                CREATE TABLE IF NOT EXISTS task_status (
-                    task_id TEXT PRIMARY KEY,
-                    unique_id TEXT NOT NULL,
-                    status TEXT NOT NULL,
-                    created_at REAL NOT NULL,
-                    updated_at REAL NOT NULL,
-                    result TEXT,
-                    error TEXT
-                )
-            ''')
-            conn.commit()
-
-    def set_status(self, task_id: str, unique_id: str, status: str,
-                   result: Optional[Dict] = None, error: Optional[str] = None):
-        """Set task status."""
-        current_time = time.time()
-
-        with sqlite3.connect(self.db_path) as conn:
-            conn.execute('''
-                INSERT OR REPLACE INTO task_status
-                (task_id, unique_id, status, created_at, updated_at, result, error)
-                VALUES (?, ?, ?, ?, ?, ?, ?)
-            ''', (
-                task_id, unique_id, status, current_time, current_time,
-                json.dumps(result) if result else None,
-                error
-            ))
-            conn.commit()
-
-    def get_status(self, task_id: str) -> Optional[Dict]:
-        """Get task status."""
-        with sqlite3.connect(self.db_path) as conn:
-            conn.row_factory = sqlite3.Row
-            cursor = conn.execute(
-                'SELECT * FROM task_status WHERE task_id = ?', (task_id,)
-            )
-            row = cursor.fetchone()
-
-            if not row:
-                return None
-
-            result = dict(row)
-            # Parse JSON field
-            if result['result']:
-                result['result'] = json.loads(result['result'])
-
-            return result
-
-    def update_status(self, task_id: str, status: str,
-                     result: Optional[Dict] = None, error: Optional[str] = None):
-        """Update task status."""
-        with sqlite3.connect(self.db_path) as conn:
-            # Check if task exists
-            cursor = conn.execute(
-                'SELECT task_id FROM task_status WHERE task_id = ?', (task_id,)
-            )
-            if not cursor.fetchone():
-                return False
-
-            # Update status
-            conn.execute('''
-                UPDATE task_status
-                SET status = ?, updated_at = ?, result = ?, error = ?
-                WHERE task_id = ?
-            ''', (
-                status, time.time(),
-                json.dumps(result) if result else None,
-                error, task_id
-            ))
-            conn.commit()
-            return True
-
-    def delete_status(self, task_id: str):
-        """Delete task status."""
-        with sqlite3.connect(self.db_path) as conn:
-            cursor = conn.execute(
-                'DELETE FROM task_status WHERE task_id = ?', (task_id,)
-            )
-            conn.commit()
-            return cursor.rowcount > 0
-
-    def list_all(self) -> Dict[str, Dict]:
-        """List all task statuses."""
-        with sqlite3.connect(self.db_path) as conn:
-            conn.row_factory = sqlite3.Row
-            cursor = conn.execute(
-                'SELECT * FROM task_status ORDER BY updated_at DESC'
-            )
-            all_tasks = {}
-            for row in cursor:
-                result = dict(row)
-                # Parse JSON field
-                if result['result']:
-                    result['result'] = json.loads(result['result'])
-                all_tasks[result['task_id']] = result
-            return all_tasks
-
-    def get_by_unique_id(self, unique_id: str) -> List[Dict]:
-        """Get all tasks for a given project ID."""
-        with sqlite3.connect(self.db_path) as conn:
-            conn.row_factory = sqlite3.Row
-            cursor = conn.execute(
-                'SELECT * FROM task_status WHERE unique_id = ? ORDER BY updated_at DESC',
-                (unique_id,)
-            )
-            tasks = []
-            for row in cursor:
-                result = dict(row)
-                if result['result']:
-                    result['result'] = json.loads(result['result'])
-                tasks.append(result)
-            return tasks
-
-    def cleanup_old_tasks(self, older_than_days: int = 7) -> int:
-        """Clean up old task records."""
-        cutoff_time = time.time() - (older_than_days * 24 * 3600)
-
-        with sqlite3.connect(self.db_path) as conn:
-            cursor = conn.execute(
-                'DELETE FROM task_status WHERE updated_at < ?',
-                (cutoff_time,)
-            )
-            conn.commit()
-            return cursor.rowcount
-
-    def get_statistics(self) -> Dict[str, Any]:
-        """Get task statistics."""
-        with sqlite3.connect(self.db_path) as conn:
-            # Total tasks
-            total = conn.execute('SELECT COUNT(*) FROM task_status').fetchone()[0]
-
-            # Status breakdown
-            status_stats = conn.execute('''
-                SELECT status, COUNT(*) as count
-                FROM task_status
-                GROUP BY status
-            ''').fetchall()
-
-            # Tasks in the last 24 hours
-            recent = time.time() - (24 * 3600)
-            recent_tasks = conn.execute(
-                'SELECT COUNT(*) FROM task_status WHERE updated_at > ?',
-                (recent,)
-            ).fetchone()[0]
-
-            return {
-                'total_tasks': total,
-                'status_breakdown': dict(status_stats),
-                'recent_24h': recent_tasks,
-                'database_path': self.db_path
-            }
-
-    def search_tasks(self, status: Optional[str] = None,
-                    unique_id: Optional[str] = None,
-                    limit: int = 100) -> List[Dict]:
-        """Search tasks."""
-        query = 'SELECT * FROM task_status WHERE 1=1'
-        params = []
-
-        if status:
-            query += ' AND status = ?'
-            params.append(status)
-
-        if unique_id:
-            query += ' AND unique_id = ?'
-            params.append(unique_id)
-
-        query += ' ORDER BY updated_at DESC LIMIT ?'
-        params.append(limit)
-
-        with sqlite3.connect(self.db_path) as conn:
-            conn.row_factory = sqlite3.Row
-            cursor = conn.execute(query, params)
-            tasks = []
-            for row in cursor:
-                result = dict(row)
-                if result['result']:
-                    result['result'] = json.loads(result['result'])
-                tasks.append(result)
-            return tasks
-
-
-# Global task status store instance
-task_status_store = TaskStatusStore()
--- a/task_queue/tasks.py
+++ b/task_queue/tasks.py
@ -1,359 +0,0 @@
-#!/usr/bin/env python3
-"""
-File processing tasks for the queue system.
-"""
-
-import os
-import json
-import time
-import shutil
-import logging
-from pathlib import Path
-from typing import Dict, List, Optional, Any
-from huey import crontab
-
-# Configure logging
-logger = logging.getLogger('app')
-
-from .config import huey
-from utils.file_utils import (
-    extract_zip_file,
-    get_file_hash,
-    load_processed_files_log,
-    save_processed_files_log,
-    get_document_preview
-)
-
-
-@huey.task()
-def process_file_async(
-    project_id: str,
-    file_path: str,
-    original_filename: str = None,
-    target_directory: str = "files"
-) -> Dict[str, Any]:
-    """
-    Asynchronously process a single file.
-
-    Args:
-        project_id: Project ID
-        file_path: File path
-        original_filename: Original filename
-        target_directory: Target directory
-
-    Returns:
-        Processing result dictionary
-    """
-    try:
-        logger.info(f"Starting file processing: {file_path}")
-
-        # Ensure project directory exists
-        project_dir = os.path.join("projects", project_id)
-        files_dir = os.path.join(project_dir, target_directory)
-        os.makedirs(files_dir, exist_ok=True)
-
-        # Get file hash as identifier
-        file_hash = get_file_hash(file_path)
-
-        # Check if file has already been processed
-        processed_log = load_processed_files_log(project_id)
-        if file_hash in processed_log:
-            logger.info(f"File already processed, skipping: {file_path}")
-            return {
-                "status": "skipped",
-                "message": "File already processed",
-                "file_hash": file_hash,
-                "project_id": project_id
-            }
-
-        # Process the file
-        result = _process_single_file(
-            file_path,
-            files_dir,
-            original_filename or os.path.basename(file_path)
-        )
-
-        # Update processing log
-        if result["status"] == "success":
-            processed_log[file_hash] = {
-                "original_path": file_path,
-                "original_filename": original_filename or os.path.basename(file_path),
-                "processed_at": str(time.time()),
-                "status": "processed",
-                "result": result
-            }
-            save_processed_files_log(project_id, processed_log)
-
-        result["file_hash"] = file_hash
-        result["project_id"] = project_id
-
-        logger.info(f"File processing complete: {file_path}, status: {result['status']}")
-        return result
-
-    except Exception as e:
-        error_msg = f"Error processing file: {str(e)}"
-        logger.error(error_msg)
-        return {
-            "status": "error",
-            "message": error_msg,
-            "file_path": file_path,
-            "project_id": project_id
-        }
-
-
-@huey.task()
-def process_multiple_files_async(
-    project_id: str,
-    file_paths: List[str],
-    original_filenames: List[str] = None
-) -> List[Dict[str, Any]]:
-    """
-    Asynchronously process multiple files in batch.
-
-    Args:
-        project_id: Project ID
-        file_paths: List of file paths
-        original_filenames: List of original filenames
-
-    Returns:
-        List of processing results
-    """
-    try:
-        logger.info(f"Starting batch processing of {len(file_paths)} files")
-
-        results = []
-        for i, file_path in enumerate(file_paths):
-            original_filename = original_filenames[i] if original_filenames and i < len(original_filenames) else None
-
-            # Create async task for each file
-            result = process_file_async(project_id, file_path, original_filename)
-            results.append(result)
-
-        logger.info(f"Batch file processing tasks submitted, total {len(results)} files")
-        return results
-
-    except Exception as e:
-        error_msg = f"Error during batch file processing: {str(e)}"
-        logger.error(error_msg)
-        return [{
-            "status": "error",
-            "message": error_msg,
-            "project_id": project_id
-        }]
-
-
-@huey.task()
-def process_zip_file_async(
-    project_id: str,
-    zip_path: str,
-    extract_to: str = None
-) -> Dict[str, Any]:
-    """
-    Asynchronously process a zip archive file.
-
-    Args:
-        project_id: Project ID
-        zip_path: Zip file path
-        extract_to: Extraction target directory
-
-    Returns:
-        Processing result dictionary
-    """
-    try:
-        logger.info(f"Starting zip file processing: {zip_path}")
-
-        # Set extraction directory
-        if extract_to is None:
-            extract_to = os.path.join("projects", project_id, "extracted", os.path.basename(zip_path))
-
-        os.makedirs(extract_to, exist_ok=True)
-
-        # Extract files
-        extracted_files = extract_zip_file(zip_path, extract_to)
-
-        if not extracted_files:
-            return {
-                "status": "error",
-                "message": "Extraction failed or no supported files found",
-                "zip_path": zip_path,
-                "project_id": project_id
-            }
-
-        # Batch process extracted files
-        result = process_multiple_files_async(project_id, extracted_files)
-
-        return {
-            "status": "success",
-            "message": f"Zip file processing complete, extracted {len(extracted_files)} files",
-            "zip_path": zip_path,
-            "extract_to": extract_to,
-            "extracted_files": extracted_files,
-            "project_id": project_id,
-            "batch_task_result": result
-        }
-
-    except Exception as e:
-        error_msg = f"Error processing zip file: {str(e)}"
-        logger.error(error_msg)
-        return {
-            "status": "error",
-            "message": error_msg,
-            "zip_path": zip_path,
-            "project_id": project_id
-        }
-
-
-@huey.task()
-def cleanup_processed_files(
-    project_id: str,
-    older_than_days: int = 30
-) -> Dict[str, Any]:
-    """
-    Clean up old processed files.
-
-    Args:
-        project_id: Project ID
-        older_than_days: Clean files older than this many days
-
-    Returns:
-        Cleanup result dictionary
-    """
-    try:
-        logger.info(f"Starting cleanup of files older than {older_than_days} days in project {project_id}")
-
-        project_dir = os.path.join("projects", project_id)
-        if not os.path.exists(project_dir):
-            return {
-                "status": "error",
-                "message": "Project directory does not exist",
-                "project_id": project_id
-            }
-
-        current_time = time.time()
-        cutoff_time = current_time - (older_than_days * 24 * 3600)
-        cleaned_files = []
-
-        # Walk through project directory
-        for root, dirs, files in os.walk(project_dir):
-            for file in files:
-                file_path = os.path.join(root, file)
-                file_mtime = os.path.getmtime(file_path)
-
-                if file_mtime < cutoff_time:
-                    try:
-                        os.remove(file_path)
-                        cleaned_files.append(file_path)
-                        logger.info(f"Deleted old file: {file_path}")
-                    except Exception as e:
-                        logger.error(f"Failed to delete file {file_path}: {str(e)}")
-
-        # Clean up empty directories
-        for root, dirs, files in os.walk(project_dir, topdown=False):
-            for dir in dirs:
-                dir_path = os.path.join(root, dir)
-                try:
-                    if not os.listdir(dir_path):
-                        os.rmdir(dir_path)
-                        logger.info(f"Deleted empty directory: {dir_path}")
-                except Exception as e:
-                    logger.error(f"Failed to delete directory {dir_path}: {str(e)}")
-
-        return {
-            "status": "success",
-            "message": f"Cleanup complete, deleted {len(cleaned_files)} files",
-            "project_id": project_id,
-            "cleaned_files": cleaned_files,
-            "older_than_days": older_than_days
-        }
-
-    except Exception as e:
-        error_msg = f"Error during file cleanup: {str(e)}"
-        logger.error(error_msg)
-        return {
-            "status": "error",
-            "message": error_msg,
-            "project_id": project_id
-        }
-
-
-def _process_single_file(
-    file_path: str,
-    target_dir: str,
-    original_filename: str
-) -> Dict[str, Any]:
-    """
-    Internal method for processing a single file.
-
-    Args:
-        file_path: Source file path
-        target_dir: Target directory
-        original_filename: Original filename
-
-    Returns:
-        Processing result dictionary
-    """
-    try:
-        # Check if file exists
-        if not os.path.exists(file_path):
-            return {
-                "status": "error",
-                "message": "Source file does not exist",
-                "file_path": file_path
-            }
-
-        # Get file info
-        file_size = os.path.getsize(file_path)
-        file_ext = os.path.splitext(original_filename)[1].lower()
-
-        # Different processing based on file type
-        supported_extensions = ['.txt', '.md', '.csv', '.xlsx', '.zip']
-
-        if file_ext not in supported_extensions:
-            return {
-                "status": "error",
-                "message": f"Unsupported file type: {file_ext}",
-                "file_path": file_path,
-                "supported_extensions": supported_extensions
-            }
-
-        # Copy file to target directory
-        target_file_path = os.path.join(target_dir, original_filename)
-
-        # If target file already exists, add timestamp
-        if os.path.exists(target_file_path):
-            name, ext = os.path.splitext(original_filename)
-            timestamp = int(time.time())
-            target_file_path = os.path.join(target_dir, f"{name}_{timestamp}{ext}")
-
-        shutil.copy2(file_path, target_file_path)
-
-        # Get file preview (if it's a text file)
-        preview = None
-        if file_ext in ['.txt', '.md']:
-            preview = get_document_preview(target_file_path, max_lines=5)
-
-        return {
-            "status": "success",
-            "message": "File processed successfully",
-            "original_path": file_path,
-            "target_path": target_file_path,
-            "file_size": file_size,
-            "file_extension": file_ext,
-            "preview": preview
-        }
-
-    except Exception as e:
-        return {
-            "status": "error",
-            "message": f"Error processing file: {str(e)}",
-            "file_path": file_path
-        }
-
-
-# Periodic task example: clean up files older than 30 days daily at 2 AM
-@huey.periodic_task(crontab(hour=2, minute=0))
-def daily_cleanup():
-    """Daily cleanup task."""
-    logger.info("Running daily cleanup task")
-    # Add cleanup logic here
-    return {"status": "completed", "message": "Daily cleanup task completed"}
--- a/utils/init.py
+++ b/utils/init.py
@ -13,23 +13,6 @@ from .file_utils import (
    save_processed_files_log
 )

-from .dataset_manager import (
-    download_dataset_files,
-    generate_dataset_structure,
-    remove_dataset_directory,
-    remove_dataset_directory_by_key
-)
-
-from .project_manager import (
-    generate_project_readme,
-    save_project_readme,
-    get_project_status,
-    remove_project,
-    list_projects,
-    get_project_stats
-)
-
-
 from .system_optimizer import (
    setup_system_optimizations
 )
@ -59,11 +42,6 @@ from .api_models import (
    ProjectListResponse,
    ProjectStatsResponse,
    ProjectActionResponse,
-    QueueTaskRequest,
-    IncrementalTaskRequest,
-    QueueTaskResponse,
-    QueueStatusResponse,
-    TaskStatusResponse,
    create_success_response,
    create_error_response,
    create_chat_response,
@ -90,20 +68,6 @@ __all__ = [
    'load_processed_files_log',
    'save_processed_files_log',
    
-    # dataset_manager
-    'download_dataset_files',
-    'generate_dataset_structure',
-    'remove_dataset_directory',
-    'remove_dataset_directory_by_key',
-    
-    # project_manager
-    'generate_project_readme',
-    'save_project_readme',
-    'get_project_status',
-    'remove_project',
-    'list_projects',
-    'get_project_stats',
-    
    # agent_pool
    'AgentPool',
    'get_agent_pool',
@ -128,10 +92,6 @@ __all__ = [
    'ProjectListResponse',
    'ProjectStatsResponse',
    'ProjectActionResponse',
-    'QueueTaskRequest',
-    'QueueTaskResponse',
-    'QueueStatusResponse',
-    'TaskStatusResponse',
    'create_success_response',
    'create_error_response',
    'create_chat_response',
--- a/utils/api_models.py
+++ b/utils/api_models.py
@ -224,133 +224,6 @@ def create_error_response(message: str, error_type: str = "error", **kwargs) ->
    }


-class QueueTaskRequest(BaseModel):
-    """Queue task request model"""
-    dataset_id: str
-    files: Optional[Dict[str, List[str]]] = Field(default=None, description="Files organized by key groups. Each key maps to a list of file paths (supports zip files)")
-    upload_folder: Optional[Dict[str, str]] = Field(default=None, description="Upload folders organized by group names. Each key maps to a folder name. Example: {'group1': 'my_project1', 'group2': 'my_project2'}")
-    priority: Optional[int] = Field(default=0, description="Task priority (higher number = higher priority)")
-    delay: Optional[int] = Field(default=0, description="Delay execution by N seconds")
-
-    model_config = ConfigDict(extra='allow')
-    
-    @field_validator('upload_folder', mode='before')
-    @classmethod
-    def validate_upload_folder(cls, v):
-        """Validate upload_folder dict format"""
-        if v is None:
-            return None
-        if isinstance(v, dict):
-            # Validate dict format
-            for key, value in v.items():
-                if not isinstance(key, str):
-                    raise ValueError(f"Key in upload_folder dict must be string, got {type(key)}")
-                if not isinstance(value, str):
-                    raise ValueError(f"Value in upload_folder dict must be string (folder name), got {type(value)} for key '{key}'")
-            return v
-        else:
-            raise ValueError(f"upload_folder must be a dict with group names as keys and folder names as values, got {type(v)}")
-    
-    @field_validator('files', mode='before')
-    @classmethod
-    def validate_files(cls, v):
-        """Validate dict format with key-grouped files"""
-        if v is None:
-            return None
-        if isinstance(v, dict):
-            # Validate dict format
-            for key, value in v.items():
-                if not isinstance(key, str):
-                    raise ValueError(f"Key in files dict must be string, got {type(key)}")
-                if not isinstance(value, list):
-                    raise ValueError(f"Value in files dict must be list, got {type(value)} for key '{key}'")
-                for item in value:
-                    if not isinstance(item, str):
-                        raise ValueError(f"File paths must be strings, got {type(item)} in key '{key}'")
-            return v
-        else:
-            raise ValueError(f"Files must be a dict with key groups, got {type(v)}")
-
-
-class IncrementalTaskRequest(BaseModel):
-    """Incremental file processing request model"""
-    dataset_id: str = Field(..., description="Dataset ID for the project")
-    files_to_add: Optional[Dict[str, List[str]]] = Field(default=None, description="Files to add organized by key groups")
-    files_to_remove: Optional[Dict[str, List[str]]] = Field(default=None, description="Files to remove organized by key groups")
-    system_prompt: Optional[str] = None
-    mcp_settings: Optional[List[Dict]] = None
-    priority: Optional[int] = Field(default=0, description="Task priority (higher number = higher priority)")
-    delay: Optional[int] = Field(default=0, description="Delay execution by N seconds")
-
-    model_config = ConfigDict(extra='allow')
-    
-    @field_validator('files_to_add', mode='before')
-    @classmethod
-    def validate_files_to_add(cls, v):
-        """Validate files_to_add dict format"""
-        if v is None:
-            return None
-        if isinstance(v, dict):
-            for key, value in v.items():
-                if not isinstance(key, str):
-                    raise ValueError(f"Key in files_to_add dict must be string, got {type(key)}")
-                if not isinstance(value, list):
-                    raise ValueError(f"Value in files_to_add dict must be list, got {type(value)} for key '{key}'")
-                for item in value:
-                    if not isinstance(item, str):
-                        raise ValueError(f"File paths must be strings, got {type(item)} in key '{key}'")
-            return v
-        else:
-            raise ValueError(f"files_to_add must be a dict with key groups, got {type(v)}")
-    
-    @field_validator('files_to_remove', mode='before')
-    @classmethod
-    def validate_files_to_remove(cls, v):
-        """Validate files_to_remove dict format"""
-        if v is None:
-            return None
-        if isinstance(v, dict):
-            for key, value in v.items():
-                if not isinstance(key, str):
-                    raise ValueError(f"Key in files_to_remove dict must be string, got {type(key)}")
-                if not isinstance(value, list):
-                    raise ValueError(f"Value in files_to_remove dict must be list, got {type(value)} for key '{key}'")
-                for item in value:
-                    if not isinstance(item, str):
-                        raise ValueError(f"File paths must be strings, got {type(item)} in key '{key}'")
-            return v
-        else:
-            raise ValueError(f"files_to_remove must be a dict with key groups, got {type(v)}")
-
-
-class QueueTaskResponse(BaseModel):
-    """Queue task response model"""
-    success: bool
-    message: str
-    dataset_id: str
-    task_id: Optional[str] = None
-    task_status: Optional[str] = None
-    estimated_processing_time: Optional[int] = None  # seconds
-
-
-class QueueStatusResponse(BaseModel):
-    """Queue status response model"""
-    success: bool
-    message: str
-    queue_stats: Dict[str, Any]
-    pending_tasks: List[Dict[str, Any]]
-
-
-class TaskStatusResponse(BaseModel):
-    """Task status response model"""
-    success: bool
-    message: str
-    task_id: str
-    task_status: Optional[str] = None
-    task_result: Optional[Dict[str, Any]] = None
-    error: Optional[str] = None
-
-
 def create_chat_response(
    messages: List[Message],
    model: str,
--- a/utils/data_merger.py
+++ b/utils/data_merger.py
@ -1,439 +0,0 @@
-#!/usr/bin/env python3
-"""
-Data merging functions for combining processed file results.
-"""
-
-import os
-import pickle
-import logging
-from typing import Dict
-from utils.settings import EMBEDDING_MODEL_NAME
-
-# Configure logger
-logger = logging.getLogger('app')
-
-# Try to import numpy, but handle if missing
-try:
-    import numpy as np
-    NUMPY_SUPPORT = True
-except ImportError:
-    logger.warning("NumPy not available, some embedding features may be limited")
-    NUMPY_SUPPORT = False
-
-
-def merge_documents_by_group(unique_id: str, group_name: str) -> Dict:
-    """Merge all document.txt files in a group into a single document."""
-    
-    processed_group_dir = os.path.join("projects", "data", unique_id, "processed", group_name)
-    dataset_group_dir = os.path.join("projects", "data", unique_id, "datasets", group_name)
-    os.makedirs(dataset_group_dir, exist_ok=True)
-    
-    merged_document_path = os.path.join(dataset_group_dir, "document.txt")
-    
-    result = {
-        "success": False,
-        "merged_document_path": merged_document_path,
-        "source_files": [],
-        "total_pages": 0,
-        "total_characters": 0,
-        "error": None
-    }
-    
-    try:
-        # Find all document.txt files in the processed directory
-        document_files = []
-        if os.path.exists(processed_group_dir):
-            for item in os.listdir(processed_group_dir):
-                item_path = os.path.join(processed_group_dir, item)
-                if os.path.isdir(item_path):
-                    document_path = os.path.join(item_path, "document.txt")
-                    if os.path.exists(document_path) and os.path.getsize(document_path) > 0:
-                        document_files.append((item, document_path))
-        
-        if not document_files:
-            result["error"] = "No document files found to merge"
-            return result
-        
-        # Merge all documents with page separators
-        merged_content = []
-        total_characters = 0
-        
-        for filename_stem, document_path in sorted(document_files):
-            try:
-                with open(document_path, 'r', encoding='utf-8') as f:
-                    content = f.read().strip()
-                
-                if content:
-                    merged_content.append(f"# Page {filename_stem}")
-                    merged_content.append(content)
-                    total_characters += len(content)
-                    result["source_files"].append(filename_stem)
-                    
-            except Exception as e:
-                logger.error(f"Error reading document file {document_path}: {str(e)}")
-                continue
-        
-        if merged_content:
-            # Write merged document
-            with open(merged_document_path, 'w', encoding='utf-8') as f:
-                f.write('\n\n'.join(merged_content))
-            
-            result["total_pages"] = len(document_files)
-            result["total_characters"] = total_characters
-            result["success"] = True
-            
-        else:
-            result["error"] = "No valid content found in document files"
-            
-    except Exception as e:
-        result["error"] = f"Document merging failed: {str(e)}"
-        logger.error(f"Error merging documents for group {group_name}: {str(e)}")
-    
-    return result
-
-
-def merge_paginations_by_group(unique_id: str, group_name: str) -> Dict:
-    """Merge all pagination.txt files in a group."""
-    
-    processed_group_dir = os.path.join("projects", "data", unique_id, "processed", group_name)
-    dataset_group_dir = os.path.join("projects", "data", unique_id, "datasets", group_name)
-    os.makedirs(dataset_group_dir, exist_ok=True)
-    
-    merged_pagination_path = os.path.join(dataset_group_dir, "pagination.txt")
-    
-    result = {
-        "success": False,
-        "merged_pagination_path": merged_pagination_path,
-        "source_files": [],
-        "total_lines": 0,
-        "error": None
-    }
-    
-    try:
-        # Find all pagination.txt files
-        pagination_files = []
-        if os.path.exists(processed_group_dir):
-            for item in os.listdir(processed_group_dir):
-                item_path = os.path.join(processed_group_dir, item)
-                if os.path.isdir(item_path):
-                    pagination_path = os.path.join(item_path, "pagination.txt")
-                    if os.path.exists(pagination_path) and os.path.getsize(pagination_path) > 0:
-                        pagination_files.append((item, pagination_path))
-        
-        if not pagination_files:
-            result["error"] = "No pagination files found to merge"
-            return result
-        
-        # Merge all pagination files
-        merged_lines = []
-        
-        for filename_stem, pagination_path in sorted(pagination_files):
-            try:
-                with open(pagination_path, 'r', encoding='utf-8') as f:
-                    lines = f.readlines()
-                
-                for line in lines:
-                    line = line.strip()
-                    if line:
-                        merged_lines.append(line)
-                        
-                result["source_files"].append(filename_stem)
-                
-            except Exception as e:
-                logger.error(f"Error reading pagination file {pagination_path}: {str(e)}")
-                continue
-        
-        if merged_lines:
-            # Write merged pagination
-            with open(merged_pagination_path, 'w', encoding='utf-8') as f:
-                for line in merged_lines:
-                    f.write(f"{line}\n")
-            
-            result["total_lines"] = len(merged_lines)
-            result["success"] = True
-            
-        else:
-            result["error"] = "No valid pagination data found"
-            
-    except Exception as e:
-        result["error"] = f"Pagination merging failed: {str(e)}"
-        logger.error(f"Error merging paginations for group {group_name}: {str(e)}")
-    
-    return result
-
-
-def merge_embeddings_by_group(unique_id: str, group_name: str) -> Dict:
-    """Merge all embedding.pkl files in a group."""
-    
-    processed_group_dir = os.path.join("projects", "data", unique_id, "processed", group_name)
-    dataset_group_dir = os.path.join("projects", "data", unique_id, "datasets", group_name)
-    os.makedirs(dataset_group_dir, exist_ok=True)
-    
-    merged_embedding_path = os.path.join(dataset_group_dir, "embedding.pkl")
-    
-    result = {
-        "success": False,
-        "merged_embedding_path": merged_embedding_path,
-        "source_files": [],
-        "total_chunks": 0,
-        "total_dimensions": 0,
-        "error": None
-    }
-    
-    try:
-        # Find all embedding.pkl files
-        embedding_files = []
-        if os.path.exists(processed_group_dir):
-            for item in os.listdir(processed_group_dir):
-                item_path = os.path.join(processed_group_dir, item)
-                if os.path.isdir(item_path):
-                    embedding_path = os.path.join(item_path, "embedding.pkl")
-                    if os.path.exists(embedding_path) and os.path.getsize(embedding_path) > 0:
-                        embedding_files.append((item, embedding_path))
-        
-        if not embedding_files:
-            result["error"] = "No embedding files found to merge"
-            return result
-        
-        # Load and merge all embedding data
-        all_chunks = []
-        all_embeddings = []  # Fix: collect all embedding vectors
-        total_chunks = 0
-        dimensions = 0
-        chunking_strategy = 'unknown'
-        chunking_params = {}
-        model_path = EMBEDDING_MODEL_NAME
-        
-        for filename_stem, embedding_path in sorted(embedding_files):
-            try:
-                with open(embedding_path, 'rb') as f:
-                    embedding_data = pickle.load(f)
-                
-                if isinstance(embedding_data, dict) and 'chunks' in embedding_data:
-                    chunks = embedding_data['chunks']
-                    
-                    # Get embedding vectors (critical fix)
-                    if 'embeddings' in embedding_data:
-                        embeddings = embedding_data['embeddings']
-                        all_embeddings.append(embeddings)
-                        
-                        # Get model metadata from the first file
-                        if 'model_path' in embedding_data:
-                            model_path = embedding_data['model_path']
-                        if 'chunking_strategy' in embedding_data:
-                            chunking_strategy = embedding_data['chunking_strategy']
-                        if 'chunking_params' in embedding_data:
-                            chunking_params = embedding_data['chunking_params']
-                    
-                    # Add source file metadata to each chunk
-                    for chunk in chunks:
-                        if isinstance(chunk, dict):
-                            chunk['source_file'] = filename_stem
-                            chunk['source_group'] = group_name
-                        elif isinstance(chunk, str):
-                            # If the chunk is a string, keep it unchanged
-                            pass
-                    
-                    all_chunks.extend(chunks)
-                    total_chunks += len(chunks)
-                    
-                    result["source_files"].append(filename_stem)
-                
-            except Exception as e:
-                logger.error(f"Error loading embedding file {embedding_path}: {str(e)}")
-                continue
-        
-        if all_chunks and all_embeddings:
-            # Merge all embedding vectors
-            try:
-                # Try merging tensors with torch
-                import torch
-                if all(isinstance(emb, torch.Tensor) for emb in all_embeddings):
-                    merged_embeddings = torch.cat(all_embeddings, dim=0)
-                    dimensions = merged_embeddings.shape[1]
-                else:
-                    # If the values are not tensors, try converting them to numpy
-                    import numpy as np
-                    if NUMPY_SUPPORT:
-                        np_embeddings = []
-                        for emb in all_embeddings:
-                            if hasattr(emb, 'numpy'):
-                                np_embeddings.append(emb.numpy())
-                            elif isinstance(emb, np.ndarray):
-                                np_embeddings.append(emb)
-                            else:
-                                # If conversion fails, skip this file
-                                logger.warning(f"Warning: Cannot convert embedding to numpy from file {filename_stem}")
-                                continue
-                        
-                        if np_embeddings:
-                            merged_embeddings = np.concatenate(np_embeddings, axis=0)
-                            dimensions = merged_embeddings.shape[1]
-                        else:
-                            result["error"] = "No valid embedding tensors could be merged"
-                            return result
-                    else:
-                        result["error"] = "NumPy not available for merging embeddings"
-                        return result
-                        
-            except ImportError:
-                # If torch is unavailable, try using numpy
-                if NUMPY_SUPPORT:
-                    import numpy as np
-                    np_embeddings = []
-                    for emb in all_embeddings:
-                        if hasattr(emb, 'numpy'):
-                            np_embeddings.append(emb.numpy())
-                        elif isinstance(emb, np.ndarray):
-                            np_embeddings.append(emb)
-                        else:
-                            logger.warning(f"Warning: Cannot convert embedding to numpy from file {filename_stem}")
-                            continue
-                    
-                    if np_embeddings:
-                        merged_embeddings = np.concatenate(np_embeddings, axis=0)
-                        dimensions = merged_embeddings.shape[1]
-                    else:
-                        result["error"] = "No valid embedding tensors could be merged"
-                        return result
-                else:
-                    result["error"] = "Neither torch nor numpy available for merging embeddings"
-                    return result
-            except Exception as e:
-                result["error"] = f"Failed to merge embedding tensors: {str(e)}"
-                logger.error(f"Error merging embedding tensors: {str(e)}")
-                return result
-            
-            # Create merged embedding data structure
-            merged_embedding_data = {
-                'chunks': all_chunks,
-                'embeddings': merged_embeddings,  # Critical fix: include the embeddings key
-                'total_chunks': total_chunks,
-                'dimensions': dimensions,
-                'source_files': result["source_files"],
-                'group_name': group_name,
-                'merged_at': str(__import__('time').time()),
-                'chunking_strategy': chunking_strategy,
-                'chunking_params': chunking_params,
-                'model_path': model_path
-            }
-            
-            # Save merged embeddings
-            with open(merged_embedding_path, 'wb') as f:
-                pickle.dump(merged_embedding_data, f)
-            
-            result["total_chunks"] = total_chunks
-            result["total_dimensions"] = dimensions
-            result["success"] = True
-            
-        else:
-            result["error"] = "No valid embedding data found"
-            
-    except Exception as e:
-        result["error"] = f"Embedding merging failed: {str(e)}"
-        logger.error(f"Error merging embeddings for group {group_name}: {str(e)}")
-    
-    return result
-
-
-def merge_all_data_by_group(unique_id: str, group_name: str) -> Dict:
-    """Merge documents, paginations, and embeddings for a group."""
-    
-    merge_results = {
-        "group_name": group_name,
-        "unique_id": unique_id,
-        "success": True,
-        "document_merge": None,
-        "pagination_merge": None, 
-        "embedding_merge": None,
-        "errors": []
-    }
-    
-    # Merge documents
-    document_result = merge_documents_by_group(unique_id, group_name)
-    merge_results["document_merge"] = document_result
-    
-    if not document_result["success"]:
-        merge_results["success"] = False
-        merge_results["errors"].append(f"Document merge failed: {document_result['error']}")
-    
-    # Merge paginations
-    pagination_result = merge_paginations_by_group(unique_id, group_name)
-    merge_results["pagination_merge"] = pagination_result
-    
-    if not pagination_result["success"]:
-        merge_results["success"] = False
-        merge_results["errors"].append(f"Pagination merge failed: {pagination_result['error']}")
-    
-    # Merge embeddings
-    embedding_result = merge_embeddings_by_group(unique_id, group_name)
-    merge_results["embedding_merge"] = embedding_result
-    
-    if not embedding_result["success"]:
-        merge_results["success"] = False
-        merge_results["errors"].append(f"Embedding merge failed: {embedding_result['error']}")
-    
-    return merge_results
-
-
-def get_group_merge_status(unique_id: str, group_name: str) -> Dict:
-    """Get the status of merged data for a group."""
-    
-    dataset_group_dir = os.path.join("projects", "data", unique_id, "datasets", group_name)
-    
-    status = {
-        "group_name": group_name,
-        "unique_id": unique_id,
-        "dataset_dir_exists": os.path.exists(dataset_group_dir),
-        "document_exists": False,
-        "document_size": 0,
-        "pagination_exists": False,
-        "pagination_size": 0,
-        "embedding_exists": False,
-        "embedding_size": 0,
-        "merge_complete": False
-    }
-    
-    if os.path.exists(dataset_group_dir):
-        document_path = os.path.join(dataset_group_dir, "document.txt")
-        pagination_path = os.path.join(dataset_group_dir, "pagination.txt")
-        embedding_path = os.path.join(dataset_group_dir, "embedding.pkl")
-        
-        if os.path.exists(document_path):
-            status["document_exists"] = True
-            status["document_size"] = os.path.getsize(document_path)
-        
-        if os.path.exists(pagination_path):
-            status["pagination_exists"] = True
-            status["pagination_size"] = os.path.getsize(pagination_path)
-        
-        if os.path.exists(embedding_path):
-            status["embedding_exists"] = True
-            status["embedding_size"] = os.path.getsize(embedding_path)
-        
-        # Check if all files exist and are not empty
-        if (status["document_exists"] and status["document_size"] > 0 and
-            status["pagination_exists"] and status["pagination_size"] > 0 and
-            status["embedding_exists"] and status["embedding_size"] > 0):
-            status["merge_complete"] = True
-    
-    return status
-
-
-def cleanup_dataset_group(unique_id: str, group_name: str) -> bool:
-    """Clean up merged dataset files for a group."""
-    
-    dataset_group_dir = os.path.join("projects", "data", unique_id, "datasets", group_name)
-    
-    try:
-        if os.path.exists(dataset_group_dir):
-            import shutil
-            shutil.rmtree(dataset_group_dir)
-            logger.info(f"Cleaned up dataset group: {group_name}")
-            return True
-        else:
-            return True  # Nothing to clean up
-            
-    except Exception as e:
-        logger.error(f"Error cleaning up dataset group {group_name}: {str(e)}")
-        return False
--- a/utils/dataset_manager.py
+++ b/utils/dataset_manager.py
@ -1,297 +0,0 @@
-#!/usr/bin/env python3
-"""
-Dataset management functions for organizing and processing datasets.
-New implementation with per-file processing and group merging.
-"""
-
-import os
-import json
-import logging
-from typing import Dict, List
-
-# Configure logger
-logger = logging.getLogger('app')
-
-# Import new modules
-from utils.file_manager import (
-    ensure_directories, sync_files_to_group, cleanup_orphaned_files,
-    get_group_files_list
-)
-from utils.single_file_processor import (
-    process_single_file, check_file_already_processed
-)
-from utils.data_merger import (
-    merge_all_data_by_group, cleanup_dataset_group
-)
-
-
-async def download_dataset_files(unique_id: str, files: Dict[str, List[str]], incremental_mode: bool = False) -> Dict[str, List[str]]:
-    """
-    Process dataset files with new architecture:
-    1. Sync files to group directories
-    2. Process each file individually
-    3. Merge results by group
-    4. Clean up orphaned files (only in non-incremental mode)
-    
-    Args:
-        unique_id: Project ID
-        files: Dictionary of files to process, grouped by key
-        incremental_mode: If True, preserve existing files and only process new ones
-    """
-    if not files:
-        return {}
-    
-    logger.info(f"Starting {'incremental' if incremental_mode else 'full'} file processing for project: {unique_id}")
-    
-    # Ensure project directories exist
-    ensure_directories(unique_id)
-    
-    # Step 1: Sync files to group directories
-    logger.info("Step 1: Syncing files to group directories...")
-    synced_files, failed_files = sync_files_to_group(unique_id, files, incremental_mode)
-    
-    # Step 2: Detect changes and cleanup orphaned files (only in non-incremental mode)
-    from utils.file_manager import detect_file_changes
-    changes = detect_file_changes(unique_id, files, incremental_mode)
-    
-    # Only cleanup orphaned files in non-incremental mode or when files are explicitly removed
-    if not incremental_mode and any(changes["removed"].values()):
-        logger.info("Step 2: Cleaning up orphaned files...")
-        removed_files = cleanup_orphaned_files(unique_id, changes)
-        logger.info(f"Removed orphaned files: {removed_files}")
-    elif incremental_mode:
-        logger.info("Step 2: Skipping cleanup in incremental mode to preserve existing files")
-    
-    # Step 3: Process individual files
-    logger.info("Step 3: Processing individual files...")
-    processed_files_by_group = {}
-    processing_results = {}
-    
-    for group_name, file_list in files.items():
-        processed_files_by_group[group_name] = []
-        processing_results[group_name] = []
-        
-        for file_path in file_list:
-            filename = os.path.basename(file_path)
-            
-            # Get local file path
-            local_path = os.path.join("projects", "data", unique_id, "files", group_name, filename)
-            
-            # Skip if file doesn't exist (might be remote file that failed to download)
-            if not os.path.exists(local_path) and not file_path.startswith(('http://', 'https://')):
-                logger.warning(f"Skipping non-existent file: {filename}")
-                continue
-            
-            # Check if already processed
-            if check_file_already_processed(unique_id, group_name, filename):
-                logger.info(f"Skipping already processed file: {filename}")
-                processed_files_by_group[group_name].append(filename)
-                processing_results[group_name].append({
-                    "filename": filename,
-                    "status": "existing"
-                })
-                continue
-            
-            # Process the file
-            logger.info(f"Processing file: {filename} (group: {group_name})")
-            result = await process_single_file(unique_id, group_name, filename, file_path, local_path)
-            processing_results[group_name].append(result)
-            
-            if result["success"]:
-                processed_files_by_group[group_name].append(filename)
-                logger.info(f"  Successfully processed {filename}")
-            else:
-                logger.error(f"  Failed to process {filename}: {result['error']}")
-    
-    # Step 4: Merge results by group
-    logger.info("Step 4: Merging results by group...")
-    merge_results = {}
-    
-    for group_name in processed_files_by_group.keys():
-        # Get all files in the group (including existing ones)
-        group_files = get_group_files_list(unique_id, group_name)
-        
-        if group_files:
-            logger.info(f"Merging group: {group_name} with {len(group_files)} files")
-            merge_result = merge_all_data_by_group(unique_id, group_name)
-            merge_results[group_name] = merge_result
-            
-            if merge_result["success"]:
-                logger.info(f"  Successfully merged group {group_name}")
-            else:
-                logger.error(f"  Failed to merge group {group_name}: {merge_result['errors']}")
-    
-    # Step 5: Save processing log
-    logger.info("Step 5: Saving processing log...")
-    await save_processing_log(unique_id, files, synced_files, processing_results, merge_results)
-    
-    logger.info(f"File processing completed for project: {unique_id}")
-    return processed_files_by_group
-
-
-async def save_processing_log(
-    unique_id: str,
-    requested_files: Dict[str, List[str]],
-    synced_files: Dict,
-    processing_results: Dict,
-    merge_results: Dict
-):
-    """Save comprehensive processing log."""
-    
-    log_data = {
-        "unique_id": unique_id,
-        "timestamp": str(os.path.getmtime("projects") if os.path.exists("projects") else 0),
-        "requested_files": requested_files,
-        "synced_files": synced_files,
-        "processing_results": processing_results,
-        "merge_results": merge_results,
-        "summary": {
-            "total_groups": len(requested_files),
-            "total_files_requested": sum(len(files) for files in requested_files.values()),
-            "total_files_processed": sum(
-                len([r for r in results if r.get("success", False)])
-                for results in processing_results.values()
-            ),
-            "total_groups_merged": len([r for r in merge_results.values() if r.get("success", False)])
-        }
-    }
-    
-    log_file_path = os.path.join("projects", "data", unique_id, "processing_log.json")
-    try:
-        with open(log_file_path, 'w', encoding='utf-8') as f:
-            json.dump(log_data, f, ensure_ascii=False, indent=2)
-        logger.info(f"Processing log saved to: {log_file_path}")
-    except Exception as e:
-        logger.error(f"Error saving processing log: {str(e)}")
-
-
-def generate_dataset_structure(unique_id: str) -> str:
-    """Generate a string representation of the dataset structure"""
-    project_dir = os.path.join("projects", "data", unique_id)
-    structure = []
-    
-    def add_directory_contents(dir_path: str, prefix: str = ""):
-        try:
-            if not os.path.exists(dir_path):
-                structure.append(f"{prefix}└── (not found)")
-                return
-                
-            items = sorted(os.listdir(dir_path))
-            for i, item in enumerate(items):
-                item_path = os.path.join(dir_path, item)
-                is_last = i == len(items) - 1
-                current_prefix = "└── " if is_last else "├── "
-                structure.append(f"{prefix}{current_prefix}{item}")
-                
-                if os.path.isdir(item_path):
-                    next_prefix = prefix + ("    " if is_last else "│   ")
-                    add_directory_contents(item_path, next_prefix)
-        except Exception as e:
-            structure.append(f"{prefix}└── Error: {str(e)}")
-    
-    # Add files directory structure
-    files_dir = os.path.join(project_dir, "files")
-    structure.append("files/")
-    add_directory_contents(files_dir, "")
-    
-    # Add processed directory structure  
-    processed_dir = os.path.join(project_dir, "processed")
-    structure.append("\nprocessed/")
-    add_directory_contents(processed_dir, "")
-    
-    # Add dataset directory structure
-    dataset_dir = os.path.join(project_dir, "datasets")
-    structure.append("\ndataset/")
-    add_directory_contents(dataset_dir, "")
-    
-    return "\n".join(structure)
-
-
-def get_processing_status(unique_id: str) -> Dict:
-    """Get comprehensive processing status for a project."""
-    
-    project_dir = os.path.join("projects", "data", unique_id)
-    
-    if not os.path.exists(project_dir):
-        return {
-            "project_exists": False,
-            "unique_id": unique_id
-        }
-    
-    status = {
-        "project_exists": True,
-        "unique_id": unique_id,
-        "directories": {
-            "files": os.path.exists(os.path.join(project_dir, "files")),
-            "processed": os.path.exists(os.path.join(project_dir, "processed")),
-            "dataset": os.path.exists(os.path.join(project_dir, "datasets"))
-        },
-        "groups": {},
-        "processing_log_exists": os.path.exists(os.path.join(project_dir, "processing_log.json"))
-    }
-    
-    # Check each group's status
-    files_dir = os.path.join(project_dir, "files")
-    if os.path.exists(files_dir):
-        for group_name in os.listdir(files_dir):
-            group_path = os.path.join(files_dir, group_name)
-            if os.path.isdir(group_path):
-                status["groups"][group_name] = {
-                    "files_count": len([
-                        f for f in os.listdir(group_path) 
-                        if os.path.isfile(os.path.join(group_path, f))
-                    ]),
-                    "merge_status": "pending"
-                }
-    
-    # Check merge status for each group
-    dataset_dir = os.path.join(project_dir, "datasets")
-    if os.path.exists(dataset_dir):
-        for group_name in os.listdir(dataset_dir):
-            group_path = os.path.join(dataset_dir, group_name)
-            if os.path.isdir(group_path):
-                if group_name in status["groups"]:
-                    # Check if merge is complete
-                    document_path = os.path.join(group_path, "document.txt")
-                    pagination_path = os.path.join(group_path, "pagination.txt")
-                    embedding_path = os.path.join(group_path, "embedding.pkl")
-                    
-                    if (os.path.exists(document_path) and os.path.exists(pagination_path) and 
-                        os.path.exists(embedding_path)):
-                        status["groups"][group_name]["merge_status"] = "completed"
-                    else:
-                        status["groups"][group_name]["merge_status"] = "incomplete"
-                else:
-                    status["groups"][group_name] = {
-                        "files_count": 0,
-                        "merge_status": "completed"
-                    }
-    
-    return status
-
-
-def remove_dataset_directory(unique_id: str, filename_without_ext: str):
-    """Remove a specific dataset directory (deprecated - use new structure)"""
-    # This function is kept for compatibility but delegates to new structure
-    dataset_path = os.path.join("projects", "data", unique_id, "processed", filename_without_ext)
-    if os.path.exists(dataset_path):
-        import shutil
-        shutil.rmtree(dataset_path)
-
-
-def remove_dataset_directory_by_key(unique_id: str, key: str):
-    """Remove dataset directory by key (group name)"""
-    # Remove files directory
-    files_group_path = os.path.join("projects", "data", unique_id, "files", key)
-    if os.path.exists(files_group_path):
-        import shutil
-        shutil.rmtree(files_group_path)
-    
-    # Remove processed directory
-    processed_group_path = os.path.join("projects", "data", unique_id, "processed", key)
-    if os.path.exists(processed_group_path):
-        import shutil
-        shutil.rmtree(processed_group_path)
-    
-    # Remove dataset directory
-    cleanup_dataset_group(unique_id, key)
--- a/utils/project_manager.py
+++ b/utils/project_manager.py
@ -1,343 +0,0 @@
-#!/usr/bin/env python3
-"""
-Project management functions for handling projects, README generation, and status tracking.
-"""
-
-import os
-import json
-import logging
-from typing import Dict, List, Optional
-from pathlib import Path
-
-# Configure logger
-logger = logging.getLogger('app')
-
-from utils.file_utils import get_document_preview, load_processed_files_log
-
-
-def generate_directory_tree(project_dir: str, unique_id: str, max_depth: int = 3) -> str:
-    """Generate dataset directory tree structure for the project"""
-    def _build_tree(path: str, prefix: str = "", is_last: bool = True, depth: int = 0) -> List[str]:
-        if depth > max_depth:
-            return []
-        
-        lines = []
-        try:
-            entries = sorted(os.listdir(path))
-            # Separate directories and files
-            dirs = [e for e in entries if os.path.isdir(os.path.join(path, e)) and not e.startswith('.')]
-            files = [e for e in entries if os.path.isfile(os.path.join(path, e)) and not e.startswith('.')]
-            
-            entries = dirs + files
-            
-            for i, entry in enumerate(entries):
-                entry_path = os.path.join(path, entry)
-                is_dir = os.path.isdir(entry_path)
-                is_last_entry = i == len(entries) - 1
-                
-                # Choose the appropriate tree symbols
-                if is_last_entry:
-                    connector = "└── "
-                    new_prefix = prefix + "    "
-                else:
-                    connector = "├── "
-                    new_prefix = prefix + "│   "
-                
-                # Add entry line
-                line = prefix + connector + entry
-                if is_dir:
-                    line += "/"
-                lines.append(line)
-                
-                # Recursively add subdirectories
-                if is_dir and depth < max_depth:
-                    sub_lines = _build_tree(entry_path, new_prefix, is_last_entry, depth + 1)
-                    lines.extend(sub_lines)
-                    
-        except PermissionError:
-            lines.append(prefix + "└── [Permission Denied]")
-        except Exception as e:
-            lines.append(prefix + "└── [Error: " + str(e) + "]")
-        
-        return lines
-    
-    # Start building tree from dataset directory
-    dataset_dir = os.path.join(project_dir, "datasets")
-    tree_lines = []
-    
-    if not os.path.exists(dataset_dir):
-        return "└── [No dataset directory found]"
-    
-    try:
-        entries = sorted(os.listdir(dataset_dir))
-        dirs = [e for e in entries if os.path.isdir(os.path.join(dataset_dir, e)) and not e.startswith('.')]
-        files = [e for e in entries if os.path.isfile(os.path.join(dataset_dir, e)) and not e.startswith('.')]
-        
-        entries = dirs + files
-        
-        if not entries:
-            tree_lines.append("└── [Empty dataset directory]")
-        else:
-            for i, entry in enumerate(entries):
-                entry_path = os.path.join(dataset_dir, entry)
-                is_dir = os.path.isdir(entry_path)
-                is_last_entry = i == len(entries) - 1
-                
-                if is_last_entry:
-                    connector = "└── "
-                    prefix = "    "
-                else:
-                    connector = "├── "
-                    prefix = "│   "
-                
-                line = connector + entry
-                if is_dir:
-                    line += "/"
-                tree_lines.append(line)
-                
-                # Recursively add subdirectories
-                if is_dir:
-                    sub_lines = _build_tree(entry_path, prefix, is_last_entry, 1)
-                    tree_lines.extend(sub_lines)
-                
-    except Exception as e:
-        tree_lines.append(f"└── [Error generating tree: {str(e)}]")
-    
-    return "\n".join(tree_lines)
-
-
-def generate_project_readme(unique_id: str) -> str:
-    """Generate README.md content for a project"""
-    project_dir = os.path.join("projects", "data", unique_id)
-    readme_content = f"""# Project: {unique_id}
-
-## Project Overview
-
-This project contains processed documents and their associated embeddings for semantic search.
-
-## Directory Structure
-
-"""
-    
-    # Generate directory tree
-    readme_content += "```\n"
-    readme_content += generate_directory_tree(project_dir, unique_id)
-    readme_content += "\n```\n\n"
-    
-    readme_content += """## Dataset Structure
-
-"""
-    
-    dataset_dir = os.path.join(project_dir, "datasets")
-    if not os.path.exists(dataset_dir):
-        readme_content += "No dataset files available.\n"
-    else:
-        # Get all document directories
-        doc_dirs = []
-        try:
-            for item in sorted(os.listdir(dataset_dir)):
-                item_path = os.path.join(dataset_dir, item)
-                if os.path.isdir(item_path):
-                    doc_dirs.append(item)
-        except Exception as e:
-            logger.error(f"Error listing dataset directories: {str(e)}")
-        
-        if not doc_dirs:
-            readme_content += "No document directories found.\n"
-        else:
-            for doc_dir in doc_dirs:
-                doc_path = os.path.join(dataset_dir, doc_dir)
-                document_file = os.path.join(doc_path, "document.txt")
-                pagination_file = os.path.join(doc_path, "pagination.txt")
-                embeddings_file = os.path.join(doc_path, "embedding.pkl")
-                
-                readme_content += f"### {doc_dir}\n\n"
-                readme_content += f"**Files:**\n"
-                readme_content += f"- `{doc_dir}/document.txt`"
-                if os.path.exists(document_file):
-                    readme_content += " ✓"
-                readme_content += "\n"
-                
-                readme_content += f"- `{doc_dir}/pagination.txt`"
-                if os.path.exists(pagination_file):
-                    readme_content += " ✓"
-                readme_content += "\n"
-                
-                readme_content += f"- `{doc_dir}/embedding.pkl`"
-                if os.path.exists(embeddings_file):
-                    readme_content += " ✓"
-                readme_content += "\n\n"
-                
-                # Add document preview
-                if os.path.exists(document_file):
-                    readme_content += f"**Content Preview (first 10 lines):**\n\n```\n"
-                    preview = get_document_preview(document_file, 10)
-                    readme_content += preview
-                    readme_content += "\n```\n\n"
-                else:
-                    readme_content += f"**Content Preview:** Not available\n\n"
-    
-    readme_content += f"""---
-*Generated on {__import__('datetime').datetime.now().strftime('%Y-%m-%d %H:%M:%S')}*
-"""
-    
-    return readme_content
-
-
-def save_project_readme(unique_id: str):
-    """Save README.md for a project"""
-    readme_content = generate_project_readme(unique_id)
-    readme_path = os.path.join("projects", "data", unique_id, "README.md")
-    
-    try:
-        os.makedirs(os.path.dirname(readme_path), exist_ok=True)
-        with open(readme_path, 'w', encoding='utf-8') as f:
-            f.write(readme_content)
-        logger.info(f"Generated README.md for project {unique_id}")
-        return readme_path
-    except Exception as e:
-        logger.error(f"Error generating README for project {unique_id}: {str(e)}")
-        return None
-
-
-def get_project_status(unique_id: str) -> Dict:
-    """Get comprehensive status of a project"""
-    project_dir = os.path.join("projects", "data", unique_id)
-    project_exists = os.path.exists(project_dir)
-    
-    if not project_exists:
-        return {
-            "unique_id": unique_id,
-            "project_exists": False,
-            "error": "Project not found"
-        }
-    
-    # Get processed log
-    processed_log = load_processed_files_log(unique_id)
-    
-    # Collect document.txt files
-    document_files = []
-    dataset_dir = os.path.join(project_dir, "datasets")
-    if os.path.exists(dataset_dir):
-        for root, dirs, files in os.walk(dataset_dir):
-            for file in files:
-                if file == "document.txt":
-                    document_files.append(os.path.join(root, file))
-    
-    # Check system prompt and MCP settings
-    system_prompt_file = os.path.join(project_dir, "system_prompt.txt")
-    mcp_settings_file = os.path.join(project_dir, "mcp_settings.json")
-    
-    status = {
-        "unique_id": unique_id,
-        "project_exists": True,
-        "project_path": project_dir,
-        "processed_files_count": len(processed_log),
-        "processed_files": processed_log,
-        "document_files_count": len(document_files),
-        "document_files": document_files,
-        "has_system_prompt": os.path.exists(system_prompt_file),
-        "has_mcp_settings": os.path.exists(mcp_settings_file),
-        "readme_exists": os.path.exists(os.path.join(project_dir, "README.md")),
-        "log_file_exists": os.path.exists(os.path.join(project_dir, "processed_files.json"))
-    }
-    
-    # Add dataset structure
-    try:
-        from utils.dataset_manager import generate_dataset_structure
-        status["dataset_structure"] = generate_dataset_structure(unique_id)
-    except Exception as e:
-        status["dataset_structure"] = f"Error generating structure: {str(e)}"
-    
-    return status
-
-
-def remove_project(unique_id: str) -> bool:
-    """Remove entire project directory"""
-    project_dir = os.path.join("projects", "data", unique_id)
-    try:
-        if os.path.exists(project_dir):
-            import shutil
-            shutil.rmtree(project_dir)
-            logger.info(f"Removed project directory: {project_dir}")
-            return True
-        else:
-            logger.warning(f"Project directory not found: {project_dir}")
-            return False
-    except Exception as e:
-        logger.error(f"Error removing project {unique_id}: {str(e)}")
-        return False
-
-
-def list_projects() -> List[str]:
-    """List all existing project IDs"""
-    projects_dir = "projects"
-    if not os.path.exists(projects_dir):
-        return []
-    
-    try:
-        return [item for item in os.listdir(projects_dir) 
-                if os.path.isdir(os.path.join(projects_dir, item))]
-    except Exception as e:
-        logger.error(f"Error listing projects: {str(e)}")
-        return []
-
-
-def get_project_stats(unique_id: str) -> Dict:
-    """Get statistics for a specific project"""
-    status = get_project_status(unique_id)
-    
-    if not status["project_exists"]:
-        return status
-    
-    stats = {
-        "unique_id": unique_id,
-        "total_processed_files": status["processed_files_count"],
-        "total_document_files": status["document_files_count"],
-        "has_system_prompt": status["has_system_prompt"],
-        "has_mcp_settings": status["has_mcp_settings"],
-        "has_readme": status["readme_exists"]
-    }
-    
-    # Calculate file sizes
-    total_size = 0
-    document_sizes = []
-    
-    for doc_file in status["document_files"]:
-        try:
-            size = os.path.getsize(doc_file)
-            document_sizes.append({
-                "file": doc_file,
-                "size": size,
-                "size_mb": round(size / (1024 * 1024), 2)
-            })
-            total_size += size
-        except Exception:
-            pass
-    
-    stats["total_document_size"] = total_size
-    stats["total_document_size_mb"] = round(total_size / (1024 * 1024), 2)
-    stats["document_files_detail"] = document_sizes
-    
-    # Check embeddings files
-    embedding_files = []
-    dataset_dir = os.path.join("projects", "data", unique_id, "datasets")
-    if os.path.exists(dataset_dir):
-        for root, dirs, files in os.walk(dataset_dir):
-            for file in files:
-                if file == "embedding.pkl":
-                    file_path = os.path.join(root, file)
-                    try:
-                        size = os.path.getsize(file_path)
-                        embedding_files.append({
-                            "file": file_path,
-                            "size": size,
-                            "size_mb": round(size / (1024 * 1024), 2)
-                        })
-                    except Exception:
-                        pass
-    
-    stats["embedding_files_count"] = len(embedding_files)
-    stats["embedding_files_detail"] = embedding_files
-    
-    return stats
--- a/utils/settings.py
+++ b/utils/settings.py
@ -77,6 +77,15 @@ CHECKPOINT_CLEANUP_INACTIVE_DAYS = int(os.getenv("CHECKPOINT_CLEANUP_INACTIVE_DA
 CHECKPOINT_CLEANUP_INTERVAL_HOURS = int(os.getenv("CHECKPOINT_CLEANUP_INTERVAL_HOURS", "24"))


+# ============================================================
+# Redis Configuration (Huey task queue backend)
+# ============================================================
+
+# Redis connection URL.
+# Format: redis://[:password]@host:port/db
+REDIS_URL = os.getenv("REDIS_URL", "redis://localhost:6379/1")
+
+
 # ============================================================
 # Mem0 long-term memory configuration
 # ============================================================
--- a/utils/single_file_processor.py
+++ b/utils/single_file_processor.py
@ -1,301 +0,0 @@
-#!/usr/bin/env python3
-"""
-Single file processing functions for handling individual files.
-"""
-
-import os
-import tempfile
-import zipfile
-import logging
-from typing import Dict, List, Tuple, Optional
-from pathlib import Path
-
-# Configure logger
-logger = logging.getLogger('app')
-
-from utils.file_utils import download_file
-
-# Try to import excel/csv processor, but handle if dependencies are missing
-try:
-    from utils.excel_csv_processor import (
-        is_excel_file, is_csv_file, process_excel_file, process_csv_file
-    )
-    EXCEL_CSV_SUPPORT = True
-except ImportError as e:
-    logger.warning(f"Excel/CSV processing not available: {e}")
-    EXCEL_CSV_SUPPORT = False
-    
-    # Fallback functions
-    def is_excel_file(file_path):
-        return file_path.lower().endswith(('.xlsx', '.xls'))
-    
-    def is_csv_file(file_path):
-        return file_path.lower().endswith('.csv')
-    
-    def process_excel_file(file_path):
-        return "", []
-    
-    def process_csv_file(file_path):
-        return "", []
-
-
-async def process_single_file(
-    unique_id: str, 
-    group_name: str, 
-    filename: str, 
-    original_path: str,
-    local_path: str
-) -> Dict:
-    """
-    Process a single file and generate document.txt, pagination.txt, and embedding.pkl.
-    
-    Returns:
-        Dict with processing results and file paths
-    """
-    # Create output directory for this file
-    filename_stem = Path(filename).stem
-    output_dir = os.path.join("projects", "data", unique_id, "processed", group_name, filename_stem)
-    os.makedirs(output_dir, exist_ok=True)
-    
-    result = {
-        "success": False,
-        "filename": filename,
-        "group": group_name,
-        "output_dir": output_dir,
-        "document_path": os.path.join(output_dir, "document.txt"),
-        "pagination_path": os.path.join(output_dir, "pagination.txt"), 
-        "embedding_path": os.path.join(output_dir, "embedding.pkl"),
-        "error": None,
-        "content_size": 0,
-        "pagination_lines": 0,
-        "embedding_chunks": 0
-    }
-    
-    try:
-        # Download file if it's remote and not yet downloaded
-        if original_path.startswith(('http://', 'https://')):
-            if not os.path.exists(local_path):
-                logger.info(f"Downloading {original_path} -> {local_path}")
-                success = await download_file(original_path, local_path)
-                if not success:
-                    result["error"] = "Failed to download file"
-                    return result
-        
-        # Extract content from file
-        content, pagination_lines = await extract_file_content(local_path, filename)
-        
-        if not content or not content.strip():
-            result["error"] = "No content extracted from file"
-            return result
-        
-        # Write document.txt
-        with open(result["document_path"], 'w', encoding='utf-8') as f:
-            f.write(content)
-        result["content_size"] = len(content)
-        
-        # Write pagination.txt
-        if pagination_lines:
-            with open(result["pagination_path"], 'w', encoding='utf-8') as f:
-                for line in pagination_lines:
-                    if line.strip():
-                        f.write(f"{line}\n")
-            result["pagination_lines"] = len(pagination_lines)
-        else:
-            # Generate pagination from text content
-            pagination_lines = generate_pagination_from_text(result["document_path"], 
-                                                            result["pagination_path"])
-            result["pagination_lines"] = len(pagination_lines)
-        
-        # Generate embeddings
-        try:
-            embedding_chunks = await generate_embeddings_for_file(
-                result["document_path"], result["embedding_path"]
-            )
-            result["embedding_chunks"] = len(embedding_chunks) if embedding_chunks else 0
-            result["success"] = True
-            
-        except Exception as e:
-            result["error"] = f"Embedding generation failed: {str(e)}"
-        logger.error(f"Failed to generate embeddings for {filename}: {str(e)}")
-        
-    except Exception as e:
-        result["error"] = f"File processing failed: {str(e)}"
-        logger.error(f"Error processing file {filename}: {str(e)}")
-    
-    return result
-
-
-async def extract_file_content(file_path: str, filename: str) -> Tuple[str, List[str]]:
-    """Extract content from various file formats."""
-    
-    # Handle zip files
-    if filename.lower().endswith('.zip'):
-        return await extract_from_zip(file_path, filename)
-    
-    # Handle Excel files
-    elif is_excel_file(file_path):
-        return await extract_from_excel(file_path, filename)
-    
-    # Handle CSV files  
-    elif is_csv_file(file_path):
-        return await extract_from_csv(file_path, filename)
-    
-    # Handle text files
-    else:
-        return await extract_from_text(file_path, filename)
-
-
-async def extract_from_zip(zip_path: str, filename: str) -> Tuple[str, List[str]]:
-    """Extract content from zip file."""
-    content_parts = []
-    pagination_lines = []
-    
-    try:
-        with zipfile.ZipFile(zip_path, 'r') as zip_ref:
-            # Extract to temporary directory
-            temp_dir = tempfile.mkdtemp(prefix=f"extract_{Path(filename).stem}_")
-            zip_ref.extractall(temp_dir)
-            
-            # Process extracted files
-            for root, dirs, files in os.walk(temp_dir):
-                for file in files:
-                    if file.lower().endswith(('.txt', '.md', '.xlsx', '.xls', '.csv')):
-                        file_path = os.path.join(root, file)
-                        
-                        try:
-                            file_content, file_pagination = await extract_file_content(file_path, file)
-                            
-                            if file_content:
-                                content_parts.append(f"# Page {file}")
-                                content_parts.append(file_content)
-                                pagination_lines.extend(file_pagination)
-                                
-                        except Exception as e:
-                            logger.error(f"Error processing extracted file {file}: {str(e)}")
-            
-            # Clean up temporary directory
-            import shutil
-            shutil.rmtree(temp_dir)
-            
-    except Exception as e:
-        logger.error(f"Error extracting zip file {filename}: {str(e)}")
-        return "", []
-    
-    return '\n\n'.join(content_parts), pagination_lines
-
-
-async def extract_from_excel(file_path: str, filename: str) -> Tuple[str, List[str]]:
-    """Extract content from Excel file."""
-    try:
-        document_content, pagination_lines = process_excel_file(file_path)
-        
-        if document_content:
-            content = f"# Page {filename}\n{document_content}"
-            return content, pagination_lines
-        else:
-            return "", []
-            
-    except Exception as e:
-        logger.error(f"Error processing Excel file {filename}: {str(e)}")
-        return "", []
-
-
-async def extract_from_csv(file_path: str, filename: str) -> Tuple[str, List[str]]:
-    """Extract content from CSV file."""
-    try:
-        document_content, pagination_lines = process_csv_file(file_path)
-        
-        if document_content:
-            content = f"# Page {filename}\n{document_content}"
-            return content, pagination_lines
-        else:
-            return "", []
-            
-    except Exception as e:
-        logger.error(f"Error processing CSV file {filename}: {str(e)}")
-        return "", []
-
-
-async def extract_from_text(file_path: str, filename: str) -> Tuple[str, List[str]]:
-    """Extract content from text file."""
-    try:
-        with open(file_path, 'r', encoding='utf-8') as f:
-            content = f.read().strip()
-        
-        if content:
-            return content, []
-        else:
-            return "", []
-            
-    except Exception as e:
-        logger.error(f"Error reading text file {filename}: {str(e)}")
-        return "", []
-
-
-def generate_pagination_from_text(document_path: str, pagination_path: str) -> List[str]:
-    """Generate pagination from text document."""
-    try:
-        # Import embedding module for pagination
-        import sys
-        sys.path.append(os.path.join(os.path.dirname(__file__), '..', 'embedding'))
-        from embedding import split_document_by_pages
-        
-        pages = split_document_by_pages(str(document_path), str(pagination_path))
-        
-        # Return pagination lines
-        pagination_lines = []
-        with open(pagination_path, 'r', encoding='utf-8') as f:
-            for line in f:
-                if line.strip():
-                    pagination_lines.append(line.strip())
-        
-        return pagination_lines
-        
-    except Exception as e:
-        logger.error(f"Error generating pagination from text: {str(e)}")
-        return []
-
-
-async def generate_embeddings_for_file(document_path: str, embedding_path: str) -> Optional[List]:
-    """Generate embeddings for a document."""
-    try:
-        # Import embedding module
-        import sys
-        sys.path.append(os.path.join(os.path.dirname(__file__), '..', 'embedding'))
-        from embedding import embed_document
-        
-        # Generate embeddings using paragraph chunking
-        embedding_data = embed_document(
-            str(document_path),
-            str(embedding_path),
-            chunking_strategy='paragraph'
-        )
-        
-        if embedding_data and 'chunks' in embedding_data:
-            return embedding_data['chunks']
-        else:
-            return None
-            
-    except Exception as e:
-        logger.error(f"Error generating embeddings: {str(e)}")
-        return None
-
-
-def check_file_already_processed(unique_id: str, group_name: str, filename: str) -> bool:
-    """Check if a file has already been processed."""
-    filename_stem = Path(filename).stem
-    output_dir = os.path.join("projects", "data", unique_id, "processed", group_name, filename_stem)
-    
-    document_path = os.path.join(output_dir, "document.txt")
-    pagination_path = os.path.join(output_dir, "pagination.txt")
-    embedding_path = os.path.join(output_dir, "embedding.pkl")
-    
-    # Check if all files exist and are not empty
-    if (os.path.exists(document_path) and os.path.exists(pagination_path) and 
-        os.path.exists(embedding_path)):
-        
-        if (os.path.getsize(document_path) > 0 and os.path.getsize(pagination_path) > 0 and
-            os.path.getsize(embedding_path) > 0):
-            return True
-    
-    return False