Commit Graph

123 Commits

Author SHA1 Message Date
朱潮
817b8cc014 LIBREOFFICE_PATH
Some checks are pending
sync2gitee / repo-sync (push) Waiting to run
Typos Check / Spell Check with Typos (push) Waiting to run
2025-12-21 14:25:23 +08:00
朱潮
51481055d6 确保文件夹存在
Some checks failed
sync2gitee / repo-sync (push) Has been cancelled
Typos Check / Spell Check with Typos (push) Has been cancelled
2025-12-19 13:54:10 +08:00
朱潮
545402617b 修复分段逻辑
Some checks are pending
sync2gitee / repo-sync (push) Waiting to run
Typos Check / Spell Check with Typos (push) Waiting to run
2025-12-19 12:34:22 +08:00
朱潮
653ee4af13 add logs
Some checks are pending
sync2gitee / repo-sync (push) Waiting to run
Typos Check / Spell Check with Typos (push) Waiting to run
2025-12-19 11:15:50 +08:00
朱潮
8b85ad33f0 音视频支持分段
Some checks are pending
sync2gitee / repo-sync (push) Waiting to run
Typos Check / Spell Check with Typos (push) Waiting to run
2025-12-18 23:12:18 +08:00
朱潮
b16afa5299 修复mineru的json解析报错
Some checks are pending
sync2gitee / repo-sync (push) Waiting to run
Typos Check / Spell Check with Typos (push) Waiting to run
2025-12-18 12:59:14 +08:00
朱潮
5f9f2a9325 modify file status 2025-08-31 11:16:33 +08:00
朱潮
5da36659c2 修复音视频处理的关键问题
1. 修复Paragraph模型构造错误:
   - 将meta参数改为status_meta
   - 添加必需的knowledge_id参数

2. 修复使用demo数据的问题:
   - 移除所有demo数据生成代码
   - 改为调用实际的音频处理逻辑
   - 通过MediaSplitHandle进行实际处理

3. 增强MediaSplitHandle功能:
   - 支持实际处理和默认文本两种模式
   - 根据use_actual_processing参数选择处理方式
   - 保持向后兼容性

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-31 01:44:54 +08:00
朱潮
86ef54fb75 修改音频分段处理为默认文本
- 移除实际音频处理逻辑
- 改为生成默认演示文本
- 根据文件类型生成合适的演示内容
- 支持音频、视频和其他媒体文件
- 保留完整的元数据和时间信息

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-31 00:58:29 +08:00
朱潮
b05f42259e Fix File model import in media_learning.py
- Fixed import error by changing from 'oss.models' to 'knowledge.models'
- File model is correctly imported from knowledge.models module

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-31 00:55:34 +08:00
朱潮
dd0360fb6f modify file status
Some checks failed
sync2gitee / repo-sync (push) Has been cancelled
Typos Check / Spell Check with Typos (push) Has been cancelled
2025-08-29 09:29:02 +08:00
朱潮
7c16c954e6 os error
Some checks failed
sync2gitee / repo-sync (push) Has been cancelled
Typos Check / Spell Check with Typos (push) Has been cancelled
2025-08-27 11:16:30 +08:00
朱潮
575b04c10f midyf model_id
Some checks are pending
sync2gitee / repo-sync (push) Waiting to run
Typos Check / Spell Check with Typos (push) Waiting to run
2025-08-26 16:35:29 +08:00
朱潮
070b3e0057 midyf model_id
Some checks are pending
sync2gitee / repo-sync (push) Waiting to run
Typos Check / Spell Check with Typos (push) Waiting to run
2025-08-26 14:48:14 +08:00
朱潮
51f436d7f7 midyf model_id
Some checks are pending
sync2gitee / repo-sync (push) Waiting to run
Typos Check / Spell Check with Typos (push) Waiting to run
2025-08-26 14:10:15 +08:00
朱潮
623dda5bb7 midyf model_id
Some checks are pending
sync2gitee / repo-sync (push) Waiting to run
Typos Check / Spell Check with Typos (push) Waiting to run
2025-08-26 13:48:07 +08:00
朱潮
edc80888cc 传入的 llm_model_id 和 vision_model_id 会被正确传递到配置中
Some checks are pending
sync2gitee / repo-sync (push) Waiting to run
Typos Check / Spell Check with Typos (push) Waiting to run
2025-08-26 00:58:18 +08:00
朱潮
0c9da8e2eb maxkb add log
Some checks are pending
sync2gitee / repo-sync (push) Waiting to run
Typos Check / Spell Check with Typos (push) Waiting to run
2025-08-25 22:49:43 +08:00
朱潮
36da5e1bf3 docker platform adapter 2025-08-25 19:39:23 +08:00
朱潮
f1494fedea modify status parsing 2025-08-25 01:20:33 +08:00
朱潮
f0263bf189 add mineru 2025-08-24 17:45:40 +08:00
朱潮
35f9a4dbfe add mineru 2025-08-24 00:56:02 +08:00
CaptainB
4c9756839a chore: normalize with_filter parameter to boolean in split handle files
--bug=1057879 --user=刘瑞斌 【知识库】高级分段中自动清洗功能未生效 https://www.tapd.cn/62980211/s/1727744
2025-07-10 15:06:19 +08:00
CaptainB
cb40d62162 refactor: allow loading of truncated images and increase max pixel limit in common_handle.py
--bug=1057749 --user=刘瑞斌 【知识库】qa问答对文档中带图片,导入后图片未显示 https://www.tapd.cn/62980211/s/1723700
2025-07-04 15:53:37 +08:00
CaptainB
aa901c7fc7 fix: update file URL paths to use relative references 2025-07-02 22:45:11 +08:00
CaptainB
089915f488 refactor: improve error logging for image reading and enhance image handling logic
--bug=1057749 --user=刘瑞斌 【知识库】qa问答对文档中带图片,导入后图片未显示 https://www.tapd.cn/62980211/s/1720856
2025-07-01 14:17:10 +08:00
CaptainB
0f1d57f0cb feat: enhance error logging for file processing in CSV, XLS, and DOC handlers 2025-06-30 12:49:50 +08:00
CaptainB
82a2203be6 fix: handle string type for limit and improve error logging in pdf_split_handle
--bug=1057493 --user=刘瑞斌 【知识库】上传文档,使用高级分段报错 https://www.tapd.cn/62980211/s/1720110
2025-06-30 12:47:47 +08:00
CaptainB
d49f448a5f fix: correct image path replacement logic in zip_split_handle 2025-06-26 17:02:34 +08:00
CaptainB
37ac79dc5a feat: import File model in zip_split_handle for enhanced functionality
--bug=1057478 --user=刘瑞斌 【知识库】通用知识库上传ZIP文件,分段失败 https://www.tapd.cn/62980211/s/1719181
2025-06-26 16:56:28 +08:00
CaptainB
e24a2001c5 feat: refine regex patterns in text_split_handle for improved comment detection
--bug=1057526 --user=刘瑞斌 【知识库】markdown文件导入知识库,分段详情中代码块展示异常 https://www.tapd.cn/62980211/s/1719131
2025-06-26 16:23:32 +08:00
CaptainB
a73e0b10f9 refactor: replace logging with maxkb_logger for consistent logging across modules 2025-06-25 17:00:18 +08:00
CaptainB
fe8f87834d refactor: replace logging with maxkb_logger for consistent logging across modules 2025-06-25 16:46:50 +08:00
CaptainB
3aa0847506 refactor: replace print statements with logging for improved error tracking 2025-06-25 16:18:19 +08:00
wxg0103
c253e8b696 refactor: remove print 2025-06-24 15:30:42 +08:00
CaptainB
45908b91ff refactor: update dataset_id to knowledge_id in zip_split_handle.py and tools.py 2025-06-18 21:28:33 +08:00
CaptainB
c0b770f41e refactor: update dataset_id to knowledge_id in zip_split_handle.py and tools.py 2025-06-18 21:15:53 +08:00
CaptainB
9a7281212d fix: update image URL paths to use OSS endpoints 2025-06-12 15:49:54 +08:00
wxg0103
b8b14884bd refactor: add application settings 2025-06-07 17:57:11 +08:00
wxg0103
93833849c1 refactor: file to oss 2025-06-06 11:42:31 +08:00
CaptainB
c3581be9bd fix: rename image_name to file_name in zip_split_handle and remove workspace_id assignment in document 2025-05-13 12:47:59 +08:00
CaptainB
e702af8c2b feat: enhance Document API with workspace ID support for get, put, and delete operations 2025-05-06 15:24:36 +08:00
CaptainB
43bef216d5 refactor: reorganize file handling imports into a structured directory 2025-04-30 16:08:17 +08:00
CaptainB
48297d81e5 feat: add initial implementations of various file handling classes for CSV, XLS, and XLSX formats 2025-04-30 15:52:58 +08:00
CaptainB
c78a6babb6 ci: v2 2025-04-11 15:47:59 +08:00
CaptainB
560890f717 fix: limit chapter title length to 256 characters in pdf_split_handle.py
--bug=1054363 --user=刘瑞斌 【知识库】导入PDF文档,分段标题长度超长时,没有自动截断 https://www.tapd.cn/57709429/s/1681044
2025-04-07 10:54:59 +08:00
CaptainB
675adeeb63 fix: exclude macOS specific files from zip processing
--bug=1054264 --user=刘瑞斌 【知识库】QA问答对模式,导入在mac上压缩的zip文件,会出现2个乱码文档 https://www.tapd.cn/57709429/s/1681034
2025-04-07 10:37:06 +08:00
CaptainB
27bc01d442 fix: skip macOS specific metadata directories and files in zip parsing
--bug=1054264 --user=刘瑞斌 【知识库】QA问答对模式,导入在mac上压缩的zip文件,会出现2个乱码文档 https://www.tapd.cn/57709429/s/1679674
2025-04-02 16:06:36 +08:00
shaohuzhang1
9750c6d605
fix: garbled zip import file names (#2747) 2025-03-31 16:22:39 +08:00
shaohuzhang1
55cdd0a708
fix: Zip with title cannot be parsed (#2683) 2025-03-26 10:31:31 +08:00