From deb78a76253a3a3855965858443e92b9f0b2da8a Mon Sep 17 00:00:00 2001 From: autobee-sparticle Date: Tue, 17 Mar 2026 10:37:49 +0900 Subject: [PATCH] fix: improve memory extraction for colloquial/informal speech (#16) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * chore: add .worktrees/ to .gitignore Co-Authored-By: Claude Opus 4.6 * feat(CI): 添加 onprem-dev 环境的构建和部署配置 在 CircleCI 配置中新增 onprem-dev 环境的 build-and-push 和 deploy 任务,部署到 cluster-for-B 的 onprem-dev 命名空间 Co-Authored-By: Claude Opus 4.6 (1M context) * fix: improve memory extraction for colloquial/informal speech Add semantic completeness rules and multilingual few-shot examples to FACT_RETRIEVAL_PROMPT to prevent truncated or semantically incorrect memory extraction. Specifically addresses Japanese casual speech where particles (が, を, に) are often omitted. Closes sparticleinc/mygpt-frontend#2125 Co-Authored-By: Claude Opus 4.6 --------- Co-authored-by: zhuchao Co-authored-by: Claude Opus 4.6 Co-authored-by: shuirong --- .circleci/config.yml | 27 +++++++++++++++++++++++++++ prompt/FACT_RETRIEVAL_PROMPT.md | 20 ++++++++++++++++++++ 2 files changed, 47 insertions(+) diff --git a/.circleci/config.yml b/.circleci/config.yml index b0a1532..d0b7173 100644 --- a/.circleci/config.yml +++ b/.circleci/config.yml @@ -193,3 +193,30 @@ workflows: branches: only: - onprem + # 为 onprem-dev 环境部署 + - build-and-push: + name: build-for-onprem-dev + context: + - ecr-new + path: . + dockerfile: Dockerfile + repo: catalog-agent + docker-tag: '' + filters: + branches: + only: + - onprem + - deploy: + name: deploy-for-onprem-dev + docker-tag: '' + path: '/home/ubuntu/cluster-for-B/onprem-dev/catalog-agent/deploy.yaml' + deploy-name: catalog-agent + deploy-namespace: onprem-dev + context: + - ecr-new + filters: + branches: + only: + - onprem + requires: + - build-for-onprem-dev diff --git a/prompt/FACT_RETRIEVAL_PROMPT.md b/prompt/FACT_RETRIEVAL_PROMPT.md index 27777e6..1c2f93e 100644 --- a/prompt/FACT_RETRIEVAL_PROMPT.md +++ b/prompt/FACT_RETRIEVAL_PROMPT.md @@ -83,6 +83,21 @@ Output: {{"facts" : ["Mike Smith helped with bug fix", "Contact: Mike Smith (col Input: Mike is coming to the meeting tomorrow. Output: {{"facts" : ["Mike Smith is coming to the meeting tomorrow", "Contact: Mike Smith (colleague, also referred as Mike) - DEFAULT when user says 'Mike'"]}} +Input: 私は林檎好きです +Output: {{"facts" : ["林檎が好き"]}} + +Input: コーヒー飲みたい、毎朝 +Output: {{"facts" : ["毎朝コーヒーを飲みたい"]}} + +Input: 昨日映画見た、すごくよかった +Output: {{"facts" : ["昨日映画を見た", "映画がすごくよかった"]}} + +Input: 我喜欢吃苹果 +Output: {{"facts" : ["喜欢吃苹果"]}} + +Input: 나는 사과를 좋아해 +Output: {{"facts" : ["사과를 좋아함"]}} + Return the facts and preferences in a json format as shown above. Remember the following: @@ -93,6 +108,11 @@ Remember the following: - If you do not find anything relevant in the below conversation, you can return an empty list corresponding to the "facts" key. - Create the facts based on the user and assistant messages only. Do not pick anything from the system messages. - Make sure to return the response in the format mentioned in the examples. The response should be in json with a key as "facts" and corresponding value will be a list of strings. +- **CRITICAL for Semantic Completeness**: + - Each extracted fact MUST preserve the complete semantic meaning. Never truncate or drop key parts of the meaning. + - For colloquial or grammatically informal expressions (common in spoken Japanese, Chinese, Korean, etc.), understand the full intended meaning and record it in a clear, semantically complete form. + - In Japanese, spoken language often omits particles (e.g., が, を, に). When extracting facts, include the necessary particles to make the meaning unambiguous. For example: "私は林檎好きです" should be understood as "林檎が好き" (likes apples), not literally "私は林檎好き". + - When the user expresses a preference or opinion in casual speech, record the core preference/opinion clearly. Remove the subject pronoun (私は/I) since facts are about the user by default, but keep all other semantic components intact. - **CRITICAL for Contact/Relationship Tracking**: - ALWAYS use the "Contact: [name] (relationship/context)" format when recording people - When you see a short name that matches a known full name, record as "Contact: [Full Name] (relationship, also referred as [Short Name])"