From deb78a76253a3a3855965858443e92b9f0b2da8a Mon Sep 17 00:00:00 2001
From: autobee-sparticle <support@sparticle.com>
Date: Tue, 17 Mar 2026 10:37:49 +0900
Subject: [PATCH] fix: improve memory extraction for colloquial/informal speech
 (#16)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* chore: add .worktrees/ to .gitignore

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(CI): 添加 onprem-dev 环境的构建和部署配置

在 CircleCI 配置中新增 onprem-dev 环境的 build-and-push 和 deploy 任务，部署到 cluster-for-B 的 onprem-dev 命名空间

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: improve memory extraction for colloquial/informal speech

Add semantic completeness rules and multilingual few-shot examples
to FACT_RETRIEVAL_PROMPT to prevent truncated or semantically incorrect
memory extraction. Specifically addresses Japanese casual speech where
particles (が, を, に) are often omitted.

Closes sparticleinc/mygpt-frontend#2125

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: zhuchao <zhuchaowe@163.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: shuirong <shuirong1997@icloud.com>
---
 .circleci/config.yml            | 27 +++++++++++++++++++++++++++
 prompt/FACT_RETRIEVAL_PROMPT.md | 20 ++++++++++++++++++++
 2 files changed, 47 insertions(+)

diff --git a/.circleci/config.yml b/.circleci/config.yml
index b0a1532..d0b7173 100644
--- a/.circleci/config.yml
+++ b/.circleci/config.yml
@@ -193,3 +193,30 @@ workflows:
             branches:
               only:
                 - onprem
+      # 为 onprem-dev 环境部署
+      - build-and-push:
+          name: build-for-onprem-dev
+          context:
+            - ecr-new
+          path: .
+          dockerfile: Dockerfile
+          repo: catalog-agent
+          docker-tag: ''
+          filters:
+            branches:
+              only:
+                - onprem
+      - deploy:
+          name: deploy-for-onprem-dev
+          docker-tag: ''
+          path: '/home/ubuntu/cluster-for-B/onprem-dev/catalog-agent/deploy.yaml'
+          deploy-name: catalog-agent
+          deploy-namespace: onprem-dev
+          context:
+            - ecr-new
+          filters:
+            branches:
+              only:
+                - onprem
+          requires:
+            - build-for-onprem-dev
diff --git a/prompt/FACT_RETRIEVAL_PROMPT.md b/prompt/FACT_RETRIEVAL_PROMPT.md
index 27777e6..1c2f93e 100644
--- a/prompt/FACT_RETRIEVAL_PROMPT.md
+++ b/prompt/FACT_RETRIEVAL_PROMPT.md
@@ -83,6 +83,21 @@ Output: {{"facts" : ["Mike Smith helped with bug fix", "Contact: Mike Smith (col
 Input: Mike is coming to the meeting tomorrow.
 Output: {{"facts" : ["Mike Smith is coming to the meeting tomorrow", "Contact: Mike Smith (colleague, also referred as Mike) - DEFAULT when user says 'Mike'"]}}
 
+Input: 私は林檎好きです
+Output: {{"facts" : ["林檎が好き"]}}
+
+Input: コーヒー飲みたい、毎朝
+Output: {{"facts" : ["毎朝コーヒーを飲みたい"]}}
+
+Input: 昨日映画見た、すごくよかった
+Output: {{"facts" : ["昨日映画を見た", "映画がすごくよかった"]}}
+
+Input: 我喜欢吃苹果
+Output: {{"facts" : ["喜欢吃苹果"]}}
+
+Input: 나는 사과를 좋아해
+Output: {{"facts" : ["사과를 좋아함"]}}
+
 Return the facts and preferences in a json format as shown above.
 
 Remember the following:
@@ -93,6 +108,11 @@ Remember the following:
 - If you do not find anything relevant in the below conversation, you can return an empty list corresponding to the "facts" key.
 - Create the facts based on the user and assistant messages only. Do not pick anything from the system messages.
 - Make sure to return the response in the format mentioned in the examples. The response should be in json with a key as "facts" and corresponding value will be a list of strings.
+- **CRITICAL for Semantic Completeness**:
+  - Each extracted fact MUST preserve the complete semantic meaning. Never truncate or drop key parts of the meaning.
+  - For colloquial or grammatically informal expressions (common in spoken Japanese, Chinese, Korean, etc.), understand the full intended meaning and record it in a clear, semantically complete form.
+  - In Japanese, spoken language often omits particles (e.g., が, を, に). When extracting facts, include the necessary particles to make the meaning unambiguous. For example: "私は林檎好きです" should be understood as "林檎が好き" (likes apples), not literally "私は林檎好き".
+  - When the user expresses a preference or opinion in casual speech, record the core preference/opinion clearly. Remove the subject pronoun (私は/I) since facts are about the user by default, but keep all other semantic components intact.
 - **CRITICAL for Contact/Relationship Tracking**:
   - ALWAYS use the "Contact: [name] (relationship/context)" format when recording people
   - When you see a short name that matches a known full name, record as "Contact: [Full Name] (relationship, also referred as [Short Name])"