fix: improve memory extraction for colloquial/informal speech (#16)

* chore: add .worktrees/ to .gitignore Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(CI): 添加 onprem-dev 环境的构建和部署配置在 CircleCI 配置中新增 onprem-dev 环境的 build-and-push 和 deploy 任务，部署到 cluster-for-B 的 onprem-dev 命名空间 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: improve memory extraction for colloquial/informal speech Add semantic completeness rules and multilingual few-shot examples to FACT_RETRIEVAL_PROMPT to prevent truncated or semantically incorrect memory extraction. Specifically addresses Japanese casual speech where particles (が, を, に) are often omitted. Closes sparticleinc/mygpt-frontend#2125 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: zhuchao <zhuchaowe@163.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: shuirong <shuirong1997@icloud.com>
2026-03-17 10:37:49 +09:00 · 2026-03-17 10:37:49 +09:00 · deb78a7625
commit deb78a7625
parent f24c3ff78f
2 changed files with 47 additions and 0 deletions
--- a/.circleci/config.yml
+++ b/.circleci/config.yml
@ -193,3 +193,30 @@ workflows:
            branches:
              only:
                - onprem
+      # 为 onprem-dev 环境部署
+      - build-and-push:
+          name: build-for-onprem-dev
+          context:
+            - ecr-new
+          path: .
+          dockerfile: Dockerfile
+          repo: catalog-agent
+          docker-tag: ''
+          filters:
+            branches:
+              only:
+                - onprem
+      - deploy:
+          name: deploy-for-onprem-dev
+          docker-tag: ''
+          path: '/home/ubuntu/cluster-for-B/onprem-dev/catalog-agent/deploy.yaml'
+          deploy-name: catalog-agent
+          deploy-namespace: onprem-dev
+          context:
+            - ecr-new
+          filters:
+            branches:
+              only:
+                - onprem
+          requires:
+            - build-for-onprem-dev
--- a/prompt/FACT_RETRIEVAL_PROMPT.md
+++ b/prompt/FACT_RETRIEVAL_PROMPT.md
@ -83,6 +83,21 @@ Output: {{"facts" : ["Mike Smith helped with bug fix", "Contact: Mike Smith (col
 Input: Mike is coming to the meeting tomorrow.
 Output: {{"facts" : ["Mike Smith is coming to the meeting tomorrow", "Contact: Mike Smith (colleague, also referred as Mike) - DEFAULT when user says 'Mike'"]}}

+Input: 私は林檎好きです
+Output: {{"facts" : ["林檎が好き"]}}
+
+Input: コーヒー飲みたい、毎朝
+Output: {{"facts" : ["毎朝コーヒーを飲みたい"]}}
+
+Input: 昨日映画見た、すごくよかった
+Output: {{"facts" : ["昨日映画を見た", "映画がすごくよかった"]}}
+
+Input: 我喜欢吃苹果
+Output: {{"facts" : ["喜欢吃苹果"]}}
+
+Input: 나는 사과를 좋아해
+Output: {{"facts" : ["사과를 좋아함"]}}
+
 Return the facts and preferences in a json format as shown above.

 Remember the following:
@ -93,6 +108,11 @@ Remember the following:
 - If you do not find anything relevant in the below conversation, you can return an empty list corresponding to the "facts" key.
 - Create the facts based on the user and assistant messages only. Do not pick anything from the system messages.
 - Make sure to return the response in the format mentioned in the examples. The response should be in json with a key as "facts" and corresponding value will be a list of strings.
+- **CRITICAL for Semantic Completeness**:
+  - Each extracted fact MUST preserve the complete semantic meaning. Never truncate or drop key parts of the meaning.
+  - For colloquial or grammatically informal expressions (common in spoken Japanese, Chinese, Korean, etc.), understand the full intended meaning and record it in a clear, semantically complete form.
+  - In Japanese, spoken language often omits particles (e.g., が, を, に). When extracting facts, include the necessary particles to make the meaning unambiguous. For example: "私は林檎好きです" should be understood as "林檎が好き" (likes apples), not literally "私は林檎好き".
+  - When the user expresses a preference or opinion in casual speech, record the core preference/opinion clearly. Remove the subject pronoun (私は/I) since facts are about the user by default, but keep all other semantic components intact.
 - **CRITICAL for Contact/Relationship Tracking**:
  - ALWAYS use the "Contact: [name] (relationship/context)" format when recording people
  - When you see a short name that matches a known full name, record as "Contact: [Full Name] (relationship, also referred as [Short Name])"