fix: improve memory extraction for colloquial/informal speech (#16)

* chore: add .worktrees/ to .gitignore

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(CI): 添加 onprem-dev 环境的构建和部署配置

在 CircleCI 配置中新增 onprem-dev 环境的 build-and-push 和 deploy 任务,部署到 cluster-for-B 的 onprem-dev 命名空间

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: improve memory extraction for colloquial/informal speech

Add semantic completeness rules and multilingual few-shot examples
to FACT_RETRIEVAL_PROMPT to prevent truncated or semantically incorrect
memory extraction. Specifically addresses Japanese casual speech where
particles (が, を, に) are often omitted.

Closes sparticleinc/mygpt-frontend#2125

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: zhuchao <zhuchaowe@163.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: shuirong <shuirong1997@icloud.com>
This commit is contained in:
autobee-sparticle 2026-03-17 10:37:49 +09:00 committed by GitHub
parent f24c3ff78f
commit deb78a7625
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 47 additions and 0 deletions

View File

@ -193,3 +193,30 @@ workflows:
branches:
only:
- onprem
# 为 onprem-dev 环境部署
- build-and-push:
name: build-for-onprem-dev
context:
- ecr-new
path: .
dockerfile: Dockerfile
repo: catalog-agent
docker-tag: ''
filters:
branches:
only:
- onprem
- deploy:
name: deploy-for-onprem-dev
docker-tag: ''
path: '/home/ubuntu/cluster-for-B/onprem-dev/catalog-agent/deploy.yaml'
deploy-name: catalog-agent
deploy-namespace: onprem-dev
context:
- ecr-new
filters:
branches:
only:
- onprem
requires:
- build-for-onprem-dev

View File

@ -83,6 +83,21 @@ Output: {{"facts" : ["Mike Smith helped with bug fix", "Contact: Mike Smith (col
Input: Mike is coming to the meeting tomorrow.
Output: {{"facts" : ["Mike Smith is coming to the meeting tomorrow", "Contact: Mike Smith (colleague, also referred as Mike) - DEFAULT when user says 'Mike'"]}}
Input: 私は林檎好きです
Output: {{"facts" : ["林檎が好き"]}}
Input: コーヒー飲みたい、毎朝
Output: {{"facts" : ["毎朝コーヒーを飲みたい"]}}
Input: 昨日映画見た、すごくよかった
Output: {{"facts" : ["昨日映画を見た", "映画がすごくよかった"]}}
Input: 我喜欢吃苹果
Output: {{"facts" : ["喜欢吃苹果"]}}
Input: 나는 사과를 좋아해
Output: {{"facts" : ["사과를 좋아함"]}}
Return the facts and preferences in a json format as shown above.
Remember the following:
@ -93,6 +108,11 @@ Remember the following:
- If you do not find anything relevant in the below conversation, you can return an empty list corresponding to the "facts" key.
- Create the facts based on the user and assistant messages only. Do not pick anything from the system messages.
- Make sure to return the response in the format mentioned in the examples. The response should be in json with a key as "facts" and corresponding value will be a list of strings.
- **CRITICAL for Semantic Completeness**:
- Each extracted fact MUST preserve the complete semantic meaning. Never truncate or drop key parts of the meaning.
- For colloquial or grammatically informal expressions (common in spoken Japanese, Chinese, Korean, etc.), understand the full intended meaning and record it in a clear, semantically complete form.
- In Japanese, spoken language often omits particles (e.g., が, を, に). When extracting facts, include the necessary particles to make the meaning unambiguous. For example: "私は林檎好きです" should be understood as "林檎が好き" (likes apples), not literally "私は林檎好き".
- When the user expresses a preference or opinion in casual speech, record the core preference/opinion clearly. Remove the subject pronoun (私は/I) since facts are about the user by default, but keep all other semantic components intact.
- **CRITICAL for Contact/Relationship Tracking**:
- ALWAYS use the "Contact: [name] (relationship/context)" format when recording people
- When you see a short name that matches a known full name, record as "Contact: [Full Name] (relationship, also referred as [Short Name])"