config
This commit is contained in:
parent
e432417299
commit
fe102e924c
21
LICENSE
21
LICENSE
@ -1,21 +0,0 @@
|
||||
MIT License
|
||||
|
||||
Copyright (c) 2025 m15-ai
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
||||
333
README.md
333
README.md
@ -1,333 +0,0 @@
|
||||
# Local Voice Assistant (Offline, Real-Time AI)
|
||||
|
||||
**Lightweight, low-latency voice assistant running fully offline on a Raspberry Pi or Linux machine.**
|
||||
Powered by PyAudio, Vosk STT, Piper TTS, and local LLMs via Ollama.
|
||||
|
||||

|
||||

|
||||

|
||||

|
||||
|
||||
---
|
||||
|
||||
## 🎯 Features
|
||||
|
||||
- 🎙️ **Microphone Input** using PyAudio
|
||||
- 🔊 **Real-Time Transcription** with [Vosk](https://alphacephei.com/vosk/)
|
||||
- 🧠 **LLM-Powered Responses** using [Ollama](https://ollama.com) with models like `gemma2:2b`, `qwen2.5:0.5b`
|
||||
- 🗣️ **Natural Voice Output** via [Piper TTS](https://github.com/rhasspy/piper)
|
||||
- 🎛️ Optional **Noise & Filter FX** using SoX for realism
|
||||
- 🔧 ALSA **Volume Control**
|
||||
- 🧩 Modular Python code ready for customization
|
||||
|
||||
---
|
||||
|
||||
## 🛠 Requirements
|
||||
|
||||
- Raspberry Pi 5 or Linux desktop
|
||||
- Python 3.9+
|
||||
- PyAudio, NumPy, requests, soxr, pydub, vosk
|
||||
- SoX + ALSA utilities
|
||||
- Ollama with one or more small LLMs (e.g., Gemma or Qwen)
|
||||
- Piper TTS with ONNX voice models
|
||||
|
||||
Install dependencies:
|
||||
|
||||
```
|
||||
pip install pyaudio requests soxr numpy pydub vosk
|
||||
sudo apt install sox alsa-utils
|
||||
```
|
||||
|
||||
## ⚙️ JSON Configuration
|
||||
|
||||
Place a config file at va_config.json:
|
||||
|
||||
```
|
||||
{
|
||||
"volume": 8,
|
||||
"mic_name": "Plantronics",
|
||||
"audio_output_device": "Plantronics",
|
||||
"model_name": "gemma2:2b",
|
||||
"voice": "en_US-kathleen-low.onnx",
|
||||
"enable_audio_processing": false,
|
||||
"history_length": 6,
|
||||
"system_prompt": "You are a helpful assistant."
|
||||
}
|
||||
```
|
||||
|
||||
Note: if the configuration file is not found, defaults withing the main python app will be used:
|
||||
|
||||
```
|
||||
# ------------------- CONFIG FILE LOADING -------------------
|
||||
DEFAULT_CONFIG = {
|
||||
"volume": 9,
|
||||
"mic_name": "Plantronics",
|
||||
"audio_output_device": "Plantronics",
|
||||
"model_name": "qwen2.5:0.5b",
|
||||
"voice": "en_US-kathleen-low.onnx",
|
||||
"enable_audio_processing": False,
|
||||
"history_length": 4,
|
||||
"system_prompt": "You are a helpful assistant."
|
||||
}
|
||||
```
|
||||
|
||||
### 🔁 What `history_length` Means
|
||||
|
||||
The `history_length` setting controls how many previous exchanges (user + assistant messages) are included when generating each new reply.
|
||||
|
||||
- A value of `6` means the model receives the last 6 exchanges, plus the system prompt.
|
||||
- This allows the assistant to maintain **short-term memory** for more coherent conversations.
|
||||
- Setting it lower (e.g., `2`) increases speed and memory efficiency.
|
||||
|
||||
### ✅ `requirements.txt`
|
||||
|
||||
```
|
||||
pyaudio
|
||||
vosk
|
||||
soxr
|
||||
numpy
|
||||
requests
|
||||
pydub
|
||||
```
|
||||
|
||||
If you plan to run this on a Raspberry Pi, you may also need:
|
||||
|
||||
```
|
||||
soundfile # for pydub compatibility on some distros
|
||||
```
|
||||
|
||||
## 🐍 Install with Virtual Environment
|
||||
|
||||
```
|
||||
# 1. Clone the repo
|
||||
|
||||
git clone https://github.com/your-username/voice-assistant-local.git
|
||||
cd voice-assistant-local
|
||||
|
||||
# 2. Create and activate a virtual environment
|
||||
|
||||
python3 -m venv env
|
||||
source env/bin/activate
|
||||
|
||||
# 3. Install dependencies
|
||||
|
||||
pip install -r requirements.txt
|
||||
|
||||
# 4. Install SoX and ALSA utilities (if not already installed)
|
||||
|
||||
sudo apt install sox alsa-utils
|
||||
|
||||
# 5. (Optional) Test PyAudio installation
|
||||
|
||||
python -m pip install --upgrade pip setuptools wheel
|
||||
```
|
||||
|
||||
> 💡 If you get errors installing PyAudio on Raspberry Pi, try:
|
||||
>
|
||||
> ```
|
||||
> sudo apt install portaudio19-dev
|
||||
> pip install pyaudio
|
||||
> ```
|
||||
|
||||
## 🆕 🔧 Piper Installation (Binary)
|
||||
|
||||
Piper is a standalone text-to-speech engine used by this assistant. It's **not a Python package**, so it must be installed manually.
|
||||
|
||||
#### ✅ Install Piper
|
||||
|
||||
1. Download the appropriate Piper binary from:
|
||||
👉 https://github.com/rhasspy/piper/releases
|
||||
|
||||
For Ubuntu Linux, download:
|
||||
`piper_linux_x86_64.tar.gz`
|
||||
|
||||
2. Extract it:
|
||||
|
||||
```
|
||||
tar -xvzf piper_linux_x86_64.tar.gz
|
||||
```
|
||||
|
||||
3. Move the binary into your project directory:
|
||||
|
||||
```
|
||||
mkdir -p bin/piper
|
||||
mv piper bin/piper/
|
||||
chmod +x bin/piper/piper
|
||||
```
|
||||
|
||||
4. ✅ Done! The script will automatically call it from `bin/piper/piper`.
|
||||
|
||||
## 📂 Directory Example
|
||||
|
||||
```
|
||||
voice_assistant.py
|
||||
va_config.json
|
||||
requirements.txt
|
||||
bin/
|
||||
└── piper/
|
||||
└── piper ← (binary)
|
||||
voices/
|
||||
└── en_US-kathleen-low.onnx
|
||||
└── en_US-kathleen-low.onnx.json
|
||||
```
|
||||
|
||||
|
||||
|
||||
## 🔌 Finding Your USB Microphone & Speaker
|
||||
|
||||
To configure the correct audio devices, use these commands on your Raspberry Pi or Linux terminal:
|
||||
|
||||
1. List Microphones (Input Devices)
|
||||
|
||||
```
|
||||
python3 -m pip install pyaudio
|
||||
python3 -c "import pyaudio; p = pyaudio.PyAudio(); \
|
||||
[print(f'{i}: {p.get_device_info_by_index(i)}') for i in range(p.get_device_count())]"
|
||||
```
|
||||
|
||||
Look for your microphone name (e.g., Plantronics) and use that as mic_name.
|
||||
2. List Speakers (Output Devices)
|
||||
|
||||
```
|
||||
aplay -l
|
||||
```
|
||||
|
||||
Example output:
|
||||
|
||||
```
|
||||
card 3: Device [USB PnP Sound Device], device 0: USB Audio [USB Audio]
|
||||
```
|
||||
|
||||
Use this info to set your audio_output_device to something like:
|
||||
|
||||
```
|
||||
"audio_output_device": "USB PnP"
|
||||
```
|
||||
|
||||
## 🔧 Ollama Installation (Required)
|
||||
|
||||
Ollama is a local model runner for LLMs. You need to install it separately (outside of Python).
|
||||
|
||||
#### 💻 Install Ollama
|
||||
|
||||
On **Linux (x86 or ARM)**:
|
||||
|
||||
```
|
||||
curl -fsSL https://ollama.com/install.sh | sh
|
||||
```
|
||||
|
||||
Or follow detailed instructions:
|
||||
👉 https://ollama.com/download
|
||||
|
||||
Then start the daemon:
|
||||
|
||||
```
|
||||
ollama serve
|
||||
```
|
||||
|
||||
#### 📥 Download the Models
|
||||
|
||||
After Ollama is installed and running, open a terminal and run:
|
||||
|
||||
##### ✅ For Gemma 2B:
|
||||
|
||||
```
|
||||
ollama run gemma2:2b
|
||||
```
|
||||
|
||||
##### For Qwen 0.5B:
|
||||
|
||||
```
|
||||
ollama run qwen2.5:0.5b
|
||||
```
|
||||
|
||||
This will automatically download and start the models. You only need to run this once per model.
|
||||
|
||||
##### ⚠️ Reminder
|
||||
|
||||
> Ollama is **not a Python package** — it is a background service.
|
||||
> Do **not** add it to `requirements.txt`. Just make sure it’s installed and running before launching the assistant.
|
||||
|
||||
## 🎤 Installing Piper Voice Models
|
||||
|
||||
To enable speech synthesis, you'll need to download a **voice model (.onnx)** and its matching **config (.json)** file.
|
||||
|
||||
#### ✅ Steps:
|
||||
|
||||
1. Visit the official Piper voices list:
|
||||
📄 https://github.com/rhasspy/piper/blob/master/VOICES.md
|
||||
|
||||
2. Choose a voice you like (e.g., `en_US-lessac-medium` or `en_US-amy-low`).
|
||||
|
||||
3. Download **both** files for your chosen voice:
|
||||
|
||||
- `voice.onnx`
|
||||
- `config.json`
|
||||
|
||||
4. If you wish, you can rename the ONNX file and config file using the same base name. For example:
|
||||
|
||||
```
|
||||
amy-low.onnx
|
||||
amy-low.json
|
||||
```
|
||||
|
||||
5. Place both files in a directory called `voices/` next to your script.
|
||||
Example Directory Structure:
|
||||
|
||||
```
|
||||
voice_assistant.py
|
||||
voices/
|
||||
├── amy-low.onnx
|
||||
└── amy-low.json
|
||||
```
|
||||
|
||||
6. Update your `config.json`:
|
||||
|
||||
```
|
||||
"voice": "amy-low.onnx"
|
||||
```
|
||||
|
||||
> ⚠️ Make sure both `.onnx` and `.json` are present in the `voices/` folder with matching names (excluding the extension).
|
||||
|
||||
## 🧪 **Performance Report**
|
||||
|
||||
The script prints out debug timing for the STT, LLM, and TTS parts of the pipeline. I asked ChatGPT4 to analyze some of the results i obtained.
|
||||
|
||||
**System:** Ubuntu laptop, Intel Core i5
|
||||
**Model:** `qwen2.5:0.5b` (local via Ollama)
|
||||
**TTS:** `piper` with `en_US-kathleen-low.onnx`
|
||||
**Audio:** Plantronics USB headset
|
||||
|
||||
------
|
||||
|
||||
### 📊 **Timing Metrics (avg)**
|
||||
|
||||
| Stage | Metric (ms) | Notes |
|
||||
| -------------- | ------------- | --------------------------------------- |
|
||||
| STT Parse | 4.5 ms avg | Vosk transcribes near-instantly |
|
||||
| LLM Inference | ~2,200 ms avg | Ranges from ~1s (short queries) to 5s |
|
||||
| TTS Generation | ~1,040 ms avg | Piper ONNX performs well on CPU |
|
||||
| Audio Playback | ~7,250 ms avg | Reflects actual audio length, not delay |
|
||||
|
||||
### ✅ Observations
|
||||
|
||||
- **STT speed is excellent** — under 10 ms consistently.
|
||||
- **LLM inference is snappy** for a 0.5b model running locally. Your best response came in under 1.1 sec.
|
||||
- **TTS is consistent and fast** — Kathleen-low voice is fully synthesized in ~800–1600 ms.
|
||||
- **Playback timing matches response length** — no lag, just actual audio time.
|
||||
- End-to-end round trip time from speaking to hearing a reply is about **8–10 seconds**, including speech and playback time.
|
||||
|
||||
## 💡 Use Cases
|
||||
|
||||
- Offline smart assistants
|
||||
|
||||
- Wearable or embedded AI demos
|
||||
|
||||
- Voice-controlled kiosks
|
||||
|
||||
- Character-based roleplay agents
|
||||
|
||||
|
||||
## 📄 License
|
||||
|
||||
MIT © 2024 M15.ai
|
||||
BIN
__pycache__/voice_recorder.cpython-312.pyc
Normal file
BIN
__pycache__/voice_recorder.cpython-312.pyc
Normal file
Binary file not shown.
BIN
__pycache__/vosk_wake_word.cpython-312.pyc
Normal file
BIN
__pycache__/vosk_wake_word.cpython-312.pyc
Normal file
Binary file not shown.
@ -1,37 +0,0 @@
|
||||
# RealtimeDialog
|
||||
|
||||
实时语音对话程序,支持语音输入和语音输出。
|
||||
|
||||
## 使用说明
|
||||
|
||||
此demo使用python3.7环境进行开发调试,其他python版本可能会有兼容性问题,需要自己尝试解决。
|
||||
|
||||
1. 配置API密钥
|
||||
- 打开 `config.py` 文件
|
||||
- 修改以下两个字段:
|
||||
```python
|
||||
"X-Api-App-ID": "火山控制台上端到端大模型对应的App ID",
|
||||
"X-Api-Access-Key": "火山控制台上端到端大模型对应的Access Key",
|
||||
```
|
||||
- 修改speaker字段指定发音人,本次支持四个发音人:
|
||||
- `zh_female_vv_jupiter_bigtts`:中文vv女声
|
||||
- `zh_female_xiaohe_jupiter_bigtts`:中文xiaohe女声
|
||||
- `zh_male_yunzhou_jupiter_bigtts`:中文云洲男声
|
||||
- `zh_male_xiaotian_jupiter_bigtts`:中文小天男声
|
||||
|
||||
2. 安装依赖
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
|
||||
3. 通过麦克风运行程序
|
||||
```bash
|
||||
python main.py --format=pcm
|
||||
```
|
||||
4. 通过录音文件启动程序
|
||||
```bash
|
||||
python main.py --audio=whoareyou.wav
|
||||
```
|
||||
5. 通过纯文本输入和程序交互
|
||||
```bash
|
||||
python main.py --mod=text --recv_timeout=120
|
||||
```
|
||||
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
@ -1,695 +0,0 @@
|
||||
import asyncio
|
||||
import queue
|
||||
import random
|
||||
import signal
|
||||
import sys
|
||||
import threading
|
||||
import time
|
||||
import uuid
|
||||
import wave
|
||||
from dataclasses import dataclass
|
||||
from typing import Any, Dict, Optional
|
||||
|
||||
import config
|
||||
import pyaudio
|
||||
from realtime_dialog_client import RealtimeDialogClient
|
||||
|
||||
|
||||
@dataclass
|
||||
class AudioConfig:
|
||||
"""音频配置数据类"""
|
||||
format: str
|
||||
bit_size: int
|
||||
channels: int
|
||||
sample_rate: int
|
||||
chunk: int
|
||||
|
||||
|
||||
class AudioDeviceManager:
|
||||
"""音频设备管理类,处理音频输入输出"""
|
||||
|
||||
def __init__(self, input_config: AudioConfig, output_config: AudioConfig):
|
||||
self.input_config = input_config
|
||||
self.output_config = output_config
|
||||
self.pyaudio = pyaudio.PyAudio()
|
||||
self.input_stream: Optional[pyaudio.Stream] = None
|
||||
self.output_stream: Optional[pyaudio.Stream] = None
|
||||
|
||||
def open_input_stream(self) -> pyaudio.Stream:
|
||||
"""打开音频输入流"""
|
||||
# p = pyaudio.PyAudio()
|
||||
self.input_stream = self.pyaudio.open(
|
||||
format=self.input_config.bit_size,
|
||||
channels=self.input_config.channels,
|
||||
rate=self.input_config.sample_rate,
|
||||
input=True,
|
||||
frames_per_buffer=self.input_config.chunk
|
||||
)
|
||||
return self.input_stream
|
||||
|
||||
def open_output_stream(self) -> pyaudio.Stream:
|
||||
"""打开音频输出流"""
|
||||
self.output_stream = self.pyaudio.open(
|
||||
format=self.output_config.bit_size,
|
||||
channels=self.output_config.channels,
|
||||
rate=self.output_config.sample_rate,
|
||||
output=True,
|
||||
frames_per_buffer=self.output_config.chunk
|
||||
)
|
||||
return self.output_stream
|
||||
|
||||
def cleanup(self) -> None:
|
||||
"""清理音频设备资源"""
|
||||
for stream in [self.input_stream, self.output_stream]:
|
||||
if stream:
|
||||
stream.stop_stream()
|
||||
stream.close()
|
||||
self.pyaudio.terminate()
|
||||
|
||||
|
||||
class DialogSession:
|
||||
"""对话会话管理类"""
|
||||
is_audio_file_input: bool
|
||||
mod: str
|
||||
|
||||
def __init__(self, ws_config: Dict[str, Any], output_audio_format: str = "pcm", audio_file_path: str = "",
|
||||
mod: str = "audio", recv_timeout: int = 10):
|
||||
self.audio_file_path = audio_file_path
|
||||
self.recv_timeout = recv_timeout
|
||||
self.is_audio_file_input = self.audio_file_path != ""
|
||||
if self.is_audio_file_input:
|
||||
mod = 'audio_file'
|
||||
else:
|
||||
self.say_hello_over_event = asyncio.Event()
|
||||
self.mod = mod
|
||||
|
||||
self.session_id = str(uuid.uuid4())
|
||||
self.client = RealtimeDialogClient(config=ws_config, session_id=self.session_id,
|
||||
output_audio_format=output_audio_format, mod=mod, recv_timeout=recv_timeout)
|
||||
if output_audio_format == "pcm_s16le":
|
||||
config.output_audio_config["format"] = "pcm_s16le"
|
||||
config.output_audio_config["bit_size"] = pyaudio.paInt16
|
||||
|
||||
self.is_running = True
|
||||
self.is_session_finished = False
|
||||
self.is_user_querying = False
|
||||
self.is_sending_chat_tts_text = False
|
||||
self.audio_buffer = b''
|
||||
self.is_playing_audio = False # 是否正在播放音频
|
||||
self.audio_queue_lock = threading.Lock() # 音频队列锁
|
||||
self.is_recording_paused = False # 录音是否被暂停
|
||||
self.should_send_silence = False # 是否需要发送静音数据
|
||||
self.silence_send_count = 0 # 需要发送的静音数据数量
|
||||
self.pre_pause_time = 0 # 预暂停时间
|
||||
self.last_recording_state = False # 上次录音状态
|
||||
self.say_hello_completed = False # say hello 是否已完成
|
||||
|
||||
# 新增:音频输入流控制
|
||||
self.input_stream_paused = False # 输入流是否被暂停
|
||||
self.force_silence_mode = False # 强制静音模式
|
||||
self.echo_suppression_start_time = 0 # 回声抑制开始时间
|
||||
|
||||
signal.signal(signal.SIGINT, self._keyboard_signal)
|
||||
self.audio_queue = queue.Queue()
|
||||
if not self.is_audio_file_input:
|
||||
self.audio_device = AudioDeviceManager(
|
||||
AudioConfig(**config.input_audio_config),
|
||||
AudioConfig(**config.output_audio_config)
|
||||
)
|
||||
# 初始化音频队列和输出流
|
||||
print(f"输出音频配置: {config.output_audio_config}")
|
||||
self.output_stream = self.audio_device.open_output_stream()
|
||||
print("音频输出流已打开")
|
||||
# 启动播放线程
|
||||
self.is_recording = True
|
||||
self.is_playing = True
|
||||
self.player_thread = threading.Thread(target=self._audio_player_thread)
|
||||
self.player_thread.daemon = True
|
||||
self.player_thread.start()
|
||||
|
||||
def _audio_player_thread(self):
|
||||
"""音频播放线程"""
|
||||
audio_playing_timeout = 1.0 # 1秒没有音频数据认为播放结束
|
||||
queue_check_interval = 0.1 # 每100ms检查一次队列状态
|
||||
|
||||
while self.is_playing:
|
||||
try:
|
||||
# 从队列获取音频数据
|
||||
audio_data = self.audio_queue.get(timeout=queue_check_interval)
|
||||
if audio_data is not None:
|
||||
with self.audio_queue_lock:
|
||||
# 第三重保险:播放开始时最终确认暂停状态
|
||||
was_not_playing = not self.is_playing_audio
|
||||
if not hasattr(self, 'last_audio_time') or was_not_playing:
|
||||
# 从非播放状态进入播放状态
|
||||
self.is_playing_audio = True
|
||||
# 确保录音已暂停
|
||||
if not self.is_recording_paused:
|
||||
self.is_recording_paused = True
|
||||
print("播放开始,最终确认暂停录音")
|
||||
|
||||
# 更新最后音频时间
|
||||
self.last_audio_time = time.time()
|
||||
|
||||
# 播放前额外发送静音数据清理管道
|
||||
if was_not_playing:
|
||||
print("播放开始前,额外发送静音数据清理管道")
|
||||
for _ in range(3):
|
||||
self.output_stream.write(b'\x00' * len(audio_data))
|
||||
time.sleep(0.1)
|
||||
|
||||
# 播放音频数据
|
||||
self.output_stream.write(audio_data)
|
||||
|
||||
except queue.Empty:
|
||||
# 队列为空,检查是否超时
|
||||
current_time = time.time()
|
||||
with self.audio_queue_lock:
|
||||
if self.is_playing_audio:
|
||||
if hasattr(self, 'last_audio_time') and current_time - self.last_audio_time > audio_playing_timeout:
|
||||
# 超过1秒没有新音频,认为播放结束
|
||||
self.is_playing_audio = False
|
||||
self.is_recording_paused = False
|
||||
self.force_silence_mode = False # 关闭强制静音模式
|
||||
self.input_stream_paused = False # 恢复输入流
|
||||
# 标记 say hello 完成
|
||||
if hasattr(self, 'say_hello_completed') and not self.say_hello_completed:
|
||||
self.say_hello_completed = True
|
||||
print("say hello 音频播放完成")
|
||||
print("音频播放超时,恢复录音")
|
||||
# 直接发送静音数据,而不是在协程中发送
|
||||
try:
|
||||
silence_data = b'\x00' * config.input_audio_config["chunk"]
|
||||
# 使用同步方式发送静音数据
|
||||
# 这里我们设置一个标志,让主循环处理
|
||||
self.silence_send_count = 2 # 播放超时时发送2组静音数据
|
||||
self.should_send_silence = True
|
||||
except Exception as e:
|
||||
print(f"准备静音数据失败: {e}")
|
||||
elif self.audio_queue.empty():
|
||||
# 队列为空,但还没超时,继续等待
|
||||
pass
|
||||
time.sleep(0.01)
|
||||
except Exception as e:
|
||||
print(f"音频播放错误: {e}")
|
||||
with self.audio_queue_lock:
|
||||
self.is_playing_audio = False
|
||||
self.is_recording_paused = False
|
||||
time.sleep(0.1)
|
||||
|
||||
# 移除了静音检测函数,避免干扰正常的音频处理
|
||||
|
||||
async def _send_silence_on_playback_end(self):
|
||||
"""播放结束时发送静音数据"""
|
||||
try:
|
||||
silence_data = b'\x00' * config.input_audio_config["chunk"]
|
||||
await self.client.task_request(silence_data)
|
||||
print("播放结束,已发送静音数据")
|
||||
except Exception as e:
|
||||
print(f"发送静音数据失败: {e}")
|
||||
|
||||
def _check_and_restore_recording(self):
|
||||
"""检查并恢复录音状态"""
|
||||
with self.audio_queue_lock:
|
||||
if self.is_recording_paused and self.audio_queue.empty():
|
||||
# 如果队列为空且录音被暂停,恢复录音
|
||||
self.is_recording_paused = False
|
||||
self.is_playing_audio = False
|
||||
print("音频队列为空,自动恢复录音")
|
||||
return True
|
||||
return False
|
||||
|
||||
def handle_server_response(self, response: Dict[str, Any]) -> None:
|
||||
if not response or response == {}:
|
||||
return
|
||||
"""处理服务器响应"""
|
||||
message_type = response.get('message_type')
|
||||
if message_type == 'SERVER_ACK' and isinstance(response.get('payload_msg'), bytes):
|
||||
if self.is_sending_chat_tts_text:
|
||||
return
|
||||
audio_data = response['payload_msg']
|
||||
|
||||
# 第二重保险:接收到音频数据时确认暂停状态
|
||||
with self.audio_queue_lock:
|
||||
was_not_playing = not self.is_playing_audio
|
||||
if was_not_playing:
|
||||
# 第一批音频数据到达,确保录音已暂停
|
||||
self.is_playing_audio = True
|
||||
if not self.is_recording_paused:
|
||||
self.is_recording_paused = True
|
||||
print("接收到首批音频数据,立即暂停录音")
|
||||
else:
|
||||
print("接收到音频数据,录音已暂停")
|
||||
|
||||
# 立即发送静音数据,确保管道清理
|
||||
self.silence_send_count = 3 # 音频数据到达时发送3组静音数据
|
||||
self.should_send_silence = True
|
||||
print("服务器收到音频数据,立即清理录音管道")
|
||||
|
||||
if not self.is_audio_file_input:
|
||||
self.audio_queue.put(audio_data)
|
||||
self.audio_buffer += audio_data
|
||||
elif message_type == 'SERVER_FULL_RESPONSE':
|
||||
print(f"服务器响应: {response}")
|
||||
event = response.get('event')
|
||||
payload_msg = response.get('payload_msg', {})
|
||||
|
||||
# 第一重保险:服务器开始响应时立即预暂停录音
|
||||
if event in [450, 359, 152, 153]: # 这些事件表示服务器开始或结束响应
|
||||
if event == 450:
|
||||
print(f"清空缓存音频: {response['session_id']}")
|
||||
while not self.audio_queue.empty():
|
||||
try:
|
||||
self.audio_queue.get_nowait()
|
||||
except queue.Empty:
|
||||
continue
|
||||
self.is_user_querying = True
|
||||
print("服务器准备接收用户输入")
|
||||
|
||||
# 预暂停录音,防止即将到来的音频回声
|
||||
with self.audio_queue_lock:
|
||||
if not self.is_recording_paused:
|
||||
self.is_recording_paused = True
|
||||
self.is_playing_audio = True # 同时设置播放状态,双重保险
|
||||
self.pre_pause_time = time.time() - 2.0 # 提前2秒预暂停
|
||||
self.force_silence_mode = True # 启用强制静音模式
|
||||
self.echo_suppression_start_time = time.time() # 记录回声抑制开始时间
|
||||
print("服务器开始响应,预暂停录音防止回声")
|
||||
|
||||
# 立即发送静音数据清理管道,防止前1-2秒回声
|
||||
print("预暂停期间立即发送静音数据清理管道")
|
||||
# 设置批量静音发送,确保管道完全清理
|
||||
self.silence_send_count = 20 # 增加到20组,确保彻底清理
|
||||
self.should_send_silence = True
|
||||
|
||||
# 强制重置录音状态
|
||||
self.last_recording_state = True # 标记为已暂停
|
||||
self.input_stream_paused = True # 暂停输入流
|
||||
|
||||
if event == 350 and self.is_sending_chat_tts_text and payload_msg.get("tts_type") in ["chat_tts_text", "external_rag"]:
|
||||
while not self.audio_queue.empty():
|
||||
try:
|
||||
self.audio_queue.get_nowait()
|
||||
except queue.Empty:
|
||||
continue
|
||||
self.is_sending_chat_tts_text = False
|
||||
|
||||
if event == 459:
|
||||
self.is_user_querying = False
|
||||
# 服务器完成响应,立即恢复录音
|
||||
with self.audio_queue_lock:
|
||||
was_paused = self.is_recording_paused
|
||||
self.is_recording_paused = False
|
||||
self.is_playing_audio = False
|
||||
self.force_silence_mode = False # 关闭强制静音模式
|
||||
self.input_stream_paused = False # 恢复输入流
|
||||
if was_paused:
|
||||
print("服务器响应完成,立即恢复录音")
|
||||
# 设置标志发送静音数据
|
||||
self.silence_send_count = 2 # 响应完成时发送2组静音数据
|
||||
self.should_send_silence = True
|
||||
print("服务器完成响应,等待用户输入")
|
||||
#if random.randint(0, 100000)%1 == 0:
|
||||
# self.is_sending_chat_tts_text = True
|
||||
#asyncio.create_task(self.trigger_chat_tts_text())
|
||||
#asyncio.create_task(self.trigger_chat_rag_text())
|
||||
elif message_type == 'SERVER_ERROR':
|
||||
print(f"服务器错误: {response['payload_msg']}")
|
||||
raise Exception("服务器错误")
|
||||
|
||||
async def trigger_chat_tts_text(self):
|
||||
"""概率触发发送ChatTTSText请求"""
|
||||
print("hit ChatTTSText event, start sending...")
|
||||
await self.client.chat_tts_text(
|
||||
is_user_querying=self.is_user_querying,
|
||||
start=True,
|
||||
end=False,
|
||||
content="这是查询到外部数据之前的安抚话术。",
|
||||
)
|
||||
await self.client.chat_tts_text(
|
||||
is_user_querying=self.is_user_querying,
|
||||
start=False,
|
||||
end=True,
|
||||
content="",
|
||||
)
|
||||
|
||||
async def trigger_chat_rag_text(self):
|
||||
await asyncio.sleep(5) # 模拟查询外部RAG的耗时,这里为了不影响GTA安抚话术的播报,直接sleep 5秒
|
||||
print("hit ChatRAGText event, start sending...")
|
||||
await self.client.chat_rag_text(self.is_user_querying, external_rag='[{"title":"北京天气","content":"今天北京整体以晴到多云为主,但西部和北部地带可能会出现分散性雷阵雨,特别是午后至傍晚时段需注意突发降雨。\n💨 风况与湿度\n风力较弱,一般为 2–3 级南风或西南风\n白天湿度较高,早晚略凉爽"}]')
|
||||
|
||||
def _keyboard_signal(self, sig, frame):
|
||||
print(f"receive keyboard Ctrl+C")
|
||||
self.stop()
|
||||
|
||||
def stop(self):
|
||||
self.is_recording = False
|
||||
self.is_playing = False
|
||||
self.is_running = False
|
||||
|
||||
async def receive_loop(self):
|
||||
try:
|
||||
while True:
|
||||
response = await self.client.receive_server_response()
|
||||
self.handle_server_response(response)
|
||||
if 'event' in response and (response['event'] == 152 or response['event'] == 153):
|
||||
print(f"receive session finished event: {response['event']}")
|
||||
self.is_session_finished = True
|
||||
break
|
||||
if 'event' in response and response['event'] == 359:
|
||||
if self.is_audio_file_input:
|
||||
print(f"receive tts ended event")
|
||||
self.is_session_finished = True
|
||||
break
|
||||
else:
|
||||
if not self.say_hello_over_event.is_set():
|
||||
print(f"receive tts sayhello ended event")
|
||||
self.say_hello_over_event.set()
|
||||
|
||||
# 对于音频模式,say hello 音频播放即将开始
|
||||
# 确保录音保持暂停状态
|
||||
if self.mod == "audio":
|
||||
with self.audio_queue_lock:
|
||||
self.is_recording_paused = True
|
||||
self.is_playing_audio = True
|
||||
print("say hello 音频即将开始,确保录音暂停")
|
||||
|
||||
if self.mod == "text":
|
||||
# 文本模式下 say hello 完成,恢复录音状态
|
||||
with self.audio_queue_lock:
|
||||
if self.is_recording_paused:
|
||||
self.is_recording_paused = False
|
||||
print("文本模式:say hello 完成,恢复录音")
|
||||
print("请输入内容:")
|
||||
|
||||
except asyncio.CancelledError:
|
||||
print("接收任务已取消")
|
||||
except Exception as e:
|
||||
print(f"接收消息错误: {e}")
|
||||
finally:
|
||||
self.stop()
|
||||
self.is_session_finished = True
|
||||
|
||||
async def process_audio_file(self) -> None:
|
||||
await self.process_audio_file_input(self.audio_file_path)
|
||||
|
||||
async def process_text_input(self) -> None:
|
||||
# 程序启动后先静音2秒,确保系统稳定
|
||||
print("文本模式:程序启动,先静音2秒确保系统稳定...")
|
||||
with self.audio_queue_lock:
|
||||
self.is_recording_paused = True
|
||||
self.is_playing_audio = True # 标记正在播放
|
||||
|
||||
# 发送2秒静音数据,确保管道清理
|
||||
silence_data = b'\x00' * config.input_audio_config["chunk"]
|
||||
for i in range(20): # 2秒 = 20 * 100ms
|
||||
await self.client.task_request(silence_data)
|
||||
await asyncio.sleep(0.1)
|
||||
if i % 10 == 0: # 每秒打印一次进度
|
||||
print(f"文本模式:静音中... {i//10 + 1}/2秒")
|
||||
|
||||
print("文本模式:静音完成,准备 say hello")
|
||||
|
||||
# say hello 前确保录音仍处于暂停状态
|
||||
with self.audio_queue_lock:
|
||||
self.is_recording_paused = True
|
||||
self.is_playing_audio = True # 标记正在播放
|
||||
print("文本模式:准备 say hello,确保录音暂停")
|
||||
|
||||
await self.client.say_hello()
|
||||
await self.say_hello_over_event.wait()
|
||||
|
||||
"""主逻辑:处理文本输入和WebSocket通信"""
|
||||
# 确保连接最终关闭
|
||||
try:
|
||||
# 启动输入监听线程
|
||||
input_queue = queue.Queue()
|
||||
input_thread = threading.Thread(target=self.input_listener, args=(input_queue,), daemon=True)
|
||||
input_thread.start()
|
||||
# 主循环:处理输入和上下文结束
|
||||
while self.is_running:
|
||||
try:
|
||||
# 检查是否有输入(非阻塞)
|
||||
input_str = input_queue.get_nowait()
|
||||
if input_str is None:
|
||||
# 输入流关闭
|
||||
print("Input channel closed")
|
||||
break
|
||||
if input_str:
|
||||
# 发送输入内容
|
||||
await self.client.chat_text_query(input_str)
|
||||
except queue.Empty:
|
||||
# 无输入时短暂休眠
|
||||
await asyncio.sleep(0.1)
|
||||
except Exception as e:
|
||||
print(f"Main loop error: {e}")
|
||||
break
|
||||
finally:
|
||||
print("exit text input")
|
||||
|
||||
def input_listener(self, input_queue: queue.Queue) -> None:
|
||||
"""在单独线程中监听标准输入"""
|
||||
print("Start listening for input")
|
||||
try:
|
||||
while True:
|
||||
# 读取标准输入(阻塞操作)
|
||||
line = sys.stdin.readline()
|
||||
if not line:
|
||||
# 输入流关闭
|
||||
input_queue.put(None)
|
||||
break
|
||||
input_str = line.strip()
|
||||
input_queue.put(input_str)
|
||||
except Exception as e:
|
||||
print(f"Input listener error: {e}")
|
||||
input_queue.put(None)
|
||||
|
||||
async def process_audio_file_input(self, audio_file_path: str) -> None:
|
||||
# 读取WAV文件
|
||||
with wave.open(audio_file_path, 'rb') as wf:
|
||||
chunk_size = config.input_audio_config["chunk"]
|
||||
framerate = wf.getframerate() # 采样率(如16000Hz)
|
||||
# 时长 = chunkSize(帧数) ÷ 采样率(帧/秒)
|
||||
sleep_seconds = chunk_size / framerate
|
||||
print(f"开始处理音频文件: {audio_file_path}")
|
||||
|
||||
# 分块读取并发送音频数据
|
||||
while True:
|
||||
audio_data = wf.readframes(chunk_size)
|
||||
if not audio_data:
|
||||
break # 文件读取完毕
|
||||
|
||||
await self.client.task_request(audio_data)
|
||||
# sleep与chunk对应的音频时长一致,模拟实时输入
|
||||
await asyncio.sleep(sleep_seconds)
|
||||
|
||||
print(f"音频文件处理完成,等待服务器响应...")
|
||||
|
||||
async def process_silence_audio(self) -> None:
|
||||
"""发送静音音频"""
|
||||
silence_data = b'\x00' * 320
|
||||
await self.client.task_request(silence_data)
|
||||
|
||||
async def process_microphone_input(self) -> None:
|
||||
"""处理麦克风输入"""
|
||||
stream = self.audio_device.open_input_stream()
|
||||
print("已打开麦克风,请讲话...")
|
||||
print("音频处理已启动,播放时将发送静音数据避免回声")
|
||||
|
||||
# 程序启动后先静音2秒,确保系统稳定
|
||||
print("程序启动,先静音2秒确保系统稳定...")
|
||||
with self.audio_queue_lock:
|
||||
self.is_recording_paused = True
|
||||
self.is_playing_audio = True # 标记正在播放
|
||||
|
||||
# 发送2秒静音数据,确保管道清理
|
||||
silence_data = b'\x00' * config.input_audio_config["chunk"]
|
||||
for i in range(20): # 2秒 = 20 * 100ms
|
||||
await self.client.task_request(silence_data)
|
||||
await asyncio.sleep(0.1)
|
||||
if i % 10 == 0: # 每秒打印一次进度
|
||||
print(f"静音中... {i//10 + 1}/2秒")
|
||||
|
||||
print("静音完成,准备 say hello")
|
||||
|
||||
# say hello 前确保录音仍处于暂停状态
|
||||
with self.audio_queue_lock:
|
||||
self.is_recording_paused = True
|
||||
self.is_playing_audio = True # 标记正在播放
|
||||
print("准备 say hello,确保录音暂停")
|
||||
|
||||
await self.client.say_hello()
|
||||
await self.say_hello_over_event.wait()
|
||||
|
||||
# 注意:不立即恢复录音状态,等待音频实际播放完成
|
||||
# 录音状态将由音频播放线程在播放超时后自动恢复
|
||||
print("say hello 请求完成,等待音频播放结束...")
|
||||
|
||||
# 创建静音数据
|
||||
silence_data = b'\x00' * config.input_audio_config["chunk"]
|
||||
last_silence_time = time.time()
|
||||
|
||||
# say hello 期间的特殊处理:确保完全静音
|
||||
say_hello_silence_sent = False
|
||||
|
||||
while self.is_recording:
|
||||
try:
|
||||
current_time = time.time()
|
||||
|
||||
# 强制静音模式检查:包括回声抑制窗口期
|
||||
with self.audio_queue_lock:
|
||||
should_force_silence = (self.force_silence_mode or
|
||||
(self.echo_suppression_start_time > 0 and
|
||||
current_time - self.echo_suppression_start_time < 3.0) or # 3秒回声抑制窗口
|
||||
self.is_playing_audio or
|
||||
not self.say_hello_completed)
|
||||
|
||||
if should_force_silence:
|
||||
# 强制静音模式:完全停止任何音频录制
|
||||
if current_time - last_silence_time > 0.05: # 每50ms发送一次
|
||||
await self.client.task_request(silence_data)
|
||||
last_silence_time = current_time
|
||||
|
||||
# 调试信息
|
||||
if not hasattr(self, 'last_silence_debug_time') or current_time - self.last_silence_debug_time > 2:
|
||||
mode_desc = []
|
||||
if self.force_silence_mode:
|
||||
mode_desc.append("强制静音")
|
||||
if self.is_playing_audio:
|
||||
mode_desc.append("播放中")
|
||||
if not self.say_hello_completed:
|
||||
mode_desc.append("say_hello")
|
||||
if self.echo_suppression_start_time > 0 and current_time - self.echo_suppression_start_time < 3.0:
|
||||
mode_desc.append("回声抑制")
|
||||
|
||||
print(f"强制静音模式: {', '.join(mode_desc)}")
|
||||
self.last_silence_debug_time = current_time
|
||||
|
||||
await asyncio.sleep(0.01)
|
||||
continue
|
||||
|
||||
# 检查是否需要发送静音数据(由播放线程触发)- 最高优先级
|
||||
if self.should_send_silence:
|
||||
with self.audio_queue_lock:
|
||||
self.should_send_silence = False
|
||||
# 获取需要发送的静音数据数量
|
||||
count = self.silence_send_count
|
||||
self.silence_send_count = 0
|
||||
|
||||
# 批量发送静音数据
|
||||
if count > 1:
|
||||
print(f"立即清理录音管道,批量发送{count}组静音数据")
|
||||
for i in range(count):
|
||||
await self.client.task_request(silence_data)
|
||||
await asyncio.sleep(0.005) # 短暂间隔确保发送成功
|
||||
else:
|
||||
await self.client.task_request(silence_data)
|
||||
print("立即清理录音管道,发送静音数据")
|
||||
|
||||
last_silence_time = current_time
|
||||
await asyncio.sleep(0.01)
|
||||
continue
|
||||
|
||||
# 检查录音是否被暂停
|
||||
with self.audio_queue_lock:
|
||||
should_pause_recording = self.is_recording_paused
|
||||
# 检查是否刚刚进入暂停状态
|
||||
just_paused = should_pause_recording and hasattr(self, 'last_recording_state') and self.last_recording_state != should_pause_recording
|
||||
self.last_recording_state = should_pause_recording
|
||||
|
||||
if should_pause_recording:
|
||||
# 播放期间:完全停止录音,只发送静音数据
|
||||
if just_paused or current_time - last_silence_time > 0.1: # 刚暂停或每100ms发送一次静音数据
|
||||
await self.client.task_request(silence_data)
|
||||
last_silence_time = current_time
|
||||
if just_paused:
|
||||
print("刚进入暂停状态,立即发送静音数据清理管道")
|
||||
# 每5秒打印一次状态,避免过多日志
|
||||
elif not hasattr(self, 'last_silence_log_time') or current_time - self.last_silence_log_time > 5:
|
||||
print("正在播放音频,发送静音数据中...")
|
||||
self.last_silence_log_time = current_time
|
||||
await asyncio.sleep(0.01)
|
||||
continue
|
||||
|
||||
# 非播放期间:正常录音
|
||||
last_silence_time = current_time
|
||||
|
||||
# 添加exception_on_overflow=False参数来忽略溢出错误
|
||||
audio_data = stream.read(config.input_audio_config["chunk"], exception_on_overflow=False)
|
||||
|
||||
# 在发送前再次检查是否应该发送静音数据(最后一道防线)
|
||||
with self.audio_queue_lock:
|
||||
if self.is_recording_paused or self.is_playing_audio:
|
||||
# 如果处于暂停状态,丢弃这个音频数据并发送静音
|
||||
save_input_pcm_to_wav(silence_data, "input.pcm") # 保存静音数据用于调试
|
||||
await self.client.task_request(silence_data)
|
||||
# 每50次打印一次日志,避免过多输出
|
||||
if not hasattr(self, 'pause_discard_count') or self.pause_discard_count % 50 == 0:
|
||||
print(f"暂停期间丢弃音频数据,发送静音数据 (次数: {getattr(self, 'pause_discard_count', 0) + 1})")
|
||||
self.pause_discard_count = getattr(self, 'pause_discard_count', 0) + 1
|
||||
await asyncio.sleep(0.01)
|
||||
continue
|
||||
|
||||
# 直接发送所有音频数据,不进行静音检测
|
||||
save_input_pcm_to_wav(audio_data, "input.pcm")
|
||||
await self.client.task_request(audio_data)
|
||||
|
||||
await asyncio.sleep(0.01) # 避免CPU过度使用
|
||||
except Exception as e:
|
||||
print(f"读取麦克风数据出错: {e}")
|
||||
await asyncio.sleep(0.1) # 给系统一些恢复时间
|
||||
|
||||
async def start(self) -> None:
|
||||
"""启动对话会话"""
|
||||
try:
|
||||
await self.client.connect()
|
||||
|
||||
if self.mod == "text":
|
||||
asyncio.create_task(self.process_text_input())
|
||||
asyncio.create_task(self.receive_loop())
|
||||
while self.is_running:
|
||||
await asyncio.sleep(0.1)
|
||||
else:
|
||||
if self.is_audio_file_input:
|
||||
asyncio.create_task(self.process_audio_file())
|
||||
await self.receive_loop()
|
||||
else:
|
||||
asyncio.create_task(self.process_microphone_input())
|
||||
asyncio.create_task(self.receive_loop())
|
||||
while self.is_running:
|
||||
await asyncio.sleep(0.1)
|
||||
|
||||
await self.client.finish_session()
|
||||
while not self.is_session_finished:
|
||||
await asyncio.sleep(0.1)
|
||||
await self.client.finish_connection()
|
||||
await asyncio.sleep(0.1)
|
||||
await self.client.close()
|
||||
print(f"dialog request logid: {self.client.logid}, chat mod: {self.mod}")
|
||||
save_output_to_file(self.audio_buffer, "output.pcm")
|
||||
except Exception as e:
|
||||
print(f"会话错误: {e}")
|
||||
finally:
|
||||
if not self.is_audio_file_input:
|
||||
self.audio_device.cleanup()
|
||||
|
||||
|
||||
def save_input_pcm_to_wav(pcm_data: bytes, filename: str) -> None:
|
||||
"""保存PCM数据为WAV文件"""
|
||||
with wave.open(filename, 'wb') as wf:
|
||||
wf.setnchannels(config.input_audio_config["channels"])
|
||||
wf.setsampwidth(2) # paInt16 = 2 bytes
|
||||
wf.setframerate(config.input_audio_config["sample_rate"])
|
||||
wf.writeframes(pcm_data)
|
||||
|
||||
|
||||
def save_output_to_file(audio_data: bytes, filename: str) -> None:
|
||||
"""保存原始PCM音频数据到文件"""
|
||||
if not audio_data:
|
||||
print("No audio data to save.")
|
||||
return
|
||||
try:
|
||||
with open(filename, 'wb') as f:
|
||||
f.write(audio_data)
|
||||
except IOError as e:
|
||||
print(f"Failed to save pcm file: {e}")
|
||||
@ -1,60 +0,0 @@
|
||||
import uuid
|
||||
|
||||
import pyaudio
|
||||
|
||||
# 配置信息
|
||||
ws_connect_config = {
|
||||
"base_url": "wss://openspeech.bytedance.com/api/v3/realtime/dialogue",
|
||||
"headers": {
|
||||
"X-Api-App-ID": "8718217928",
|
||||
"X-Api-Access-Key": "ynJMX-5ix1FsJvswC9KTNlGUdubcchqc",
|
||||
"X-Api-Resource-Id": "volc.speech.dialog", # 固定值
|
||||
"X-Api-App-Key": "PlgvMymc7f3tQnJ6", # 固定值
|
||||
"X-Api-Connect-Id": str(uuid.uuid4()),
|
||||
},
|
||||
}
|
||||
|
||||
start_session_req = {
|
||||
"asr": {
|
||||
"extra": {
|
||||
"end_smooth_window_ms": 1500,
|
||||
},
|
||||
},
|
||||
"tts": {
|
||||
"speaker": "zh_female_vv_jupiter_bigtts",
|
||||
# "speaker": "S_XXXXXX", // 指定自定义的复刻音色,需要填下character_manifest
|
||||
# "speaker": "ICL_zh_female_aojiaonvyou_tob" // 指定官方复刻音色,不需要填character_manifest
|
||||
"audio_config": {"channel": 1, "format": "pcm", "sample_rate": 24000},
|
||||
},
|
||||
"dialog": {
|
||||
"bot_name": "豆包",
|
||||
"system_role": "你使用活泼灵动的女声,性格开朗,热爱生活。",
|
||||
"speaking_style": "你的说话风格简洁明了,语速适中,语调自然。",
|
||||
# "character_manifest": "外貌与穿着\n26岁,短发干净利落,眉眼分明,笑起来露出整齐有力的牙齿。体态挺拔,肌肉线条不夸张但明显。常穿简单的衬衫或夹克,看似随意,但每件衣服都干净整洁,给人一种干练可靠的感觉。平时冷峻,眼神锐利,专注时让人不自觉紧张。\n\n性格特点\n平时话不多,不喜欢多说废话,通常用“嗯”或者短句带过。但内心极为细腻,特别在意身边人的感受,只是不轻易表露。嘴硬是常态,“少管我”是他的常用台词,但会悄悄做些体贴的事情,比如把对方喜欢的饮料放在手边。战斗或训练后常说“没事”,但动作中透露出疲惫,习惯用小动作缓解身体酸痛。\n性格上坚毅果断,但不会冲动,做事有条理且有原则。\n\n常用表达方式与口头禅\n\t•\t认可对方时:\n“行吧,这次算你靠谱。”(声音稳重,手却不自觉放松一下,心里松口气)\n\t•\t关心对方时:\n“快点回去,别磨蹭。”(语气干脆,但眼神一直追着对方的背影)\n\t•\t想了解情况时:\n“刚刚……你看到那道光了吗?”(话语随意,手指敲着桌面,但内心紧张,小心隐藏身份)",
|
||||
"location": {
|
||||
"city": "北京",
|
||||
},
|
||||
"extra": {
|
||||
"strict_audit": False,
|
||||
"audit_response": "支持客户自定义安全审核回复话术。",
|
||||
"recv_timeout": 10,
|
||||
"input_mod": "audio",
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
input_audio_config = {
|
||||
"chunk": 3200,
|
||||
"format": "pcm",
|
||||
"channels": 1,
|
||||
"sample_rate": 16000,
|
||||
"bit_size": pyaudio.paInt16,
|
||||
}
|
||||
|
||||
output_audio_config = {
|
||||
"chunk": 3200,
|
||||
"format": "pcm",
|
||||
"channels": 1,
|
||||
"sample_rate": 24000,
|
||||
"bit_size": pyaudio.paFloat32,
|
||||
}
|
||||
BIN
doubao/input.pcm
BIN
doubao/input.pcm
Binary file not shown.
@ -1,20 +0,0 @@
|
||||
import asyncio
|
||||
import argparse
|
||||
|
||||
import config
|
||||
from audio_manager import DialogSession
|
||||
|
||||
async def main() -> None:
|
||||
parser = argparse.ArgumentParser(description="Real-time Dialog Client")
|
||||
parser.add_argument("--format", type=str, default="pcm", help="The audio format (e.g., pcm, pcm_s16le).")
|
||||
parser.add_argument("--audio", type=str, default="", help="audio file send to server, if not set, will use microphone input.")
|
||||
parser.add_argument("--mod",type=str,default="audio",help="Use mod to select plain text input mode or audio mode, the default is audio mode")
|
||||
parser.add_argument("--recv_timeout",type=int,default=10,help="Timeout for receiving messages,value range [10,120]")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
session = DialogSession(ws_config=config.ws_connect_config, output_audio_format=args.format, audio_file_path=args.audio,mod=args.mod,recv_timeout=args.recv_timeout)
|
||||
await session.start()
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
Binary file not shown.
@ -1,135 +0,0 @@
|
||||
import gzip
|
||||
import json
|
||||
|
||||
PROTOCOL_VERSION = 0b0001
|
||||
DEFAULT_HEADER_SIZE = 0b0001
|
||||
|
||||
PROTOCOL_VERSION_BITS = 4
|
||||
HEADER_BITS = 4
|
||||
MESSAGE_TYPE_BITS = 4
|
||||
MESSAGE_TYPE_SPECIFIC_FLAGS_BITS = 4
|
||||
MESSAGE_SERIALIZATION_BITS = 4
|
||||
MESSAGE_COMPRESSION_BITS = 4
|
||||
RESERVED_BITS = 8
|
||||
|
||||
# Message Type:
|
||||
CLIENT_FULL_REQUEST = 0b0001
|
||||
CLIENT_AUDIO_ONLY_REQUEST = 0b0010
|
||||
|
||||
SERVER_FULL_RESPONSE = 0b1001
|
||||
SERVER_ACK = 0b1011
|
||||
SERVER_ERROR_RESPONSE = 0b1111
|
||||
|
||||
# Message Type Specific Flags
|
||||
NO_SEQUENCE = 0b0000 # no check sequence
|
||||
POS_SEQUENCE = 0b0001
|
||||
NEG_SEQUENCE = 0b0010
|
||||
NEG_SEQUENCE_1 = 0b0011
|
||||
|
||||
MSG_WITH_EVENT = 0b0100
|
||||
|
||||
# Message Serialization
|
||||
NO_SERIALIZATION = 0b0000
|
||||
JSON = 0b0001
|
||||
THRIFT = 0b0011
|
||||
CUSTOM_TYPE = 0b1111
|
||||
|
||||
# Message Compression
|
||||
NO_COMPRESSION = 0b0000
|
||||
GZIP = 0b0001
|
||||
CUSTOM_COMPRESSION = 0b1111
|
||||
|
||||
|
||||
def generate_header(
|
||||
version=PROTOCOL_VERSION,
|
||||
message_type=CLIENT_FULL_REQUEST,
|
||||
message_type_specific_flags=MSG_WITH_EVENT,
|
||||
serial_method=JSON,
|
||||
compression_type=GZIP,
|
||||
reserved_data=0x00,
|
||||
extension_header=bytes()
|
||||
):
|
||||
"""
|
||||
protocol_version(4 bits), header_size(4 bits),
|
||||
message_type(4 bits), message_type_specific_flags(4 bits)
|
||||
serialization_method(4 bits) message_compression(4 bits)
|
||||
reserved (8bits) 保留字段
|
||||
header_extensions 扩展头(大小等于 8 * 4 * (header_size - 1) )
|
||||
"""
|
||||
header = bytearray()
|
||||
header_size = int(len(extension_header) / 4) + 1
|
||||
header.append((version << 4) | header_size)
|
||||
header.append((message_type << 4) | message_type_specific_flags)
|
||||
header.append((serial_method << 4) | compression_type)
|
||||
header.append(reserved_data)
|
||||
header.extend(extension_header)
|
||||
return header
|
||||
|
||||
|
||||
def parse_response(res):
|
||||
"""
|
||||
- header
|
||||
- (4bytes)header
|
||||
- (4bits)version(v1) + (4bits)header_size
|
||||
- (4bits)messageType + (4bits)messageTypeFlags
|
||||
-- 0001 CompleteClient | -- 0001 hasSequence
|
||||
-- 0010 audioonly | -- 0010 isTailPacket
|
||||
| -- 0100 hasEvent
|
||||
- (4bits)payloadFormat + (4bits)compression
|
||||
- (8bits) reserve
|
||||
- payload
|
||||
- [optional 4 bytes] event
|
||||
- [optional] session ID
|
||||
-- (4 bytes)session ID len
|
||||
-- session ID data
|
||||
- (4 bytes)data len
|
||||
- data
|
||||
"""
|
||||
if isinstance(res, str):
|
||||
return {}
|
||||
protocol_version = res[0] >> 4
|
||||
header_size = res[0] & 0x0f
|
||||
message_type = res[1] >> 4
|
||||
message_type_specific_flags = res[1] & 0x0f
|
||||
serialization_method = res[2] >> 4
|
||||
message_compression = res[2] & 0x0f
|
||||
reserved = res[3]
|
||||
header_extensions = res[4:header_size * 4]
|
||||
payload = res[header_size * 4:]
|
||||
result = {}
|
||||
payload_msg = None
|
||||
payload_size = 0
|
||||
start = 0
|
||||
if message_type == SERVER_FULL_RESPONSE or message_type == SERVER_ACK:
|
||||
result['message_type'] = 'SERVER_FULL_RESPONSE'
|
||||
if message_type == SERVER_ACK:
|
||||
result['message_type'] = 'SERVER_ACK'
|
||||
if message_type_specific_flags & NEG_SEQUENCE > 0:
|
||||
result['seq'] = int.from_bytes(payload[:4], "big", signed=False)
|
||||
start += 4
|
||||
if message_type_specific_flags & MSG_WITH_EVENT > 0:
|
||||
result['event'] = int.from_bytes(payload[:4], "big", signed=False)
|
||||
start += 4
|
||||
payload = payload[start:]
|
||||
session_id_size = int.from_bytes(payload[:4], "big", signed=True)
|
||||
session_id = payload[4:session_id_size+4]
|
||||
result['session_id'] = str(session_id)
|
||||
payload = payload[4 + session_id_size:]
|
||||
payload_size = int.from_bytes(payload[:4], "big", signed=False)
|
||||
payload_msg = payload[4:]
|
||||
elif message_type == SERVER_ERROR_RESPONSE:
|
||||
code = int.from_bytes(payload[:4], "big", signed=False)
|
||||
result['code'] = code
|
||||
payload_size = int.from_bytes(payload[4:8], "big", signed=False)
|
||||
payload_msg = payload[8:]
|
||||
if payload_msg is None:
|
||||
return result
|
||||
if message_compression == GZIP:
|
||||
payload_msg = gzip.decompress(payload_msg)
|
||||
if serialization_method == JSON:
|
||||
payload_msg = json.loads(str(payload_msg, "utf-8"))
|
||||
elif serialization_method != NO_SERIALIZATION:
|
||||
payload_msg = str(payload_msg, "utf-8")
|
||||
result['payload_msg'] = payload_msg
|
||||
result['payload_size'] = payload_size
|
||||
return result
|
||||
@ -1,187 +0,0 @@
|
||||
import gzip
|
||||
import json
|
||||
from typing import Dict, Any
|
||||
|
||||
import websockets
|
||||
|
||||
import config
|
||||
import protocol
|
||||
|
||||
|
||||
class RealtimeDialogClient:
|
||||
def __init__(self, config: Dict[str, Any], session_id: str, output_audio_format: str = "pcm",
|
||||
mod: str = "audio", recv_timeout: int = 10) -> None:
|
||||
self.config = config
|
||||
self.logid = ""
|
||||
self.session_id = session_id
|
||||
self.output_audio_format = output_audio_format
|
||||
self.mod = mod
|
||||
self.recv_timeout = recv_timeout
|
||||
self.ws = None
|
||||
|
||||
async def connect(self) -> None:
|
||||
"""建立WebSocket连接"""
|
||||
print(f"url: {self.config['base_url']}, headers: {self.config['headers']}")
|
||||
# For older websockets versions, use additional_headers instead of extra_headers
|
||||
self.ws = await websockets.connect(
|
||||
self.config['base_url'],
|
||||
additional_headers=self.config['headers'],
|
||||
ping_interval=None
|
||||
)
|
||||
# In older websockets versions, response headers are accessed differently
|
||||
if hasattr(self.ws, 'response_headers'):
|
||||
self.logid = self.ws.response_headers.get("X-Tt-Logid")
|
||||
elif hasattr(self.ws, 'headers'):
|
||||
self.logid = self.ws.headers.get("X-Tt-Logid")
|
||||
else:
|
||||
self.logid = "unknown"
|
||||
print(f"dialog server response logid: {self.logid}")
|
||||
|
||||
# StartConnection request
|
||||
start_connection_request = bytearray(protocol.generate_header())
|
||||
start_connection_request.extend(int(1).to_bytes(4, 'big'))
|
||||
payload_bytes = str.encode("{}")
|
||||
payload_bytes = gzip.compress(payload_bytes)
|
||||
start_connection_request.extend((len(payload_bytes)).to_bytes(4, 'big'))
|
||||
start_connection_request.extend(payload_bytes)
|
||||
await self.ws.send(start_connection_request)
|
||||
response = await self.ws.recv()
|
||||
print(f"StartConnection response: {protocol.parse_response(response)}")
|
||||
|
||||
# 扩大这个参数,可以在一段时间内保持静默,主要用于text模式,参数范围[10,120]
|
||||
config.start_session_req["dialog"]["extra"]["recv_timeout"] = self.recv_timeout
|
||||
# 这个参数,在text或者audio_file模式,可以在一段时间内保持静默
|
||||
config.start_session_req["dialog"]["extra"]["input_mod"] = self.mod
|
||||
# StartSession request
|
||||
if self.output_audio_format == "pcm_s16le":
|
||||
config.start_session_req["tts"]["audio_config"]["format"] = "pcm_s16le"
|
||||
request_params = config.start_session_req
|
||||
payload_bytes = str.encode(json.dumps(request_params))
|
||||
payload_bytes = gzip.compress(payload_bytes)
|
||||
start_session_request = bytearray(protocol.generate_header())
|
||||
start_session_request.extend(int(100).to_bytes(4, 'big'))
|
||||
start_session_request.extend((len(self.session_id)).to_bytes(4, 'big'))
|
||||
start_session_request.extend(str.encode(self.session_id))
|
||||
start_session_request.extend((len(payload_bytes)).to_bytes(4, 'big'))
|
||||
start_session_request.extend(payload_bytes)
|
||||
await self.ws.send(start_session_request)
|
||||
response = await self.ws.recv()
|
||||
print(f"StartSession response: {protocol.parse_response(response)}")
|
||||
|
||||
async def say_hello(self) -> None:
|
||||
"""发送Hello消息"""
|
||||
payload = {
|
||||
"content": "你好,我是豆包,有什么可以帮助你的?",
|
||||
}
|
||||
hello_request = bytearray(protocol.generate_header())
|
||||
hello_request.extend(int(300).to_bytes(4, 'big'))
|
||||
payload_bytes = str.encode(json.dumps(payload))
|
||||
payload_bytes = gzip.compress(payload_bytes)
|
||||
hello_request.extend((len(self.session_id)).to_bytes(4, 'big'))
|
||||
hello_request.extend(str.encode(self.session_id))
|
||||
hello_request.extend((len(payload_bytes)).to_bytes(4, 'big'))
|
||||
hello_request.extend(payload_bytes)
|
||||
await self.ws.send(hello_request)
|
||||
|
||||
async def chat_text_query(self, content: str) -> None:
|
||||
"""发送Chat Text Query消息"""
|
||||
payload = {
|
||||
"content": content,
|
||||
}
|
||||
chat_text_query_request = bytearray(protocol.generate_header())
|
||||
chat_text_query_request.extend(int(501).to_bytes(4, 'big'))
|
||||
payload_bytes = str.encode(json.dumps(payload))
|
||||
payload_bytes = gzip.compress(payload_bytes)
|
||||
chat_text_query_request.extend((len(self.session_id)).to_bytes(4, 'big'))
|
||||
chat_text_query_request.extend(str.encode(self.session_id))
|
||||
chat_text_query_request.extend((len(payload_bytes)).to_bytes(4, 'big'))
|
||||
chat_text_query_request.extend(payload_bytes)
|
||||
await self.ws.send(chat_text_query_request)
|
||||
|
||||
async def chat_tts_text(self, is_user_querying: bool, start: bool, end: bool, content: str) -> None:
|
||||
if is_user_querying:
|
||||
return
|
||||
"""发送Chat TTS Text消息"""
|
||||
payload = {
|
||||
"start": start,
|
||||
"end": end,
|
||||
"content": content,
|
||||
}
|
||||
print(f"ChatTTSTextRequest payload: {payload}")
|
||||
payload_bytes = str.encode(json.dumps(payload))
|
||||
payload_bytes = gzip.compress(payload_bytes)
|
||||
|
||||
chat_tts_text_request = bytearray(protocol.generate_header())
|
||||
chat_tts_text_request.extend(int(500).to_bytes(4, 'big'))
|
||||
chat_tts_text_request.extend((len(self.session_id)).to_bytes(4, 'big'))
|
||||
chat_tts_text_request.extend(str.encode(self.session_id))
|
||||
chat_tts_text_request.extend((len(payload_bytes)).to_bytes(4, 'big'))
|
||||
chat_tts_text_request.extend(payload_bytes)
|
||||
await self.ws.send(chat_tts_text_request)
|
||||
|
||||
async def chat_rag_text(self, is_user_querying: bool, external_rag: str) -> None:
|
||||
if is_user_querying:
|
||||
return
|
||||
"""发送Chat TTS Text消息"""
|
||||
payload = {
|
||||
"external_rag": external_rag,
|
||||
}
|
||||
print(f"ChatRAGTextRequest payload: {payload}")
|
||||
payload_bytes = str.encode(json.dumps(payload))
|
||||
payload_bytes = gzip.compress(payload_bytes)
|
||||
|
||||
chat_rag_text_request = bytearray(protocol.generate_header())
|
||||
chat_rag_text_request.extend(int(502).to_bytes(4, 'big'))
|
||||
chat_rag_text_request.extend((len(self.session_id)).to_bytes(4, 'big'))
|
||||
chat_rag_text_request.extend(str.encode(self.session_id))
|
||||
chat_rag_text_request.extend((len(payload_bytes)).to_bytes(4, 'big'))
|
||||
chat_rag_text_request.extend(payload_bytes)
|
||||
await self.ws.send(chat_rag_text_request)
|
||||
|
||||
async def task_request(self, audio: bytes) -> None:
|
||||
task_request = bytearray(
|
||||
protocol.generate_header(message_type=protocol.CLIENT_AUDIO_ONLY_REQUEST,
|
||||
serial_method=protocol.NO_SERIALIZATION))
|
||||
task_request.extend(int(200).to_bytes(4, 'big'))
|
||||
task_request.extend((len(self.session_id)).to_bytes(4, 'big'))
|
||||
task_request.extend(str.encode(self.session_id))
|
||||
payload_bytes = gzip.compress(audio)
|
||||
task_request.extend((len(payload_bytes)).to_bytes(4, 'big')) # payload size(4 bytes)
|
||||
task_request.extend(payload_bytes)
|
||||
await self.ws.send(task_request)
|
||||
|
||||
async def receive_server_response(self) -> Dict[str, Any]:
|
||||
try:
|
||||
response = await self.ws.recv()
|
||||
data = protocol.parse_response(response)
|
||||
return data
|
||||
except Exception as e:
|
||||
raise Exception(f"Failed to receive message: {e}")
|
||||
|
||||
async def finish_session(self):
|
||||
finish_session_request = bytearray(protocol.generate_header())
|
||||
finish_session_request.extend(int(102).to_bytes(4, 'big'))
|
||||
payload_bytes = str.encode("{}")
|
||||
payload_bytes = gzip.compress(payload_bytes)
|
||||
finish_session_request.extend((len(self.session_id)).to_bytes(4, 'big'))
|
||||
finish_session_request.extend(str.encode(self.session_id))
|
||||
finish_session_request.extend((len(payload_bytes)).to_bytes(4, 'big'))
|
||||
finish_session_request.extend(payload_bytes)
|
||||
await self.ws.send(finish_session_request)
|
||||
|
||||
async def finish_connection(self):
|
||||
finish_connection_request = bytearray(protocol.generate_header())
|
||||
finish_connection_request.extend(int(2).to_bytes(4, 'big'))
|
||||
payload_bytes = str.encode("{}")
|
||||
payload_bytes = gzip.compress(payload_bytes)
|
||||
finish_connection_request.extend((len(payload_bytes)).to_bytes(4, 'big'))
|
||||
finish_connection_request.extend(payload_bytes)
|
||||
await self.ws.send(finish_connection_request)
|
||||
response = await self.ws.recv()
|
||||
print(f"FinishConnection response: {protocol.parse_response(response)}")
|
||||
|
||||
async def close(self) -> None:
|
||||
"""关闭WebSocket连接"""
|
||||
if self.ws:
|
||||
print(f"Closing WebSocket connection...")
|
||||
await self.ws.close()
|
||||
@ -1,4 +0,0 @@
|
||||
pyaudio
|
||||
websockets
|
||||
dataclasses==0.8; python_version < "3.7"
|
||||
typing-extensions==4.7.1; python_version < "3.8"
|
||||
Binary file not shown.
BIN
doubao/.DS_Store → model/.DS_Store
vendored
BIN
doubao/.DS_Store → model/.DS_Store
vendored
Binary file not shown.
6
model/README
Normal file
6
model/README
Normal file
@ -0,0 +1,6 @@
|
||||
Chinese Vosk model for mobile
|
||||
|
||||
CER results
|
||||
|
||||
23.54% speechio_02
|
||||
38.29% speechio_06
|
||||
BIN
model/am/final.mdl
Normal file
BIN
model/am/final.mdl
Normal file
Binary file not shown.
8
model/conf/mfcc.conf
Normal file
8
model/conf/mfcc.conf
Normal file
@ -0,0 +1,8 @@
|
||||
--use-energy=false
|
||||
--sample-frequency=16000
|
||||
--num-mel-bins=40
|
||||
--num-ceps=40
|
||||
--low-freq=40
|
||||
--high-freq=-200
|
||||
--allow-upsample=true
|
||||
--allow-downsample=true
|
||||
10
model/conf/model.conf
Normal file
10
model/conf/model.conf
Normal file
@ -0,0 +1,10 @@
|
||||
--min-active=200
|
||||
--max-active=5000
|
||||
--beam=12.0
|
||||
--lattice-beam=4.0
|
||||
--acoustic-scale=1.0
|
||||
--frame-subsampling-factor=3
|
||||
--endpoint.silence-phones=1:2:3:4:5:6:7:8:9:10
|
||||
--endpoint.rule2.min-trailing-silence=0.5
|
||||
--endpoint.rule3.min-trailing-silence=1.0
|
||||
--endpoint.rule4.min-trailing-silence=2.0
|
||||
BIN
model/graph/Gr.fst
Normal file
BIN
model/graph/Gr.fst
Normal file
Binary file not shown.
BIN
model/graph/HCLr.fst
Normal file
BIN
model/graph/HCLr.fst
Normal file
Binary file not shown.
39
model/graph/disambig_tid.int
Normal file
39
model/graph/disambig_tid.int
Normal file
@ -0,0 +1,39 @@
|
||||
11845
|
||||
11846
|
||||
11847
|
||||
11848
|
||||
11849
|
||||
11850
|
||||
11851
|
||||
11852
|
||||
11853
|
||||
11854
|
||||
11855
|
||||
11856
|
||||
11857
|
||||
11858
|
||||
11859
|
||||
11860
|
||||
11861
|
||||
11862
|
||||
11863
|
||||
11864
|
||||
11865
|
||||
11866
|
||||
11867
|
||||
11868
|
||||
11869
|
||||
11870
|
||||
11871
|
||||
11872
|
||||
11873
|
||||
11874
|
||||
11875
|
||||
11876
|
||||
11877
|
||||
11878
|
||||
11879
|
||||
11880
|
||||
11881
|
||||
11882
|
||||
11883
|
||||
646
model/graph/phones/word_boundary.int
Normal file
646
model/graph/phones/word_boundary.int
Normal file
@ -0,0 +1,646 @@
|
||||
1 nonword
|
||||
2 begin
|
||||
3 end
|
||||
4 internal
|
||||
5 singleton
|
||||
6 nonword
|
||||
7 begin
|
||||
8 end
|
||||
9 internal
|
||||
10 singleton
|
||||
11 begin
|
||||
12 end
|
||||
13 internal
|
||||
14 singleton
|
||||
15 begin
|
||||
16 end
|
||||
17 internal
|
||||
18 singleton
|
||||
19 begin
|
||||
20 end
|
||||
21 internal
|
||||
22 singleton
|
||||
23 begin
|
||||
24 end
|
||||
25 internal
|
||||
26 singleton
|
||||
27 begin
|
||||
28 end
|
||||
29 internal
|
||||
30 singleton
|
||||
31 begin
|
||||
32 end
|
||||
33 internal
|
||||
34 singleton
|
||||
35 begin
|
||||
36 end
|
||||
37 internal
|
||||
38 singleton
|
||||
39 begin
|
||||
40 end
|
||||
41 internal
|
||||
42 singleton
|
||||
43 begin
|
||||
44 end
|
||||
45 internal
|
||||
46 singleton
|
||||
47 begin
|
||||
48 end
|
||||
49 internal
|
||||
50 singleton
|
||||
51 begin
|
||||
52 end
|
||||
53 internal
|
||||
54 singleton
|
||||
55 begin
|
||||
56 end
|
||||
57 internal
|
||||
58 singleton
|
||||
59 begin
|
||||
60 end
|
||||
61 internal
|
||||
62 singleton
|
||||
63 begin
|
||||
64 end
|
||||
65 internal
|
||||
66 singleton
|
||||
67 begin
|
||||
68 end
|
||||
69 internal
|
||||
70 singleton
|
||||
71 begin
|
||||
72 end
|
||||
73 internal
|
||||
74 singleton
|
||||
75 begin
|
||||
76 end
|
||||
77 internal
|
||||
78 singleton
|
||||
79 begin
|
||||
80 end
|
||||
81 internal
|
||||
82 singleton
|
||||
83 begin
|
||||
84 end
|
||||
85 internal
|
||||
86 singleton
|
||||
87 begin
|
||||
88 end
|
||||
89 internal
|
||||
90 singleton
|
||||
91 begin
|
||||
92 end
|
||||
93 internal
|
||||
94 singleton
|
||||
95 begin
|
||||
96 end
|
||||
97 internal
|
||||
98 singleton
|
||||
99 begin
|
||||
100 end
|
||||
101 internal
|
||||
102 singleton
|
||||
103 begin
|
||||
104 end
|
||||
105 internal
|
||||
106 singleton
|
||||
107 begin
|
||||
108 end
|
||||
109 internal
|
||||
110 singleton
|
||||
111 begin
|
||||
112 end
|
||||
113 internal
|
||||
114 singleton
|
||||
115 begin
|
||||
116 end
|
||||
117 internal
|
||||
118 singleton
|
||||
119 begin
|
||||
120 end
|
||||
121 internal
|
||||
122 singleton
|
||||
123 begin
|
||||
124 end
|
||||
125 internal
|
||||
126 singleton
|
||||
127 begin
|
||||
128 end
|
||||
129 internal
|
||||
130 singleton
|
||||
131 begin
|
||||
132 end
|
||||
133 internal
|
||||
134 singleton
|
||||
135 begin
|
||||
136 end
|
||||
137 internal
|
||||
138 singleton
|
||||
139 begin
|
||||
140 end
|
||||
141 internal
|
||||
142 singleton
|
||||
143 begin
|
||||
144 end
|
||||
145 internal
|
||||
146 singleton
|
||||
147 begin
|
||||
148 end
|
||||
149 internal
|
||||
150 singleton
|
||||
151 begin
|
||||
152 end
|
||||
153 internal
|
||||
154 singleton
|
||||
155 begin
|
||||
156 end
|
||||
157 internal
|
||||
158 singleton
|
||||
159 begin
|
||||
160 end
|
||||
161 internal
|
||||
162 singleton
|
||||
163 begin
|
||||
164 end
|
||||
165 internal
|
||||
166 singleton
|
||||
167 begin
|
||||
168 end
|
||||
169 internal
|
||||
170 singleton
|
||||
171 begin
|
||||
172 end
|
||||
173 internal
|
||||
174 singleton
|
||||
175 begin
|
||||
176 end
|
||||
177 internal
|
||||
178 singleton
|
||||
179 begin
|
||||
180 end
|
||||
181 internal
|
||||
182 singleton
|
||||
183 begin
|
||||
184 end
|
||||
185 internal
|
||||
186 singleton
|
||||
187 begin
|
||||
188 end
|
||||
189 internal
|
||||
190 singleton
|
||||
191 begin
|
||||
192 end
|
||||
193 internal
|
||||
194 singleton
|
||||
195 begin
|
||||
196 end
|
||||
197 internal
|
||||
198 singleton
|
||||
199 begin
|
||||
200 end
|
||||
201 internal
|
||||
202 singleton
|
||||
203 begin
|
||||
204 end
|
||||
205 internal
|
||||
206 singleton
|
||||
207 begin
|
||||
208 end
|
||||
209 internal
|
||||
210 singleton
|
||||
211 begin
|
||||
212 end
|
||||
213 internal
|
||||
214 singleton
|
||||
215 begin
|
||||
216 end
|
||||
217 internal
|
||||
218 singleton
|
||||
219 begin
|
||||
220 end
|
||||
221 internal
|
||||
222 singleton
|
||||
223 begin
|
||||
224 end
|
||||
225 internal
|
||||
226 singleton
|
||||
227 begin
|
||||
228 end
|
||||
229 internal
|
||||
230 singleton
|
||||
231 begin
|
||||
232 end
|
||||
233 internal
|
||||
234 singleton
|
||||
235 begin
|
||||
236 end
|
||||
237 internal
|
||||
238 singleton
|
||||
239 begin
|
||||
240 end
|
||||
241 internal
|
||||
242 singleton
|
||||
243 begin
|
||||
244 end
|
||||
245 internal
|
||||
246 singleton
|
||||
247 begin
|
||||
248 end
|
||||
249 internal
|
||||
250 singleton
|
||||
251 begin
|
||||
252 end
|
||||
253 internal
|
||||
254 singleton
|
||||
255 begin
|
||||
256 end
|
||||
257 internal
|
||||
258 singleton
|
||||
259 begin
|
||||
260 end
|
||||
261 internal
|
||||
262 singleton
|
||||
263 begin
|
||||
264 end
|
||||
265 internal
|
||||
266 singleton
|
||||
267 begin
|
||||
268 end
|
||||
269 internal
|
||||
270 singleton
|
||||
271 begin
|
||||
272 end
|
||||
273 internal
|
||||
274 singleton
|
||||
275 begin
|
||||
276 end
|
||||
277 internal
|
||||
278 singleton
|
||||
279 begin
|
||||
280 end
|
||||
281 internal
|
||||
282 singleton
|
||||
283 begin
|
||||
284 end
|
||||
285 internal
|
||||
286 singleton
|
||||
287 begin
|
||||
288 end
|
||||
289 internal
|
||||
290 singleton
|
||||
291 begin
|
||||
292 end
|
||||
293 internal
|
||||
294 singleton
|
||||
295 begin
|
||||
296 end
|
||||
297 internal
|
||||
298 singleton
|
||||
299 begin
|
||||
300 end
|
||||
301 internal
|
||||
302 singleton
|
||||
303 begin
|
||||
304 end
|
||||
305 internal
|
||||
306 singleton
|
||||
307 begin
|
||||
308 end
|
||||
309 internal
|
||||
310 singleton
|
||||
311 begin
|
||||
312 end
|
||||
313 internal
|
||||
314 singleton
|
||||
315 begin
|
||||
316 end
|
||||
317 internal
|
||||
318 singleton
|
||||
319 begin
|
||||
320 end
|
||||
321 internal
|
||||
322 singleton
|
||||
323 begin
|
||||
324 end
|
||||
325 internal
|
||||
326 singleton
|
||||
327 begin
|
||||
328 end
|
||||
329 internal
|
||||
330 singleton
|
||||
331 begin
|
||||
332 end
|
||||
333 internal
|
||||
334 singleton
|
||||
335 begin
|
||||
336 end
|
||||
337 internal
|
||||
338 singleton
|
||||
339 begin
|
||||
340 end
|
||||
341 internal
|
||||
342 singleton
|
||||
343 begin
|
||||
344 end
|
||||
345 internal
|
||||
346 singleton
|
||||
347 begin
|
||||
348 end
|
||||
349 internal
|
||||
350 singleton
|
||||
351 begin
|
||||
352 end
|
||||
353 internal
|
||||
354 singleton
|
||||
355 begin
|
||||
356 end
|
||||
357 internal
|
||||
358 singleton
|
||||
359 begin
|
||||
360 end
|
||||
361 internal
|
||||
362 singleton
|
||||
363 begin
|
||||
364 end
|
||||
365 internal
|
||||
366 singleton
|
||||
367 begin
|
||||
368 end
|
||||
369 internal
|
||||
370 singleton
|
||||
371 begin
|
||||
372 end
|
||||
373 internal
|
||||
374 singleton
|
||||
375 begin
|
||||
376 end
|
||||
377 internal
|
||||
378 singleton
|
||||
379 begin
|
||||
380 end
|
||||
381 internal
|
||||
382 singleton
|
||||
383 begin
|
||||
384 end
|
||||
385 internal
|
||||
386 singleton
|
||||
387 begin
|
||||
388 end
|
||||
389 internal
|
||||
390 singleton
|
||||
391 begin
|
||||
392 end
|
||||
393 internal
|
||||
394 singleton
|
||||
395 begin
|
||||
396 end
|
||||
397 internal
|
||||
398 singleton
|
||||
399 begin
|
||||
400 end
|
||||
401 internal
|
||||
402 singleton
|
||||
403 begin
|
||||
404 end
|
||||
405 internal
|
||||
406 singleton
|
||||
407 begin
|
||||
408 end
|
||||
409 internal
|
||||
410 singleton
|
||||
411 begin
|
||||
412 end
|
||||
413 internal
|
||||
414 singleton
|
||||
415 begin
|
||||
416 end
|
||||
417 internal
|
||||
418 singleton
|
||||
419 begin
|
||||
420 end
|
||||
421 internal
|
||||
422 singleton
|
||||
423 begin
|
||||
424 end
|
||||
425 internal
|
||||
426 singleton
|
||||
427 begin
|
||||
428 end
|
||||
429 internal
|
||||
430 singleton
|
||||
431 begin
|
||||
432 end
|
||||
433 internal
|
||||
434 singleton
|
||||
435 begin
|
||||
436 end
|
||||
437 internal
|
||||
438 singleton
|
||||
439 begin
|
||||
440 end
|
||||
441 internal
|
||||
442 singleton
|
||||
443 begin
|
||||
444 end
|
||||
445 internal
|
||||
446 singleton
|
||||
447 begin
|
||||
448 end
|
||||
449 internal
|
||||
450 singleton
|
||||
451 begin
|
||||
452 end
|
||||
453 internal
|
||||
454 singleton
|
||||
455 begin
|
||||
456 end
|
||||
457 internal
|
||||
458 singleton
|
||||
459 begin
|
||||
460 end
|
||||
461 internal
|
||||
462 singleton
|
||||
463 begin
|
||||
464 end
|
||||
465 internal
|
||||
466 singleton
|
||||
467 begin
|
||||
468 end
|
||||
469 internal
|
||||
470 singleton
|
||||
471 begin
|
||||
472 end
|
||||
473 internal
|
||||
474 singleton
|
||||
475 begin
|
||||
476 end
|
||||
477 internal
|
||||
478 singleton
|
||||
479 begin
|
||||
480 end
|
||||
481 internal
|
||||
482 singleton
|
||||
483 begin
|
||||
484 end
|
||||
485 internal
|
||||
486 singleton
|
||||
487 begin
|
||||
488 end
|
||||
489 internal
|
||||
490 singleton
|
||||
491 begin
|
||||
492 end
|
||||
493 internal
|
||||
494 singleton
|
||||
495 begin
|
||||
496 end
|
||||
497 internal
|
||||
498 singleton
|
||||
499 begin
|
||||
500 end
|
||||
501 internal
|
||||
502 singleton
|
||||
503 begin
|
||||
504 end
|
||||
505 internal
|
||||
506 singleton
|
||||
507 begin
|
||||
508 end
|
||||
509 internal
|
||||
510 singleton
|
||||
511 begin
|
||||
512 end
|
||||
513 internal
|
||||
514 singleton
|
||||
515 begin
|
||||
516 end
|
||||
517 internal
|
||||
518 singleton
|
||||
519 begin
|
||||
520 end
|
||||
521 internal
|
||||
522 singleton
|
||||
523 begin
|
||||
524 end
|
||||
525 internal
|
||||
526 singleton
|
||||
527 begin
|
||||
528 end
|
||||
529 internal
|
||||
530 singleton
|
||||
531 begin
|
||||
532 end
|
||||
533 internal
|
||||
534 singleton
|
||||
535 begin
|
||||
536 end
|
||||
537 internal
|
||||
538 singleton
|
||||
539 begin
|
||||
540 end
|
||||
541 internal
|
||||
542 singleton
|
||||
543 begin
|
||||
544 end
|
||||
545 internal
|
||||
546 singleton
|
||||
547 begin
|
||||
548 end
|
||||
549 internal
|
||||
550 singleton
|
||||
551 begin
|
||||
552 end
|
||||
553 internal
|
||||
554 singleton
|
||||
555 begin
|
||||
556 end
|
||||
557 internal
|
||||
558 singleton
|
||||
559 begin
|
||||
560 end
|
||||
561 internal
|
||||
562 singleton
|
||||
563 begin
|
||||
564 end
|
||||
565 internal
|
||||
566 singleton
|
||||
567 begin
|
||||
568 end
|
||||
569 internal
|
||||
570 singleton
|
||||
571 begin
|
||||
572 end
|
||||
573 internal
|
||||
574 singleton
|
||||
575 begin
|
||||
576 end
|
||||
577 internal
|
||||
578 singleton
|
||||
579 begin
|
||||
580 end
|
||||
581 internal
|
||||
582 singleton
|
||||
583 begin
|
||||
584 end
|
||||
585 internal
|
||||
586 singleton
|
||||
587 begin
|
||||
588 end
|
||||
589 internal
|
||||
590 singleton
|
||||
591 begin
|
||||
592 end
|
||||
593 internal
|
||||
594 singleton
|
||||
595 begin
|
||||
596 end
|
||||
597 internal
|
||||
598 singleton
|
||||
599 begin
|
||||
600 end
|
||||
601 internal
|
||||
602 singleton
|
||||
603 begin
|
||||
604 end
|
||||
605 internal
|
||||
606 singleton
|
||||
607 begin
|
||||
608 end
|
||||
609 internal
|
||||
610 singleton
|
||||
611 begin
|
||||
612 end
|
||||
613 internal
|
||||
614 singleton
|
||||
615 begin
|
||||
616 end
|
||||
617 internal
|
||||
618 singleton
|
||||
619 begin
|
||||
620 end
|
||||
621 internal
|
||||
622 singleton
|
||||
623 begin
|
||||
624 end
|
||||
625 internal
|
||||
626 singleton
|
||||
627 begin
|
||||
628 end
|
||||
629 internal
|
||||
630 singleton
|
||||
631 begin
|
||||
632 end
|
||||
633 internal
|
||||
634 singleton
|
||||
635 begin
|
||||
636 end
|
||||
637 internal
|
||||
638 singleton
|
||||
639 begin
|
||||
640 end
|
||||
641 internal
|
||||
642 singleton
|
||||
643 begin
|
||||
644 end
|
||||
645 internal
|
||||
646 singleton
|
||||
BIN
model/ivector/final.dubm
Normal file
BIN
model/ivector/final.dubm
Normal file
Binary file not shown.
BIN
model/ivector/final.ie
Normal file
BIN
model/ivector/final.ie
Normal file
Binary file not shown.
BIN
model/ivector/final.mat
Normal file
BIN
model/ivector/final.mat
Normal file
Binary file not shown.
3
model/ivector/global_cmvn.stats
Normal file
3
model/ivector/global_cmvn.stats
Normal file
@ -0,0 +1,3 @@
|
||||
[
|
||||
1.117107e+11 -7.827721e+08 -1.101398e+10 -2.193934e+09 -1.347332e+10 -1.613916e+10 -1.199561e+10 -1.255081e+10 -1.638895e+10 -3.821099e+09 -1.372833e+10 -5.244242e+09 -1.098187e+10 -3.655235e+09 -9.364579e+09 -4.285302e+09 -6.296873e+09 -1.552953e+09 -3.176746e+09 -1.202976e+08 -9.857023e+08 2.316555e+08 -1.61059e+08 -5.891868e+07 3.465849e+08 -1.842054e+08 3.248211e+08 -1.483965e+08 3.739239e+08 -6.672061e+08 4.442288e+08 -9.274889e+08 5.142684e+08 4.292036e+07 2.206386e+08 -4.532715e+08 -2.092499e+08 -3.70488e+08 -8.079404e+07 -8.425977e+07 1.344125e+09
|
||||
9.982632e+12 1.02635e+12 8.634624e+11 9.06451e+11 9.652096e+11 1.12772e+12 9.468372e+11 9.141218e+11 9.670484e+11 6.936961e+11 8.141006e+11 6.256321e+11 6.087707e+11 4.616898e+11 4.212042e+11 2.862872e+11 2.498089e+11 1.470856e+11 1.099197e+11 5.780894e+10 3.118114e+10 1.060667e+10 1.466199e+09 4.173056e+08 5.257362e+09 1.277714e+10 2.114478e+10 2.974502e+10 3.587691e+10 4.078971e+10 4.247745e+10 4.382608e+10 4.62521e+10 4.575282e+10 3.546206e+10 3.041531e+10 2.838562e+10 2.258604e+10 1.715295e+10 1.303227e+10 0 ]
|
||||
0
model/ivector/online_cmvn.conf
Normal file
0
model/ivector/online_cmvn.conf
Normal file
2
model/ivector/splice.conf
Normal file
2
model/ivector/splice.conf
Normal file
@ -0,0 +1,2 @@
|
||||
--left-context=3
|
||||
--right-context=3
|
||||
BIN
recording_20250920_003720.wav
Normal file
BIN
recording_20250920_003720.wav
Normal file
Binary file not shown.
BIN
recording_20250920_003857.wav
Normal file
BIN
recording_20250920_003857.wav
Normal file
Binary file not shown.
BIN
recording_20250920_003912.wav
Normal file
BIN
recording_20250920_003912.wav
Normal file
Binary file not shown.
@ -1,6 +1,3 @@
|
||||
pyaudio
|
||||
vosk
|
||||
soxr
|
||||
numpy
|
||||
requests
|
||||
pydub
|
||||
vosk>=0.3.44
|
||||
pyaudio>=0.2.11
|
||||
numpy>=1.19.0
|
||||
403
simple_wake_and_record.py
Normal file
403
simple_wake_and_record.py
Normal file
@ -0,0 +1,403 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
"""
|
||||
简化的唤醒+录音测试
|
||||
专注于解决音频冲突问题
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import time
|
||||
import threading
|
||||
import pyaudio
|
||||
import json
|
||||
|
||||
# 添加当前目录到路径
|
||||
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
|
||||
|
||||
try:
|
||||
from vosk import Model, KaldiRecognizer
|
||||
VOSK_AVAILABLE = True
|
||||
except ImportError:
|
||||
VOSK_AVAILABLE = False
|
||||
print("⚠️ Vosk 未安装,请运行: pip install vosk")
|
||||
|
||||
class SimpleWakeAndRecord:
|
||||
"""简化的唤醒+录音系统"""
|
||||
|
||||
def __init__(self, model_path="model", wake_words=["你好", "助手"]):
|
||||
self.model_path = model_path
|
||||
self.wake_words = wake_words
|
||||
self.model = None
|
||||
self.recognizer = None
|
||||
self.audio = None
|
||||
self.stream = None
|
||||
self.running = False
|
||||
|
||||
# 音频参数
|
||||
self.FORMAT = pyaudio.paInt16
|
||||
self.CHANNELS = 1
|
||||
self.RATE = 16000
|
||||
self.CHUNK_SIZE = 1024
|
||||
|
||||
# 录音相关
|
||||
self.recording = False
|
||||
self.recorded_frames = []
|
||||
self.last_text_time = None # 最后一次识别到文字的时间
|
||||
self.recording_start_time = None
|
||||
self.recording_recognizer = None # 录音时专用的识别器
|
||||
|
||||
# 阈值
|
||||
self.text_silence_threshold = 3.0 # 3秒没有识别到文字就结束
|
||||
self.min_recording_time = 2.0 # 最小录音时间
|
||||
self.max_recording_time = 30.0 # 最大录音时间
|
||||
|
||||
self._setup_model()
|
||||
self._setup_audio()
|
||||
|
||||
def _setup_model(self):
|
||||
"""设置 Vosk 模型"""
|
||||
if not VOSK_AVAILABLE:
|
||||
return
|
||||
|
||||
try:
|
||||
if not os.path.exists(self.model_path):
|
||||
print(f"模型路径不存在: {self.model_path}")
|
||||
return
|
||||
|
||||
self.model = Model(self.model_path)
|
||||
self.recognizer = KaldiRecognizer(self.model, self.RATE)
|
||||
self.recognizer.SetWords(True)
|
||||
|
||||
print(f"✅ Vosk 模型加载成功")
|
||||
|
||||
except Exception as e:
|
||||
print(f"模型初始化失败: {e}")
|
||||
|
||||
def _setup_audio(self):
|
||||
"""设置音频设备"""
|
||||
try:
|
||||
if self.audio is None:
|
||||
self.audio = pyaudio.PyAudio()
|
||||
|
||||
if self.stream is None:
|
||||
self.stream = self.audio.open(
|
||||
format=self.FORMAT,
|
||||
channels=self.CHANNELS,
|
||||
rate=self.RATE,
|
||||
input=True,
|
||||
frames_per_buffer=self.CHUNK_SIZE
|
||||
)
|
||||
|
||||
print("✅ 音频设备初始化成功")
|
||||
|
||||
except Exception as e:
|
||||
print(f"音频设备初始化失败: {e}")
|
||||
|
||||
def _calculate_energy(self, audio_data):
|
||||
"""计算音频能量"""
|
||||
if len(audio_data) == 0:
|
||||
return 0
|
||||
|
||||
import numpy as np
|
||||
audio_array = np.frombuffer(audio_data, dtype=np.int16)
|
||||
rms = np.sqrt(np.mean(audio_array ** 2))
|
||||
return rms
|
||||
|
||||
def _check_wake_word(self, text):
|
||||
"""检查是否包含唤醒词"""
|
||||
if not text or not self.wake_words:
|
||||
return False, None
|
||||
|
||||
text_lower = text.lower()
|
||||
for wake_word in self.wake_words:
|
||||
if wake_word.lower() in text_lower:
|
||||
return True, wake_word
|
||||
return False, None
|
||||
|
||||
def _save_recording(self, audio_data):
|
||||
"""保存录音"""
|
||||
timestamp = time.strftime("%Y%m%d_%H%M%S")
|
||||
filename = f"recording_{timestamp}.wav"
|
||||
|
||||
try:
|
||||
import wave
|
||||
with wave.open(filename, 'wb') as wf:
|
||||
wf.setnchannels(self.CHANNELS)
|
||||
wf.setsampwidth(self.audio.get_sample_size(self.FORMAT))
|
||||
wf.setframerate(self.RATE)
|
||||
wf.writeframes(audio_data)
|
||||
|
||||
print(f"✅ 录音已保存: {filename}")
|
||||
return True, filename
|
||||
except Exception as e:
|
||||
print(f"保存录音失败: {e}")
|
||||
return False, None
|
||||
|
||||
def _play_audio(self, filename):
|
||||
"""播放音频文件"""
|
||||
try:
|
||||
import wave
|
||||
|
||||
# 打开音频文件
|
||||
with wave.open(filename, 'rb') as wf:
|
||||
# 获取音频参数
|
||||
channels = wf.getnchannels()
|
||||
width = wf.getsampwidth()
|
||||
rate = wf.getframerate()
|
||||
total_frames = wf.getnframes()
|
||||
|
||||
# 分块读取音频数据,避免内存问题
|
||||
chunk_size = 1024
|
||||
frames = []
|
||||
|
||||
for _ in range(0, total_frames, chunk_size):
|
||||
chunk = wf.readframes(chunk_size)
|
||||
if chunk:
|
||||
frames.append(chunk)
|
||||
else:
|
||||
break
|
||||
|
||||
# 创建播放流
|
||||
playback_stream = self.audio.open(
|
||||
format=self.audio.get_format_from_width(width),
|
||||
channels=channels,
|
||||
rate=rate,
|
||||
output=True
|
||||
)
|
||||
|
||||
print(f"🔊 开始播放: {filename}")
|
||||
|
||||
# 分块播放音频
|
||||
for chunk in frames:
|
||||
playback_stream.write(chunk)
|
||||
|
||||
# 等待播放完成
|
||||
playback_stream.stop_stream()
|
||||
playback_stream.close()
|
||||
|
||||
print("✅ 播放完成")
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 播放失败: {e}")
|
||||
# 如果pyaudio播放失败,尝试用系统命令播放
|
||||
self._play_with_system_player(filename)
|
||||
|
||||
def _play_with_system_player(self, filename):
|
||||
"""使用系统播放器播放音频"""
|
||||
try:
|
||||
import platform
|
||||
import subprocess
|
||||
|
||||
system = platform.system()
|
||||
|
||||
if system == 'Darwin': # macOS
|
||||
cmd = ['afplay', filename]
|
||||
elif system == 'Windows':
|
||||
cmd = ['start', '/min', filename]
|
||||
else: # Linux
|
||||
cmd = ['aplay', filename]
|
||||
|
||||
print(f"🔊 使用系统播放器: {' '.join(cmd)}")
|
||||
subprocess.run(cmd, check=True)
|
||||
print("✅ 播放完成")
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 系统播放器也失败: {e}")
|
||||
print(f"💡 文件已保存,请手动播放: {filename}")
|
||||
|
||||
def _start_recording(self):
|
||||
"""开始录音"""
|
||||
print("🎙️ 开始录音,请说话...")
|
||||
self.recording = True
|
||||
self.recorded_frames = []
|
||||
self.last_text_time = None
|
||||
self.recording_start_time = time.time()
|
||||
|
||||
# 为录音创建一个新的识别器
|
||||
if self.model:
|
||||
self.recording_recognizer = KaldiRecognizer(self.model, self.RATE)
|
||||
self.recording_recognizer.SetWords(True)
|
||||
|
||||
def _stop_recording(self):
|
||||
"""停止录音"""
|
||||
if len(self.recorded_frames) > 0:
|
||||
audio_data = b''.join(self.recorded_frames)
|
||||
duration = len(audio_data) / (self.RATE * 2)
|
||||
print(f"📝 录音完成,时长: {duration:.2f}秒")
|
||||
|
||||
# 保存录音
|
||||
success, filename = self._save_recording(audio_data)
|
||||
|
||||
# 如果保存成功,播放录音
|
||||
if success and filename:
|
||||
print("=" * 50)
|
||||
print("🔊 播放刚才录制的音频...")
|
||||
self._play_audio(filename)
|
||||
print("=" * 50)
|
||||
|
||||
self.recording = False
|
||||
self.recorded_frames = []
|
||||
self.last_text_time = None
|
||||
self.recording_start_time = None
|
||||
self.recording_recognizer = None
|
||||
|
||||
def start(self):
|
||||
"""开始唤醒词检测和录音"""
|
||||
if not self.stream:
|
||||
print("❌ 音频设备未初始化")
|
||||
return
|
||||
|
||||
self.running = True
|
||||
print("🎤 开始监听...")
|
||||
print(f"唤醒词: {', '.join(self.wake_words)}")
|
||||
|
||||
try:
|
||||
while self.running:
|
||||
# 读取音频数据
|
||||
data = self.stream.read(self.CHUNK_SIZE, exception_on_overflow=False)
|
||||
|
||||
if len(data) == 0:
|
||||
continue
|
||||
|
||||
if self.recording:
|
||||
# 录音模式
|
||||
self.recorded_frames.append(data)
|
||||
recording_duration = time.time() - self.recording_start_time
|
||||
|
||||
# 使用录音专用的识别器进行实时识别
|
||||
if self.recording_recognizer:
|
||||
if self.recording_recognizer.AcceptWaveform(data):
|
||||
# 获取最终识别结果
|
||||
result = json.loads(self.recording_recognizer.Result())
|
||||
text = result.get('text', '').strip()
|
||||
|
||||
if text:
|
||||
# 识别到文字,更新时间戳
|
||||
self.last_text_time = time.time()
|
||||
print(f"\n📝 识别: {text}")
|
||||
else:
|
||||
# 获取部分识别结果
|
||||
partial_result = json.loads(self.recording_recognizer.PartialResult())
|
||||
partial_text = partial_result.get('partial', '').strip()
|
||||
|
||||
if partial_text:
|
||||
# 更新时间戳(部分识别也算有声音)
|
||||
self.last_text_time = time.time()
|
||||
status = f"录音中... {recording_duration:.1f}s | {partial_text}"
|
||||
print(f"\r{status}", end='', flush=True)
|
||||
|
||||
# 检查是否需要结束录音
|
||||
current_time = time.time()
|
||||
|
||||
# 检查是否有文字识别超时
|
||||
if self.last_text_time is not None:
|
||||
text_silence_duration = current_time - self.last_text_time
|
||||
if text_silence_duration > self.text_silence_threshold and recording_duration >= self.min_recording_time:
|
||||
print(f"\n\n3秒没有识别到文字,结束录音")
|
||||
self._stop_recording()
|
||||
else:
|
||||
# 还没有识别到任何文字,检查是否超时
|
||||
if recording_duration > 5.0: # 如果5秒还没识别到任何文字,也结束
|
||||
print(f"\n\n5秒没有识别到文字,结束录音")
|
||||
self._stop_recording()
|
||||
|
||||
# 检查最大录音时间
|
||||
if recording_duration > self.max_recording_time:
|
||||
print(f"\n\n达到最大录音时间 {self.max_recording_time}s")
|
||||
self._stop_recording()
|
||||
|
||||
# 显示录音状态
|
||||
if self.last_text_time is None:
|
||||
status = f"等待语音输入... {recording_duration:.1f}s"
|
||||
print(f"\r{status}", end='', flush=True)
|
||||
|
||||
elif self.model and self.recognizer:
|
||||
# 唤醒词检测模式
|
||||
if self.recognizer.AcceptWaveform(data):
|
||||
result = json.loads(self.recognizer.Result())
|
||||
text = result.get('text', '').strip()
|
||||
|
||||
if text:
|
||||
print(f"识别: {text}")
|
||||
|
||||
# 检查唤醒词
|
||||
is_wake_word, detected_word = self._check_wake_word(text)
|
||||
if is_wake_word:
|
||||
print(f"🎯 检测到唤醒词: {detected_word}")
|
||||
self._start_recording()
|
||||
else:
|
||||
# 显示实时音频级别
|
||||
energy = self._calculate_energy(data)
|
||||
if energy > 50: # 只显示有意义的音频级别
|
||||
partial_result = json.loads(self.recognizer.PartialResult())
|
||||
partial_text = partial_result.get('partial', '')
|
||||
if partial_text:
|
||||
status = f"监听中... 能量: {energy:.0f} | {partial_text}"
|
||||
else:
|
||||
status = f"监听中... 能量: {energy:.0f}"
|
||||
print(status, end='\r')
|
||||
|
||||
time.sleep(0.01)
|
||||
|
||||
except KeyboardInterrupt:
|
||||
print("\n👋 退出")
|
||||
except Exception as e:
|
||||
print(f"错误: {e}")
|
||||
finally:
|
||||
self.stop()
|
||||
|
||||
def stop(self):
|
||||
"""停止"""
|
||||
self.running = False
|
||||
if self.recording:
|
||||
self._stop_recording()
|
||||
|
||||
if self.stream:
|
||||
self.stream.stop_stream()
|
||||
self.stream.close()
|
||||
self.stream = None
|
||||
|
||||
if self.audio:
|
||||
self.audio.terminate()
|
||||
self.audio = None
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
print("🚀 简化唤醒+录音测试")
|
||||
print("=" * 50)
|
||||
|
||||
# 检查模型
|
||||
model_dir = "model"
|
||||
if not os.path.exists(model_dir):
|
||||
print("⚠️ 未找到模型目录")
|
||||
print("请下载 Vosk 模型到 model 目录")
|
||||
return
|
||||
|
||||
# 创建系统
|
||||
system = SimpleWakeAndRecord(
|
||||
model_path=model_dir,
|
||||
wake_words=["你好", "助手", "小爱"]
|
||||
)
|
||||
|
||||
if not system.model:
|
||||
print("❌ 模型加载失败")
|
||||
return
|
||||
|
||||
print("✅ 系统初始化成功")
|
||||
print("📖 使用说明:")
|
||||
print("1. 说唤醒词开始录音")
|
||||
print("2. 基于语音识别判断,3秒没有识别到文字就结束")
|
||||
print("3. 最少录音2秒,最多30秒")
|
||||
print("4. 录音时实时显示识别结果")
|
||||
print("5. 录音文件自动保存")
|
||||
print("6. 录音完成后自动播放刚才录制的内容")
|
||||
print("7. 按 Ctrl+C 退出")
|
||||
print("=" * 50)
|
||||
|
||||
# 开始运行
|
||||
system.start()
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@ -1,119 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
音频播放测试脚本
|
||||
用于测试树莓派的音频播放功能
|
||||
"""
|
||||
|
||||
import subprocess
|
||||
import time
|
||||
import sys
|
||||
import os
|
||||
|
||||
def test_audio_playback():
|
||||
"""测试音频播放功能"""
|
||||
print("=== 音频播放测试 ===")
|
||||
|
||||
# 检查音频设备
|
||||
print("\n1. 检查音频设备...")
|
||||
try:
|
||||
result = subprocess.run(['aplay', '-l'], capture_output=True, text=True)
|
||||
if result.returncode == 0:
|
||||
print("音频设备列表:")
|
||||
print(result.stdout)
|
||||
else:
|
||||
print("错误: 无法获取音频设备列表")
|
||||
return False
|
||||
except FileNotFoundError:
|
||||
print("错误: aplay 命令未找到,请安装 alsa-utils")
|
||||
return False
|
||||
|
||||
# 测试播放系统声音
|
||||
print("\n2. 测试播放系统提示音...")
|
||||
try:
|
||||
# 使用系统内置的测试声音
|
||||
result = subprocess.run(['speaker-test', '-t', 'sine', '-f', '440', '-l', '1'],
|
||||
capture_output=True, text=True, timeout=5)
|
||||
if result.returncode == 0:
|
||||
print("✓ 系统提示音播放成功")
|
||||
else:
|
||||
print("✗ 系统提示音播放失败")
|
||||
return False
|
||||
except (subprocess.TimeoutExpired, FileNotFoundError):
|
||||
print("提示: speaker-test 测试跳过,尝试直接播放音频文件")
|
||||
|
||||
# 创建测试音频文件并播放
|
||||
print("\n3. 创建并播放测试音频文件...")
|
||||
test_audio_file = "/tmp/test_audio.wav"
|
||||
|
||||
# 使用sox生成测试音频(如果可用)
|
||||
if os.path.exists("/usr/bin/sox"):
|
||||
try:
|
||||
subprocess.run(['sox', '-n', '-r', '44100', '-c', '2', test_audio_file,
|
||||
'synth', '3', 'sine', '440'], check=True)
|
||||
print("✓ 测试音频文件创建成功")
|
||||
except (subprocess.CalledProcessError, FileNotFoundError):
|
||||
print("无法创建测试音频文件,跳过文件播放测试")
|
||||
return True
|
||||
else:
|
||||
print("sox 未安装,跳过文件播放测试")
|
||||
return True
|
||||
|
||||
# 播放测试音频文件
|
||||
try:
|
||||
result = subprocess.run(['aplay', test_audio_file], capture_output=True, text=True)
|
||||
if result.returncode == 0:
|
||||
print("✓ 音频文件播放成功")
|
||||
return True
|
||||
else:
|
||||
print("✗ 音频文件播放失败")
|
||||
print(f"错误信息: {result.stderr}")
|
||||
return False
|
||||
except FileNotFoundError:
|
||||
print("错误: aplay 命令未找到")
|
||||
return False
|
||||
finally:
|
||||
# 清理测试文件
|
||||
if os.path.exists(test_audio_file):
|
||||
os.remove(test_audio_file)
|
||||
|
||||
def check_volume():
|
||||
"""检查并设置音量"""
|
||||
print("\n4. 检查音量设置...")
|
||||
try:
|
||||
result = subprocess.run(['amixer', 'sget', 'Master'], capture_output=True, text=True)
|
||||
if result.returncode == 0:
|
||||
print("当前音量设置:")
|
||||
print(result.stdout)
|
||||
|
||||
# 设置音量到80%
|
||||
subprocess.run(['amixer', 'sset', 'Master', '80%'], check=True)
|
||||
print("✓ 音量已设置为80%")
|
||||
return True
|
||||
else:
|
||||
print("无法获取音量信息")
|
||||
return False
|
||||
except (subprocess.CalledProcessError, FileNotFoundError):
|
||||
print("amixer 命令未找到或执行失败")
|
||||
return False
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("树莓派音频播放功能测试")
|
||||
print("=" * 40)
|
||||
|
||||
success = True
|
||||
|
||||
# 检查音量
|
||||
if not check_volume():
|
||||
success = False
|
||||
|
||||
# 测试音频播放
|
||||
if not test_audio_playback():
|
||||
success = False
|
||||
|
||||
print("\n" + "=" * 40)
|
||||
if success:
|
||||
print("✓ 所有音频播放测试通过")
|
||||
sys.exit(0)
|
||||
else:
|
||||
print("✗ 部分音频播放测试失败")
|
||||
sys.exit(1)
|
||||
@ -1,187 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
音频录音测试脚本
|
||||
用于测试树莓派的音频录音功能
|
||||
"""
|
||||
|
||||
import subprocess
|
||||
import time
|
||||
import sys
|
||||
import os
|
||||
import signal
|
||||
|
||||
def test_audio_recording():
|
||||
"""测试音频录音功能"""
|
||||
print("=== 音频录音测试 ===")
|
||||
|
||||
# 检查录音设备
|
||||
print("\n1. 检查录音设备...")
|
||||
try:
|
||||
result = subprocess.run(['arecord', '-l'], capture_output=True, text=True)
|
||||
if result.returncode == 0:
|
||||
print("录音设备列表:")
|
||||
print(result.stdout)
|
||||
else:
|
||||
print("错误: 无法获取录音设备列表")
|
||||
return False
|
||||
except FileNotFoundError:
|
||||
print("错误: arecord 命令未找到,请安装 alsa-utils")
|
||||
return False
|
||||
|
||||
# 录制测试音频
|
||||
print("\n2. 录制测试音频(5秒)...")
|
||||
test_record_file = "/tmp/test_record.wav"
|
||||
|
||||
try:
|
||||
print("请对着麦克风说话(5秒录音开始)...")
|
||||
|
||||
# 录制5秒音频
|
||||
result = subprocess.run(['arecord', '-d', '5', '-f', 'cd', test_record_file],
|
||||
capture_output=True, text=True)
|
||||
|
||||
if result.returncode == 0:
|
||||
print("✓ 音频录制成功")
|
||||
|
||||
# 检查文件是否存在且大小合理
|
||||
if os.path.exists(test_record_file):
|
||||
file_size = os.path.getsize(test_record_file)
|
||||
print(f"录制文件大小: {file_size} 字节")
|
||||
|
||||
if file_size > 1000: # 至少1KB
|
||||
print("✓ 录音文件大小正常")
|
||||
return True
|
||||
else:
|
||||
print("✗ 录音文件太小,可能录音失败")
|
||||
return False
|
||||
else:
|
||||
print("✗ 录音文件未创建")
|
||||
return False
|
||||
else:
|
||||
print("✗ 音频录制失败")
|
||||
print(f"错误信息: {result.stderr}")
|
||||
return False
|
||||
|
||||
except FileNotFoundError:
|
||||
print("错误: arecord 命令未找到")
|
||||
return False
|
||||
except KeyboardInterrupt:
|
||||
print("\n录音被用户中断")
|
||||
return False
|
||||
|
||||
def test_audio_playback_verification():
|
||||
"""播放录制的音频进行验证"""
|
||||
print("\n3. 播放录制的音频进行验证...")
|
||||
test_record_file = "/tmp/test_record.wav"
|
||||
|
||||
if not os.path.exists(test_record_file):
|
||||
print("错误: 找不到录制的音频文件")
|
||||
return False
|
||||
|
||||
try:
|
||||
print("播放录制的音频...")
|
||||
result = subprocess.run(['aplay', test_record_file], capture_output=True, text=True)
|
||||
|
||||
if result.returncode == 0:
|
||||
print("✓ 录音播放成功")
|
||||
return True
|
||||
else:
|
||||
print("✗ 录音播放失败")
|
||||
print(f"错误信息: {result.stderr}")
|
||||
return False
|
||||
|
||||
except FileNotFoundError:
|
||||
print("错误: aplay 命令未找到")
|
||||
return False
|
||||
|
||||
def test_microphone_levels():
|
||||
"""测试麦克风音量级别"""
|
||||
print("\n4. 测试麦克风音量级别...")
|
||||
|
||||
try:
|
||||
# 获取麦克风音量
|
||||
result = subprocess.run(['amixer', 'sget', 'Capture'], capture_output=True, text=True)
|
||||
|
||||
if result.returncode == 0:
|
||||
print("当前麦克风音量:")
|
||||
print(result.stdout)
|
||||
|
||||
# 设置麦克风音量
|
||||
subprocess.run(['amixer', 'sset', 'Capture', '80%'], check=True)
|
||||
print("✓ 麦克风音量已设置为80%")
|
||||
return True
|
||||
else:
|
||||
print("无法获取麦克风音量信息")
|
||||
return False
|
||||
|
||||
except (subprocess.CalledProcessError, FileNotFoundError):
|
||||
print("amixer 命令未找到或执行失败")
|
||||
return False
|
||||
|
||||
def test_realtime_monitoring():
|
||||
"""实时音频监控测试"""
|
||||
print("\n5. 实时音频监控测试(3秒)...")
|
||||
|
||||
try:
|
||||
print("开始实时监控,请对着麦克风说话...")
|
||||
|
||||
# 使用parecord进行实时监控(如果可用)
|
||||
cmd = ['parecord', '--monitor', '--latency-msec', '100', '--duration', '3', '/dev/null']
|
||||
|
||||
result = subprocess.run(cmd, capture_output=True, text=True, timeout=5)
|
||||
|
||||
if result.returncode == 0:
|
||||
print("✓ 实时监控测试成功")
|
||||
return True
|
||||
else:
|
||||
print("提示: 实时监控测试跳过(需要pulseaudio)")
|
||||
return True
|
||||
|
||||
except (subprocess.TimeoutExpired, FileNotFoundError, subprocess.CalledProcessError):
|
||||
print("提示: 实时监控测试跳过")
|
||||
return True
|
||||
|
||||
def cleanup():
|
||||
"""清理测试文件"""
|
||||
test_files = ["/tmp/test_record.wav"]
|
||||
|
||||
for file_path in test_files:
|
||||
if os.path.exists(file_path):
|
||||
try:
|
||||
os.remove(file_path)
|
||||
print(f"✓ 已清理测试文件: {file_path}")
|
||||
except OSError:
|
||||
print(f"警告: 无法清理测试文件: {file_path}")
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("树莓派音频录音功能测试")
|
||||
print("=" * 40)
|
||||
|
||||
success = True
|
||||
|
||||
# 测试麦克风音量
|
||||
if not test_microphone_levels():
|
||||
success = False
|
||||
|
||||
# 测试音频录制
|
||||
if not test_audio_recording():
|
||||
success = False
|
||||
|
||||
# 播放录制的音频
|
||||
if os.path.exists("/tmp/test_record.wav"):
|
||||
if not test_audio_playback_verification():
|
||||
success = False
|
||||
|
||||
# 实时监控测试
|
||||
if not test_realtime_monitoring():
|
||||
success = False
|
||||
|
||||
print("\n" + "=" * 40)
|
||||
if success:
|
||||
print("✓ 所有音频录音测试通过")
|
||||
else:
|
||||
print("✗ 部分音频录音测试失败")
|
||||
|
||||
# 清理测试文件
|
||||
cleanup()
|
||||
|
||||
sys.exit(0 if success else 1)
|
||||
@ -1,10 +0,0 @@
|
||||
{
|
||||
"volume": 10,
|
||||
"mic_name": "bcm2835 Headphones: - (hw:0,0)",
|
||||
"audio_output_device": "bcm2835 Headphones",
|
||||
"model_name": "qwen2.5:0.5b",
|
||||
"voice": "en_US-kathleen-low.onnx",
|
||||
"enable_audio_processing": false,
|
||||
"history_length": 6,
|
||||
"system_prompt": "You are a helpful assistant."
|
||||
}
|
||||
@ -1,469 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Voice Assistant: Real-Time Voice Chat
|
||||
|
||||
This app runs on a Raspberry Pi (or Linux desktop) and creates a low-latency, full-duplex voice interaction
|
||||
with an AI character. It uses local speech recognition
|
||||
(Vosk), local text-to-speech synthesis (Piper), and a locally hosted large language model via Ollama.
|
||||
|
||||
Key Features:
|
||||
- Wake-free, continuous voice recognition with real-time transcription
|
||||
- LLM-driven responses streamed from a selected local model (e.g., LLaMA, Qwen, Gemma)
|
||||
- Audio response synthesis with a gruff custom voice using ONNX-based Piper models
|
||||
- Optional noise mixing and filtering via SoX
|
||||
- System volume control via ALSA
|
||||
- Modular and responsive design suitable for low-latency, character-driven agents
|
||||
|
||||
Ideal for embedded voice AI demos, cosplay companions, or standalone AI characters.
|
||||
|
||||
Copyright: M15.ai
|
||||
License: MIT
|
||||
"""
|
||||
|
||||
import io
|
||||
import json
|
||||
import os
|
||||
import queue
|
||||
import re
|
||||
import subprocess
|
||||
import threading
|
||||
import time
|
||||
import wave
|
||||
|
||||
import numpy as np
|
||||
import pyaudio
|
||||
import requests
|
||||
import soxr
|
||||
from pydub import AudioSegment
|
||||
from vosk import KaldiRecognizer, Model
|
||||
|
||||
|
||||
# ------------------- TIMING UTILITY -------------------
|
||||
class Timer:
|
||||
def __init__(self, label):
|
||||
self.label = label
|
||||
self.enabled = True
|
||||
def __enter__(self):
|
||||
self.start = time.time()
|
||||
return self
|
||||
def __exit__(self, exc_type, exc_val, exc_tb):
|
||||
if self.enabled:
|
||||
elapsed_ms = (time.time() - self.start) * 1000
|
||||
print(f"[Timing] {self.label}: {elapsed_ms:.0f} ms")
|
||||
def disable(self):
|
||||
self.enabled = False
|
||||
|
||||
# ------------------- FUNCTIONS -------------------
|
||||
|
||||
def get_input_device_index(preferred_name="Shure MVX2U"):
|
||||
pa = pyaudio.PyAudio()
|
||||
index = None
|
||||
for i in range(pa.get_device_count()):
|
||||
info = pa.get_device_info_by_index(i)
|
||||
if preferred_name.lower() in info['name'].lower() and info['maxInputChannels'] > 0:
|
||||
print(f"[Debug] Selected input device {i}: {info['name']}")
|
||||
print(f"[Debug] Device sample rate: {info['defaultSampleRate']} Hz")
|
||||
index = i
|
||||
break
|
||||
pa.terminate()
|
||||
if index is None:
|
||||
print("[Warning] Preferred mic not found. Falling back to default.")
|
||||
return index
|
||||
|
||||
def get_output_device_index(preferred_name):
|
||||
pa = pyaudio.PyAudio()
|
||||
for i in range(pa.get_device_count()):
|
||||
info = pa.get_device_info_by_index(i)
|
||||
if preferred_name.lower() in info['name'].lower() and info['maxOutputChannels'] > 0:
|
||||
print(f"[Debug] Selected output device {i}: {info['name']}")
|
||||
return i
|
||||
print("[Warning] Preferred output device not found. Using default index 0.")
|
||||
return 0
|
||||
|
||||
def parse_card_number(device_str):
|
||||
"""
|
||||
Extract ALSA card number from string like 'plughw:3,0'
|
||||
"""
|
||||
try:
|
||||
return int(device_str.split(":")[1].split(",")[0])
|
||||
except Exception as e:
|
||||
print(f"[Warning] Could not parse card number from {device_str}: {e}")
|
||||
return 0 # fallback
|
||||
|
||||
def list_input_devices():
|
||||
pa = pyaudio.PyAudio()
|
||||
print("[Debug] Available input devices:")
|
||||
for i in range(pa.get_device_count()):
|
||||
info = pa.get_device_info_by_index(i)
|
||||
if info['maxInputChannels'] > 0:
|
||||
print(f" {i}: {info['name']} ({int(info['defaultSampleRate'])} Hz, {info['maxInputChannels']}ch)")
|
||||
pa.terminate()
|
||||
|
||||
def resample_audio(data, orig_rate=48000, target_rate=16000):
|
||||
# Convert byte string to numpy array
|
||||
audio_np = np.frombuffer(data, dtype=np.int16)
|
||||
# Resample using soxr
|
||||
resampled_np = soxr.resample(audio_np, orig_rate, target_rate)
|
||||
# Convert back to bytes
|
||||
return resampled_np.astype(np.int16).tobytes()
|
||||
|
||||
def set_output_volume(volume_level, card_id=3):
|
||||
"""
|
||||
Set output volume using ALSA 'Speaker' control on specified card.
|
||||
volume_level: 1–10 (user scale)
|
||||
card_id: ALSA card number (from aplay -l)
|
||||
"""
|
||||
percent = max(1, min(volume_level, 10)) * 10 # map to 10–100%
|
||||
try:
|
||||
subprocess.run(
|
||||
['amixer', '-c', str(card_id), 'sset', 'Speaker', f'{percent}%'],
|
||||
check=True,
|
||||
stdout=subprocess.DEVNULL,
|
||||
stderr=subprocess.DEVNULL
|
||||
)
|
||||
print(f"[Debug] Volume set to {percent}% on card {card_id}")
|
||||
except Exception as e:
|
||||
print(f"[Warning] Volume control failed on card {card_id}: {e}")
|
||||
|
||||
# ------------------- PATHS -------------------
|
||||
|
||||
CONFIG_PATH = os.path.expanduser("va_config.json")
|
||||
BASE_DIR = os.path.dirname(__file__)
|
||||
MODEL_PATH = os.path.join(BASE_DIR, 'vosk-model')
|
||||
CHAT_URL = 'https://open.bigmodel.cn/api/paas/v4/chat/completions'
|
||||
AUTH_TOKEN = '0c9cbaca9d2bbf864990f1e1decdf340.dXRMsZCHTUbPQ0rm' # Replace with your actual token
|
||||
|
||||
# ------------------- CONFIG FILE LOADING -------------------
|
||||
|
||||
DEFAULT_CONFIG = {
|
||||
"volume": 9,
|
||||
"mic_name": "Plantronics",
|
||||
"audio_output_device": "Plantronics",
|
||||
"model_name": "qwen2.5:0.5b",
|
||||
"voice": "en_US-kathleen-low.onnx",
|
||||
"enable_audio_processing": False,
|
||||
"history_length": 4,
|
||||
"system_prompt": "You are a helpful assistant."
|
||||
}
|
||||
|
||||
def load_config():
|
||||
# Load config from system file or fall back to defaults
|
||||
if os.path.isfile(CONFIG_PATH):
|
||||
try:
|
||||
with open(CONFIG_PATH, 'r') as f:
|
||||
user_config = json.load(f)
|
||||
return {**DEFAULT_CONFIG, **user_config} # merge with defaults
|
||||
except Exception as e:
|
||||
print(f"[Warning] Failed to load system config: {e}")
|
||||
|
||||
print("[Debug] Using default config.")
|
||||
|
||||
return DEFAULT_CONFIG
|
||||
|
||||
config = load_config()
|
||||
|
||||
# Apply loaded config values
|
||||
VOLUME = config["volume"]
|
||||
MIC_NAME = config["mic_name"]
|
||||
AUDIO_OUTPUT_DEVICE = config["audio_output_device"]
|
||||
AUDIO_OUTPUT_DEVICE_INDEX = get_output_device_index(config["audio_output_device"])
|
||||
OUTPUT_CARD = parse_card_number(AUDIO_OUTPUT_DEVICE)
|
||||
MODEL_NAME = config["model_name"]
|
||||
VOICE_MODEL = os.path.join("voices", config["voice"])
|
||||
ENABLE_AUDIO_PROCESSING = config["enable_audio_processing"]
|
||||
HISTORY_LENGTH = config["history_length"]
|
||||
|
||||
# Set system volume
|
||||
set_output_volume(VOLUME, OUTPUT_CARD)
|
||||
|
||||
# Setup messages with system prompt
|
||||
messages = [{"role": "system", "content": config["system_prompt"]}]
|
||||
|
||||
list_input_devices()
|
||||
RATE = 48000
|
||||
CHUNK = 1024
|
||||
CHANNELS = 1
|
||||
mic_enabled = True
|
||||
DEVICE_INDEX = get_input_device_index()
|
||||
|
||||
# SOUND EFFECTS
|
||||
NOISE_LEVEL = '0.04'
|
||||
BANDPASS_HIGHPASS = '300'
|
||||
BANDPASS_LOWPASS = '800'
|
||||
|
||||
# ------------------- VOICE MODEL -------------------
|
||||
|
||||
VOICE_MODELS_DIR = os.path.join(BASE_DIR, 'voices')
|
||||
if not os.path.isdir(VOICE_MODELS_DIR):
|
||||
os.makedirs(VOICE_MODELS_DIR)
|
||||
|
||||
VOICE_MODEL = os.path.join(VOICE_MODELS_DIR, config["voice"])
|
||||
|
||||
print('[Debug] Available Piper voices:')
|
||||
for f in os.listdir(VOICE_MODELS_DIR):
|
||||
if f.endswith('.onnx'):
|
||||
print(' ', f)
|
||||
print(f'[Debug] Using VOICE_MODEL: {VOICE_MODEL}')
|
||||
print(f"[Debug] Config loaded: model={MODEL_NAME}, voice={config['voice']}, vol={VOLUME}, mic={MIC_NAME}")
|
||||
|
||||
# ------------------- CONVERSATION STATE -------------------
|
||||
|
||||
audio_queue = queue.Queue()
|
||||
|
||||
# Audio callback form Shure
|
||||
def audio_callback(in_data, frame_count, time_info, status):
|
||||
global mic_enabled
|
||||
if not mic_enabled:
|
||||
return (None, pyaudio.paContinue)
|
||||
resampled_data = resample_audio(in_data, orig_rate=48000, target_rate=16000)
|
||||
audio_queue.put(resampled_data)
|
||||
return (None, pyaudio.paContinue)
|
||||
|
||||
# ------------------- STREAM SETUP -------------------
|
||||
|
||||
def start_stream():
|
||||
pa = pyaudio.PyAudio()
|
||||
|
||||
stream = pa.open(
|
||||
rate=RATE,
|
||||
format=pyaudio.paInt16,
|
||||
channels=CHANNELS,
|
||||
input=True,
|
||||
input_device_index=DEVICE_INDEX,
|
||||
frames_per_buffer=CHUNK,
|
||||
stream_callback=audio_callback
|
||||
)
|
||||
stream.start_stream()
|
||||
print(f'[Debug] Stream @ {RATE}Hz')
|
||||
return pa, stream
|
||||
|
||||
# ------------------- QUERY OLLAMA CHAT ENDPOINT -------------------
|
||||
|
||||
def query_glm():
|
||||
headers = {
|
||||
'Authorization': f'Bearer {AUTH_TOKEN}',
|
||||
'Content-Type': 'application/json'
|
||||
}
|
||||
payload = {
|
||||
"model": "glm-4.5",
|
||||
"messages": [messages[0]] + messages[-HISTORY_LENGTH:], # force system prompt at top
|
||||
"temperature": 0.6,
|
||||
"max_tokens": 1024,
|
||||
"stream": False
|
||||
}
|
||||
|
||||
with Timer("Inference"): # measure inference latency
|
||||
resp = requests.post(CHAT_URL, json=payload, headers=headers)
|
||||
|
||||
if resp.status_code != 200:
|
||||
print(f'[Error] GLM API failed with status {resp.status_code}: {resp.text}')
|
||||
return ''
|
||||
|
||||
data = resp.json()
|
||||
# Extract assistant message
|
||||
reply = ''
|
||||
if 'choices' in data and len(data['choices']) > 0:
|
||||
choice = data['choices'][0]
|
||||
if 'message' in choice and 'content' in choice['message']:
|
||||
reply = choice['message']['content'].strip()
|
||||
return reply
|
||||
|
||||
# ------------------- TTS & DEGRADATION -------------------
|
||||
|
||||
import tempfile
|
||||
|
||||
|
||||
def play_response(text):
|
||||
import io
|
||||
import tempfile
|
||||
|
||||
# Mute the mic during playback to avoid feedback loop
|
||||
global mic_enabled
|
||||
mic_enabled = False # 🔇 mute mic
|
||||
|
||||
# clean the response
|
||||
clean = re.sub(r"[\*]+", '', text) # remove asterisks
|
||||
clean = re.sub(r"\(.*?\)", '', clean) # remove (stage directions)
|
||||
clean = re.sub(r"<.*?>", '', clean) # remove HTML-style tags
|
||||
clean = clean.replace('\n', ' ').strip() # normalize newlines
|
||||
clean = re.sub(r'\s+', ' ', clean) # collapse whitespace
|
||||
clean = re.sub(r'[\U0001F300-\U0001FAFF\u2600-\u26FF\u2700-\u27BF]+', '', clean) # remove emojis
|
||||
|
||||
piper_path = os.path.join(BASE_DIR, 'bin', 'piper', 'piper')
|
||||
|
||||
# 1. Generate Piper raw PCM
|
||||
with Timer("Piper inference"):
|
||||
piper_proc = subprocess.Popen(
|
||||
[piper_path, '--model', VOICE_MODEL, '--output_raw'],
|
||||
stdin=subprocess.PIPE,
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.DEVNULL
|
||||
)
|
||||
tts_pcm, _ = piper_proc.communicate(input=clean.encode())
|
||||
|
||||
if ENABLE_AUDIO_PROCESSING:
|
||||
# SoX timing consolidation
|
||||
sox_start = time.time()
|
||||
|
||||
# 2. Convert raw PCM to WAV
|
||||
pcm_to_wav = subprocess.Popen(
|
||||
['sox', '-t', 'raw', '-r', '16000', '-c', str(CHANNELS), '-b', '16',
|
||||
'-e', 'signed-integer', '-', '-t', 'wav', '-'],
|
||||
stdin=subprocess.PIPE,
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.DEVNULL
|
||||
)
|
||||
tts_wav_16k, _ = pcm_to_wav.communicate(input=tts_pcm)
|
||||
|
||||
# 3. Estimate duration
|
||||
duration_sec = len(tts_pcm) / (RATE * 2)
|
||||
|
||||
# 4. Generate white noise WAV bytes
|
||||
noise_bytes = subprocess.check_output([
|
||||
'sox', '-n',
|
||||
'-r', '16000',
|
||||
'-c', str(CHANNELS),
|
||||
'-b', '16',
|
||||
'-e', 'signed-integer',
|
||||
'-t', 'wav', '-',
|
||||
'synth', str(duration_sec),
|
||||
'whitenoise', 'vol', NOISE_LEVEL
|
||||
], stderr=subprocess.DEVNULL)
|
||||
|
||||
# 5. Write both to temp files & mix
|
||||
with tempfile.NamedTemporaryFile(suffix='.wav') as tts_file, tempfile.NamedTemporaryFile(suffix='.wav') as noise_file:
|
||||
tts_file.write(tts_wav_16k)
|
||||
noise_file.write(noise_bytes)
|
||||
tts_file.flush()
|
||||
noise_file.flush()
|
||||
mixer = subprocess.Popen(
|
||||
['sox', '-m', tts_file.name, noise_file.name, '-t', 'wav', '-'],
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.DEVNULL
|
||||
)
|
||||
mixed_bytes, _ = mixer.communicate()
|
||||
|
||||
# 6. Apply filter
|
||||
filter_proc = subprocess.Popen(
|
||||
#['sox', '-t', 'wav', '-', '-t', 'wav', '-', 'highpass', BANDPASS_HIGHPASS, 'lowpass', BANDPASS_LOWPASS],
|
||||
['sox', '-t', 'wav', '-', '-r', '48000', '-t', 'wav', '-',
|
||||
'highpass', BANDPASS_HIGHPASS, 'lowpass', BANDPASS_LOWPASS],
|
||||
stdin=subprocess.PIPE,
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.DEVNULL
|
||||
)
|
||||
final_bytes, _ = filter_proc.communicate(input=mixed_bytes)
|
||||
|
||||
sox_elapsed = (time.time() - sox_start) * 1000
|
||||
print(f"[Timing] SoX (total): {int(sox_elapsed)} ms")
|
||||
|
||||
else:
|
||||
# No FX: just convert raw PCM to WAV
|
||||
pcm_to_wav = subprocess.Popen(
|
||||
['sox', '-t', 'raw', '-r', '16000', '-c', str(CHANNELS), '-b', '16',
|
||||
'-e', 'signed-integer', '-', '-t', 'wav', '-'],
|
||||
stdin=subprocess.PIPE,
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.DEVNULL
|
||||
)
|
||||
tts_wav_16k, _ = pcm_to_wav.communicate(input=tts_pcm)
|
||||
|
||||
resample_proc = subprocess.Popen(
|
||||
['sox', '-t', 'wav', '-', '-r', '48000', '-t', 'wav', '-'],
|
||||
stdin=subprocess.PIPE,
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.DEVNULL
|
||||
)
|
||||
final_bytes, _ = resample_proc.communicate(input=tts_wav_16k)
|
||||
|
||||
# 7. Playback
|
||||
with Timer("Playback"):
|
||||
try:
|
||||
wf = wave.open(io.BytesIO(final_bytes), 'rb')
|
||||
|
||||
|
||||
pa = pyaudio.PyAudio()
|
||||
stream = pa.open(
|
||||
format=pa.get_format_from_width(wf.getsampwidth()),
|
||||
channels=wf.getnchannels(),
|
||||
rate=wf.getframerate(),
|
||||
output=True,
|
||||
output_device_index=AUDIO_OUTPUT_DEVICE_INDEX
|
||||
)
|
||||
|
||||
data = wf.readframes(CHUNK)
|
||||
while data:
|
||||
stream.write(data)
|
||||
data = wf.readframes(CHUNK)
|
||||
|
||||
stream.stop_stream()
|
||||
stream.close()
|
||||
pa.terminate()
|
||||
wf.close()
|
||||
|
||||
except wave.Error as e:
|
||||
print(f"[Error] Could not open final WAV: {e}")
|
||||
|
||||
finally:
|
||||
mic_enabled = True # 🔊 unmute mic
|
||||
time.sleep(0.3) # optional: small cooldown
|
||||
|
||||
|
||||
# ------------------- PROCESSING LOOP -------------------
|
||||
|
||||
def processing_loop():
|
||||
model = Model(MODEL_PATH)
|
||||
#rec = KaldiRecognizer(model, RATE)
|
||||
rec = KaldiRecognizer(model, 16000)
|
||||
MAX_DEBUG_LEN = 200 # optional: limit length of debug output
|
||||
LOW_EFFORT_UTTERANCES = {"huh", "uh", "um", "erm", "hmm", "he's", "but"}
|
||||
|
||||
while True:
|
||||
data = audio_queue.get()
|
||||
|
||||
if rec.AcceptWaveform(data):
|
||||
start = time.time()
|
||||
r = json.loads(rec.Result())
|
||||
elapsed_ms = int((time.time() - start) * 1000)
|
||||
|
||||
user = r.get('text', '').strip()
|
||||
if user:
|
||||
print(f"[Timing] STT parse: {elapsed_ms} ms")
|
||||
print("User:", user)
|
||||
|
||||
if user.lower().strip(".,!? ") in LOW_EFFORT_UTTERANCES:
|
||||
print("[Debug] Ignored low-effort utterance.")
|
||||
rec = KaldiRecognizer(model, 16000)
|
||||
continue # Skip LLM response + TTS for accidental noise
|
||||
|
||||
messages.append({"role": "user", "content": user})
|
||||
# Generate assistant response
|
||||
resp_text = query_glm()
|
||||
if resp_text:
|
||||
# Clean debug print (remove newlines and carriage returns)
|
||||
clean_debug_text = resp_text.replace('\n', ' ').replace('\r', ' ')
|
||||
if len(clean_debug_text) > MAX_DEBUG_LEN:
|
||||
clean_debug_text = clean_debug_text[:MAX_DEBUG_LEN] + '...'
|
||||
|
||||
print('Assistant:', clean_debug_text)
|
||||
messages.append({"role": "assistant", "content": clean_debug_text})
|
||||
|
||||
# TTS generation + playback
|
||||
play_response(resp_text)
|
||||
else:
|
||||
print('[Debug] Empty response, skipping TTS.')
|
||||
|
||||
# Reset recognizer after each full interaction
|
||||
rec = KaldiRecognizer(model, 16000)
|
||||
|
||||
# ------------------- MAIN -------------------
|
||||
|
||||
if __name__ == '__main__':
|
||||
pa, stream = start_stream()
|
||||
t = threading.Thread(target=processing_loop, daemon=True)
|
||||
t.start()
|
||||
try:
|
||||
while stream.is_active():
|
||||
time.sleep(0.1)
|
||||
except KeyboardInterrupt:
|
||||
stream.stop_stream(); stream.close(); pa.terminate()
|
||||
344
voice_recorder.py
Normal file
344
voice_recorder.py
Normal file
@ -0,0 +1,344 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
"""
|
||||
语音录制模块
|
||||
基于pyaudio实现,支持语音活动检测(VAD)自动判断录音结束
|
||||
"""
|
||||
|
||||
import pyaudio
|
||||
import wave
|
||||
import numpy as np
|
||||
import time
|
||||
import os
|
||||
import threading
|
||||
from collections import deque
|
||||
|
||||
class VoiceRecorder:
|
||||
"""语音录制器,支持自动检测语音结束"""
|
||||
|
||||
def __init__(self,
|
||||
energy_threshold=500,
|
||||
silence_threshold=1.0,
|
||||
min_recording_time=0.5,
|
||||
max_recording_time=10.0,
|
||||
sample_rate=16000,
|
||||
chunk_size=1024,
|
||||
defer_audio_init=False):
|
||||
"""
|
||||
初始化录音器
|
||||
|
||||
Args:
|
||||
energy_threshold: 语音能量阈值
|
||||
silence_threshold: 静音持续时间阈值(秒)
|
||||
min_recording_time: 最小录音时间(秒)
|
||||
max_recording_time: 最大录音时间(秒)
|
||||
sample_rate: 采样率
|
||||
chunk_size: 音频块大小
|
||||
defer_audio_init: 是否延迟音频初始化
|
||||
"""
|
||||
self.energy_threshold = energy_threshold
|
||||
self.silence_threshold = silence_threshold
|
||||
self.min_recording_time = min_recording_time
|
||||
self.max_recording_time = max_recording_time
|
||||
self.sample_rate = sample_rate
|
||||
self.chunk_size = chunk_size
|
||||
self.defer_audio_init = defer_audio_init
|
||||
|
||||
# 音频参数
|
||||
self.FORMAT = pyaudio.paInt16
|
||||
self.CHANNELS = 1
|
||||
|
||||
# 状态变量
|
||||
self.audio = None
|
||||
self.stream = None
|
||||
self.recording = False
|
||||
self.recorded_frames = []
|
||||
|
||||
# 语音检测相关
|
||||
self.silence_start_time = None
|
||||
self.recording_start_time = None
|
||||
self.audio_buffer = deque(maxlen=int(sample_rate / chunk_size * 2)) # 2秒缓冲
|
||||
|
||||
# 回调函数
|
||||
self.on_recording_complete = None
|
||||
self.on_speech_detected = None
|
||||
|
||||
if not defer_audio_init:
|
||||
self._setup_audio()
|
||||
|
||||
def _setup_audio(self):
|
||||
"""设置音频设备"""
|
||||
try:
|
||||
self.audio = pyaudio.PyAudio()
|
||||
|
||||
# 获取默认输入设备信息
|
||||
device_info = self.audio.get_default_input_device_info()
|
||||
print(f"使用音频设备: {device_info['name']}")
|
||||
|
||||
except Exception as e:
|
||||
print(f"音频设备初始化失败: {e}")
|
||||
raise
|
||||
|
||||
def _calculate_energy(self, audio_data):
|
||||
"""计算音频能量"""
|
||||
if len(audio_data) == 0:
|
||||
return 0
|
||||
|
||||
# 转换为numpy数组
|
||||
audio_array = np.frombuffer(audio_data, dtype=np.int16)
|
||||
|
||||
# 计算RMS能量
|
||||
rms = np.sqrt(np.mean(audio_array ** 2))
|
||||
return rms
|
||||
|
||||
def _is_speech(self, audio_data):
|
||||
"""判断是否为语音"""
|
||||
energy = self._calculate_energy(audio_data)
|
||||
return energy > self.energy_threshold
|
||||
|
||||
def _open_stream(self):
|
||||
"""打开音频流"""
|
||||
if self.stream is not None:
|
||||
return
|
||||
|
||||
self.stream = self.audio.open(
|
||||
format=self.FORMAT,
|
||||
channels=self.CHANNELS,
|
||||
rate=self.sample_rate,
|
||||
input=True,
|
||||
frames_per_buffer=self.chunk_size
|
||||
)
|
||||
|
||||
def _close_stream(self):
|
||||
"""关闭音频流"""
|
||||
if self.stream:
|
||||
self.stream.stop_stream()
|
||||
self.stream.close()
|
||||
self.stream = None
|
||||
|
||||
def start_listening(self):
|
||||
"""开始监听语音"""
|
||||
if self.recording:
|
||||
print("正在录音中...")
|
||||
return
|
||||
|
||||
self._open_stream()
|
||||
self.recording = True
|
||||
self.recorded_frames = []
|
||||
self.silence_start_time = None
|
||||
self.recording_start_time = None
|
||||
|
||||
print("开始监听语音...")
|
||||
|
||||
# 在新线程中录音
|
||||
recording_thread = threading.Thread(target=self._record_loop)
|
||||
recording_thread.daemon = True
|
||||
recording_thread.start()
|
||||
|
||||
def _record_loop(self):
|
||||
"""录音循环"""
|
||||
try:
|
||||
while self.recording:
|
||||
# 读取音频数据
|
||||
data = self.stream.read(self.chunk_size, exception_on_overflow=False)
|
||||
|
||||
if len(data) == 0:
|
||||
continue
|
||||
|
||||
# 计算能量
|
||||
energy = self._calculate_energy(data)
|
||||
|
||||
# 添加到缓冲区
|
||||
self.audio_buffer.append(data)
|
||||
|
||||
# 检测语音活动
|
||||
if energy > self.energy_threshold:
|
||||
# 检测到语音
|
||||
if self.recording_start_time is None:
|
||||
# 开始录音
|
||||
self.recording_start_time = time.time()
|
||||
self.silence_start_time = None
|
||||
self.recorded_frames = list(self.audio_buffer) # 包含之前的音频
|
||||
|
||||
print("🎤 检测到语音,开始录音...")
|
||||
|
||||
if self.on_speech_detected:
|
||||
self.on_speech_detected()
|
||||
|
||||
# 重置静音计时
|
||||
self.silence_start_time = None
|
||||
|
||||
# 录音
|
||||
self.recorded_frames.append(data)
|
||||
|
||||
elif self.recording_start_time is not None:
|
||||
# 之前有语音,现在检查是否静音
|
||||
if self.silence_start_time is None:
|
||||
self.silence_start_time = time.time()
|
||||
|
||||
# 继续录音
|
||||
self.recorded_frames.append(data)
|
||||
|
||||
# 检查是否静音超时
|
||||
silence_duration = time.time() - self.silence_start_time
|
||||
if silence_duration > self.silence_threshold:
|
||||
recording_duration = time.time() - self.recording_start_time
|
||||
|
||||
# 检查最小录音时间
|
||||
if recording_duration >= self.min_recording_time:
|
||||
print(f"静音 {silence_duration:.1f}s,结束录音")
|
||||
self.stop_recording()
|
||||
break
|
||||
else:
|
||||
print(f"录音时间太短 ({recording_duration:.1f}s),继续等待...")
|
||||
self.silence_start_time = time.time()
|
||||
|
||||
# 检查最大录音时间
|
||||
if self.recording_start_time is not None:
|
||||
recording_duration = time.time() - self.recording_start_time
|
||||
if recording_duration > self.max_recording_time:
|
||||
print(f"达到最大录音时间 {self.max_recording_time}s,结束录音")
|
||||
self.stop_recording()
|
||||
break
|
||||
|
||||
# 短暂休眠
|
||||
time.sleep(0.01)
|
||||
|
||||
except Exception as e:
|
||||
print(f"录音过程中发生错误: {e}")
|
||||
self.stop_recording()
|
||||
|
||||
def stop_recording(self):
|
||||
"""停止录音"""
|
||||
if not self.recording:
|
||||
return
|
||||
|
||||
self.recording = False
|
||||
self._close_stream()
|
||||
|
||||
if len(self.recorded_frames) > 0:
|
||||
# 保存录音
|
||||
audio_data = b''.join(self.recorded_frames)
|
||||
|
||||
print(f"录音完成,共 {len(self.recorded_frames)} 帧")
|
||||
print(f"录音时长: {len(audio_data) / (self.sample_rate * 2):.2f} 秒")
|
||||
|
||||
# 调用回调函数
|
||||
if self.on_recording_complete:
|
||||
self.on_recording_complete(audio_data)
|
||||
|
||||
# 重置状态
|
||||
self.recorded_frames = []
|
||||
self.silence_start_time = None
|
||||
self.recording_start_time = None
|
||||
|
||||
def save_audio(self, audio_data, filename):
|
||||
"""保存音频到文件"""
|
||||
try:
|
||||
with wave.open(filename, 'wb') as wf:
|
||||
wf.setnchannels(self.CHANNELS)
|
||||
wf.setsampwidth(self.audio.get_sample_size(self.FORMAT))
|
||||
wf.setframerate(self.sample_rate)
|
||||
wf.writeframes(audio_data)
|
||||
|
||||
print(f"音频已保存到: {filename}")
|
||||
return True
|
||||
except Exception as e:
|
||||
print(f"保存音频失败: {e}")
|
||||
return False
|
||||
|
||||
def set_recording_complete_callback(self, callback):
|
||||
"""设置录音完成回调函数"""
|
||||
self.on_recording_complete = callback
|
||||
|
||||
def set_speech_detected_callback(self, callback):
|
||||
"""设置语音检测回调函数"""
|
||||
self.on_speech_detected = callback
|
||||
|
||||
def adjust_sensitivity(self, energy_threshold=None, silence_threshold=None):
|
||||
"""调整灵敏度"""
|
||||
if energy_threshold is not None:
|
||||
self.energy_threshold = energy_threshold
|
||||
print(f"能量阈值调整为: {energy_threshold}")
|
||||
|
||||
if silence_threshold is not None:
|
||||
self.silence_threshold = silence_threshold
|
||||
print(f"静音阈值调整为: {silence_threshold}秒")
|
||||
|
||||
def get_audio_level(self):
|
||||
"""获取当前音频级别"""
|
||||
if len(self.audio_buffer) > 0:
|
||||
latest_data = self.audio_buffer[-1]
|
||||
return self._calculate_energy(latest_data)
|
||||
return 0
|
||||
|
||||
def cleanup(self):
|
||||
"""清理资源"""
|
||||
self.stop_recording()
|
||||
if self.audio:
|
||||
self.audio.terminate()
|
||||
self.audio = None
|
||||
|
||||
def main():
|
||||
"""测试录音功能"""
|
||||
print("🎙️ 语音录制测试")
|
||||
print("=" * 50)
|
||||
print("配置:")
|
||||
print("- 能量阈值: 500")
|
||||
print("- 静音阈值: 1.0秒")
|
||||
print("- 最小录音时间: 0.5秒")
|
||||
print("- 最大录音时间: 10秒")
|
||||
print("=" * 50)
|
||||
print("请说话测试录音功能...")
|
||||
print("按 Ctrl+C 退出")
|
||||
|
||||
def on_recording_complete(audio_data):
|
||||
"""录音完成回调"""
|
||||
# 保存录音文件
|
||||
timestamp = time.strftime("%Y%m%d_%H%M%S")
|
||||
filename = f"recording_{timestamp}.wav"
|
||||
|
||||
recorder.save_audio(audio_data, filename)
|
||||
print(f"✅ 录音文件已保存: {filename}")
|
||||
|
||||
# 显示录音信息
|
||||
duration = len(audio_data) / (recorder.sample_rate * 2)
|
||||
print(f"录音时长: {duration:.2f} 秒")
|
||||
|
||||
def on_speech_detected():
|
||||
"""检测到语音回调"""
|
||||
print("🔊 检测到语音活动...")
|
||||
|
||||
# 创建录音器
|
||||
recorder = VoiceRecorder(
|
||||
energy_threshold=500,
|
||||
silence_threshold=1.0,
|
||||
min_recording_time=0.5,
|
||||
max_recording_time=10.0
|
||||
)
|
||||
|
||||
# 设置回调
|
||||
recorder.set_recording_complete_callback(on_recording_complete)
|
||||
recorder.set_speech_detected_callback(on_speech_detected)
|
||||
|
||||
try:
|
||||
# 开始监听
|
||||
recorder.start_listening()
|
||||
|
||||
# 保持程序运行
|
||||
while True:
|
||||
time.sleep(0.1)
|
||||
|
||||
# 显示当前音频级别(可选)
|
||||
level = recorder.get_audio_level()
|
||||
if level > 100:
|
||||
print(f"当前音频级别: {level:.0f}", end='\r')
|
||||
|
||||
except KeyboardInterrupt:
|
||||
print("\n👋 退出录音测试")
|
||||
finally:
|
||||
recorder.cleanup()
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
BIN
vosk-model/.DS_Store
vendored
BIN
vosk-model/.DS_Store
vendored
Binary file not shown.
Loading…
Reference in New Issue
Block a user