回声待处理
This commit is contained in:
parent
0ab8e49ba5
commit
aed69e9c54
190
README_multiprocess.md
Normal file
190
README_multiprocess.md
Normal file
@ -0,0 +1,190 @@
|
||||
# 多进程音频录音系统
|
||||
|
||||
基于进程隔离的音频处理架构,实现零延迟的录音和播放切换。
|
||||
|
||||
## 🚀 系统特点
|
||||
|
||||
### 核心优势
|
||||
- **多进程架构**: 输入输出进程完全隔离,无需设备重置
|
||||
- **零切换延迟**: 彻底解决传统单进程的音频切换问题
|
||||
- **实时响应**: 并行处理录音和播放,真正的实时体验
|
||||
- **智能检测**: 基于ZCR(零交叉率)的精确语音识别
|
||||
- **流式TTS**: 实时音频生成和播放,减少等待时间
|
||||
- **角色扮演**: 支持多种AI角色和音色
|
||||
|
||||
### 技术架构
|
||||
```
|
||||
主控制进程 ──┐
|
||||
├─ 输入进程 (录音 + 语音检测)
|
||||
├─ 输出进程 (音频播放)
|
||||
└─ 在线AI服务 (STT + LLM + TTS)
|
||||
```
|
||||
|
||||
## 📦 文件结构
|
||||
|
||||
```
|
||||
Local-Voice/
|
||||
├── recorder.py # 原始实现 (保留作为参考)
|
||||
├── multiprocess_recorder.py # 主程序
|
||||
├── audio_processes.py # 音频进程模块
|
||||
├── control_system.py # 控制系统模块
|
||||
├── config.json # 配置文件
|
||||
└── characters/ # 角色配置目录
|
||||
├── libai.json # 李白角色
|
||||
└── zhubajie.json # 猪八戒角色
|
||||
```
|
||||
|
||||
## 🛠️ 安装和运行
|
||||
|
||||
### 1. 环境要求
|
||||
- Python 3.7+
|
||||
- 音频输入设备 (麦克风)
|
||||
- 网络连接 (用于在线AI服务)
|
||||
|
||||
### 2. 安装依赖
|
||||
```bash
|
||||
pip install pyaudio numpy requests websockets
|
||||
```
|
||||
|
||||
### 3. 设置API密钥
|
||||
```bash
|
||||
export ARK_API_KEY='your_api_key_here'
|
||||
```
|
||||
|
||||
### 4. 基本运行
|
||||
```bash
|
||||
# 使用默认角色 (李白)
|
||||
python multiprocess_recorder.py
|
||||
|
||||
# 指定角色
|
||||
python multiprocess_recorder.py -c zhubajie
|
||||
|
||||
# 列出可用角色
|
||||
python multiprocess_recorder.py -l
|
||||
|
||||
# 使用配置文件
|
||||
python multiprocess_recorder.py --config config.json
|
||||
|
||||
# 创建示例配置文件
|
||||
python multiprocess_recorder.py --create-config
|
||||
```
|
||||
|
||||
## ⚙️ 配置说明
|
||||
|
||||
### 主要配置项
|
||||
|
||||
| 配置项 | 说明 | 默认值 |
|
||||
|--------|------|--------|
|
||||
| `recording.min_duration` | 最小录音时长(秒) | 2.0 |
|
||||
| `recording.max_duration` | 最大录音时长(秒) | 30.0 |
|
||||
| `recording.silence_threshold` | 静音检测阈值(秒) | 3.0 |
|
||||
| `detection.zcr_min` | ZCR最小值 | 2400 |
|
||||
| `detection.zcr_max` | ZCR最大值 | 12000 |
|
||||
| `processing.max_tokens` | LLM最大token数 | 50 |
|
||||
|
||||
### 音频参数
|
||||
- 采样率: 16kHz
|
||||
- 声道数: 1 (单声道)
|
||||
- 位深度: 16位
|
||||
- 格式: PCM
|
||||
|
||||
## 🎭 角色系统
|
||||
|
||||
### 支持的角色
|
||||
- **libai**: 李白 - 文雅诗人风格
|
||||
- **zhubajie**: <20>豬八戒 - 幽默风趣风格
|
||||
|
||||
### 自定义角色
|
||||
在 `characters/` 目录创建JSON文件:
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "角色名称",
|
||||
"description": "角色描述",
|
||||
"system_prompt": "系统提示词",
|
||||
"voice": "zh_female_wanqudashu_moon_bigtts",
|
||||
"max_tokens": 50
|
||||
}
|
||||
```
|
||||
|
||||
## 🔧 故障排除
|
||||
|
||||
### 常见问题
|
||||
|
||||
1. **音频设备问题**
|
||||
```bash
|
||||
# 检查音频设备
|
||||
python multiprocess_recorder.py --check-env
|
||||
```
|
||||
|
||||
2. **依赖缺失**
|
||||
```bash
|
||||
# 重新安装依赖
|
||||
pip install --upgrade pyaudio numpy requests websockets
|
||||
```
|
||||
|
||||
3. **网络连接问题**
|
||||
- 检查网络连接
|
||||
- 确认API密钥正确
|
||||
- 检查防火墙设置
|
||||
|
||||
4. **权限问题**
|
||||
```bash
|
||||
# Linux系统可能需要音频权限
|
||||
sudo usermod -a -G audio $USER
|
||||
```
|
||||
|
||||
### 调试模式
|
||||
```bash
|
||||
# 启用详细输出
|
||||
python multiprocess_recorder.py -v
|
||||
```
|
||||
|
||||
## 📊 性能对比
|
||||
|
||||
| 指标 | 原始单进程 | 多进程架构 | 改善 |
|
||||
|------|-----------|------------|------|
|
||||
| 切换延迟 | 1-2秒 | 0秒 | 100% |
|
||||
| CPU利用率 | 单核 | 多核 | 提升 |
|
||||
| 响应速度 | 较慢 | 实时 | 显著改善 |
|
||||
| 稳定性 | 一般 | 优秀 | 大幅提升 |
|
||||
|
||||
## 🔄 与原版本对比
|
||||
|
||||
### 原版本 (recorder.py)
|
||||
- 单进程处理
|
||||
- 需要频繁重置音频设备
|
||||
- 录音和播放不能同时进行
|
||||
- 切换延迟明显
|
||||
|
||||
### 新版本 (multiprocess_recorder.py)
|
||||
- 多进程架构
|
||||
- 输入输出完全隔离
|
||||
- 零切换延迟
|
||||
- 真正的并行处理
|
||||
- 更好的稳定性和扩展性
|
||||
|
||||
## 📝 开发说明
|
||||
|
||||
### 架构设计
|
||||
- **输入进程**: 专注录音和语音检测
|
||||
- **输出进程**: 专注音频播放
|
||||
- **主控制进程**: 协调整个系统和AI处理
|
||||
|
||||
### 进程间通信
|
||||
- 使用 `multiprocessing.Queue` 进行安全通信
|
||||
- 支持命令控制和事件通知
|
||||
- 线程安全的音频数据传输
|
||||
|
||||
### 状态管理
|
||||
- 清晰的状态机设计
|
||||
- 完善的错误处理机制
|
||||
- 优雅的进程退出流程
|
||||
|
||||
## 📄 许可证
|
||||
|
||||
本项目仅供学习和研究使用。
|
||||
|
||||
## 🤝 贡献
|
||||
|
||||
欢迎提交Issue和Pull Request来改进这个项目。
|
||||
527
audio_processes.py
Normal file
527
audio_processes.py
Normal file
@ -0,0 +1,527 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
"""
|
||||
多进程音频处理模块
|
||||
定义输入进程和输出进程的类
|
||||
"""
|
||||
|
||||
import multiprocessing as mp
|
||||
import queue
|
||||
import time
|
||||
import threading
|
||||
import numpy as np
|
||||
import pyaudio
|
||||
from enum import Enum
|
||||
from dataclasses import dataclass
|
||||
from typing import Optional, List, Dict, Any
|
||||
import json
|
||||
import wave
|
||||
import os
|
||||
|
||||
class RecordingState(Enum):
|
||||
"""录音状态枚举"""
|
||||
IDLE = "idle"
|
||||
RECORDING = "recording"
|
||||
PROCESSING = "processing"
|
||||
PLAYING = "playing"
|
||||
|
||||
@dataclass
|
||||
class AudioSegment:
|
||||
"""音频片段数据结构"""
|
||||
audio_data: bytes
|
||||
start_time: float
|
||||
end_time: float
|
||||
duration: float
|
||||
metadata: Dict[str, Any] = None
|
||||
|
||||
@dataclass
|
||||
class ControlCommand:
|
||||
"""控制命令数据结构"""
|
||||
command: str
|
||||
parameters: Dict[str, Any] = None
|
||||
|
||||
@dataclass
|
||||
class ProcessEvent:
|
||||
"""进程事件数据结构"""
|
||||
event_type: str
|
||||
data: Optional[bytes] = None
|
||||
metadata: Dict[str, Any] = None
|
||||
|
||||
class InputProcess:
|
||||
"""输入进程 - 专门负责录音和语音检测"""
|
||||
|
||||
def __init__(self, command_queue: mp.Queue, event_queue: mp.Queue, config: Dict[str, Any] = None):
|
||||
self.command_queue = command_queue # 主进程 → 输入进程
|
||||
self.event_queue = event_queue # 输入进程 → 主进程
|
||||
|
||||
# 配置参数
|
||||
self.config = config or self._get_default_config()
|
||||
|
||||
# 音频参数
|
||||
self.FORMAT = pyaudio.paInt16
|
||||
self.CHANNELS = 1
|
||||
self.RATE = 16000
|
||||
self.CHUNK_SIZE = 1024
|
||||
|
||||
# 状态控制
|
||||
self.recording_enabled = True # 是否允许录音
|
||||
self.is_recording = False # 是否正在录音
|
||||
self.recording_buffer = [] # 录音缓冲区
|
||||
self.pre_record_buffer = [] # 预录音缓冲区
|
||||
self.voice_detected = False
|
||||
self.silence_start_time = None
|
||||
self.recording_start_time = None
|
||||
|
||||
# ZCR检测参数
|
||||
self.zcr_history = []
|
||||
self.max_zcr_history = 50
|
||||
self.consecutive_silence_count = 0
|
||||
self.silence_threshold_count = 30 # 约3秒
|
||||
self.low_zcr_threshold_count = 20 # 连续低ZCR计数阈值
|
||||
self.consecutive_low_zcr_count = 0 # 连续低ZCR计数
|
||||
self.voice_activity_history = [] # 语音活动历史
|
||||
self.max_voice_history = 30 # 最大历史记录数
|
||||
|
||||
# 预录音参数
|
||||
self.pre_record_duration = 2.0
|
||||
self.pre_record_max_frames = int(self.pre_record_duration * self.RATE / self.CHUNK_SIZE)
|
||||
|
||||
# PyAudio实例
|
||||
self.audio = None
|
||||
self.input_stream = None
|
||||
|
||||
# 运行状态
|
||||
self.running = True
|
||||
|
||||
def _get_default_config(self) -> Dict[str, Any]:
|
||||
"""获取默认配置"""
|
||||
return {
|
||||
'zcr_min': 2400, # 适应16kHz采样率的ZCR最小值
|
||||
'zcr_max': 12000, # 适应16kHz采样率的ZCR最大值
|
||||
'min_recording_time': 2.0, # 最小录音时间
|
||||
'max_recording_time': 30.0,
|
||||
'silence_threshold': 3.0,
|
||||
'pre_record_duration': 2.0
|
||||
}
|
||||
|
||||
def run(self):
|
||||
"""输入进程主循环"""
|
||||
print("🎙️ 输入进程启动")
|
||||
self._setup_audio()
|
||||
|
||||
try:
|
||||
while self.running:
|
||||
# 1. 检查主进程命令
|
||||
self._check_commands()
|
||||
|
||||
# 2. 如果允许录音,处理音频
|
||||
if self.recording_enabled:
|
||||
self._process_audio()
|
||||
|
||||
# 3. 短暂休眠,减少CPU占用
|
||||
time.sleep(0.01)
|
||||
|
||||
except KeyboardInterrupt:
|
||||
print("🎙️ 输入进程收到中断信号")
|
||||
except Exception as e:
|
||||
print(f"❌ 输入进程错误: {e}")
|
||||
finally:
|
||||
self._cleanup()
|
||||
print("🎙️ 输入进程退出")
|
||||
|
||||
def _setup_audio(self):
|
||||
"""设置音频输入设备"""
|
||||
try:
|
||||
self.audio = pyaudio.PyAudio()
|
||||
self.input_stream = self.audio.open(
|
||||
format=self.FORMAT,
|
||||
channels=self.CHANNELS,
|
||||
rate=self.RATE,
|
||||
input=True,
|
||||
frames_per_buffer=self.CHUNK_SIZE
|
||||
)
|
||||
print("🎙️ 输入进程:音频设备初始化成功")
|
||||
except Exception as e:
|
||||
print(f"❌ 输入进程音频设备初始化失败: {e}")
|
||||
raise
|
||||
|
||||
def _check_commands(self):
|
||||
"""检查主进程控制命令"""
|
||||
try:
|
||||
while True:
|
||||
command = self.command_queue.get_nowait()
|
||||
|
||||
if command.command == 'enable_recording':
|
||||
self.recording_enabled = True
|
||||
print("🎙️ 输入进程:录音功能已启用")
|
||||
|
||||
elif command.command == 'disable_recording':
|
||||
self.recording_enabled = False
|
||||
# 如果正在录音,立即停止并发送数据
|
||||
if self.is_recording:
|
||||
self._stop_recording()
|
||||
print("🎙️ 输入进程:录音功能已禁用")
|
||||
|
||||
elif command.command == 'shutdown':
|
||||
print("🎙️ 输入进程:收到关闭命令")
|
||||
self.running = False
|
||||
return
|
||||
|
||||
except queue.Empty:
|
||||
pass
|
||||
|
||||
def _process_audio(self):
|
||||
"""处理音频数据"""
|
||||
try:
|
||||
data = self.input_stream.read(self.CHUNK_SIZE, exception_on_overflow=False)
|
||||
if len(data) == 0:
|
||||
return
|
||||
|
||||
# 更新预录音缓冲区
|
||||
self._update_pre_record_buffer(data)
|
||||
|
||||
# ZCR语音检测
|
||||
zcr = self._calculate_zcr(data)
|
||||
|
||||
# 语音检测
|
||||
is_voice = self._is_voice_active(zcr)
|
||||
|
||||
if self.is_recording:
|
||||
# 录音模式
|
||||
self.recording_buffer.append(data)
|
||||
|
||||
# 静音检测
|
||||
if is_voice:
|
||||
self.silence_start_time = None
|
||||
self.consecutive_silence_count = 0
|
||||
self.consecutive_low_zcr_count = 0 # 重置低ZCR计数
|
||||
else:
|
||||
self.consecutive_silence_count += 1
|
||||
self.consecutive_low_zcr_count += 1
|
||||
if self.silence_start_time is None:
|
||||
self.silence_start_time = time.time()
|
||||
|
||||
# 检查是否应该停止录音
|
||||
recording_duration = time.time() - self.recording_start_time
|
||||
should_stop = False
|
||||
|
||||
# ZCR静音检测
|
||||
if (self.consecutive_low_zcr_count >= self.low_zcr_threshold_count and
|
||||
recording_duration >= self.config['min_recording_time']):
|
||||
should_stop = True
|
||||
print(f"🎙️ 输入进程:ZCR静音检测触发停止录音")
|
||||
|
||||
# 最大时间检测
|
||||
if recording_duration >= self.config['max_recording_time']:
|
||||
should_stop = True
|
||||
print(f"🎙️ 输入进程:达到最大录音时间")
|
||||
|
||||
if should_stop:
|
||||
self._stop_recording()
|
||||
|
||||
else:
|
||||
# 监听模式
|
||||
if is_voice:
|
||||
# 检测到语音,开始录音
|
||||
self._start_recording()
|
||||
else:
|
||||
# 显示监听状态
|
||||
buffer_usage = len(self.pre_record_buffer) / self.pre_record_max_frames * 100
|
||||
print(f"\r🎙️ 监听中... ZCR: {zcr:.0f} | 语音: {is_voice} | 缓冲: {buffer_usage:.0f}%", end='', flush=True)
|
||||
|
||||
except Exception as e:
|
||||
print(f"🎙️ 输入进程音频处理错误: {e}")
|
||||
|
||||
def _update_pre_record_buffer(self, audio_data: bytes):
|
||||
"""更新预录音缓冲区"""
|
||||
self.pre_record_buffer.append(audio_data)
|
||||
|
||||
# 保持缓冲区大小
|
||||
if len(self.pre_record_buffer) > self.pre_record_max_frames:
|
||||
self.pre_record_buffer.pop(0)
|
||||
|
||||
def _start_recording(self):
|
||||
"""开始录音"""
|
||||
if not self.recording_enabled:
|
||||
return
|
||||
|
||||
self.is_recording = True
|
||||
self.recording_buffer = []
|
||||
self.recording_start_time = time.time()
|
||||
self.silence_start_time = None
|
||||
self.consecutive_silence_count = 0
|
||||
self.consecutive_low_zcr_count = 0
|
||||
|
||||
# 将预录音缓冲区的内容添加到录音中
|
||||
self.recording_buffer.extend(self.pre_record_buffer)
|
||||
self.pre_record_buffer.clear()
|
||||
|
||||
print(f"🎙️ 输入进程:开始录音(包含预录音 {self.config['pre_record_duration']}秒)")
|
||||
|
||||
def _stop_recording(self):
|
||||
"""停止录音并发送数据"""
|
||||
if not self.is_recording:
|
||||
return
|
||||
|
||||
self.is_recording = False
|
||||
|
||||
# 合并录音数据
|
||||
if self.recording_buffer:
|
||||
audio_data = b''.join(self.recording_buffer)
|
||||
duration = len(audio_data) / (self.RATE * 2)
|
||||
|
||||
# 创建音频片段
|
||||
segment = AudioSegment(
|
||||
audio_data=audio_data,
|
||||
start_time=self.recording_start_time,
|
||||
end_time=time.time(),
|
||||
duration=duration,
|
||||
metadata={
|
||||
'sample_rate': self.RATE,
|
||||
'channels': self.CHANNELS,
|
||||
'format': self.FORMAT,
|
||||
'chunk_size': self.CHUNK_SIZE
|
||||
}
|
||||
)
|
||||
|
||||
# 保存录音文件(可选)
|
||||
filename = self._save_recording(audio_data)
|
||||
|
||||
# 发送给主进程
|
||||
self.event_queue.put(ProcessEvent(
|
||||
event_type='recording_complete',
|
||||
data=audio_data,
|
||||
metadata={
|
||||
'duration': duration,
|
||||
'start_time': self.recording_start_time,
|
||||
'filename': filename
|
||||
}
|
||||
))
|
||||
|
||||
print(f"📝 输入进程:录音完成,时长 {duration:.2f} 秒")
|
||||
|
||||
# 清空缓冲区
|
||||
self.recording_buffer = []
|
||||
self.pre_record_buffer = []
|
||||
|
||||
def _save_recording(self, audio_data: bytes) -> str:
|
||||
"""保存录音文件"""
|
||||
try:
|
||||
timestamp = time.strftime("%Y%m%d_%H%M%S")
|
||||
filename = f"recording_{timestamp}.wav"
|
||||
|
||||
with wave.open(filename, 'wb') as wf:
|
||||
wf.setnchannels(self.CHANNELS)
|
||||
wf.setsampwidth(self.audio.get_sample_size(self.FORMAT))
|
||||
wf.setframerate(self.RATE)
|
||||
wf.writeframes(audio_data)
|
||||
|
||||
print(f"💾 输入进程:录音已保存到 {filename}")
|
||||
return filename
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 输入进程保存录音失败: {e}")
|
||||
return None
|
||||
|
||||
def _calculate_zcr(self, audio_data: bytes) -> float:
|
||||
"""计算零交叉率"""
|
||||
if len(audio_data) == 0:
|
||||
return 0
|
||||
|
||||
audio_array = np.frombuffer(audio_data, dtype=np.int16)
|
||||
|
||||
# 计算零交叉次数
|
||||
zero_crossings = np.sum(np.diff(np.sign(audio_array)) != 0)
|
||||
|
||||
# 归一化到采样率
|
||||
zcr = zero_crossings / len(audio_array) * self.RATE
|
||||
|
||||
# 更新ZCR历史
|
||||
self.zcr_history.append(zcr)
|
||||
if len(self.zcr_history) > self.max_zcr_history:
|
||||
self.zcr_history.pop(0)
|
||||
|
||||
return zcr
|
||||
|
||||
def _is_voice_active(self, zcr: float) -> bool:
|
||||
"""基于ZCR判断是否为语音活动"""
|
||||
# 简单的ZCR范围检测,匹配recorder.py的实现
|
||||
return 2400 < zcr < 12000
|
||||
|
||||
def _cleanup(self):
|
||||
"""清理资源"""
|
||||
if self.input_stream:
|
||||
try:
|
||||
self.input_stream.stop_stream()
|
||||
self.input_stream.close()
|
||||
except:
|
||||
pass
|
||||
|
||||
if self.audio:
|
||||
try:
|
||||
self.audio.terminate()
|
||||
except:
|
||||
pass
|
||||
|
||||
class OutputProcess:
|
||||
"""输出进程 - 专门负责音频播放"""
|
||||
|
||||
def __init__(self, audio_queue: mp.Queue, config: Dict[str, Any] = None):
|
||||
self.audio_queue = audio_queue # 主进程 → 输出进程
|
||||
self.config = config or self._get_default_config()
|
||||
|
||||
# 音频播放参数
|
||||
self.FORMAT = pyaudio.paInt16
|
||||
self.CHANNELS = 1
|
||||
self.RATE = 16000
|
||||
self.CHUNK_SIZE = 512
|
||||
|
||||
# 播放状态
|
||||
self.is_playing = False
|
||||
self.playback_buffer = []
|
||||
self.total_chunks_played = 0
|
||||
self.total_audio_size = 0
|
||||
|
||||
# PyAudio实例
|
||||
self.audio = None
|
||||
self.output_stream = None
|
||||
|
||||
# 运行状态
|
||||
self.running = True
|
||||
|
||||
def _get_default_config(self) -> Dict[str, Any]:
|
||||
"""获取默认配置"""
|
||||
return {
|
||||
'buffer_size': 1000,
|
||||
'show_progress': True,
|
||||
'progress_interval': 100
|
||||
}
|
||||
|
||||
def run(self):
|
||||
"""输出进程主循环"""
|
||||
print("🔊 输出进程启动")
|
||||
self._setup_audio()
|
||||
|
||||
try:
|
||||
while self.running:
|
||||
# 处理音频队列
|
||||
self._process_audio_queue()
|
||||
|
||||
# 播放缓冲的音频
|
||||
self._play_audio()
|
||||
|
||||
# 显示播放进度
|
||||
self._show_progress()
|
||||
|
||||
time.sleep(0.001) # 极短休眠,确保流畅播放
|
||||
|
||||
except KeyboardInterrupt:
|
||||
print("🔊 输出进程收到中断信号")
|
||||
except Exception as e:
|
||||
print(f"❌ 输出进程错误: {e}")
|
||||
finally:
|
||||
self._cleanup()
|
||||
print("🔊 输出进程退出")
|
||||
|
||||
def _setup_audio(self):
|
||||
"""设置音频输出设备"""
|
||||
try:
|
||||
self.audio = pyaudio.PyAudio()
|
||||
self.output_stream = self.audio.open(
|
||||
format=self.FORMAT,
|
||||
channels=self.CHANNELS,
|
||||
rate=self.RATE,
|
||||
output=True,
|
||||
frames_per_buffer=self.CHUNK_SIZE
|
||||
)
|
||||
print("🔊 输出进程:音频设备初始化成功")
|
||||
except Exception as e:
|
||||
print(f"❌ 输出进程音频设备初始化失败: {e}")
|
||||
raise
|
||||
|
||||
def _process_audio_queue(self):
|
||||
"""处理来自主进程的音频数据"""
|
||||
try:
|
||||
while True:
|
||||
audio_data = self.audio_queue.get_nowait()
|
||||
|
||||
if audio_data is None:
|
||||
# 结束信号
|
||||
self._finish_playback()
|
||||
break
|
||||
|
||||
if isinstance(audio_data, str) and audio_data.startswith("METADATA:"):
|
||||
# 处理元数据
|
||||
metadata = audio_data[9:] # 移除 "METADATA:" 前缀
|
||||
print(f"📝 输出进程:播放元数据 {metadata}")
|
||||
continue
|
||||
|
||||
# 音频数据放入播放缓冲区
|
||||
self.playback_buffer.append(audio_data)
|
||||
if not self.is_playing:
|
||||
self.is_playing = True
|
||||
print("🔊 输出进程:开始播放音频")
|
||||
|
||||
except queue.Empty:
|
||||
pass
|
||||
|
||||
def _play_audio(self):
|
||||
"""播放音频数据"""
|
||||
if self.playback_buffer and self.output_stream:
|
||||
try:
|
||||
# 取出一块音频数据播放
|
||||
audio_chunk = self.playback_buffer.pop(0)
|
||||
if audio_chunk and len(audio_chunk) > 0:
|
||||
self.output_stream.write(audio_chunk)
|
||||
self.total_chunks_played += 1
|
||||
self.total_audio_size += len(audio_chunk)
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 输出进程播放错误: {e}")
|
||||
self.playback_buffer.clear()
|
||||
|
||||
def _show_progress(self):
|
||||
"""显示播放进度"""
|
||||
if (self.config['show_progress'] and
|
||||
self.total_chunks_played > 0 and
|
||||
self.total_chunks_played % self.config['progress_interval'] == 0):
|
||||
|
||||
progress = f"🔊 播放进度: {self.total_chunks_played} 块 | {self.total_audio_size / 1024:.1f} KB"
|
||||
print(f"\r{progress}", end='', flush=True)
|
||||
|
||||
def _finish_playback(self):
|
||||
"""完成播放"""
|
||||
self.is_playing = False
|
||||
self.playback_buffer.clear()
|
||||
|
||||
if self.total_chunks_played > 0:
|
||||
print(f"\n✅ 输出进程:播放完成,总计 {self.total_chunks_played} 块, {self.total_audio_size / 1024:.1f} KB")
|
||||
|
||||
# 重置统计
|
||||
self.total_chunks_played = 0
|
||||
self.total_audio_size = 0
|
||||
|
||||
# 通知主进程播放完成
|
||||
# 这里可以通过共享内存或另一个队列来实现
|
||||
# 暂时简化处理,由主进程通过队列大小判断
|
||||
|
||||
def _cleanup(self):
|
||||
"""清理资源"""
|
||||
if self.output_stream:
|
||||
try:
|
||||
self.output_stream.stop_stream()
|
||||
self.output_stream.close()
|
||||
except:
|
||||
pass
|
||||
|
||||
if self.audio:
|
||||
try:
|
||||
self.audio.terminate()
|
||||
except:
|
||||
pass
|
||||
|
||||
if __name__ == "__main__":
|
||||
# 测试代码
|
||||
print("音频进程模块测试")
|
||||
print("这个模块应该在多进程环境中运行")
|
||||
39
config.json
Normal file
39
config.json
Normal file
@ -0,0 +1,39 @@
|
||||
{
|
||||
"system": {
|
||||
"max_queue_size": 1000,
|
||||
"process_timeout": 30,
|
||||
"heartbeat_interval": 1.0,
|
||||
"log_level": "INFO"
|
||||
},
|
||||
"audio": {
|
||||
"sample_rate": 16000,
|
||||
"channels": 1,
|
||||
"chunk_size": 1024,
|
||||
"format": "paInt16"
|
||||
},
|
||||
"recording": {
|
||||
"min_duration": 3.0,
|
||||
"max_duration": 30.0,
|
||||
"silence_threshold": 3.0,
|
||||
"pre_record_duration": 2.0
|
||||
},
|
||||
"processing": {
|
||||
"enable_asr": true,
|
||||
"enable_llm": true,
|
||||
"enable_tts": true,
|
||||
"character": "libai",
|
||||
"max_tokens": 50
|
||||
},
|
||||
"detection": {
|
||||
"zcr_min": 2400,
|
||||
"zcr_max": 12000,
|
||||
"consecutive_silence_count": 30,
|
||||
"max_zcr_history": 50
|
||||
},
|
||||
"playback": {
|
||||
"buffer_size": 1000,
|
||||
"show_progress": true,
|
||||
"progress_interval": 100,
|
||||
"chunk_size": 512
|
||||
}
|
||||
}
|
||||
774
control_system.py
Normal file
774
control_system.py
Normal file
@ -0,0 +1,774 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
"""
|
||||
多进程音频控制系统
|
||||
实现主控制进程和状态管理
|
||||
"""
|
||||
|
||||
import multiprocessing as mp
|
||||
import queue
|
||||
import time
|
||||
import threading
|
||||
import requests
|
||||
import json
|
||||
import base64
|
||||
import gzip
|
||||
import uuid
|
||||
import asyncio
|
||||
import websockets
|
||||
from typing import Optional, Dict, Any, List
|
||||
from dataclasses import dataclass, asdict
|
||||
from enum import Enum
|
||||
import os
|
||||
import sys
|
||||
|
||||
from audio_processes import (
|
||||
InputProcess, OutputProcess,
|
||||
RecordingState, ControlCommand, ProcessEvent
|
||||
)
|
||||
|
||||
class ControlSystem:
|
||||
"""主控制系统"""
|
||||
|
||||
def __init__(self, config: Dict[str, Any] = None):
|
||||
self.config = config or self._get_default_config()
|
||||
|
||||
# 进程间通信
|
||||
self.input_command_queue = mp.Queue(maxsize=100) # 主进程 → 输入进程
|
||||
self.input_event_queue = mp.Queue(maxsize=100) # 输入进程 → 主进程
|
||||
self.output_audio_queue = mp.Queue(maxsize=1000) # 主进程 → 输出进程
|
||||
|
||||
# 进程
|
||||
self.input_process = None
|
||||
self.output_process = None
|
||||
|
||||
# 状态管理
|
||||
self.state = RecordingState.IDLE
|
||||
self.processing_complete = False
|
||||
self.playback_complete = False
|
||||
|
||||
# 当前处理的数据
|
||||
self.current_audio_data = None
|
||||
self.current_audio_metadata = None
|
||||
|
||||
# API配置
|
||||
self.api_config = self._setup_api_config()
|
||||
|
||||
# 统计信息
|
||||
self.stats = {
|
||||
'total_conversations': 0,
|
||||
'total_recording_time': 0,
|
||||
'successful_processing': 0,
|
||||
'failed_processing': 0
|
||||
}
|
||||
|
||||
# 运行状态
|
||||
self.running = True
|
||||
|
||||
# 检查依赖
|
||||
self._check_dependencies()
|
||||
|
||||
def _get_default_config(self) -> Dict[str, Any]:
|
||||
"""获取默认配置"""
|
||||
return {
|
||||
'system': {
|
||||
'max_queue_size': 1000,
|
||||
'process_timeout': 30,
|
||||
'heartbeat_interval': 1.0
|
||||
},
|
||||
'audio': {
|
||||
'sample_rate': 16000,
|
||||
'channels': 1,
|
||||
'chunk_size': 1024
|
||||
},
|
||||
'recording': {
|
||||
'min_duration': 2.0,
|
||||
'max_duration': 30.0,
|
||||
'silence_threshold': 3.0
|
||||
},
|
||||
'processing': {
|
||||
'enable_asr': True,
|
||||
'enable_llm': True,
|
||||
'enable_tts': True,
|
||||
'character': 'libai'
|
||||
}
|
||||
}
|
||||
|
||||
def _setup_api_config(self) -> Dict[str, Any]:
|
||||
"""设置API配置"""
|
||||
config = {
|
||||
'asr': {
|
||||
'appid': "8718217928",
|
||||
'token': "ynJMX-5ix1FsJvswC9KTNlGUdubcchqc",
|
||||
'cluster': "volcengine_input_common",
|
||||
'ws_url': "wss://openspeech.bytedance.com/api/v2/asr"
|
||||
},
|
||||
'llm': {
|
||||
'api_url': "https://ark.cn-beijing.volces.com/api/v3/chat/completions",
|
||||
'model': "doubao-seed-1-6-flash-250828",
|
||||
'api_key': os.environ.get("ARK_API_KEY", ""),
|
||||
'max_tokens': 50
|
||||
},
|
||||
'tts': {
|
||||
'url': "https://openspeech.bytedance.com/api/v3/tts/unidirectional",
|
||||
'app_id': "8718217928",
|
||||
'access_key': "ynJMX-5ix1FsJvswC9KTNlGUdubcchqc",
|
||||
'resource_id': "volc.service_type.10029",
|
||||
'app_key': "aGjiRDfUWi",
|
||||
'speaker': "zh_female_wanqudashu_moon_bigtts"
|
||||
}
|
||||
}
|
||||
|
||||
# 加载角色配置
|
||||
character_config = self._load_character_config(self.config['processing']['character'])
|
||||
if character_config and "voice" in character_config:
|
||||
config['tts']['speaker'] = character_config["voice"]
|
||||
|
||||
return config
|
||||
|
||||
def _load_character_config(self, character_name: str) -> Optional[Dict[str, Any]]:
|
||||
"""加载角色配置"""
|
||||
characters_dir = os.path.join(os.path.dirname(__file__), "characters")
|
||||
config_file = os.path.join(characters_dir, f"{character_name}.json")
|
||||
|
||||
if not os.path.exists(config_file):
|
||||
print(f"⚠️ 角色配置文件不存在: {config_file}")
|
||||
return None
|
||||
|
||||
try:
|
||||
with open(config_file, 'r', encoding='utf-8') as f:
|
||||
config = json.load(f)
|
||||
|
||||
print(f"✅ 加载角色: {config.get('name', character_name)}")
|
||||
return config
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 加载角色配置失败: {e}")
|
||||
return None
|
||||
|
||||
def _check_dependencies(self):
|
||||
"""检查依赖库"""
|
||||
missing_deps = []
|
||||
|
||||
try:
|
||||
import pyaudio
|
||||
except ImportError:
|
||||
missing_deps.append("pyaudio")
|
||||
|
||||
try:
|
||||
import numpy
|
||||
except ImportError:
|
||||
missing_deps.append("numpy")
|
||||
|
||||
try:
|
||||
import requests
|
||||
except ImportError:
|
||||
missing_deps.append("requests")
|
||||
|
||||
try:
|
||||
import websockets
|
||||
except ImportError:
|
||||
missing_deps.append("websockets")
|
||||
|
||||
if missing_deps:
|
||||
print(f"❌ 缺少依赖库: {', '.join(missing_deps)}")
|
||||
print("请安装: pip install " + " ".join(missing_deps))
|
||||
sys.exit(1)
|
||||
|
||||
# 检查API密钥
|
||||
if not self.api_config['llm']['api_key']:
|
||||
print("⚠️ 未设置 ARK_API_KEY 环境变量,大语言模型功能将被禁用")
|
||||
self.config['processing']['enable_llm'] = False
|
||||
|
||||
def start(self):
|
||||
"""启动系统"""
|
||||
print("🚀 启动多进程音频控制系统")
|
||||
print("=" * 60)
|
||||
|
||||
# 创建并启动输入进程
|
||||
input_config = {
|
||||
'zcr_min': 2400,
|
||||
'zcr_max': 12000,
|
||||
'min_recording_time': self.config['recording']['min_duration'],
|
||||
'max_recording_time': self.config['recording']['max_duration'],
|
||||
'silence_threshold': self.config['recording']['silence_threshold'],
|
||||
'pre_record_duration': 2.0
|
||||
}
|
||||
|
||||
self.input_process = mp.Process(
|
||||
target=InputProcess(
|
||||
self.input_command_queue,
|
||||
self.input_event_queue,
|
||||
input_config
|
||||
).run
|
||||
)
|
||||
|
||||
# 创建并启动输出进程
|
||||
output_config = {
|
||||
'buffer_size': 1000,
|
||||
'show_progress': True,
|
||||
'progress_interval': 100
|
||||
}
|
||||
|
||||
self.output_process = mp.Process(
|
||||
target=OutputProcess(
|
||||
self.output_audio_queue,
|
||||
output_config
|
||||
).run
|
||||
)
|
||||
|
||||
# 启动进程
|
||||
self.input_process.start()
|
||||
self.output_process.start()
|
||||
|
||||
print("✅ 所有进程已启动")
|
||||
print("🎙️ 输入进程:负责录音和语音检测")
|
||||
print("🔊 输出进程:负责音频播放")
|
||||
print("🎯 主控制:负责协调和AI处理")
|
||||
print("=" * 60)
|
||||
|
||||
# 启动主控制循环
|
||||
self._control_loop()
|
||||
|
||||
def _control_loop(self):
|
||||
"""主控制循环"""
|
||||
print("🎯 主控制循环启动")
|
||||
|
||||
try:
|
||||
while self.running:
|
||||
# 根据状态处理不同逻辑
|
||||
if self.state == RecordingState.IDLE:
|
||||
self._handle_idle_state()
|
||||
|
||||
elif self.state == RecordingState.RECORDING:
|
||||
self._handle_recording_state()
|
||||
|
||||
elif self.state == RecordingState.PROCESSING:
|
||||
self._handle_processing_state()
|
||||
|
||||
elif self.state == RecordingState.PLAYING:
|
||||
self._handle_playing_state()
|
||||
|
||||
# 检查进程事件
|
||||
self._check_events()
|
||||
|
||||
# 显示状态
|
||||
self._display_status()
|
||||
|
||||
# 控制循环频率
|
||||
time.sleep(0.1)
|
||||
|
||||
except KeyboardInterrupt:
|
||||
print("\n👋 收到退出信号...")
|
||||
self.shutdown()
|
||||
except Exception as e:
|
||||
print(f"❌ 主控制循环错误: {e}")
|
||||
self.shutdown()
|
||||
|
||||
def _handle_idle_state(self):
|
||||
"""处理空闲状态"""
|
||||
if self.state == RecordingState.IDLE:
|
||||
# 启用输入进程录音功能
|
||||
self.input_command_queue.put(ControlCommand('enable_recording'))
|
||||
self.state = RecordingState.RECORDING
|
||||
print("🎯 状态:IDLE → RECORDING")
|
||||
|
||||
def _handle_recording_state(self):
|
||||
"""处理录音状态"""
|
||||
# 等待输入进程发送录音完成事件
|
||||
pass
|
||||
|
||||
def _handle_processing_state(self):
|
||||
"""处理状态"""
|
||||
if not self.processing_complete:
|
||||
self._process_audio_pipeline()
|
||||
|
||||
def _handle_playing_state(self):
|
||||
"""处理播放状态"""
|
||||
# 检查播放是否完成
|
||||
if self.output_audio_queue.qsize() == 0 and not self.playback_complete:
|
||||
# 等待一小段时间确保播放完成
|
||||
time.sleep(0.5)
|
||||
if self.output_audio_queue.qsize() == 0:
|
||||
self.playback_complete = True
|
||||
self.stats['total_conversations'] += 1
|
||||
|
||||
def _check_events(self):
|
||||
"""检查进程事件"""
|
||||
# 检查输入进程事件
|
||||
try:
|
||||
while True:
|
||||
event = self.input_event_queue.get_nowait()
|
||||
|
||||
if event.event_type == 'recording_complete':
|
||||
print("📡 主控制:收到录音完成事件")
|
||||
self._handle_recording_complete(event)
|
||||
|
||||
except queue.Empty:
|
||||
pass
|
||||
|
||||
def _handle_recording_complete(self, event: ProcessEvent):
|
||||
"""处理录音完成事件"""
|
||||
# 禁用输入进程录音功能
|
||||
self.input_command_queue.put(ControlCommand('disable_recording'))
|
||||
|
||||
# 保存录音数据
|
||||
self.current_audio_data = event.data
|
||||
self.current_audio_metadata = event.metadata
|
||||
|
||||
# 更新统计
|
||||
self.stats['total_recording_time'] += event.metadata['duration']
|
||||
|
||||
# 切换到处理状态
|
||||
self.state = RecordingState.PROCESSING
|
||||
self.processing_complete = False
|
||||
self.playback_complete = False
|
||||
|
||||
print(f"🎯 状态:RECORDING → PROCESSING (时长: {event.metadata['duration']:.2f}s)")
|
||||
|
||||
def _process_audio_pipeline(self):
|
||||
"""处理音频流水线:STT + LLM + TTS"""
|
||||
try:
|
||||
print("🤖 开始处理音频流水线")
|
||||
|
||||
# 1. 语音识别 (STT)
|
||||
if self.config['processing']['enable_asr']:
|
||||
text = self._speech_to_text(self.current_audio_data)
|
||||
if not text:
|
||||
print("❌ 语音识别失败")
|
||||
self._handle_processing_failure()
|
||||
return
|
||||
|
||||
print(f"📝 识别结果: {text}")
|
||||
else:
|
||||
text = "语音识别功能已禁用"
|
||||
|
||||
# 2. 大语言模型 (LLM)
|
||||
if self.config['processing']['enable_llm']:
|
||||
response = self._call_llm(text)
|
||||
if not response:
|
||||
print("❌ 大语言模型调用失败")
|
||||
self._handle_processing_failure()
|
||||
return
|
||||
|
||||
print(f"💬 AI回复: {response}")
|
||||
else:
|
||||
response = "大语言模型功能已禁用"
|
||||
|
||||
# 3. 文本转语音 (TTS)
|
||||
if self.config['processing']['enable_tts']:
|
||||
success = self._text_to_speech_streaming(response)
|
||||
if not success:
|
||||
print("❌ 文本转语音失败")
|
||||
self._handle_processing_failure()
|
||||
return
|
||||
else:
|
||||
print("ℹ️ 文本转语音功能已禁用")
|
||||
# 直接发送结束信号
|
||||
self.output_audio_queue.put(None)
|
||||
|
||||
# 标记处理完成
|
||||
self.processing_complete = True
|
||||
self.state = RecordingState.PLAYING
|
||||
self.stats['successful_processing'] += 1
|
||||
|
||||
print("🎯 状态:PROCESSING → PLAYING")
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 处理流水线错误: {e}")
|
||||
self._handle_processing_failure()
|
||||
|
||||
def _handle_processing_failure(self):
|
||||
"""处理失败情况"""
|
||||
self.stats['failed_processing'] += 1
|
||||
self.state = RecordingState.IDLE
|
||||
self.processing_complete = True
|
||||
self.playback_complete = True
|
||||
print("🎯 状态:PROCESSING → IDLE (失败)")
|
||||
|
||||
def _speech_to_text(self, audio_data: bytes) -> Optional[str]:
|
||||
"""语音转文字"""
|
||||
try:
|
||||
return asyncio.run(self._recognize_audio_async(audio_data))
|
||||
except Exception as e:
|
||||
print(f"❌ 语音识别异常: {e}")
|
||||
return None
|
||||
|
||||
async def _recognize_audio_async(self, audio_data: bytes) -> Optional[str]:
|
||||
"""异步语音识别"""
|
||||
if not self.config['processing']['enable_asr']:
|
||||
return "语音识别功能已禁用"
|
||||
|
||||
try:
|
||||
import websockets
|
||||
|
||||
# 生成ASR头部
|
||||
def generate_asr_header(message_type=1, message_type_specific_flags=0):
|
||||
PROTOCOL_VERSION = 0b0001
|
||||
DEFAULT_HEADER_SIZE = 0b0001
|
||||
JSON = 0b0001
|
||||
GZIP = 0b0001
|
||||
|
||||
header = bytearray()
|
||||
header.append((PROTOCOL_VERSION << 4) | DEFAULT_HEADER_SIZE)
|
||||
header.append((message_type << 4) | message_type_specific_flags)
|
||||
header.append((JSON << 4) | GZIP)
|
||||
header.append(0x00) # reserved
|
||||
return header
|
||||
|
||||
# 解析ASR响应
|
||||
def parse_asr_response(res):
|
||||
# 简化的响应解析
|
||||
if len(res) < 8:
|
||||
return {}
|
||||
|
||||
message_type = res[1] >> 4
|
||||
payload_size = int.from_bytes(res[4:8], "big", signed=False)
|
||||
payload_msg = res[8:8+payload_size]
|
||||
|
||||
if message_type == 0b1001: # SERVER_FULL_RESPONSE
|
||||
try:
|
||||
if payload_msg.startswith(b'{'):
|
||||
result = json.loads(payload_msg.decode('utf-8'))
|
||||
return result
|
||||
except:
|
||||
pass
|
||||
|
||||
return {}
|
||||
|
||||
# 构建请求参数
|
||||
reqid = str(uuid.uuid4())
|
||||
request_params = {
|
||||
'app': {
|
||||
'appid': self.api_config['asr']['appid'],
|
||||
'cluster': self.api_config['asr']['cluster'],
|
||||
'token': self.api_config['asr']['token'],
|
||||
},
|
||||
'user': {
|
||||
'uid': 'multiprocess_asr'
|
||||
},
|
||||
'request': {
|
||||
'reqid': reqid,
|
||||
'nbest': 1,
|
||||
'workflow': 'audio_in,resample,partition,vad,fe,decode,itn,nlu_punctuate',
|
||||
'show_language': False,
|
||||
'show_utterances': False,
|
||||
'result_type': 'full',
|
||||
"sequence": 1
|
||||
},
|
||||
'audio': {
|
||||
'format': 'wav',
|
||||
'rate': self.config['audio']['sample_rate'],
|
||||
'language': 'zh-CN',
|
||||
'bits': 16,
|
||||
'channel': self.config['audio']['channels'],
|
||||
'codec': 'raw'
|
||||
}
|
||||
}
|
||||
|
||||
# 构建请求
|
||||
payload_bytes = str.encode(json.dumps(request_params))
|
||||
payload_bytes = gzip.compress(payload_bytes)
|
||||
full_client_request = bytearray(generate_asr_header())
|
||||
full_client_request.extend((len(payload_bytes)).to_bytes(4, 'big'))
|
||||
full_client_request.extend(payload_bytes)
|
||||
|
||||
# 设置认证头
|
||||
additional_headers = {'Authorization': 'Bearer; {}'.format(self.api_config['asr']['token'])}
|
||||
|
||||
# 连接WebSocket
|
||||
async with websockets.connect(
|
||||
self.api_config['asr']['ws_url'],
|
||||
additional_headers=additional_headers,
|
||||
max_size=1000000000
|
||||
) as ws:
|
||||
# 发送请求
|
||||
await ws.send(full_client_request)
|
||||
res = await ws.recv()
|
||||
result = parse_asr_response(res)
|
||||
|
||||
# 发送音频数据
|
||||
chunk_size = int(self.config['audio']['channels'] * 2 *
|
||||
self.config['audio']['sample_rate'] * 15000 / 1000)
|
||||
|
||||
for offset in range(0, len(audio_data), chunk_size):
|
||||
chunk = audio_data[offset:offset + chunk_size]
|
||||
last = (offset + chunk_size) >= len(audio_data)
|
||||
|
||||
payload_bytes = gzip.compress(chunk)
|
||||
audio_only_request = bytearray(
|
||||
generate_asr_header(
|
||||
message_type=0b0010,
|
||||
message_type_specific_flags=0b0010 if last else 0
|
||||
)
|
||||
)
|
||||
audio_only_request.extend((len(payload_bytes)).to_bytes(4, 'big'))
|
||||
audio_only_request.extend(payload_bytes)
|
||||
|
||||
await ws.send(audio_only_request)
|
||||
res = await ws.recv()
|
||||
result = parse_asr_response(res)
|
||||
|
||||
# 获取最终结果
|
||||
if 'payload_msg' in result and 'result' in result['payload_msg']:
|
||||
results = result['payload_msg']['result']
|
||||
if results:
|
||||
return results[0].get('text', '识别失败')
|
||||
|
||||
return None
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 语音识别失败: {e}")
|
||||
return None
|
||||
|
||||
def _call_llm(self, text: str) -> Optional[str]:
|
||||
"""调用大语言模型"""
|
||||
if not self.config['processing']['enable_llm']:
|
||||
return "大语言模型功能已禁用"
|
||||
|
||||
try:
|
||||
# 获取角色配置
|
||||
character_config = self._load_character_config(self.config['processing']['character'])
|
||||
if character_config and "system_prompt" in character_config:
|
||||
system_prompt = character_config["system_prompt"]
|
||||
else:
|
||||
system_prompt = "你是一个智能助手,请根据用户的语音输入提供有帮助的回答。保持回答简洁明了。"
|
||||
|
||||
# 构建请求
|
||||
headers = {
|
||||
"Content-Type": "application/json",
|
||||
"Authorization": f"Bearer {self.api_config['llm']['api_key']}"
|
||||
}
|
||||
|
||||
messages = [
|
||||
{"role": "system", "content": system_prompt},
|
||||
{"role": "user", "content": text}
|
||||
]
|
||||
|
||||
data = {
|
||||
"model": self.api_config['llm']['model'],
|
||||
"messages": messages,
|
||||
"max_tokens": self.api_config['llm']['max_tokens'],
|
||||
"stream": False # 非流式,简化实现
|
||||
}
|
||||
|
||||
response = requests.post(
|
||||
self.api_config['llm']['api_url'],
|
||||
headers=headers,
|
||||
json=data,
|
||||
timeout=30
|
||||
)
|
||||
|
||||
if response.status_code == 200:
|
||||
result = response.json()
|
||||
if 'choices' in result and len(result['choices']) > 0:
|
||||
content = result['choices'][0]['message']['content']
|
||||
return content.strip()
|
||||
|
||||
print(f"❌ LLM API调用失败: {response.status_code}")
|
||||
return None
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 大语言模型调用失败: {e}")
|
||||
return None
|
||||
|
||||
def _text_to_speech_streaming(self, text: str) -> bool:
|
||||
"""文本转语音(流式)"""
|
||||
if not self.config['processing']['enable_tts']:
|
||||
return False
|
||||
|
||||
try:
|
||||
print("🎵 开始文本转语音")
|
||||
|
||||
# 发送元数据
|
||||
self.output_audio_queue.put(f"METADATA:{text[:30]}...")
|
||||
|
||||
# 构建请求头
|
||||
headers = {
|
||||
"X-Api-App-Id": self.api_config['tts']['app_id'],
|
||||
"X-Api-Access-Key": self.api_config['tts']['access_key'],
|
||||
"X-Api-Resource-Id": self.api_config['tts']['resource_id'],
|
||||
"X-Api-App-Key": self.api_config['tts']['app_key'],
|
||||
"Content-Type": "application/json",
|
||||
"Connection": "keep-alive"
|
||||
}
|
||||
|
||||
# 构建请求参数
|
||||
payload = {
|
||||
"user": {
|
||||
"uid": "multiprocess_tts"
|
||||
},
|
||||
"req_params": {
|
||||
"text": text,
|
||||
"speaker": self.api_config['tts']['speaker'],
|
||||
"audio_params": {
|
||||
"format": "pcm",
|
||||
"sample_rate": self.config['audio']['sample_rate'],
|
||||
"enable_timestamp": True
|
||||
},
|
||||
"additions": "{\"explicit_language\":\"zh\",\"disable_markdown_filter\":true, \"enable_timestamp\":true}\"}"
|
||||
}
|
||||
}
|
||||
|
||||
# 发送请求
|
||||
session = requests.Session()
|
||||
try:
|
||||
response = session.post(
|
||||
self.api_config['tts']['url'],
|
||||
headers=headers,
|
||||
json=payload,
|
||||
stream=True
|
||||
)
|
||||
|
||||
if response.status_code != 200:
|
||||
print(f"❌ TTS请求失败: {response.status_code}")
|
||||
return False
|
||||
|
||||
# 处理流式响应
|
||||
total_audio_size = 0
|
||||
chunk_count = 0
|
||||
|
||||
for chunk in response.iter_lines(decode_unicode=True):
|
||||
if not chunk:
|
||||
continue
|
||||
|
||||
try:
|
||||
data = json.loads(chunk)
|
||||
|
||||
if data.get("code", 0) == 0 and "data" in data and data["data"]:
|
||||
chunk_audio = base64.b64decode(data["data"])
|
||||
audio_size = len(chunk_audio)
|
||||
total_audio_size += audio_size
|
||||
chunk_count += 1
|
||||
|
||||
# 发送到输出进程
|
||||
self.output_audio_queue.put(chunk_audio)
|
||||
|
||||
# 显示进度
|
||||
if chunk_count % 10 == 0:
|
||||
progress = f"📥 TTS生成: {chunk_count} 块 | {total_audio_size / 1024:.1f} KB"
|
||||
print(f"\r{progress}", end='', flush=True)
|
||||
|
||||
if data.get("code", 0) == 20000000:
|
||||
break
|
||||
|
||||
except json.JSONDecodeError:
|
||||
continue
|
||||
|
||||
print(f"\n✅ TTS音频生成完成: {chunk_count} 块, {total_audio_size / 1024:.1f} KB")
|
||||
|
||||
# 发送结束信号
|
||||
self.output_audio_queue.put(None)
|
||||
|
||||
return chunk_count > 0
|
||||
|
||||
finally:
|
||||
response.close()
|
||||
session.close()
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 文本转语音失败: {e}")
|
||||
return False
|
||||
|
||||
def _display_status(self):
|
||||
"""显示系统状态"""
|
||||
# 每秒显示一次状态
|
||||
if hasattr(self, '_last_status_time'):
|
||||
if time.time() - self._last_status_time < 1.0:
|
||||
return
|
||||
|
||||
self._last_status_time = time.time()
|
||||
|
||||
# 状态显示
|
||||
status_lines = [
|
||||
f"🎯 状态: {self.state.value}",
|
||||
f"📊 统计: 对话{self.stats['total_conversations']} | "
|
||||
f"录音{self.stats['total_recording_time']:.1f}s | "
|
||||
f"成功{self.stats['successful_processing']} | "
|
||||
f"失败{self.stats['failed_processing']}"
|
||||
]
|
||||
|
||||
# 队列状态
|
||||
input_queue_size = self.input_command_queue.qsize()
|
||||
output_queue_size = self.output_audio_queue.qsize()
|
||||
|
||||
if input_queue_size > 0 or output_queue_size > 0:
|
||||
status_lines.append(f"📦 队列: 输入{input_queue_size} | 输出{output_queue_size}")
|
||||
|
||||
# 显示状态
|
||||
status_str = " | ".join(status_lines)
|
||||
print(f"\r{status_str}", end='', flush=True)
|
||||
|
||||
def shutdown(self):
|
||||
"""关闭系统"""
|
||||
print("\n🛑 正在关闭系统...")
|
||||
|
||||
self.running = False
|
||||
|
||||
# 发送关闭命令
|
||||
try:
|
||||
self.input_command_queue.put(ControlCommand('shutdown'))
|
||||
self.output_audio_queue.put(None)
|
||||
except:
|
||||
pass
|
||||
|
||||
# 等待进程结束
|
||||
if self.input_process:
|
||||
try:
|
||||
self.input_process.join(timeout=5)
|
||||
except:
|
||||
pass
|
||||
|
||||
if self.output_process:
|
||||
try:
|
||||
self.output_process.join(timeout=5)
|
||||
except:
|
||||
pass
|
||||
|
||||
# 显示最终统计
|
||||
print("\n📊 最终统计:")
|
||||
print(f" 总对话次数: {self.stats['total_conversations']}")
|
||||
print(f" 总录音时长: {self.stats['total_recording_time']:.1f} 秒")
|
||||
print(f" 成功处理: {self.stats['successful_processing']}")
|
||||
print(f" 失败处理: {self.stats['failed_processing']}")
|
||||
|
||||
success_rate = (self.stats['successful_processing'] /
|
||||
max(1, self.stats['successful_processing'] + self.stats['failed_processing']) * 100)
|
||||
print(f" 成功率: {success_rate:.1f}%")
|
||||
|
||||
print("👋 系统已关闭")
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
import argparse
|
||||
|
||||
parser = argparse.ArgumentParser(description='多进程音频控制系统')
|
||||
parser.add_argument('--character', '-c', type=str, default='libai',
|
||||
help='选择角色 (默认: libai)')
|
||||
parser.add_argument('--config', type=str,
|
||||
help='配置文件路径')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# 加载配置
|
||||
config = None
|
||||
if args.config:
|
||||
try:
|
||||
with open(args.config, 'r', encoding='utf-8') as f:
|
||||
config = json.load(f)
|
||||
except Exception as e:
|
||||
print(f"⚠️ 配置文件加载失败: {e}")
|
||||
|
||||
# 创建控制系统
|
||||
control_system = ControlSystem(config)
|
||||
|
||||
# 设置角色
|
||||
if args.character:
|
||||
control_system.config['processing']['character'] = args.character
|
||||
|
||||
# 启动系统
|
||||
control_system.start()
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
74
install.sh
74
install.sh
@ -1,74 +0,0 @@
|
||||
#!/bin/bash
|
||||
|
||||
# 智能语音助手系统安装脚本
|
||||
# 适用于树莓派和Linux系统
|
||||
|
||||
echo "🚀 智能语音助手系统 - 安装脚本"
|
||||
echo "================================"
|
||||
|
||||
# 检查是否为root用户
|
||||
if [ "$EUID" -eq 0 ]; then
|
||||
echo "⚠️ 请不要以root身份运行此脚本"
|
||||
echo " 建议使用普通用户: sudo ./install.sh"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# 更新包管理器
|
||||
echo "📦 更新包管理器..."
|
||||
sudo apt-get update
|
||||
|
||||
# 安装系统依赖
|
||||
echo "🔧 安装系统依赖..."
|
||||
sudo apt-get install -y \
|
||||
python3 \
|
||||
python3-pip \
|
||||
portaudio19-dev \
|
||||
python3-dev \
|
||||
alsa-utils
|
||||
|
||||
# 安装Python依赖
|
||||
echo "🐍 安装Python依赖..."
|
||||
pip3 install --user \
|
||||
websockets \
|
||||
requests \
|
||||
pyaudio \
|
||||
numpy
|
||||
|
||||
# 检查音频播放器
|
||||
echo "🔊 检查音频播放器..."
|
||||
if command -v aplay >/dev/null 2>&1; then
|
||||
echo "✅ aplay 已安装(支持PCM/WAV播放)"
|
||||
else
|
||||
echo "❌ aplay 安装失败"
|
||||
fi
|
||||
|
||||
# 检查Python模块
|
||||
echo "🧪 检查Python模块..."
|
||||
python3 -c "import websockets, requests, pyaudio, numpy" 2>/dev/null
|
||||
if [ $? -eq 0 ]; then
|
||||
echo "✅ 所有Python依赖已安装"
|
||||
else
|
||||
echo "❌ 部分Python依赖安装失败"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "✅ 安装完成!"
|
||||
echo ""
|
||||
echo "📋 使用说明:"
|
||||
echo "1. 设置API密钥(如需使用大语言模型):"
|
||||
echo " export ARK_API_KEY='your_api_key_here'"
|
||||
echo ""
|
||||
echo "2. 运行程序:"
|
||||
echo " python3 recorder.py"
|
||||
echo ""
|
||||
echo "3. 故障排除:"
|
||||
echo " - 如果遇到权限问题,请确保用户在audio组中:"
|
||||
echo " sudo usermod -a -G audio \$USER"
|
||||
echo " - 然后重新登录或重启系统"
|
||||
echo ""
|
||||
echo "🎯 系统功能:"
|
||||
echo "- 🎙️ 智能语音录制"
|
||||
echo "- 🤖 在线语音识别"
|
||||
echo "- 💬 AI智能对话"
|
||||
echo "- 🔊 语音回复合成"
|
||||
echo "- 📁 自动文件管理"
|
||||
305
multiprocess_recorder.py
Normal file
305
multiprocess_recorder.py
Normal file
@ -0,0 +1,305 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
"""
|
||||
多进程音频录音系统
|
||||
基于进程隔离的音频处理架构
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import argparse
|
||||
import json
|
||||
import time
|
||||
from typing import Dict, Any
|
||||
|
||||
def check_dependencies():
|
||||
"""检查系统依赖"""
|
||||
missing_deps = []
|
||||
|
||||
try:
|
||||
import pyaudio
|
||||
except ImportError:
|
||||
missing_deps.append("pyaudio")
|
||||
|
||||
try:
|
||||
import numpy
|
||||
except ImportError:
|
||||
missing_deps.append("numpy")
|
||||
|
||||
try:
|
||||
import requests
|
||||
except ImportError:
|
||||
missing_deps.append("requests")
|
||||
|
||||
try:
|
||||
import websockets
|
||||
except ImportError:
|
||||
missing_deps.append("websockets")
|
||||
|
||||
if missing_deps:
|
||||
print("❌ 缺少以下依赖库:")
|
||||
for dep in missing_deps:
|
||||
print(f" - {dep}")
|
||||
print("\n请运行以下命令安装:")
|
||||
print(f"pip install {' '.join(missing_deps)}")
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
def check_environment():
|
||||
"""检查运行环境"""
|
||||
print("🔍 检查运行环境...")
|
||||
|
||||
# 检查Python版本
|
||||
python_version = sys.version_info
|
||||
if python_version.major < 3 or (python_version.major == 3 and python_version.minor < 7):
|
||||
print(f"❌ Python版本过低: {python_version.major}.{python_version.minor}")
|
||||
print("需要Python 3.7或更高版本")
|
||||
return False
|
||||
|
||||
print(f"✅ Python版本: {python_version.major}.{python_version.minor}.{python_version.micro}")
|
||||
|
||||
# 检查操作系统
|
||||
import platform
|
||||
system = platform.system().lower()
|
||||
print(f"✅ 操作系统: {system}")
|
||||
|
||||
# 检查音频设备
|
||||
try:
|
||||
import pyaudio
|
||||
audio = pyaudio.PyAudio()
|
||||
device_count = audio.get_device_count()
|
||||
print(f"✅ 音频设备数量: {device_count}")
|
||||
|
||||
if device_count == 0:
|
||||
print("❌ 未检测到音频设备")
|
||||
return False
|
||||
|
||||
audio.terminate()
|
||||
except Exception as e:
|
||||
print(f"❌ 音频设备检查失败: {e}")
|
||||
return False
|
||||
|
||||
# 检查网络连接
|
||||
try:
|
||||
import requests
|
||||
response = requests.get("https://www.baidu.com", timeout=5)
|
||||
print("✅ 网络连接正常")
|
||||
except:
|
||||
print("⚠️ 网络连接可能有问题,会影响在线AI功能")
|
||||
|
||||
# 检查API密钥
|
||||
api_key = os.environ.get("ARK_API_KEY")
|
||||
if api_key:
|
||||
print("✅ ARK_API_KEY 已设置")
|
||||
else:
|
||||
print("⚠️ ARK_API_KEY 未设置,大语言模型功能将被禁用")
|
||||
print(" 请运行: export ARK_API_KEY='your_api_key_here'")
|
||||
|
||||
return True
|
||||
|
||||
def list_characters():
|
||||
"""列出可用角色"""
|
||||
characters_dir = os.path.join(os.path.dirname(__file__), "characters")
|
||||
|
||||
if not os.path.exists(characters_dir):
|
||||
print("❌ 角色目录不存在")
|
||||
return
|
||||
|
||||
characters = []
|
||||
for file in os.listdir(characters_dir):
|
||||
if file.endswith('.json'):
|
||||
character_name = file[:-5]
|
||||
config_file = os.path.join(characters_dir, file)
|
||||
|
||||
try:
|
||||
with open(config_file, 'r', encoding='utf-8') as f:
|
||||
config = json.load(f)
|
||||
name = config.get('name', character_name)
|
||||
desc = config.get('description', '无描述')
|
||||
characters.append(f"{character_name}: {name} - {desc}")
|
||||
except:
|
||||
characters.append(f"{character_name}: 配置文件读取失败")
|
||||
|
||||
if characters:
|
||||
print("🎭 可用角色列表:")
|
||||
for char in characters:
|
||||
print(f" - {char}")
|
||||
else:
|
||||
print("❌ 未找到任何角色配置文件")
|
||||
|
||||
def create_sample_config():
|
||||
"""创建示例配置文件"""
|
||||
config = {
|
||||
"system": {
|
||||
"max_queue_size": 1000,
|
||||
"process_timeout": 30,
|
||||
"heartbeat_interval": 1.0,
|
||||
"log_level": "INFO"
|
||||
},
|
||||
"audio": {
|
||||
"sample_rate": 16000,
|
||||
"channels": 1,
|
||||
"chunk_size": 1024,
|
||||
"format": "paInt16"
|
||||
},
|
||||
"recording": {
|
||||
"min_duration": 2.0,
|
||||
"max_duration": 30.0,
|
||||
"silence_threshold": 3.0,
|
||||
"pre_record_duration": 2.0
|
||||
},
|
||||
"processing": {
|
||||
"enable_asr": True,
|
||||
"enable_llm": True,
|
||||
"enable_tts": True,
|
||||
"character": "libai",
|
||||
"max_tokens": 50
|
||||
},
|
||||
"detection": {
|
||||
"zcr_min": 2400,
|
||||
"zcr_max": 12000,
|
||||
"consecutive_silence_count": 30,
|
||||
"max_zcr_history": 30
|
||||
},
|
||||
"playback": {
|
||||
"buffer_size": 1000,
|
||||
"show_progress": True,
|
||||
"progress_interval": 100,
|
||||
"chunk_size": 512
|
||||
}
|
||||
}
|
||||
|
||||
config_file = "config.json"
|
||||
try:
|
||||
with open(config_file, 'w', encoding='utf-8') as f:
|
||||
json.dump(config, f, indent=2, ensure_ascii=False)
|
||||
print(f"✅ 示例配置文件已创建: {config_file}")
|
||||
except Exception as e:
|
||||
print(f"❌ 创建配置文件失败: {e}")
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
parser = argparse.ArgumentParser(
|
||||
description='多进程音频录音系统',
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
使用示例:
|
||||
python multiprocess_recorder.py # 使用默认角色
|
||||
python multiprocess_recorder.py -c zhubajie # 指定角色
|
||||
python multiprocess_recorder.py -l # 列出角色
|
||||
python multiprocess_recorder.py --create-config # 创建配置文件
|
||||
"""
|
||||
)
|
||||
|
||||
parser.add_argument('--character', '-c', type=str, default='libai',
|
||||
help='选择角色 (默认: libai)')
|
||||
parser.add_argument('--list-characters', '-l', action='store_true',
|
||||
help='列出所有可用角色')
|
||||
parser.add_argument('--config', type=str,
|
||||
help='配置文件路径')
|
||||
parser.add_argument('--create-config', action='store_true',
|
||||
help='创建示例配置文件')
|
||||
parser.add_argument('--check-env', action='store_true',
|
||||
help='检查运行环境')
|
||||
parser.add_argument('--verbose', '-v', action='store_true',
|
||||
help='详细输出')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# 显示欢迎信息
|
||||
print("🚀 多进程音频录音系统")
|
||||
print("=" * 60)
|
||||
|
||||
# 检查依赖
|
||||
if not check_dependencies():
|
||||
sys.exit(1)
|
||||
|
||||
# 创建配置文件
|
||||
if args.create_config:
|
||||
create_sample_config()
|
||||
return
|
||||
|
||||
# 检查环境
|
||||
if args.check_env:
|
||||
check_environment()
|
||||
return
|
||||
|
||||
# 列出角色
|
||||
if args.list_characters:
|
||||
list_characters()
|
||||
return
|
||||
|
||||
# 检查characters目录
|
||||
characters_dir = os.path.join(os.path.dirname(__file__), "characters")
|
||||
if not os.path.exists(characters_dir):
|
||||
print(f"⚠️ 角色目录不存在: {characters_dir}")
|
||||
print("请确保characters目录存在并包含角色配置文件")
|
||||
|
||||
# 检查指定角色
|
||||
character_file = os.path.join(characters_dir, f"{args.character}.json")
|
||||
if not os.path.exists(character_file):
|
||||
print(f"⚠️ 角色文件不存在: {character_file}")
|
||||
print(f"可用角色:")
|
||||
list_characters()
|
||||
return
|
||||
|
||||
print(f"🎭 当前角色: {args.character}")
|
||||
print("🎯 系统特点:")
|
||||
print(" - 多进程架构:输入输出完全隔离")
|
||||
print(" - 零切换延迟:无需音频设备重置")
|
||||
print(" - 实时响应:并行处理录音和播放")
|
||||
print(" - 智能检测:基于ZCR的语音识别")
|
||||
print(" - 流式TTS:实时音频生成和播放")
|
||||
print(" - 角色扮演:支持多种AI角色")
|
||||
print("=" * 60)
|
||||
|
||||
# 显示使用说明
|
||||
print("📖 使用说明:")
|
||||
print(" - 检测到语音自动开始录音")
|
||||
print(" - 持续静音3秒自动结束录音")
|
||||
print(" - 录音完成后自动处理和播放")
|
||||
print(" - 按 Ctrl+C 退出")
|
||||
print("=" * 60)
|
||||
|
||||
# 加载配置
|
||||
config = None
|
||||
if args.config:
|
||||
try:
|
||||
with open(args.config, 'r', encoding='utf-8') as f:
|
||||
config = json.load(f)
|
||||
print(f"📋 加载配置文件: {args.config}")
|
||||
except Exception as e:
|
||||
print(f"⚠️ 配置文件加载失败: {e}")
|
||||
print("使用默认配置")
|
||||
|
||||
try:
|
||||
# 导入控制系统
|
||||
from control_system import ControlSystem
|
||||
|
||||
# 创建控制系统
|
||||
control_system = ControlSystem(config)
|
||||
|
||||
# 设置角色
|
||||
control_system.config['processing']['character'] = args.character
|
||||
|
||||
# 设置日志级别
|
||||
if args.verbose:
|
||||
control_system.config['system']['log_level'] = "DEBUG"
|
||||
|
||||
# 启动系统
|
||||
control_system.start()
|
||||
|
||||
except KeyboardInterrupt:
|
||||
print("\n👋 用户中断")
|
||||
except Exception as e:
|
||||
print(f"❌ 系统启动失败: {e}")
|
||||
if args.verbose:
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
finally:
|
||||
print("👋 系统退出")
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
123
quick_test.py
Normal file
123
quick_test.py
Normal file
@ -0,0 +1,123 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
"""
|
||||
快速测试脚本
|
||||
用于验证多进程录音系统的基础功能
|
||||
"""
|
||||
|
||||
import time
|
||||
import multiprocessing as mp
|
||||
from audio_processes import InputProcess, OutputProcess
|
||||
|
||||
def test_audio_processes():
|
||||
"""测试音频进程类"""
|
||||
print("🧪 测试音频进程类...")
|
||||
|
||||
# 创建测试队列
|
||||
command_queue = mp.Queue()
|
||||
event_queue = mp.Queue()
|
||||
audio_queue = mp.Queue()
|
||||
|
||||
# 创建进程配置
|
||||
config = {
|
||||
'zcr_min': 3000,
|
||||
'zcr_max': 10000,
|
||||
'min_recording_time': 3.0,
|
||||
'max_recording_time': 10.0, # 缩短测试时间
|
||||
'silence_threshold': 3.0,
|
||||
'pre_record_duration': 2.0,
|
||||
'voice_activation_threshold': 5, # 降低阈值便于测试
|
||||
'calibration_samples': 50, # 减少校准时间
|
||||
'adaptive_threshold': True
|
||||
}
|
||||
|
||||
# 创建输入进程
|
||||
input_process = InputProcess(command_queue, event_queue, config)
|
||||
|
||||
# 创建输出进程
|
||||
output_process = OutputProcess(audio_queue)
|
||||
|
||||
print("✅ 音频进程类创建成功")
|
||||
|
||||
# 测试配置加载
|
||||
print("📋 测试配置:")
|
||||
print(f" ZCR范围: {config['zcr_min']} - {config['zcr_max']}")
|
||||
print(f" 校准样本数: {config['calibration_samples']}")
|
||||
print(f" 语音激活阈值: {config['voice_activation_threshold']}")
|
||||
|
||||
return True
|
||||
|
||||
def test_dependencies():
|
||||
"""测试依赖库"""
|
||||
print("🔍 检查依赖库...")
|
||||
|
||||
dependencies = {
|
||||
'numpy': False,
|
||||
'pyaudio': False,
|
||||
'requests': False,
|
||||
'websockets': False
|
||||
}
|
||||
|
||||
try:
|
||||
import numpy
|
||||
dependencies['numpy'] = True
|
||||
print("✅ numpy")
|
||||
except ImportError:
|
||||
print("❌ numpy")
|
||||
|
||||
try:
|
||||
import pyaudio
|
||||
dependencies['pyaudio'] = True
|
||||
print("✅ pyaudio")
|
||||
except ImportError:
|
||||
print("❌ pyaudio")
|
||||
|
||||
try:
|
||||
import requests
|
||||
dependencies['requests'] = True
|
||||
print("✅ requests")
|
||||
except ImportError:
|
||||
print("❌ requests")
|
||||
|
||||
try:
|
||||
import websockets
|
||||
dependencies['websockets'] = True
|
||||
print("✅ websockets")
|
||||
except ImportError:
|
||||
print("❌ websockets")
|
||||
|
||||
missing = [dep for dep, installed in dependencies.items() if not installed]
|
||||
if missing:
|
||||
print(f"❌ 缺少依赖: {', '.join(missing)}")
|
||||
return False
|
||||
else:
|
||||
print("✅ 所有依赖都已安装")
|
||||
return True
|
||||
|
||||
def main():
|
||||
"""主测试函数"""
|
||||
print("🚀 多进程录音系统快速测试")
|
||||
print("=" * 50)
|
||||
|
||||
# 测试依赖
|
||||
if not test_dependencies():
|
||||
print("❌ 依赖检查失败")
|
||||
return False
|
||||
|
||||
print()
|
||||
|
||||
# 测试音频进程
|
||||
if not test_audio_processes():
|
||||
print("❌ 音频进程测试失败")
|
||||
return False
|
||||
|
||||
print()
|
||||
print("✅ 所有测试通过!")
|
||||
print("💡 现在可以运行主程序:")
|
||||
print(" python multiprocess_recorder.py")
|
||||
|
||||
return True
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
96
test_llm.py
96
test_llm.py
@ -1,96 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
"""
|
||||
测试大语言模型API功能
|
||||
"""
|
||||
|
||||
import os
|
||||
import requests
|
||||
import json
|
||||
|
||||
def test_llm_api():
|
||||
"""测试大语言模型API"""
|
||||
|
||||
# 检查API密钥
|
||||
api_key = os.environ.get("ARK_API_KEY")
|
||||
if not api_key:
|
||||
print("❌ 未设置 ARK_API_KEY 环境变量")
|
||||
return False
|
||||
|
||||
print(f"✅ API密钥已设置: {api_key[:20]}...")
|
||||
|
||||
# API配置
|
||||
api_url = "https://ark.cn-beijing.volces.com/api/v3/chat/completions"
|
||||
model = "doubao-1-5-pro-32k-250115"
|
||||
|
||||
# 测试消息
|
||||
test_message = "你好,请简单介绍一下自己"
|
||||
|
||||
try:
|
||||
print("🤖 测试大语言模型API...")
|
||||
|
||||
headers = {
|
||||
"Content-Type": "application/json",
|
||||
"Authorization": f"Bearer {api_key}"
|
||||
}
|
||||
|
||||
data = {
|
||||
"model": model,
|
||||
"messages": [
|
||||
{
|
||||
"role": "system",
|
||||
"content": "你是一个智能助手,请根据用户的语音输入提供有帮助的回答。保持回答简洁明了。"
|
||||
},
|
||||
{
|
||||
"role": "user",
|
||||
"content": test_message
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
response = requests.post(api_url, headers=headers, json=data, timeout=30)
|
||||
|
||||
print(f"📡 HTTP状态码: {response.status_code}")
|
||||
|
||||
if response.status_code == 200:
|
||||
result = response.json()
|
||||
print("✅ API调用成功")
|
||||
|
||||
if "choices" in result and len(result["choices"]) > 0:
|
||||
llm_response = result["choices"][0]["message"]["content"]
|
||||
print(f"💬 AI回复: {llm_response}")
|
||||
|
||||
# 显示完整响应结构
|
||||
print("\n📋 完整响应结构:")
|
||||
print(json.dumps(result, indent=2, ensure_ascii=False))
|
||||
|
||||
return True
|
||||
else:
|
||||
print("❌ 响应格式错误")
|
||||
print(f"响应内容: {response.text}")
|
||||
return False
|
||||
else:
|
||||
print(f"❌ API调用失败: {response.status_code}")
|
||||
print(f"响应内容: {response.text}")
|
||||
return False
|
||||
|
||||
except requests.exceptions.RequestException as e:
|
||||
print(f"❌ 网络请求失败: {e}")
|
||||
return False
|
||||
except Exception as e:
|
||||
print(f"❌ 测试失败: {e}")
|
||||
return False
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("🧪 测试大语言模型API功能")
|
||||
print("=" * 50)
|
||||
|
||||
success = test_llm_api()
|
||||
|
||||
if success:
|
||||
print("\n✅ 大语言模型功能测试通过!")
|
||||
print("🚀 现在可以运行完整的语音助手系统了")
|
||||
else:
|
||||
print("\n❌ 大语言模型功能测试失败")
|
||||
print("🔧 请检查API密钥和网络连接")
|
||||
@ -1,108 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
"""
|
||||
测试流式响应解析的脚本
|
||||
"""
|
||||
|
||||
import json
|
||||
import requests
|
||||
import os
|
||||
|
||||
def test_streaming_response():
|
||||
"""测试流式响应解析"""
|
||||
|
||||
# 检查API密钥
|
||||
api_key = os.environ.get("ARK_API_KEY")
|
||||
if not api_key:
|
||||
print("❌ 请设置 ARK_API_KEY 环境变量")
|
||||
return
|
||||
|
||||
headers = {
|
||||
"Content-Type": "application/json",
|
||||
"Authorization": f"Bearer {api_key}"
|
||||
}
|
||||
|
||||
data = {
|
||||
"messages": [
|
||||
{
|
||||
"content": "你是一个智能助手,请回答问题。",
|
||||
"role": "system"
|
||||
},
|
||||
{
|
||||
"content": "你好,请简单介绍一下自己",
|
||||
"role": "user"
|
||||
}
|
||||
],
|
||||
"model": "doubao-1-5-pro-32k-250115",
|
||||
"stream": True
|
||||
}
|
||||
|
||||
print("🚀 开始测试流式响应...")
|
||||
|
||||
try:
|
||||
response = requests.post(
|
||||
"https://ark.cn-beijing.volces.com/api/v3/chat/completions",
|
||||
headers=headers,
|
||||
json=data,
|
||||
stream=True,
|
||||
timeout=30
|
||||
)
|
||||
|
||||
print(f"📊 响应状态: {response.status_code}")
|
||||
|
||||
if response.status_code != 200:
|
||||
print(f"❌ 请求失败: {response.text}")
|
||||
return
|
||||
|
||||
print("🔍 开始解析流式响应...")
|
||||
|
||||
accumulated_text = ""
|
||||
line_count = 0
|
||||
|
||||
for line in response.iter_lines(decode_unicode=True):
|
||||
line_count += 1
|
||||
|
||||
if not line or not line.strip():
|
||||
continue
|
||||
|
||||
# 预处理
|
||||
line = line.strip()
|
||||
|
||||
print(f"\n--- 第{line_count}行 ---")
|
||||
print(f"原始内容: {repr(line)}")
|
||||
|
||||
if line.startswith("data: "):
|
||||
data_str = line[6:] # 移除 "data: " 前缀
|
||||
print(f"处理后: {repr(data_str)}")
|
||||
|
||||
if data_str == "[DONE]":
|
||||
print("✅ 流结束")
|
||||
break
|
||||
|
||||
try:
|
||||
chunk_data = json.loads(data_str)
|
||||
print(f"✅ JSON解析成功: {chunk_data}")
|
||||
|
||||
if "choices" in chunk_data and len(chunk_data["choices"]) > 0:
|
||||
delta = chunk_data["choices"][0].get("delta", {})
|
||||
content = delta.get("content", "")
|
||||
|
||||
if content:
|
||||
accumulated_text += content
|
||||
print(f"💬 累计内容: {accumulated_text}")
|
||||
|
||||
except json.JSONDecodeError as e:
|
||||
print(f"❌ JSON解析失败: {e}")
|
||||
print(f"🔍 问题数据: {repr(data_str)}")
|
||||
except Exception as e:
|
||||
print(f"❌ 其他错误: {e}")
|
||||
|
||||
print(f"\n✅ 测试完成,总共处理了 {line_count} 行")
|
||||
print(f"📝 最终内容: {accumulated_text}")
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 测试失败: {e}")
|
||||
|
||||
if __name__ == "__main__":
|
||||
test_streaming_response()
|
||||
194
test_voice_detection.py
Normal file
194
test_voice_detection.py
Normal file
@ -0,0 +1,194 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
"""
|
||||
语音检测测试脚本
|
||||
用于测试和调试ZCR语音检测功能
|
||||
"""
|
||||
|
||||
import numpy as np
|
||||
import time
|
||||
import pyaudio
|
||||
from audio_processes import InputProcess
|
||||
import multiprocessing as mp
|
||||
import queue
|
||||
|
||||
class VoiceDetectionTester:
|
||||
"""语音检测测试器"""
|
||||
|
||||
def __init__(self):
|
||||
self.FORMAT = pyaudio.paInt16
|
||||
self.CHANNELS = 1
|
||||
self.RATE = 16000
|
||||
self.CHUNK_SIZE = 1024
|
||||
|
||||
# 测试参数
|
||||
self.test_duration = 30 # 测试30秒
|
||||
self.zcr_history = []
|
||||
self.voice_count = 0
|
||||
|
||||
# 音频设备
|
||||
self.audio = None
|
||||
self.stream = None
|
||||
|
||||
def setup_audio(self):
|
||||
"""设置音频设备"""
|
||||
try:
|
||||
self.audio = pyaudio.PyAudio()
|
||||
self.stream = self.audio.open(
|
||||
format=self.FORMAT,
|
||||
channels=self.CHANNELS,
|
||||
rate=self.RATE,
|
||||
input=True,
|
||||
frames_per_buffer=self.CHUNK_SIZE
|
||||
)
|
||||
print("✅ 音频设备初始化成功")
|
||||
return True
|
||||
except Exception as e:
|
||||
print(f"❌ 音频设备初始化失败: {e}")
|
||||
return False
|
||||
|
||||
def calculate_zcr(self, audio_data):
|
||||
"""计算零交叉率"""
|
||||
if len(audio_data) == 0:
|
||||
return 0
|
||||
|
||||
audio_array = np.frombuffer(audio_data, dtype=np.int16)
|
||||
zero_crossings = np.sum(np.diff(np.sign(audio_array)) != 0)
|
||||
zcr = zero_crossings / len(audio_array) * self.RATE
|
||||
return zcr
|
||||
|
||||
def test_detection(self):
|
||||
"""测试语音检测"""
|
||||
print("🎙️ 开始语音检测测试")
|
||||
print("=" * 50)
|
||||
|
||||
# 环境校准阶段
|
||||
print("🔍 第一阶段:环境噪音校准 (10秒)")
|
||||
print("请保持安静,不要说话...")
|
||||
|
||||
calibration_samples = []
|
||||
start_time = time.time()
|
||||
|
||||
try:
|
||||
while time.time() - start_time < 10:
|
||||
data = self.stream.read(self.CHUNK_SIZE, exception_on_overflow=False)
|
||||
if len(data) > 0:
|
||||
zcr = self.calculate_zcr(data)
|
||||
calibration_samples.append(zcr)
|
||||
|
||||
# 显示进度
|
||||
progress = (time.time() - start_time) / 10 * 100
|
||||
print(f"\r校准进度: {progress:.1f}%", end='', flush=True)
|
||||
|
||||
time.sleep(0.01)
|
||||
|
||||
print("\n✅ 环境校准完成")
|
||||
|
||||
# 计算统计数据
|
||||
if calibration_samples:
|
||||
avg_zcr = np.mean(calibration_samples)
|
||||
std_zcr = np.std(calibration_samples)
|
||||
min_zcr = min(calibration_samples)
|
||||
max_zcr = max(calibration_samples)
|
||||
|
||||
print(f"📊 环境噪音统计:")
|
||||
print(f" 平均ZCR: {avg_zcr:.0f}")
|
||||
print(f" 标准差: {std_zcr:.0f}")
|
||||
print(f" 最小值: {min_zcr:.0f}")
|
||||
print(f" 最大值: {max_zcr:.0f}")
|
||||
|
||||
# 建议的检测阈值
|
||||
suggested_min = max(2400, avg_zcr + 2 * std_zcr)
|
||||
suggested_max = min(12000, avg_zcr + 6 * std_zcr)
|
||||
|
||||
print(f"\n🎯 建议的语音检测阈值:")
|
||||
print(f" 最小阈值: {suggested_min:.0f}")
|
||||
print(f" 最大阈值: {suggested_max:.0f}")
|
||||
|
||||
# 测试检测
|
||||
print(f"\n🎙️ 第二阶段:语音检测测试 (20秒)")
|
||||
print("现在请说话,测试语音检测...")
|
||||
|
||||
voice_threshold = suggested_min
|
||||
silence_threshold = suggested_max
|
||||
|
||||
consecutive_voice = 0
|
||||
voice_detected = False
|
||||
|
||||
test_start = time.time()
|
||||
|
||||
while time.time() - test_start < 20:
|
||||
data = self.stream.read(self.CHUNK_SIZE, exception_on_overflow=False)
|
||||
if len(data) > 0:
|
||||
zcr = self.calculate_zcr(data)
|
||||
|
||||
# 简单的语音检测
|
||||
is_voice = voice_threshold < zcr < silence_threshold
|
||||
|
||||
if is_voice:
|
||||
consecutive_voice += 1
|
||||
if consecutive_voice >= 5 and not voice_detected:
|
||||
voice_detected = True
|
||||
self.voice_count += 1
|
||||
print(f"\n🎤 检测到语音 #{self.voice_count}! ZCR: {zcr:.0f}")
|
||||
else:
|
||||
consecutive_voice = 0
|
||||
if voice_detected:
|
||||
voice_detected = False
|
||||
print(f" 语音结束,持续时间: {time.time() - last_voice_time:.1f}秒")
|
||||
|
||||
if voice_detected:
|
||||
last_voice_time = time.time()
|
||||
|
||||
# 实时显示ZCR值
|
||||
status = "🎤" if voice_detected else "🔇"
|
||||
print(f"\r{status} ZCR: {zcr:.0f} | 阈值: {voice_threshold:.0f}-{silence_threshold:.0f} | "
|
||||
f"连续语音: {consecutive_voice}/5", end='', flush=True)
|
||||
|
||||
time.sleep(0.01)
|
||||
|
||||
print(f"\n\n✅ 测试完成!共检测到 {self.voice_count} 次语音")
|
||||
|
||||
except KeyboardInterrupt:
|
||||
print("\n🛑 测试被用户中断")
|
||||
except Exception as e:
|
||||
print(f"\n❌ 测试过程中出错: {e}")
|
||||
|
||||
def cleanup(self):
|
||||
"""清理资源"""
|
||||
if self.stream:
|
||||
try:
|
||||
self.stream.stop_stream()
|
||||
self.stream.close()
|
||||
except:
|
||||
pass
|
||||
|
||||
if self.audio:
|
||||
try:
|
||||
self.audio.terminate()
|
||||
except:
|
||||
pass
|
||||
|
||||
def run_test(self):
|
||||
"""运行完整测试"""
|
||||
print("🚀 语音检测测试工具")
|
||||
print("=" * 60)
|
||||
|
||||
if not self.setup_audio():
|
||||
print("❌ 无法初始化音频设备,测试终止")
|
||||
return
|
||||
|
||||
try:
|
||||
self.test_detection()
|
||||
finally:
|
||||
self.cleanup()
|
||||
print("\n👋 测试结束")
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
tester = VoiceDetectionTester()
|
||||
tester.run_test()
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
840
voice_chat.py
840
voice_chat.py
@ -1,840 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
"""
|
||||
语音交互聊天系统 - 集成豆包AI
|
||||
基于能量检测的录音 + 豆包语音识别 + TTS回复
|
||||
"""
|
||||
|
||||
import sys
|
||||
import os
|
||||
import time
|
||||
import threading
|
||||
import asyncio
|
||||
import subprocess
|
||||
import wave
|
||||
import struct
|
||||
import json
|
||||
import gzip
|
||||
import uuid
|
||||
from typing import Dict, Any, Optional
|
||||
import pyaudio
|
||||
import numpy as np
|
||||
import websockets
|
||||
|
||||
# 豆包协议常量
|
||||
PROTOCOL_VERSION = 0b0001
|
||||
CLIENT_FULL_REQUEST = 0b0001
|
||||
CLIENT_AUDIO_ONLY_REQUEST = 0b0010
|
||||
SERVER_FULL_RESPONSE = 0b1001
|
||||
SERVER_ACK = 0b1011
|
||||
SERVER_ERROR_RESPONSE = 0b1111
|
||||
NO_SEQUENCE = 0b0000
|
||||
MSG_WITH_EVENT = 0b0100
|
||||
NO_SERIALIZATION = 0b0000
|
||||
JSON = 0b0001
|
||||
GZIP = 0b0001
|
||||
|
||||
class DoubaoClient:
|
||||
"""豆包音频处理客户端"""
|
||||
|
||||
def __init__(self):
|
||||
self.base_url = "wss://openspeech.bytedance.com/api/v3/realtime/dialogue"
|
||||
self.app_id = "8718217928"
|
||||
self.access_key = "ynJMX-5ix1FsJvswC9KTNlGUdubcchqc"
|
||||
self.app_key = "PlgvMymc7f3tQnJ6"
|
||||
self.resource_id = "volc.speech.dialog"
|
||||
self.session_id = str(uuid.uuid4())
|
||||
self.ws = None
|
||||
self.log_id = ""
|
||||
|
||||
def get_headers(self) -> Dict[str, str]:
|
||||
"""获取请求头"""
|
||||
return {
|
||||
"X-Api-App-ID": self.app_id,
|
||||
"X-Api-Access-Key": self.access_key,
|
||||
"X-Api-Resource-Id": self.resource_id,
|
||||
"X-Api-App-Key": self.app_key,
|
||||
"X-Api-Connect-Id": str(uuid.uuid4()),
|
||||
}
|
||||
|
||||
def generate_header(self, message_type=CLIENT_FULL_REQUEST,
|
||||
message_type_specific_flags=MSG_WITH_EVENT,
|
||||
serial_method=JSON, compression_type=GZIP) -> bytes:
|
||||
"""生成协议头"""
|
||||
header = bytearray()
|
||||
header.append((PROTOCOL_VERSION << 4) | 1) # version + header_size
|
||||
header.append((message_type << 4) | message_type_specific_flags)
|
||||
header.append((serial_method << 4) | compression_type)
|
||||
header.append(0x00) # reserved
|
||||
return bytes(header)
|
||||
|
||||
async def connect(self) -> None:
|
||||
"""建立WebSocket连接"""
|
||||
print(f"🔗 连接豆包服务器...")
|
||||
try:
|
||||
self.ws = await websockets.connect(
|
||||
self.base_url,
|
||||
additional_headers=self.get_headers(),
|
||||
ping_interval=None
|
||||
)
|
||||
|
||||
# 获取log_id
|
||||
if hasattr(self.ws, 'response_headers'):
|
||||
self.log_id = self.ws.response_headers.get("X-Tt-Logid")
|
||||
elif hasattr(self.ws, 'headers'):
|
||||
self.log_id = self.ws.headers.get("X-Tt-Logid")
|
||||
|
||||
print(f"✅ 连接成功, log_id: {self.log_id}")
|
||||
|
||||
# 发送StartConnection请求
|
||||
await self._send_start_connection()
|
||||
|
||||
# 发送StartSession请求
|
||||
await self._send_start_session()
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 连接失败: {e}")
|
||||
raise
|
||||
|
||||
def parse_response(self, response):
|
||||
"""解析响应"""
|
||||
if len(response) < 4:
|
||||
return None
|
||||
|
||||
protocol_version = response[0] >> 4
|
||||
header_size = response[0] & 0x0f
|
||||
message_type = response[1] >> 4
|
||||
flags = response[1] & 0x0f
|
||||
|
||||
payload_start = header_size * 4
|
||||
payload = response[payload_start:]
|
||||
|
||||
result = {
|
||||
'protocol_version': protocol_version,
|
||||
'header_size': header_size,
|
||||
'message_type': message_type,
|
||||
'flags': flags,
|
||||
'payload': payload,
|
||||
'payload_size': len(payload)
|
||||
}
|
||||
|
||||
# 解析payload
|
||||
if len(payload) >= 4:
|
||||
result['event'] = int.from_bytes(payload[:4], 'big')
|
||||
|
||||
if len(payload) >= 8:
|
||||
session_id_len = int.from_bytes(payload[4:8], 'big')
|
||||
if len(payload) >= 8 + session_id_len:
|
||||
result['session_id'] = payload[8:8+session_id_len].decode()
|
||||
|
||||
if len(payload) >= 12 + session_id_len:
|
||||
data_size = int.from_bytes(payload[8+session_id_len:12+session_id_len], 'big')
|
||||
result['data_size'] = data_size
|
||||
result['data'] = payload[12+session_id_len:12+session_id_len+data_size]
|
||||
|
||||
# 尝试解析JSON数据
|
||||
try:
|
||||
result['json_data'] = json.loads(result['data'].decode('utf-8'))
|
||||
except:
|
||||
pass
|
||||
|
||||
return result
|
||||
|
||||
async def _send_start_connection(self) -> None:
|
||||
"""发送StartConnection请求"""
|
||||
request = bytearray(self.generate_header())
|
||||
request.extend(int(1).to_bytes(4, 'big'))
|
||||
|
||||
payload_bytes = b"{}"
|
||||
payload_bytes = gzip.compress(payload_bytes)
|
||||
request.extend(len(payload_bytes).to_bytes(4, 'big'))
|
||||
request.extend(payload_bytes)
|
||||
|
||||
await self.ws.send(request)
|
||||
response = await self.ws.recv()
|
||||
|
||||
async def _send_start_session(self) -> None:
|
||||
"""发送StartSession请求"""
|
||||
session_config = {
|
||||
"asr": {"extra": {"end_smooth_window_ms": 1500}},
|
||||
"tts": {
|
||||
"speaker": "zh_female_vv_jupiter_bigtts",
|
||||
"audio_config": {"channel": 1, "format": "pcm", "sample_rate": 24000}
|
||||
},
|
||||
"dialog": {
|
||||
"bot_name": "豆包",
|
||||
"system_role": "你使用活泼灵动的女声,性格开朗,热爱生活。",
|
||||
"speaking_style": "你的说话风格简洁明了,语速适中,语调自然。",
|
||||
"location": {"city": "北京"},
|
||||
"extra": {
|
||||
"strict_audit": False,
|
||||
"audit_response": "支持客户自定义安全审核回复话术。",
|
||||
"recv_timeout": 30,
|
||||
"input_mod": "audio",
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
request = bytearray(self.generate_header())
|
||||
request.extend(int(100).to_bytes(4, 'big'))
|
||||
request.extend(len(self.session_id).to_bytes(4, 'big'))
|
||||
request.extend(self.session_id.encode())
|
||||
|
||||
payload_bytes = json.dumps(session_config).encode()
|
||||
payload_bytes = gzip.compress(payload_bytes)
|
||||
request.extend(len(payload_bytes).to_bytes(4, 'big'))
|
||||
request.extend(payload_bytes)
|
||||
|
||||
await self.ws.send(request)
|
||||
response = await self.ws.recv()
|
||||
await asyncio.sleep(1.0)
|
||||
|
||||
async def process_audio(self, audio_data: bytes) -> tuple[str, bytes]:
|
||||
"""处理音频并返回(识别文本, TTS音频)"""
|
||||
try:
|
||||
# 发送音频数据 - 使用与doubao_simple.py相同的格式
|
||||
task_request = bytearray(
|
||||
self.generate_header(message_type=CLIENT_AUDIO_ONLY_REQUEST,
|
||||
serial_method=NO_SERIALIZATION))
|
||||
task_request.extend(int(200).to_bytes(4, 'big'))
|
||||
task_request.extend(len(self.session_id).to_bytes(4, 'big'))
|
||||
task_request.extend(self.session_id.encode())
|
||||
payload_bytes = gzip.compress(audio_data)
|
||||
task_request.extend(len(payload_bytes).to_bytes(4, 'big'))
|
||||
task_request.extend(payload_bytes)
|
||||
await self.ws.send(task_request)
|
||||
print("📤 音频数据已发送")
|
||||
|
||||
recognized_text = ""
|
||||
tts_audio = b""
|
||||
response_count = 0
|
||||
|
||||
# 接收响应 - 使用与doubao_simple.py相同的解析逻辑
|
||||
audio_chunks = []
|
||||
max_responses = 30
|
||||
|
||||
while response_count < max_responses:
|
||||
try:
|
||||
response = await asyncio.wait_for(self.ws.recv(), timeout=30.0)
|
||||
response_count += 1
|
||||
|
||||
parsed = self.parse_response(response)
|
||||
if not parsed:
|
||||
continue
|
||||
|
||||
print(f"📥 响应 {response_count}: message_type={parsed['message_type']}, event={parsed.get('event', 'N/A')}, size={parsed['payload_size']}")
|
||||
|
||||
# 处理不同类型的响应
|
||||
if parsed['message_type'] == 11: # SERVER_ACK - 可能包含音频
|
||||
if 'data' in parsed and parsed['data_size'] > 0:
|
||||
audio_chunks.append(parsed['data'])
|
||||
print(f"收集到音频块: {parsed['data_size']} 字节")
|
||||
|
||||
elif parsed['message_type'] == 9: # SERVER_FULL_RESPONSE
|
||||
event = parsed.get('event', 0)
|
||||
|
||||
if event == 450: # ASR开始
|
||||
print("🎤 ASR处理开始")
|
||||
elif event == 451: # ASR结果
|
||||
if 'json_data' in parsed and 'results' in parsed['json_data']:
|
||||
text = parsed['json_data']['results'][0].get('text', '')
|
||||
recognized_text = text
|
||||
print(f"🧠 识别结果: {text}")
|
||||
elif event == 459: # ASR结束
|
||||
print("✅ ASR处理结束")
|
||||
elif event == 350: # TTS开始
|
||||
print("🎵 TTS生成开始")
|
||||
elif event == 359: # TTS结束
|
||||
print("✅ TTS生成结束")
|
||||
break
|
||||
elif event == 550: # TTS音频数据
|
||||
if 'data' in parsed and parsed['data_size'] > 0:
|
||||
# 检查是否是JSON(音频元数据)还是实际音频数据
|
||||
try:
|
||||
json.loads(parsed['data'].decode('utf-8'))
|
||||
print("收到TTS音频元数据")
|
||||
except:
|
||||
# 不是JSON,可能是音频数据
|
||||
audio_chunks.append(parsed['data'])
|
||||
print(f"收集到TTS音频块: {parsed['data_size']} 字节")
|
||||
|
||||
except asyncio.TimeoutError:
|
||||
print(f"⏰ 等待响应 {response_count + 1} 超时")
|
||||
break
|
||||
except websockets.exceptions.ConnectionClosed:
|
||||
print("🔌 连接已关闭")
|
||||
break
|
||||
|
||||
print(f"共收到 {response_count} 个响应,收集到 {len(audio_chunks)} 个音频块")
|
||||
|
||||
# 合并音频数据
|
||||
if audio_chunks:
|
||||
tts_audio = b''.join(audio_chunks)
|
||||
print(f"合并后的音频数据: {len(tts_audio)} 字节")
|
||||
|
||||
# 转换TTS音频格式(32位浮点 -> 16位整数)
|
||||
if tts_audio:
|
||||
# 检查是否是GZIP压缩数据
|
||||
try:
|
||||
decompressed = gzip.decompress(tts_audio)
|
||||
print(f"解压缩后音频数据: {len(decompressed)} 字节")
|
||||
audio_to_write = decompressed
|
||||
except:
|
||||
print("音频数据不是GZIP压缩格式,直接使用原始数据")
|
||||
audio_to_write = tts_audio
|
||||
|
||||
# 检查音频数据长度是否是4的倍数(32位浮点)
|
||||
if len(audio_to_write) % 4 != 0:
|
||||
print(f"警告:音频数据长度 {len(audio_to_write)} 不是4的倍数,截断到最近的倍数")
|
||||
audio_to_write = audio_to_write[:len(audio_to_write) // 4 * 4]
|
||||
|
||||
# 将32位浮点转换为16位整数
|
||||
float_count = len(audio_to_write) // 4
|
||||
int16_data = bytearray(float_count * 2)
|
||||
|
||||
for i in range(float_count):
|
||||
# 读取32位浮点数(小端序)
|
||||
float_value = struct.unpack('<f', audio_to_write[i*4:i*4+4])[0]
|
||||
|
||||
# 将浮点数限制在[-1.0, 1.0]范围内
|
||||
float_value = max(-1.0, min(1.0, float_value))
|
||||
|
||||
# 转换为16位整数
|
||||
int16_value = int(float_value * 32767)
|
||||
|
||||
# 写入16位整数(小端序)
|
||||
int16_data[i*2:i*2+2] = struct.pack('<h', int16_value)
|
||||
|
||||
tts_audio = bytes(int16_data)
|
||||
print(f"✅ 音频转换完成: {len(tts_audio)} 字节")
|
||||
|
||||
return recognized_text, tts_audio
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 处理失败: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return "", b""
|
||||
|
||||
async def send_silence_data(self, duration_ms=100) -> None:
|
||||
"""发送静音数据保持连接活跃"""
|
||||
try:
|
||||
# 生成静音音频数据
|
||||
samples = int(16000 * duration_ms / 1000) # 16kHz采样率
|
||||
silence_data = bytes(samples * 2) # 16位PCM
|
||||
|
||||
# 发送静音数据
|
||||
task_request = bytearray(
|
||||
self.generate_header(message_type=CLIENT_AUDIO_ONLY_REQUEST,
|
||||
serial_method=NO_SERIALIZATION))
|
||||
task_request.extend(int(200).to_bytes(4, 'big'))
|
||||
task_request.extend(len(self.session_id).to_bytes(4, 'big'))
|
||||
task_request.extend(self.session_id.encode())
|
||||
payload_bytes = gzip.compress(silence_data)
|
||||
task_request.extend(len(payload_bytes).to_bytes(4, 'big'))
|
||||
task_request.extend(payload_bytes)
|
||||
await self.ws.send(task_request)
|
||||
print("💓 发送心跳数据保持连接")
|
||||
|
||||
# 简单处理响应(不等待完整响应)
|
||||
try:
|
||||
response = await asyncio.wait_for(self.ws.recv(), timeout=5.0)
|
||||
# 只确认收到响应,不处理内容
|
||||
except asyncio.TimeoutError:
|
||||
print("⚠️ 心跳响应超时")
|
||||
except websockets.exceptions.ConnectionClosed:
|
||||
print("❌ 心跳时连接已关闭")
|
||||
raise
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 发送心跳数据失败: {e}")
|
||||
|
||||
async def close(self) -> None:
|
||||
"""关闭连接"""
|
||||
if self.ws:
|
||||
try:
|
||||
await self.ws.close()
|
||||
except:
|
||||
pass
|
||||
print("🔌 连接已关闭")
|
||||
|
||||
class VoiceChatRecorder:
|
||||
"""语音聊天录音系统"""
|
||||
|
||||
def __init__(self, enable_ai_chat=True):
|
||||
# 音频参数
|
||||
self.FORMAT = pyaudio.paInt16
|
||||
self.CHANNELS = 1
|
||||
self.RATE = 16000
|
||||
self.CHUNK_SIZE = 1024
|
||||
|
||||
# 能量检测参数
|
||||
self.energy_threshold = 500
|
||||
self.silence_threshold = 2.0
|
||||
self.min_recording_time = 1.0
|
||||
self.max_recording_time = 20.0
|
||||
|
||||
# 状态变量
|
||||
self.audio = None
|
||||
self.stream = None
|
||||
self.running = False
|
||||
self.recording = False
|
||||
self.recorded_frames = []
|
||||
self.recording_start_time = None
|
||||
self.last_sound_time = None
|
||||
self.energy_history = []
|
||||
self.zcr_history = []
|
||||
|
||||
# AI聊天功能
|
||||
self.enable_ai_chat = enable_ai_chat
|
||||
self.doubao_client = None
|
||||
self.is_processing_ai = False
|
||||
self.heartbeat_thread = None
|
||||
self.last_heartbeat_time = time.time()
|
||||
self.heartbeat_interval = 10.0 # 每10秒发送一次心跳
|
||||
|
||||
# 预录音缓冲区
|
||||
self.pre_record_buffer = []
|
||||
self.pre_record_max_frames = int(2.0 * self.RATE / self.CHUNK_SIZE)
|
||||
|
||||
# 播放状态
|
||||
self.is_playing = False
|
||||
|
||||
# ZCR检测参数
|
||||
self.consecutive_low_zcr_count = 0
|
||||
self.low_zcr_threshold_count = 15
|
||||
self.voice_activity_history = []
|
||||
|
||||
self._setup_audio()
|
||||
|
||||
def _setup_audio(self):
|
||||
"""设置音频设备"""
|
||||
try:
|
||||
self.audio = pyaudio.PyAudio()
|
||||
self.stream = self.audio.open(
|
||||
format=self.FORMAT,
|
||||
channels=self.CHANNELS,
|
||||
rate=self.RATE,
|
||||
input=True,
|
||||
frames_per_buffer=self.CHUNK_SIZE
|
||||
)
|
||||
print("✅ 音频设备初始化成功")
|
||||
except Exception as e:
|
||||
print(f"❌ 音频设备初始化失败: {e}")
|
||||
|
||||
def generate_silence_audio(self, duration_ms=100):
|
||||
"""生成静音音频数据"""
|
||||
# 生成指定时长的静音音频(16位PCM,值为0)
|
||||
samples = int(self.RATE * duration_ms / 1000)
|
||||
silence_data = bytes(samples * 2) # 16位 = 2字节每样本
|
||||
return silence_data
|
||||
|
||||
def calculate_energy(self, audio_data):
|
||||
"""计算音频能量"""
|
||||
if len(audio_data) == 0:
|
||||
return 0
|
||||
|
||||
audio_array = np.frombuffer(audio_data, dtype=np.int16)
|
||||
rms = np.sqrt(np.mean(audio_array ** 2))
|
||||
|
||||
if not self.recording:
|
||||
self.energy_history.append(rms)
|
||||
if len(self.energy_history) > 50:
|
||||
self.energy_history.pop(0)
|
||||
|
||||
return rms
|
||||
|
||||
def calculate_zero_crossing_rate(self, audio_data):
|
||||
"""计算零交叉率"""
|
||||
if len(audio_data) == 0:
|
||||
return 0
|
||||
|
||||
audio_array = np.frombuffer(audio_data, dtype=np.int16)
|
||||
zero_crossings = np.sum(np.diff(np.sign(audio_array)) != 0)
|
||||
zcr = zero_crossings / len(audio_array) * self.RATE
|
||||
|
||||
self.zcr_history.append(zcr)
|
||||
if len(self.zcr_history) > 30:
|
||||
self.zcr_history.pop(0)
|
||||
|
||||
return zcr
|
||||
|
||||
def is_voice_active(self, energy, zcr):
|
||||
"""使用ZCR进行语音活动检测"""
|
||||
# 16000Hz采样率下的语音ZCR范围
|
||||
zcr_condition = 2400 < zcr < 12000
|
||||
return zcr_condition
|
||||
|
||||
def save_recording(self, audio_data, filename=None):
|
||||
"""保存录音"""
|
||||
if filename is None:
|
||||
timestamp = time.strftime("%Y%m%d_%H%M%S")
|
||||
filename = f"recording_{timestamp}.wav"
|
||||
|
||||
try:
|
||||
with wave.open(filename, 'wb') as wf:
|
||||
wf.setnchannels(self.CHANNELS)
|
||||
wf.setsampwidth(self.audio.get_sample_size(self.FORMAT))
|
||||
wf.setframerate(self.RATE)
|
||||
wf.writeframes(audio_data)
|
||||
|
||||
print(f"✅ 录音已保存: {filename}")
|
||||
return True, filename
|
||||
except Exception as e:
|
||||
print(f"❌ 保存录音失败: {e}")
|
||||
return False, None
|
||||
|
||||
def play_audio(self, filename):
|
||||
"""播放音频文件"""
|
||||
try:
|
||||
# 停止当前录音
|
||||
if self.recording:
|
||||
self.recording = False
|
||||
self.recorded_frames = []
|
||||
|
||||
# 关闭输入流
|
||||
if self.stream:
|
||||
self.stream.stop_stream()
|
||||
self.stream.close()
|
||||
self.stream = None
|
||||
|
||||
self.is_playing = True
|
||||
time.sleep(0.2)
|
||||
|
||||
# 使用系统播放器
|
||||
print(f"🔊 播放: {filename}")
|
||||
subprocess.run(['aplay', filename], check=True)
|
||||
print("✅ 播放完成")
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 播放失败: {e}")
|
||||
finally:
|
||||
self.is_playing = False
|
||||
time.sleep(0.2)
|
||||
self._setup_audio()
|
||||
|
||||
def update_pre_record_buffer(self, audio_data):
|
||||
"""更新预录音缓冲区"""
|
||||
self.pre_record_buffer.append(audio_data)
|
||||
if len(self.pre_record_buffer) > self.pre_record_max_frames:
|
||||
self.pre_record_buffer.pop(0)
|
||||
|
||||
def start_recording(self):
|
||||
"""开始录音"""
|
||||
print("🎙️ 检测到声音,开始录音...")
|
||||
self.recording = True
|
||||
self.recorded_frames = []
|
||||
self.recorded_frames.extend(self.pre_record_buffer)
|
||||
self.pre_record_buffer = []
|
||||
self.recording_start_time = time.time()
|
||||
self.last_sound_time = time.time()
|
||||
self.consecutive_low_zcr_count = 0
|
||||
|
||||
def stop_recording(self):
|
||||
"""停止录音"""
|
||||
if len(self.recorded_frames) > 0:
|
||||
audio_data = b''.join(self.recorded_frames)
|
||||
duration = len(audio_data) / (self.RATE * 2)
|
||||
|
||||
print(f"📝 录音完成,时长: {duration:.2f}秒")
|
||||
|
||||
if self.enable_ai_chat:
|
||||
# AI聊天模式
|
||||
self.process_with_ai(audio_data)
|
||||
else:
|
||||
# 普通录音模式
|
||||
success, filename = self.save_recording(audio_data)
|
||||
if success and filename:
|
||||
print("=" * 50)
|
||||
print("🔊 播放刚才录制的音频...")
|
||||
self.play_audio(filename)
|
||||
print("=" * 50)
|
||||
|
||||
self.recording = False
|
||||
self.recorded_frames = []
|
||||
self.recording_start_time = None
|
||||
self.last_sound_time = None
|
||||
|
||||
def process_with_ai(self, audio_data):
|
||||
"""使用AI处理录音"""
|
||||
if self.is_processing_ai:
|
||||
print("⏳ AI正在处理中,请稍候...")
|
||||
return
|
||||
|
||||
self.is_processing_ai = True
|
||||
|
||||
# 在新线程中处理AI
|
||||
ai_thread = threading.Thread(target=self._ai_processing_thread, args=(audio_data,))
|
||||
ai_thread.daemon = True
|
||||
ai_thread.start()
|
||||
|
||||
def _heartbeat_thread(self):
|
||||
"""心跳线程 - 定期发送静音数据保持连接活跃"""
|
||||
while self.running and self.doubao_client and self.doubao_client.ws:
|
||||
current_time = time.time()
|
||||
if current_time - self.last_heartbeat_time >= self.heartbeat_interval:
|
||||
try:
|
||||
# 异步发送心跳数据
|
||||
loop = asyncio.new_event_loop()
|
||||
asyncio.set_event_loop(loop)
|
||||
try:
|
||||
loop.run_until_complete(self.doubao_client.send_silence_data())
|
||||
self.last_heartbeat_time = current_time
|
||||
except Exception as e:
|
||||
print(f"❌ 心跳失败: {e}")
|
||||
# 如果心跳失败,可能需要重新连接
|
||||
break
|
||||
finally:
|
||||
loop.close()
|
||||
except Exception as e:
|
||||
print(f"❌ 心跳线程异常: {e}")
|
||||
break
|
||||
|
||||
# 睡眠一段时间
|
||||
time.sleep(1.0)
|
||||
|
||||
print("📡 心跳线程结束")
|
||||
|
||||
def _ai_processing_thread(self, audio_data):
|
||||
"""AI处理线程"""
|
||||
try:
|
||||
print("🤖 开始AI处理...")
|
||||
print("🧠 正在进行语音识别...")
|
||||
|
||||
# 异步处理
|
||||
loop = asyncio.new_event_loop()
|
||||
asyncio.set_event_loop(loop)
|
||||
|
||||
try:
|
||||
# 连接豆包
|
||||
self.doubao_client = DoubaoClient()
|
||||
loop.run_until_complete(self.doubao_client.connect())
|
||||
|
||||
# 启动心跳线程
|
||||
self.last_heartbeat_time = time.time()
|
||||
self.heartbeat_thread = threading.Thread(target=self._heartbeat_thread)
|
||||
self.heartbeat_thread.daemon = True
|
||||
self.heartbeat_thread.start()
|
||||
print("💓 心跳线程已启动")
|
||||
|
||||
# 语音识别和TTS回复
|
||||
recognized_text, tts_audio = loop.run_until_complete(
|
||||
self.doubao_client.process_audio(audio_data)
|
||||
)
|
||||
|
||||
if recognized_text:
|
||||
print(f"🗣️ 你说: {recognized_text}")
|
||||
|
||||
if tts_audio:
|
||||
# 保存TTS音频
|
||||
tts_filename = "ai_response.wav"
|
||||
with wave.open(tts_filename, 'wb') as wav_file:
|
||||
wav_file.setnchannels(1)
|
||||
wav_file.setsampwidth(2)
|
||||
wav_file.setframerate(24000)
|
||||
wav_file.writeframes(tts_audio)
|
||||
|
||||
print("🎵 AI回复生成完成")
|
||||
print("=" * 50)
|
||||
print("🔊 播放AI回复...")
|
||||
self.play_audio(tts_filename)
|
||||
print("=" * 50)
|
||||
else:
|
||||
print("❌ 未收到AI回复")
|
||||
|
||||
# 等待一段时间再关闭连接,以便心跳继续工作
|
||||
print("⏳ 等待5秒后关闭连接...")
|
||||
time.sleep(5)
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ AI处理失败: {e}")
|
||||
finally:
|
||||
# 停止心跳线程
|
||||
if self.heartbeat_thread and self.heartbeat_thread.is_alive():
|
||||
print("🛑 停止心跳线程")
|
||||
self.heartbeat_thread = None
|
||||
|
||||
# 关闭连接
|
||||
if self.doubao_client:
|
||||
loop.run_until_complete(self.doubao_client.close())
|
||||
|
||||
loop.close()
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ AI处理线程失败: {e}")
|
||||
finally:
|
||||
self.is_processing_ai = False
|
||||
|
||||
def run(self):
|
||||
"""运行语音聊天系统"""
|
||||
if not self.stream:
|
||||
print("❌ 音频设备未初始化")
|
||||
return
|
||||
|
||||
self.running = True
|
||||
|
||||
if self.enable_ai_chat:
|
||||
print("🤖 语音聊天AI助手")
|
||||
print("=" * 50)
|
||||
print("🎯 功能特点:")
|
||||
print("- 🎙️ 智能语音检测")
|
||||
print("- 🧠 豆包AI语音识别")
|
||||
print("- 🗣️ AI智能回复")
|
||||
print("- 🔊 TTS语音播放")
|
||||
print("- 🔄 实时对话")
|
||||
print("=" * 50)
|
||||
print("📖 使用说明:")
|
||||
print("- 说话自动录音")
|
||||
print("- 静音2秒结束录音")
|
||||
print("- AI自动识别并回复")
|
||||
print("- 按 Ctrl+C 退出")
|
||||
print("=" * 50)
|
||||
else:
|
||||
print("🎙️ 智能录音系统")
|
||||
print("=" * 50)
|
||||
print("📖 使用说明:")
|
||||
print("- 说话自动录音")
|
||||
print("- 静音2秒结束录音")
|
||||
print("- 录音完成后自动播放")
|
||||
print("- 按 Ctrl+C 退出")
|
||||
print("=" * 50)
|
||||
|
||||
try:
|
||||
while self.running:
|
||||
# 如果正在播放AI回复,跳过音频处理
|
||||
if self.is_playing or self.is_processing_ai:
|
||||
status = "🤖 AI处理中..."
|
||||
print(f"\r{status}", end='', flush=True)
|
||||
time.sleep(0.1)
|
||||
continue
|
||||
|
||||
# 读取音频数据
|
||||
data = self.stream.read(self.CHUNK_SIZE, exception_on_overflow=False)
|
||||
|
||||
if len(data) == 0:
|
||||
continue
|
||||
|
||||
# 计算能量和ZCR
|
||||
energy = self.calculate_energy(data)
|
||||
zcr = self.calculate_zero_crossing_rate(data)
|
||||
|
||||
if self.recording:
|
||||
# 录音模式
|
||||
self.recorded_frames.append(data)
|
||||
recording_duration = time.time() - self.recording_start_time
|
||||
|
||||
# 检测语音活动
|
||||
if self.is_voice_active(energy, zcr):
|
||||
self.last_sound_time = time.time()
|
||||
self.consecutive_low_zcr_count = 0
|
||||
else:
|
||||
self.consecutive_low_zcr_count += 1
|
||||
|
||||
# 检查是否应该结束录音
|
||||
should_stop = False
|
||||
|
||||
# ZCR静音检测
|
||||
if self.consecutive_low_zcr_count >= self.low_zcr_threshold_count:
|
||||
should_stop = True
|
||||
|
||||
# 时间静音检测
|
||||
if not should_stop and time.time() - self.last_sound_time > self.silence_threshold:
|
||||
should_stop = True
|
||||
|
||||
# 执行停止录音
|
||||
if should_stop and recording_duration >= self.min_recording_time:
|
||||
print(f"\n🔇 检测到静音,结束录音")
|
||||
self.stop_recording()
|
||||
|
||||
# 检查最大录音时间
|
||||
if recording_duration > self.max_recording_time:
|
||||
print(f"\n⏰ 达到最大录音时间")
|
||||
self.stop_recording()
|
||||
|
||||
# 显示录音状态
|
||||
is_voice = self.is_voice_active(energy, zcr)
|
||||
zcr_count = f"{self.consecutive_low_zcr_count}/{self.low_zcr_threshold_count}"
|
||||
status = f"录音中... {recording_duration:.1f}s | ZCR: {zcr:.0f} | 语音: {is_voice} | 静音计数: {zcr_count}"
|
||||
print(f"\r{status}", end='', flush=True)
|
||||
|
||||
else:
|
||||
# 监听模式
|
||||
self.update_pre_record_buffer(data)
|
||||
|
||||
if self.is_voice_active(energy, zcr):
|
||||
# 检测到声音,开始录音
|
||||
self.start_recording()
|
||||
else:
|
||||
# 显示监听状态
|
||||
is_voice = self.is_voice_active(energy, zcr)
|
||||
buffer_usage = len(self.pre_record_buffer) / self.pre_record_max_frames * 100
|
||||
status = f"监听中... ZCR: {zcr:.0f} | 语音: {is_voice} | 缓冲: {buffer_usage:.0f}%"
|
||||
print(f"\r{status}", end='', flush=True)
|
||||
|
||||
time.sleep(0.01)
|
||||
|
||||
except KeyboardInterrupt:
|
||||
print("\n👋 退出")
|
||||
except Exception as e:
|
||||
print(f"❌ 错误: {e}")
|
||||
finally:
|
||||
self.stop()
|
||||
|
||||
def stop(self):
|
||||
"""停止系统"""
|
||||
self.running = False
|
||||
|
||||
# 停止心跳线程
|
||||
if self.heartbeat_thread and self.heartbeat_thread.is_alive():
|
||||
print("🛑 停止心跳线程")
|
||||
self.heartbeat_thread = None
|
||||
|
||||
if self.recording:
|
||||
self.stop_recording()
|
||||
|
||||
if self.stream:
|
||||
self.stream.stop_stream()
|
||||
self.stream.close()
|
||||
|
||||
if self.audio:
|
||||
self.audio.terminate()
|
||||
|
||||
# 关闭AI连接
|
||||
if self.doubao_client and self.doubao_client.ws:
|
||||
try:
|
||||
loop = asyncio.new_event_loop()
|
||||
asyncio.set_event_loop(loop)
|
||||
loop.run_until_complete(self.doubao_client.close())
|
||||
loop.close()
|
||||
except:
|
||||
pass
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
import argparse
|
||||
|
||||
parser = argparse.ArgumentParser(description='语音聊天AI助手')
|
||||
parser.add_argument('--no-ai', action='store_true', help='禁用AI功能,仅录音')
|
||||
args = parser.parse_args()
|
||||
|
||||
enable_ai = not args.no_ai
|
||||
|
||||
if enable_ai:
|
||||
print("🚀 语音聊天AI助手")
|
||||
else:
|
||||
print("🚀 智能录音系统")
|
||||
|
||||
print("=" * 50)
|
||||
|
||||
# 创建语音聊天系统
|
||||
recorder = VoiceChatRecorder(enable_ai_chat=enable_ai)
|
||||
|
||||
print("✅ 系统初始化成功")
|
||||
print("=" * 50)
|
||||
|
||||
# 开始运行
|
||||
recorder.run()
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
198
zcr_monitor.py
Normal file
198
zcr_monitor.py
Normal file
@ -0,0 +1,198 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
"""
|
||||
实时ZCR监控工具
|
||||
用于观察实际的ZCR值和测试语音检测
|
||||
"""
|
||||
|
||||
import threading
|
||||
import time
|
||||
|
||||
import numpy as np
|
||||
import pyaudio
|
||||
|
||||
|
||||
class ZCRMonitor:
|
||||
"""ZCR实时监控器"""
|
||||
|
||||
def __init__(self):
|
||||
self.FORMAT = pyaudio.paInt16
|
||||
self.CHANNELS = 1
|
||||
self.RATE = 16000
|
||||
self.CHUNK_SIZE = 1024
|
||||
|
||||
# 监控参数
|
||||
self.running = False
|
||||
self.zcr_history = []
|
||||
self.max_history = 100
|
||||
|
||||
# 音频设备
|
||||
self.audio = None
|
||||
self.stream = None
|
||||
|
||||
# 检测阈值(匹配recorder.py的设置)
|
||||
self.zcr_min = 2400
|
||||
self.zcr_max = 12000
|
||||
|
||||
def setup_audio(self):
|
||||
"""设置音频设备"""
|
||||
try:
|
||||
self.audio = pyaudio.PyAudio()
|
||||
self.stream = self.audio.open(
|
||||
format=self.FORMAT,
|
||||
channels=self.CHANNELS,
|
||||
rate=self.RATE,
|
||||
input=True,
|
||||
frames_per_buffer=self.CHUNK_SIZE
|
||||
)
|
||||
return True
|
||||
except Exception as e:
|
||||
print(f"❌ 音频设备初始化失败: {e}")
|
||||
return False
|
||||
|
||||
def calculate_zcr(self, audio_data):
|
||||
"""计算零交叉率"""
|
||||
if len(audio_data) == 0:
|
||||
return 0
|
||||
|
||||
audio_array = np.frombuffer(audio_data, dtype=np.int16)
|
||||
zero_crossings = np.sum(np.diff(np.sign(audio_array)) != 0)
|
||||
zcr = zero_crossings / len(audio_array) * self.RATE
|
||||
return zcr
|
||||
|
||||
def is_voice(self, zcr):
|
||||
"""简单的语音检测"""
|
||||
return self.zcr_min < zcr < self.zcr_max
|
||||
|
||||
def monitor_callback(self, in_data, frame_count, time_info, status):
|
||||
"""音频回调函数"""
|
||||
zcr = self.calculate_zcr(in_data)
|
||||
|
||||
# 更新历史
|
||||
self.zcr_history.append(zcr)
|
||||
if len(self.zcr_history) > self.max_history:
|
||||
self.zcr_history.pop(0)
|
||||
|
||||
# 计算统计信息
|
||||
if len(self.zcr_history) > 10:
|
||||
avg_zcr = np.mean(self.zcr_history[-10:]) # 最近10个值的平均
|
||||
std_zcr = np.std(self.zcr_history[-10:])
|
||||
else:
|
||||
avg_zcr = zcr
|
||||
std_zcr = 0
|
||||
|
||||
# 判断是否为语音
|
||||
voice_detected = self.is_voice(zcr)
|
||||
|
||||
# 实时显示
|
||||
status = "🎤" if voice_detected else "🔇"
|
||||
color = "\033[92m" if voice_detected else "\033[90m" # 绿色或灰色
|
||||
reset = "\033[0m"
|
||||
|
||||
# 显示信息
|
||||
info = (f"{color}{status} ZCR: {zcr:.0f} | "
|
||||
f"阈值: {self.zcr_min}-{self.zcr_max} | "
|
||||
f"平均: {avg_zcr:.0f}±{std_zcr:.0f}{reset}")
|
||||
|
||||
print(f"\r{info}", end='', flush=True)
|
||||
|
||||
return (in_data, pyaudio.paContinue)
|
||||
|
||||
def start_monitoring(self):
|
||||
"""开始监控"""
|
||||
print("🎙️ ZCR实时监控工具")
|
||||
print("=" * 50)
|
||||
print("📊 当前检测阈值:")
|
||||
print(f" ZCR范围: {self.zcr_min} - {self.zcr_max}")
|
||||
print("💡 请说话测试语音检测...")
|
||||
print("🛑 按 Ctrl+C 停止监控")
|
||||
print("=" * 50)
|
||||
|
||||
try:
|
||||
# 使用回调模式
|
||||
self.stream = self.audio.open(
|
||||
format=self.FORMAT,
|
||||
channels=self.CHANNELS,
|
||||
rate=self.RATE,
|
||||
input=True,
|
||||
frames_per_buffer=self.CHUNK_SIZE,
|
||||
stream_callback=self.monitor_callback
|
||||
)
|
||||
|
||||
self.stream.start_stream()
|
||||
self.running = True
|
||||
|
||||
# 主循环
|
||||
while self.running:
|
||||
time.sleep(0.1)
|
||||
|
||||
except KeyboardInterrupt:
|
||||
print("\n🛑 监控停止")
|
||||
finally:
|
||||
self.cleanup()
|
||||
|
||||
def show_statistics(self):
|
||||
"""显示统计信息"""
|
||||
if not self.zcr_history:
|
||||
return
|
||||
|
||||
print("\n📊 ZCR统计信息:")
|
||||
print(f" 样本数量: {len(self.zcr_history)}")
|
||||
print(f" 最小值: {min(self.zcr_history):.0f}")
|
||||
print(f" 最大值: {max(self.zcr_history):.0f}")
|
||||
print(f" 平均值: {np.mean(self.zcr_history):.0f}")
|
||||
print(f" 标准差: {np.std(self.zcr_history):.0f}")
|
||||
|
||||
# 分析语音检测
|
||||
voice_count = sum(1 for zcr in self.zcr_history if self.is_voice(zcr))
|
||||
voice_percentage = voice_count / len(self.zcr_history) * 100
|
||||
print(f" 语音检测: {voice_count}/{len(self.zcr_history)} ({voice_percentage:.1f}%)")
|
||||
|
||||
# 建议新的阈值
|
||||
avg_zcr = np.mean(self.zcr_history)
|
||||
std_zcr = np.std(self.zcr_history)
|
||||
suggested_min = max(800, avg_zcr + std_zcr)
|
||||
suggested_max = min(8000, avg_zcr + 4 * std_zcr)
|
||||
|
||||
print(f"\n🎯 建议的检测阈值:")
|
||||
print(f" 最小值: {suggested_min:.0f}")
|
||||
print(f" 最大值: {suggested_max:.0f}")
|
||||
|
||||
def cleanup(self):
|
||||
"""清理资源"""
|
||||
self.running = False
|
||||
|
||||
if self.stream:
|
||||
try:
|
||||
self.stream.stop_stream()
|
||||
self.stream.close()
|
||||
except:
|
||||
pass
|
||||
|
||||
if self.audio:
|
||||
try:
|
||||
self.audio.terminate()
|
||||
except:
|
||||
pass
|
||||
|
||||
# 显示最终统计
|
||||
self.show_statistics()
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
monitor = ZCRMonitor()
|
||||
|
||||
if not monitor.setup_audio():
|
||||
print("❌ 无法初始化音频设备")
|
||||
return
|
||||
|
||||
try:
|
||||
monitor.start_monitoring()
|
||||
except Exception as e:
|
||||
print(f"❌ 监控过程中出错: {e}")
|
||||
finally:
|
||||
monitor.cleanup()
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Loading…
Reference in New Issue
Block a user