From adef09c06928fc8c747473ce57546da78e18d428 Mon Sep 17 00:00:00 2001 From: m15-ai Date: Tue, 13 May 2025 18:14:00 -0500 Subject: [PATCH] Add files via upload --- README.md | 333 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 333 insertions(+) create mode 100644 README.md diff --git a/README.md b/README.md new file mode 100644 index 0000000..29f0c99 --- /dev/null +++ b/README.md @@ -0,0 +1,333 @@ +# Local Voice Assistant (Offline, Real-Time AI) + +**Lightweight, low-latency voice assistant running fully offline on a Raspberry Pi or Linux machine.** +Powered by PyAudio, Vosk STT, Piper TTS, and local LLMs via Ollama. + +![badge](https://img.shields.io/badge/Offline-Voice%20AI-blue) +![badge](https://img.shields.io/badge/Audio-PyAudio-yellow) +![badge](https://img.shields.io/badge/TTS-Piper-orange) +![badge](https://img.shields.io/badge/LLM-Gemma2%20%7C%20Qwen-success) + +--- + +## 🎯 Features + +- πŸŽ™οΈ **Microphone Input** using PyAudio +- πŸ”Š **Real-Time Transcription** with [Vosk](https://alphacephei.com/vosk/) +- 🧠 **LLM-Powered Responses** using [Ollama](https://ollama.com) with models like `gemma2:2b`, `qwen2.5:0.5b` +- πŸ—£οΈ **Natural Voice Output** via [Piper TTS](https://github.com/rhasspy/piper) +- πŸŽ›οΈ Optional **Noise & Filter FX** using SoX for realism +- πŸ”§ ALSA **Volume Control** +- 🧩 Modular Python code ready for customization + +--- + +## πŸ›  Requirements + +- Raspberry Pi 5 or Linux desktop +- Python 3.9+ +- PyAudio, NumPy, requests, soxr, pydub, vosk +- SoX + ALSA utilities +- Ollama with one or more small LLMs (e.g., Gemma or Qwen) +- Piper TTS with ONNX voice models + +Install dependencies: + +``` +pip install pyaudio requests soxr numpy pydub vosk +sudo apt install sox alsa-utils +``` + +## βš™οΈ JSON Configuration + +Place a config file at va_config.json: + +``` +{ + "volume": 8, + "mic_name": "Plantronics", + "audio_output_device": "Plantronics", + "model_name": "gemma2:2b", + "voice": "en_US-kathleen-low.onnx", + "enable_audio_processing": false, + "history_length": 6, + "system_prompt": "You are a helpful assistant." +} +``` + +Note: if the configuration file is not found, defaults withing the main python app will be used: + +``` +# ------------------- CONFIG FILE LOADING ------------------- +DEFAULT_CONFIG = { + "volume": 9, + "mic_name": "Plantronics", + "audio_output_device": "Plantronics", + "model_name": "qwen2.5:0.5b", + "voice": "en_US-kathleen-low.onnx", + "enable_audio_processing": False, + "history_length": 4, + "system_prompt": "You are a helpful assistant." +} +``` + +### πŸ” What `history_length` Means + +The `history_length` setting controls how many previous exchanges (user + assistant messages) are included when generating each new reply. + +- A value of `6` means the model receives the last 6 exchanges, plus the system prompt. +- This allows the assistant to maintain **short-term memory** for more coherent conversations. +- Setting it lower (e.g., `2`) increases speed and memory efficiency. + +### βœ… `requirements.txt` + +``` +pyaudio +vosk +soxr +numpy +requests +pydub +``` + +If you plan to run this on a Raspberry Pi, you may also need: + +``` +soundfile # for pydub compatibility on some distros +``` + +## 🐍 Install with Virtual Environment + +``` +# 1. Clone the repo + +git clone https://github.com/your-username/voice-assistant-local.git +cd voice-assistant-local + +# 2. Create and activate a virtual environment + +python3 -m venv env +source env/bin/activate + +# 3. Install dependencies + +pip install -r requirements.txt + +# 4. Install SoX and ALSA utilities (if not already installed) + +sudo apt install sox alsa-utils + +# 5. (Optional) Test PyAudio installation + +python -m pip install --upgrade pip setuptools wheel +``` + +> πŸ’‘ If you get errors installing PyAudio on Raspberry Pi, try: +> +> ``` +> sudo apt install portaudio19-dev +> pip install pyaudio +> ``` + +## πŸ†• πŸ”§ Piper Installation (Binary) + +Piper is a standalone text-to-speech engine used by this assistant. It's **not a Python package**, so it must be installed manually. + +#### βœ… Install Piper + +1. Download the appropriate Piper binary from: + πŸ‘‰ https://github.com/rhasspy/piper/releases + + For Ubuntu Linux, download: + `piper_linux_x86_64.tar.gz` + +2. Extract it: + + ``` + tar -xvzf piper_linux_x86_64.tar.gz + ``` + +3. Move the binary into your project directory: + + ``` + mkdir -p bin/piper + mv piper bin/piper/ + chmod +x bin/piper/piper + ``` + +4. βœ… Done! The script will automatically call it from `bin/piper/piper`. + +## πŸ“‚ Directory Example + +``` +voice_assistant.py +va_config.json +requirements.txt +bin/ +└── piper/ + └── piper ← (binary) +voices/ +└── en_US-kathleen-low.onnx +└── en_US-kathleen-low.onnx.json +``` + + + +## πŸ”Œ Finding Your USB Microphone & Speaker + +To configure the correct audio devices, use these commands on your Raspberry Pi or Linux terminal: + +1. List Microphones (Input Devices) + +``` +python3 -m pip install pyaudio +python3 -c "import pyaudio; p = pyaudio.PyAudio(); \ +[print(f'{i}: {p.get_device_info_by_index(i)}') for i in range(p.get_device_count())]" +``` + +Look for your microphone name (e.g., Plantronics) and use that as mic_name. +2. List Speakers (Output Devices) + +``` +aplay -l +``` + +Example output: + +``` +card 3: Device [USB PnP Sound Device], device 0: USB Audio [USB Audio] +``` + +Use this info to set your audio_output_device to something like: + +``` +"audio_output_device": "USB PnP" +``` + +## πŸ”§ Ollama Installation (Required) + +Ollama is a local model runner for LLMs. You need to install it separately (outside of Python). + +#### πŸ’» Install Ollama + +On **Linux (x86 or ARM)**: + +``` +curl -fsSL https://ollama.com/install.sh | sh +``` + +Or follow detailed instructions: + πŸ‘‰ https://ollama.com/download + +Then start the daemon: + +``` +ollama serve +``` + +#### πŸ“₯ Download the Models + +After Ollama is installed and running, open a terminal and run: + +##### βœ… For Gemma 2B: + +``` +ollama run gemma2:2b +``` + +##### For Qwen 0.5B: + +``` +ollama run qwen2.5:0.5b +``` + +This will automatically download and start the models. You only need to run this once per model. + +##### ⚠️ Reminder + +> Ollama is **not a Python package** β€” it is a background service. +> Do **not** add it to `requirements.txt`. Just make sure it’s installed and running before launching the assistant. + +## 🎀 Installing Piper Voice Models + +To enable speech synthesis, you'll need to download a **voice model (.onnx)** and its matching **config (.json)** file. + +#### βœ… Steps: + +1. Visit the official Piper voices list: + πŸ“„ https://github.com/rhasspy/piper/blob/master/VOICES.md + +2. Choose a voice you like (e.g., `en_US-lessac-medium` or `en_US-amy-low`). + +3. Download **both** files for your chosen voice: + + - `voice.onnx` + - `config.json` + +4. If you wish, you can rename the ONNX file and config file using the same base name. For example: + + ``` + amy-low.onnx + amy-low.json + ``` + +5. Place both files in a directory called `voices/` next to your script. + Example Directory Structure: + + ``` + voice_assistant.py + voices/ + β”œβ”€β”€ amy-low.onnx + └── amy-low.json + ``` + +6. Update your `config.json`: + + ``` + "voice": "amy-low.onnx" + ``` + +> ⚠️ Make sure both `.onnx` and `.json` are present in the `voices/` folder with matching names (excluding the extension). + +## πŸ§ͺ **Performance Report** + +The script prints out debug timing for the STT, LLM, and TTS parts of the pipeline. I asked ChatGPT4 to analyze some of the results i obtained. + +**System:** Ubuntu laptop, Intel Core i5 + **Model:** `qwen2.5:0.5b` (local via Ollama) + **TTS:** `piper` with `en_US-kathleen-low.onnx` + **Audio:** Plantronics USB headset + +------ + +### πŸ“Š **Timing Metrics (avg)** + +| Stage | Metric (ms) | Notes | +| -------------- | ------------- | --------------------------------------- | +| STT Parse | 4.5 ms avg | Vosk transcribes near-instantly | +| LLM Inference | ~2,200 ms avg | Ranges from ~1s (short queries) to 5s | +| TTS Generation | ~1,040 ms avg | Piper ONNX performs well on CPU | +| Audio Playback | ~7,250 ms avg | Reflects actual audio length, not delay | + +### βœ… Observations + +- **STT speed is excellent** β€” under 10 ms consistently. +- **LLM inference is snappy** for a 0.5b model running locally. Your best response came in under 1.1 sec. +- **TTS is consistent and fast** β€” Kathleen-low voice is fully synthesized in ~800–1600 ms. +- **Playback timing matches response length** β€” no lag, just actual audio time. +- End-to-end round trip time from speaking to hearing a reply is about **8–10 seconds**, including speech and playback time. + +## πŸ’‘ Use Cases + +- ​ Offline smart assistants + +- ​ Wearable or embedded AI demos + +- ​ Voice-controlled kiosks + +- ​ Character-based roleplay agents + + +## πŸ“„ License + +MIT Β© 2024 M15.ai \ No newline at end of file