This project bridges Python and Unity to create a voice-controlled gaming experience using real-time speech recognition. A Python server leverages Vosk Mini for low-latency, offline audio transcription, streaming results to Unity via a WebSocket connection. The system employs a dual-threaded architecture: the Python server isolates audio processing in a background thread, while Unity’s C# client uses a ConcurrentQueue
to safely pass voice commands from a WebSocket thread to the main game loop. Designed for scalability, voice triggers are defined through ScriptableObjects, allowing developers to map words like “pizza” or “burger” to 3D models without code changes. Ideal for games requiring instant voice interaction, this solution prioritizes performance (sub-500ms latency) and modularity.
This project demonstrates a low-latency speech recognition system for Unity games, combining:
- Python Server: Uses Vosk Mini for offline speech-to-text.
- Unity Client: Handles voice commands via WebSocket.
- Threaded Architecture: Keeps networking separate from game logic.
Technical Breakdown
1. Python Server (Vosk + WebSocket)
- Vosk Mini: Lightweight ASR model for real-time transcription.
- WebSocket: Async server (
websockets
library) on port 8765. - Dual Threads:
- Main Thread: Manages WebSocket connections.
- Audio Thread: Processes microphone input without blocking.
# Simplified server snippet
async def send_audio(websocket):
def sync_callback(word):
asyncio.run_coroutine_threadsafe(send_word(word), loop)
await loop.run_in_executor(None, run_recording, sync_callback)
2. Unity Client (C# WebSocket)
- Threaded WebSocket: Runs in background via
System.Threading
. - Concurrent Queue: Safely passes messages to the main thread.
- ScriptableObjects: Configurable voice commands (e.g., “apple” → spawn 3D fruit).
// Unity WebSocket handler
private void RunWebSocket() {
ws = new WebSocket("ws://localhost:8765");
ws.OnMessage += (sender, e) => receivedWordsQueue.Enqueue(e.Data);
}
Key Features
- Latency < 500ms: Achieved via Vosk Mini’s optimized inference.
- Thread Safety:
- Python:
asyncio.run_coroutine_threadsafe()
for async sync. - Unity:
ConcurrentQueue
decouples networking from gameplay.
- Python:
- Scalability: Add commands via Unity’s ScriptableObjects, no code changes.
Use Cases
- Voice-controlled character transformations
- Speech-driven puzzle mechanics
- Accessibility features for motor-impaired players
[GitHub Repositories]
- Python Server: https://github.com/payam-ranjbar/Speech-Transfer-Socket
- Unity Client: https://github.com/payam-ranjbar/Speech-Mania-Game
Leave a Reply