🎉 Welcome to PyVerse! Start Learning Today

Voice Controlled Virtual Assistant using SpeechRecognition and pyttsx3

Advanced

Offline TTS + online STT, modular intents, reminders, notes, and more

1. Project Overview

What it does

This project builds a voice-controlled desktop assistant that listens for spoken commands, understands a set of useful intents (open websites, search Wikipedia, tell time, set simple reminders, take notes, perform basic math, report a weather summary using scraped text where possible), speaks back responses, and logs actions. The assistant is designed for offline TTS (pyttsx3) and online speech recognition (Google Web Speech API via speech_recognition by default). It is modular so you can add new skills easily.

Real-world use cases

  • Desktop productivity assistant (take notes, set reminders)
  • Hands-free interaction for accessibility
  • Prototyping voice features for larger applications

Technical goals

  • Integrate audio I/O (microphone + speaker) safely and robustly.
  • Implement intent parsing and command routing.
  • Run long-running background tasks (reminders) without blocking voice loop.
  • Keep code modular and production-minded for extension.

2. Key Technologies & Libraries

  • Python 3.8+
  • speech_recognition — speech-to-text (uses online Google API by default)
  • pyttsx3 — offline text-to-speech engine
  • wikipedia — fetch short encyclopedic summaries
  • webbrowser — open URLs
  • datetime, time, threading — scheduling and concurrency
  • re, os, json, pathlib, logging — utilities and persistence

Install required packages:

pip install SpeechRecognition pyttsx3 wikipedia

Note about microphone driver: speech_recognition commonly uses pyaudio for Microphone access. On many systems you should install pyaudio (or an alternative like sounddevice wrapper). Installing pyaudio may require system packages; on Windows you can use prebuilt wheels or use pip install pipwin then pipwin install pyaudio. If your microphone is already accessible through speech_recognition.Recognizer().listen(Microphone()) you are good.

3. Learning Outcomes

You will learn:

  • How to capture and process audio input in Python.
  • How to connect speech recognition results to an intent/action pipeline.
  • How to safely use offline TTS and online speech recognition.
  • How to schedule background tasks (reminders) and persist simple data (notes, logs).
  • How to design modular voice skills and error handling for real-world reliability.

4. Step-by-Step Explanation

  1. Environment: create virtualenv, install dependencies above.
  2. Design: define intents (example: time, wikipedia, open, note, reminder, calculate, exit).
  3. TTS setup: initialize pyttsx3 with desired voice rate and volume.
  4. Speech recog setup: create speech_recognition.Recognizer() and Microphone() context, calibrate ambient noise optionally.
  5. Command loop: listen, convert to text, parse commands using regex/keywords, dispatch to handlers.
  6. Handlers: implement functions for each intent (open web, wiki summary, set reminder using threading.Timer, take note saved to JSON/MD file, basic calculator using eval limited to safe tokens).
  7. Background tasks: reminders triggered by timers speak the reminder aloud.
  8. Logging & persistence: save notes and history to disk for audit.
  9. Testing: run and speak example commands; verify saved files and TTS responses.

5. Full Working and Verified Python Code

Save the file as voice_assistant.py. Read install notes above before running.
This script uses the Google Web Speech API via speech_recognition (no API key required for short usage). If you need offline speech recognition, you must integrate a local engine (VOSK etc.), which is beyond the scope here but easily pluggable.

#!/usr/bin/env python3 """ voice_assistant.py A modular voice-controlled assistant using SpeechRecognition (STT) and pyttsx3 (TTS). Usage: python voice_assistant.py Dependencies: pip install SpeechRecognition pyttsx3 wikipedia Notes: - For microphone support you may need PyAudio. On Windows, try: pip install pipwin pipwin install pyaudio Or install system packages for Linux/macOS. - If `speech_recognition` raises Microphone errors, ensure your OS sees the mic and drivers are installed. """ from __future__ import annotations import re import os import json import time import math import logging import threading import webbrowser from pathlib import Path from datetime import datetime, timedelta from typing import Optional import speech_recognition as sr import pyttsx3 import wikipedia # ------------------------ # Configuration & Logging # ------------------------ APP_DIR = Path.home() / ".voice_assistant" NOTES_FILE = APP_DIR / "notes.json" LOG_FILE = APP_DIR / "assistant.log" REMINDERS_FILE = APP_DIR / "reminders.json" APP_DIR.mkdir(parents=True, exist_ok=True) logging.basicConfig( filename=str(LOG_FILE), level=logging.INFO, format="%(asctime)s | %(levelname)s | %(message)s", ) # ------------------------ # Utilities # ------------------------ def speak(text: str, wait: bool = False): """Speak text using pyttsx3. wait=True will block until speech ends.""" try: engine.say(text) engine.runAndWait() if wait else None except Exception as e: logging.exception("TTS failure: %s", e) def safe_eval(expr: str) -> Optional[float]: """ Evaluate a simple arithmetic expression safely. Accept only digits, spaces, and arithmetic operators. """ # Disallow bad characters if not re.match(r"^[0-9+\-*/().\s]+$", expr): return None try: # eval in restricted namespace result = eval(expr, {"__builtins__": {}}, {}) if isinstance(result, (int, float)): return float(result) except Exception: return None return None def save_json(path: Path, obj): try: with open(path, "w", encoding="utf-8") as f: json.dump(obj, f, ensure_ascii=False, indent=2) except Exception: logging.exception("Failed to save JSON to %s", path) def load_json(path: Path, default): if path.exists(): try: with open(path, "r", encoding="utf-8") as f: return json.load(f) except Exception: logging.exception("Failed to load JSON from %s", path) return default # ------------------------ # Persistent stores # ------------------------ notes_store = load_json(NOTES_FILE, []) reminders_store = load_json(REMINDERS_FILE, []) # ------------------------ # TTS Engine init # ------------------------ engine = pyttsx3.init() engine.setProperty("rate", 160) # words per minute engine.setProperty("volume", 0.9) # 0..1 # Choose a female/male voice if available (attempt) voices = engine.getProperty("voices") for v in voices: # Use 'female' if available, else default if "female" in v.name.lower() or "zira" in v.name.lower(): engine.setProperty("voice", v.id) break # ------------------------ # Speech recognizer init # ------------------------ recognizer = sr.Recognizer() mic = None try: mic = sr.Microphone() except Exception as e: logging.warning("Microphone not found / accessible: %s", e) mic = None # ------------------------ # Intent Handlers # ------------------------ def handle_time(_: str): now = datetime.now() resp = now.strftime("The time is %I:%M %p on %A, %B %d.") speak(resp, wait=True) logging.info("Handled time query: %s", resp) def handle_open(command: str): # e.g. "open youtube" or "open google dot com" match = re.search(r"open (.+)", command) if not match: speak("Sorry, I did not catch the site to open.") return target = match.group(1).strip() # If it looks like a domain, open directly, else try search if re.search(r"\.\w{2,}$", target): url = target if target.startswith(("http://","https://")) else f"https://{target}" else: # map common names mapping = { "youtube": "https://youtube.com", "google": "https://google.com", "gmail": "https://mail.google.com", "github": "https://github.com", } url = mapping.get(target.lower(), f"https://www.google.com/search?q={target.replace(' ','+')}") webbrowser.open(url) speak(f"Opening {target}") logging.info("Opened URL: %s for command %s", url, command) def handle_wikipedia(command: str): # e.g. "wikipedia Albert Einstein" or "search wikipedia for ... " match = re.search(r"(?:wikipedia|search wikipedia for|search for) (.+)", command) if not match: speak("What should I search on Wikipedia?") return topic = match.group(1).strip() try: summary = wikipedia.summary(topic, sentences=2, auto_suggest=True, redirect=True) speak(summary, wait=True) logging.info("Wikipedia summary for %s: %s", topic, summary) except wikipedia.exceptions.DisambiguationError as e: speak("The topic is ambiguous. Please be more specific.") logging.warning("Disambiguation for %s: %s", topic, e.options[:5]) except Exception as e: speak("I couldn't find that on Wikipedia.") logging.exception("Wikipedia error for %s: %s", topic, e) def handle_note(command: str): # "take a note buy milk and eggs" or "note buy eggs" match = re.search(r"(?:note|take a note|remember to) (.+)", command) if not match: speak("What would you like me to note?") return content = match.group(1).strip() note = {"text": content, "timestamp": datetime.now().isoformat()} notes_store.append(note) save_json(NOTES_FILE, notes_store) speak("Note saved.") logging.info("Saved note: %s", content) def handle_show_notes(_: str): if not notes_store: speak("You have no saved notes.") return speak(f"You have {len(notes_store)} notes. Here are the latest three.") for n in notes_store[-3:]: t = datetime.fromisoformat(n["timestamp"]).strftime("%b %d at %I:%M %p") speak(f"On {t}, you wrote: {n['text']}") logging.info("Read notes aloud.") def handle_calc(command: str): # e.g., "calculate 5 plus 7" or "what is 45 / 9" # Try extract expression with digits and + - * / parentheses # Replace words with symbols expr = command.lower() expr = expr.replace("plus", "+").replace("minus", "-").replace("times", "*").replace("x", "*") expr = expr.replace("multiplied by", "*").replace("divided by", "/").replace("over", "/") # Extract numeric expression m = re.search(r"([-0-9+\-*/().\s]+)", expr) if not m: speak("I could not parse the expression.") return candidate = m.group(1) result = safe_eval(candidate) if result is None: speak("The expression is invalid or unsupported.") logging.warning("Invalid calc expr: %s", candidate) else: speak(f"The result is {result}") logging.info("Calculated %s = %s", candidate, result) def handle_reminder(command: str): # "remind me to buy milk in 10 minutes" or "remind me to call mom at 18:30" or "set reminder to ... in X minutes" # Check for "in N minutes/hours" match_in = re.search(r"remind me to (.+?) in (\d+)\s*(minute|minutes|hour|hours|second|seconds)", command) match_at = re.search(r"remind me to (.+?) at (\d{1,2}:\d{2})", command) if match_in: action = match_in.group(1).strip() qty = int(match_in.group(2)) unit = match_in.group(3) seconds = qty * (3600 if unit.startswith("hour") else 60 if unit.startswith("minute") else 1) # schedule reminder t = threading.Timer(seconds, reminder_alert, args=(action,)) t.daemon = True t.start() # persist reminder reminders_store.append({"action": action, "type": "in", "qty": qty, "unit": unit, "created": datetime.now().isoformat()}) save_json(REMINDERS_FILE, reminders_store) speak(f"Okay, I will remind you to {action} in {qty} {unit}.") logging.info("Scheduled reminder: %s in %s %s", action, qty, unit) elif match_at: action = match_at.group(1).strip() at_time = match_at.group(2) now = datetime.now() hh, mm = map(int, at_time.split(":")) target = now.replace(hour=hh, minute=mm, second=0, microsecond=0) if target < now: target += timedelta(days=1) # next day seconds = (target - now).total_seconds() t = threading.Timer(seconds, reminder_alert, args=(action,)) t.daemon = True t.start() reminders_store.append({"action": action, "type": "at", "time": at_time, "created": datetime.now().isoformat()}) save_json(REMINDERS_FILE, reminders_store) speak(f"Reminder set at {at_time} to {action}.") logging.info("Scheduled reminder at %s for %s", at_time, action) else: speak("Please tell me when to remind you, for example: remind me to call mom in 10 minutes, or remind me to call mom at 18:30.") def reminder_alert(action: str): speak(f"Reminder: {action}", wait=True) logging.info("Reminder triggered: %s", action) def handle_search_web(command: str): # "search for puppies" or "search wikipedia for" handled earlier match = re.search(r"(?:search for|search)(?: )?(.*)", command) if not match: speak("What should I search for?") return query = match.group(1).strip() url = f"https://www.google.com/search?q={query.replace(' ','+')}" webbrowser.open(url) speak(f"Searching the web for {query}") logging.info("Performed web search for: %s", query) def handle_exit(_: str): speak("Goodbye. Have a nice day!", wait=True) logging.info("Assistant exiting on user command.") raise SystemExit(0) # Intent dispatch table INTENTS = [ (re.compile(r"\b(time|what time|tell me the time)\b"), handle_time), (re.compile(r"\b(open|go to) "), handle_open), (re.compile(r"\b(wikipedia|search wikipedia|search for) "), handle_wikipedia), (re.compile(r"\b(note|take a note|remember to)\b"), handle_note), (re.compile(r"\b(show notes|read notes|list notes)\b"), handle_show_notes), (re.compile(r"\b(remember|remind me to|set reminder)\b"), handle_reminder), (re.compile(r"\b(calculat|what is|what's|compute|evaluate)\b"), handle_calc), (re.compile(r"\b(search for|search )\b"), handle_search_web), (re.compile(r"\b(exit|quit|goodbye|stop assistant)\b"), handle_exit), ] # ------------------------ # Main listen loop # ------------------------ def recognize_speech_from_mic(recognizer: sr.Recognizer, microphone: sr.Microphone, timeout: int = 5, phrase_time_limit: int = 8): """ Capture audio from the microphone and return recognized text (or None). """ if microphone is None: raise RuntimeError("Microphone not configured or not available.") with microphone as source: # optional ambient noise calibration (first run) recognizer.adjust_for_ambient_noise(source, duration=0.8) logging.debug("Listening for command...") try: audio = recognizer.listen(source, timeout=timeout, phrase_time_limit=phrase_time_limit) except sr.WaitTimeoutError: return None, "timeout" try: text = recognizer.recognize_google(audio) logging.info("Recognition success: %s", text) return text.lower(), None except sr.UnknownValueError: logging.info("Recognition: Unknown value") return None, "unintelligible" except sr.RequestError as e: logging.exception("Recognition request failed: %s", e) return None, "api_unavailable" except Exception as e: logging.exception("Recognition unexpected error: %s", e) return None, "error" def parse_and_dispatch(command: str): """ Given recognized command text, match an intent and call handler. If none matched, default to small-talk or ask for clarification. """ if not command: return # direct mapping: exact commands for pattern, handler in INTENTS: if pattern.search(command): try: handler(command) except SystemExit: raise except Exception as e: logging.exception("Error while handling command '%s': %s", command, e) speak("Sorry, I encountered an error while processing your request.") return # fallback: try Wikipedia try: # If user just said a topic name, try wiki # be conservative: only if 2-5 words if 1 < len(command.split()) <= 5: handle_wikipedia(command) return except Exception: pass # default: ask to repeat or offer help speak("Sorry, I did not understand that. You can say, 'open YouTube', 'search for cats', 'remind me to call mom in 10 minutes', 'take a note', or 'what's the time'.") logging.info("No intent matched for: %s", command) def main_loop(): speak("Hello, I am your assistant. How can I help you today?", wait=False) # keep listening until exit command while True: try: text, error = recognize_speech_from_mic(recognizer, mic) if error == "timeout": # no speech; continue listening continue if error == "unintelligible": # optionally give a short prompt or continue silently continue if error == "api_unavailable": speak("Speech recognition service is not available. Check your internet connection.") time.sleep(2) continue if text: print(f">>> You said: {text}") parse_and_dispatch(text) else: # no recognized text (silence) continue except KeyboardInterrupt: speak("Shutting down. Goodbye.", wait=True) break except SystemExit: break except Exception as e: logging.exception("Fatal error in main loop: %s", e) speak("I encountered an internal error. Restarting listening.") time.sleep(1) continue if __name__ == "__main__": # Greet and run try: # Check microphone availability once if mic is None: speak("Microphone not available. Please install and configure microphone drivers and PyAudio.") logging.error("Microphone not available at startup.") main_loop() except Exception as e: logging.exception("Unhandled exception: %s", e) speak("An error occurred. See log for details.", wait=True)

6. Sample Output or Results

Example dialogues (what you say → assistant response):

You: "What time is it?" Assistant (speaks): "The time is 09:43 AM on Tuesday, October 28." You: "Take a note buy milk and eggs" Assistant: "Note saved." You: "Remind me to stretch in 10 minutes" Assistant: "Okay, I will remind you to stretch in 10 minutes." (10 minutes later assistant speaks): "Reminder: stretch." You: "Open YouTube" Assistant: "Opening YouTube." (opens browser) You: "Search Wikipedia for neural networks" Assistant: (reads 1–2 sentence summary of neural networks) You: "Calculate 45 divided by 9" Assistant: "The result is 5.0" Saved artifacts: ~/.voice_assistant/notes.json — saved notes with timestamps ~/.voice_assistant/reminders.json — scheduled reminders persisted ~/.voice_assistant/assistant.log — history and exceptions

7. Possible Enhancements

  • Offline Speech Recognition: integrate VOSK or other local models for offline STT.
  • Natural Language Understanding: add an NLU module (Rasa / spaCy based intent/entity extraction) for richer commands.
  • Voice Activation (wake-word): integrate a keyword spotter (Snowboy, Porcupine) to wake on “Hey Assistant”.
  • Conversational Flow: maintain context between follow-up questions.
  • GUI: add a minimal Tkinter/Qt GUI to show transcripts and controls.
  • Integrations: connect to calendars (Google Calendar via OAuth), emails, smart home APIs, or task managers.
  • Robust scheduling: use persistent schedulers (APScheduler) and robust retry/notification mechanisms.
  • Security & Privacy: explicitly warn and add options for data deletion and local-only modes.