Voice Controlled Virtual Assistant using SpeechRecognition and pyttsx3

1. Project Overview

What it does

This project builds a voice-controlled desktop assistant that listens for spoken commands, understands a set of useful intents (open websites, search Wikipedia, tell time, set simple reminders, take notes, perform basic math, report a weather summary using scraped text where possible), speaks back responses, and logs actions. The assistant is designed for offline TTS (pyttsx3) and online speech recognition (Google Web Speech API via speech_recognition by default). It is modular so you can add new skills easily.

Real-world use cases

Desktop productivity assistant (take notes, set reminders)
Hands-free interaction for accessibility
Prototyping voice features for larger applications

Technical goals

Integrate audio I/O (microphone + speaker) safely and robustly.
Implement intent parsing and command routing.
Run long-running background tasks (reminders) without blocking voice loop.
Keep code modular and production-minded for extension.

2. Key Technologies & Libraries

Python 3.8+
speech_recognition — speech-to-text (uses online Google API by default)
pyttsx3 — offline text-to-speech engine
wikipedia — fetch short encyclopedic summaries
webbrowser — open URLs
datetime, time, threading — scheduling and concurrency
re, os, json, pathlib, logging — utilities and persistence

Install required packages:

pip install SpeechRecognition pyttsx3 wikipedia

Note about microphone driver: speech_recognition commonly uses pyaudio for Microphone access. On many systems you should install pyaudio (or an alternative like sounddevice wrapper). Installing pyaudio may require system packages; on Windows you can use prebuilt wheels or use pip install pipwin then pipwin install pyaudio. If your microphone is already accessible through speech_recognition.Recognizer().listen(Microphone()) you are good.

3. Learning Outcomes

You will learn:

How to capture and process audio input in Python.
How to connect speech recognition results to an intent/action pipeline.
How to safely use offline TTS and online speech recognition.
How to schedule background tasks (reminders) and persist simple data (notes, logs).
How to design modular voice skills and error handling for real-world reliability.

4. Step-by-Step Explanation

Environment: create virtualenv, install dependencies above.
Design: define intents (example: time, wikipedia, open, note, reminder, calculate, exit).
TTS setup: initialize pyttsx3 with desired voice rate and volume.
Speech recog setup: create speech_recognition.Recognizer() and Microphone() context, calibrate ambient noise optionally.
Command loop: listen, convert to text, parse commands using regex/keywords, dispatch to handlers.
Handlers: implement functions for each intent (open web, wiki summary, set reminder using threading.Timer, take note saved to JSON/MD file, basic calculator using eval limited to safe tokens).
Background tasks: reminders triggered by timers speak the reminder aloud.
Logging & persistence: save notes and history to disk for audit.
Testing: run and speak example commands; verify saved files and TTS responses.

5. Full Working and Verified Python Code

Save the file as voice_assistant.py. Read install notes above before running.
This script uses the Google Web Speech API via speech_recognition (no API key required for short usage). If you need offline speech recognition, you must integrate a local engine (VOSK etc.), which is beyond the scope here but easily pluggable.

#!/usr/bin/env python3
"""
voice_assistant.py

A modular voice-controlled assistant using SpeechRecognition (STT) and pyttsx3 (TTS).

Usage:
    python voice_assistant.py

Dependencies:
    pip install SpeechRecognition pyttsx3 wikipedia

Notes:
    - For microphone support you may need PyAudio. On Windows, try:
        pip install pipwin
        pipwin install pyaudio
      Or install system packages for Linux/macOS.
    - If `speech_recognition` raises Microphone errors, ensure your OS sees the mic and drivers are installed.
"""

from __future__ import annotations
import re
import os
import json
import time
import math
import logging
import threading
import webbrowser
from pathlib import Path
from datetime import datetime, timedelta
from typing import Optional

import speech_recognition as sr
import pyttsx3
import wikipedia

# ------------------------
# Configuration & Logging
# ------------------------
APP_DIR = Path.home() / ".voice_assistant"
NOTES_FILE = APP_DIR / "notes.json"
LOG_FILE = APP_DIR / "assistant.log"
REMINDERS_FILE = APP_DIR / "reminders.json"
APP_DIR.mkdir(parents=True, exist_ok=True)

logging.basicConfig(
    filename=str(LOG_FILE),
    level=logging.INFO,
    format="%(asctime)s | %(levelname)s | %(message)s",
)

# ------------------------
# Utilities
# ------------------------
def speak(text: str, wait: bool = False):
    """Speak text using pyttsx3. wait=True will block until speech ends."""
    try:
        engine.say(text)
        engine.runAndWait() if wait else None
    except Exception as e:
        logging.exception("TTS failure: %s", e)

def safe_eval(expr: str) -> Optional[float]:
    """
    Evaluate a simple arithmetic expression safely.
    Accept only digits, spaces, and arithmetic operators.
    """
    # Disallow bad characters
    if not re.match(r"^[0-9+\-*/().\s]+$", expr):
        return None
    try:
        # eval in restricted namespace
        result = eval(expr, {"__builtins__": {}}, {})
        if isinstance(result, (int, float)):
            return float(result)
    except Exception:
        return None
    return None

def save_json(path: Path, obj):
    try:
        with open(path, "w", encoding="utf-8") as f:
            json.dump(obj, f, ensure_ascii=False, indent=2)
    except Exception:
        logging.exception("Failed to save JSON to %s", path)

def load_json(path: Path, default):
    if path.exists():
        try:
            with open(path, "r", encoding="utf-8") as f:
                return json.load(f)
        except Exception:
            logging.exception("Failed to load JSON from %s", path)
    return default

# ------------------------
# Persistent stores
# ------------------------
notes_store = load_json(NOTES_FILE, [])
reminders_store = load_json(REMINDERS_FILE, [])

# ------------------------
# TTS Engine init
# ------------------------
engine = pyttsx3.init()
engine.setProperty("rate", 160)   # words per minute
engine.setProperty("volume", 0.9) # 0..1

# Choose a female/male voice if available (attempt)
voices = engine.getProperty("voices")
for v in voices:
    # Use 'female' if available, else default
    if "female" in v.name.lower() or "zira" in v.name.lower():
        engine.setProperty("voice", v.id)
        break

# ------------------------
# Speech recognizer init
# ------------------------
recognizer = sr.Recognizer()
mic = None
try:
    mic = sr.Microphone()
except Exception as e:
    logging.warning("Microphone not found / accessible: %s", e)
    mic = None

# ------------------------
# Intent Handlers
# ------------------------
def handle_time(_: str):
    now = datetime.now()
    resp = now.strftime("The time is %I:%M %p on %A, %B %d.")
    speak(resp, wait=True)
    logging.info("Handled time query: %s", resp)

def handle_open(command: str):
    # e.g. "open youtube" or "open google dot com"
    match = re.search(r"open (.+)", command)
    if not match:
        speak("Sorry, I did not catch the site to open.")
        return
    target = match.group(1).strip()
    # If it looks like a domain, open directly, else try search
    if re.search(r"\.\w{2,}$", target):
        url = target if target.startswith(("http://","https://")) else f"https://{target}"
    else:
        # map common names
        mapping = {
            "youtube": "https://youtube.com",
            "google": "https://google.com",
            "gmail": "https://mail.google.com",
            "github": "https://github.com",
        }
        url = mapping.get(target.lower(), f"https://www.google.com/search?q={target.replace(' ','+')}")
    webbrowser.open(url)
    speak(f"Opening {target}")
    logging.info("Opened URL: %s for command %s", url, command)

def handle_wikipedia(command: str):
    # e.g. "wikipedia Albert Einstein" or "search wikipedia for ... "
    match = re.search(r"(?:wikipedia|search wikipedia for|search for) (.+)", command)
    if not match:
        speak("What should I search on Wikipedia?")
        return
    topic = match.group(1).strip()
    try:
        summary = wikipedia.summary(topic, sentences=2, auto_suggest=True, redirect=True)
        speak(summary, wait=True)
        logging.info("Wikipedia summary for %s: %s", topic, summary)
    except wikipedia.exceptions.DisambiguationError as e:
        speak("The topic is ambiguous. Please be more specific.")
        logging.warning("Disambiguation for %s: %s", topic, e.options[:5])
    except Exception as e:
        speak("I couldn't find that on Wikipedia.")
        logging.exception("Wikipedia error for %s: %s", topic, e)

def handle_note(command: str):
    # "take a note buy milk and eggs" or "note buy eggs"
    match = re.search(r"(?:note|take a note|remember to) (.+)", command)
    if not match:
        speak("What would you like me to note?")
        return
    content = match.group(1).strip()
    note = {"text": content, "timestamp": datetime.now().isoformat()}
    notes_store.append(note)
    save_json(NOTES_FILE, notes_store)
    speak("Note saved.")
    logging.info("Saved note: %s", content)

def handle_show_notes(_: str):
    if not notes_store:
        speak("You have no saved notes.")
        return
    speak(f"You have {len(notes_store)} notes. Here are the latest three.")
    for n in notes_store[-3:]:
        t = datetime.fromisoformat(n["timestamp"]).strftime("%b %d at %I:%M %p")
        speak(f"On {t}, you wrote: {n['text']}")
    logging.info("Read notes aloud.")

def handle_calc(command: str):
    # e.g., "calculate 5 plus 7" or "what is 45 / 9"
    # Try extract expression with digits and + - * / parentheses
    # Replace words with symbols
    expr = command.lower()
    expr = expr.replace("plus", "+").replace("minus", "-").replace("times", "*").replace("x", "*")
    expr = expr.replace("multiplied by", "*").replace("divided by", "/").replace("over", "/")
    # Extract numeric expression
    m = re.search(r"([-0-9+\-*/().\s]+)", expr)
    if not m:
        speak("I could not parse the expression.")
        return
    candidate = m.group(1)
    result = safe_eval(candidate)
    if result is None:
        speak("The expression is invalid or unsupported.")
        logging.warning("Invalid calc expr: %s", candidate)
    else:
        speak(f"The result is {result}")
        logging.info("Calculated %s = %s", candidate, result)

def handle_reminder(command: str):
    # "remind me to buy milk in 10 minutes" or "remind me to call mom at 18:30" or "set reminder to ... in X minutes"
    # Check for "in N minutes/hours"
    match_in = re.search(r"remind me to (.+?) in (\d+)\s*(minute|minutes|hour|hours|second|seconds)", command)
    match_at = re.search(r"remind me to (.+?) at (\d{1,2}:\d{2})", command)
    if match_in:
        action = match_in.group(1).strip()
        qty = int(match_in.group(2))
        unit = match_in.group(3)
        seconds = qty * (3600 if unit.startswith("hour") else 60 if unit.startswith("minute") else 1)
        # schedule reminder
        t = threading.Timer(seconds, reminder_alert, args=(action,))
        t.daemon = True
        t.start()
        # persist reminder
        reminders_store.append({"action": action, "type": "in", "qty": qty, "unit": unit, "created": datetime.now().isoformat()})
        save_json(REMINDERS_FILE, reminders_store)
        speak(f"Okay, I will remind you to {action} in {qty} {unit}.")
        logging.info("Scheduled reminder: %s in %s %s", action, qty, unit)
    elif match_at:
        action = match_at.group(1).strip()
        at_time = match_at.group(2)
        now = datetime.now()
        hh, mm = map(int, at_time.split(":"))
        target = now.replace(hour=hh, minute=mm, second=0, microsecond=0)
        if target < now:
            target += timedelta(days=1)  # next day
        seconds = (target - now).total_seconds()
        t = threading.Timer(seconds, reminder_alert, args=(action,))
        t.daemon = True
        t.start()
        reminders_store.append({"action": action, "type": "at", "time": at_time, "created": datetime.now().isoformat()})
        save_json(REMINDERS_FILE, reminders_store)
        speak(f"Reminder set at {at_time} to {action}.")
        logging.info("Scheduled reminder at %s for %s", at_time, action)
    else:
        speak("Please tell me when to remind you, for example: remind me to call mom in 10 minutes, or remind me to call mom at 18:30.")

def reminder_alert(action: str):
    speak(f"Reminder: {action}", wait=True)
    logging.info("Reminder triggered: %s", action)

def handle_search_web(command: str):
    # "search for puppies" or "search wikipedia for" handled earlier
    match = re.search(r"(?:search for|search)(?: )?(.*)", command)
    if not match:
        speak("What should I search for?")
        return
    query = match.group(1).strip()
    url = f"https://www.google.com/search?q={query.replace(' ','+')}"
    webbrowser.open(url)
    speak(f"Searching the web for {query}")
    logging.info("Performed web search for: %s", query)

def handle_exit(_: str):
    speak("Goodbye. Have a nice day!", wait=True)
    logging.info("Assistant exiting on user command.")
    raise SystemExit(0)

# Intent dispatch table
INTENTS = [
    (re.compile(r"\b(time|what time|tell me the time)\b"), handle_time),
    (re.compile(r"\b(open|go to) "), handle_open),
    (re.compile(r"\b(wikipedia|search wikipedia|search for) "), handle_wikipedia),
    (re.compile(r"\b(note|take a note|remember to)\b"), handle_note),
    (re.compile(r"\b(show notes|read notes|list notes)\b"), handle_show_notes),
    (re.compile(r"\b(remember|remind me to|set reminder)\b"), handle_reminder),
    (re.compile(r"\b(calculat|what is|what's|compute|evaluate)\b"), handle_calc),
    (re.compile(r"\b(search for|search )\b"), handle_search_web),
    (re.compile(r"\b(exit|quit|goodbye|stop assistant)\b"), handle_exit),
]

# ------------------------
# Main listen loop
# ------------------------
def recognize_speech_from_mic(recognizer: sr.Recognizer, microphone: sr.Microphone, timeout: int = 5, phrase_time_limit: int = 8):
    """
    Capture audio from the microphone and return recognized text (or None).
    """
    if microphone is None:
        raise RuntimeError("Microphone not configured or not available.")
    with microphone as source:
        # optional ambient noise calibration (first run)
        recognizer.adjust_for_ambient_noise(source, duration=0.8)
        logging.debug("Listening for command...")
        try:
            audio = recognizer.listen(source, timeout=timeout, phrase_time_limit=phrase_time_limit)
        except sr.WaitTimeoutError:
            return None, "timeout"
    try:
        text = recognizer.recognize_google(audio)
        logging.info("Recognition success: %s", text)
        return text.lower(), None
    except sr.UnknownValueError:
        logging.info("Recognition: Unknown value")
        return None, "unintelligible"
    except sr.RequestError as e:
        logging.exception("Recognition request failed: %s", e)
        return None, "api_unavailable"
    except Exception as e:
        logging.exception("Recognition unexpected error: %s", e)
        return None, "error"

def parse_and_dispatch(command: str):
    """
    Given recognized command text, match an intent and call handler.
    If none matched, default to small-talk or ask for clarification.
    """
    if not command:
        return
    # direct mapping: exact commands
    for pattern, handler in INTENTS:
        if pattern.search(command):
            try:
                handler(command)
            except SystemExit:
                raise
            except Exception as e:
                logging.exception("Error while handling command '%s': %s", command, e)
                speak("Sorry, I encountered an error while processing your request.")
            return
    # fallback: try Wikipedia
    try:
        # If user just said a topic name, try wiki
        # be conservative: only if 2-5 words
        if 1 < len(command.split()) <= 5:
            handle_wikipedia(command)
            return
    except Exception:
        pass
    # default: ask to repeat or offer help
    speak("Sorry, I did not understand that. You can say, 'open YouTube', 'search for cats', 'remind me to call mom in 10 minutes', 'take a note', or 'what's the time'.")
    logging.info("No intent matched for: %s", command)

def main_loop():
    speak("Hello, I am your assistant. How can I help you today?", wait=False)
    # keep listening until exit command
    while True:
        try:
            text, error = recognize_speech_from_mic(recognizer, mic)
            if error == "timeout":
                # no speech; continue listening
                continue
            if error == "unintelligible":
                # optionally give a short prompt or continue silently
                continue
            if error == "api_unavailable":
                speak("Speech recognition service is not available. Check your internet connection.")
                time.sleep(2)
                continue
            if text:
                print(f">>> You said: {text}")
                parse_and_dispatch(text)
            else:
                # no recognized text (silence)
                continue
        except KeyboardInterrupt:
            speak("Shutting down. Goodbye.", wait=True)
            break
        except SystemExit:
            break
        except Exception as e:
            logging.exception("Fatal error in main loop: %s", e)
            speak("I encountered an internal error. Restarting listening.")
            time.sleep(1)
            continue

if __name__ == "__main__":
    # Greet and run
    try:
        # Check microphone availability once
        if mic is None:
            speak("Microphone not available. Please install and configure microphone drivers and PyAudio.")
            logging.error("Microphone not available at startup.")
        main_loop()
    except Exception as e:
        logging.exception("Unhandled exception: %s", e)
        speak("An error occurred. See log for details.", wait=True)

6. Sample Output or Results

Example dialogues (what you say → assistant response):

You: "What time is it?"
Assistant (speaks): "The time is 09:43 AM on Tuesday, October 28."
You: "Take a note buy milk and eggs"
Assistant: "Note saved."
You: "Remind me to stretch in 10 minutes"
Assistant: "Okay, I will remind you to stretch in 10 minutes."
(10 minutes later assistant speaks): "Reminder: stretch."
You: "Open YouTube"
Assistant: "Opening YouTube." (opens browser)
You: "Search Wikipedia for neural networks"
Assistant: (reads 1–2 sentence summary of neural networks)
You: "Calculate 45 divided by 9"
Assistant: "The result is 5.0"

Saved artifacts:
~/.voice_assistant/notes.json — saved notes with timestamps
~/.voice_assistant/reminders.json — scheduled reminders persisted
~/.voice_assistant/assistant.log — history and exceptions

7. Possible Enhancements

Offline Speech Recognition: integrate VOSK or other local models for offline STT.
Natural Language Understanding: add an NLU module (Rasa / spaCy based intent/entity extraction) for richer commands.
Voice Activation (wake-word): integrate a keyword spotter (Snowboy, Porcupine) to wake on “Hey Assistant”.
Conversational Flow: maintain context between follow-up questions.
GUI: add a minimal Tkinter/Qt GUI to show transcripts and controls.
Integrations: connect to calendars (Google Calendar via OAuth), emails, smart home APIs, or task managers.
Robust scheduling: use persistent schedulers (APScheduler) and robust retry/notification mechanisms.
Security & Privacy: explicitly warn and add options for data deletion and local-only modes.

python Topics

python Tutorial