This project implements a gesture- and voice-based control system for an IoT Smart Home automation platform. It enables users to control smart devices such as doors, lights, and appliances using intuitive hand gestures and voice commands.
The system integrates real-time hand gesture recognition and voice command processing to provide a seamless user experience for controlling IoT devices. It leverages open-source libraries and MQTT for communication, ensuring scalability and extensibility.
- Hand Gesture Recognition: Utilizes MediaPipe for real-time hand landmark detection.
- Supported Gestures:
- Thumb Up: Unlocks the front door.
- Thumb Down: Locks the front door.
- Open Palm: Turns all switches on.
- Number One (index finger up): Turns all switches off.
- Number Two (victory sign): Turns all lights on.
- Rock On (index and pinky up): Turns all lights off.
- Cooldown Mechanism: 1.5-second delay to prevent repeated gesture triggers.
- Debug Mode: Displays fingertip coordinates, gesture status, and MQTT connection for troubleshooting.
- Visual Feedback: Shows action text (e.g., "UNLOCKING DOOR") and gesture legend on the video feed.
- Voice Command Processing: Records 5-second audio clips (16kHz, mono) and processes them using Rhasspy for speech-to-text and intent recognition.
- Intent Recognition: Converts spoken commands into actionable intents for smart home control.
- Continuous Voice Listening: Loops to detect and process voice commands until interrupted.
- Rhasspy Integration: Uses a local Rhasspy instance (
http://localhost:12101
) for voice processing. - User Feedback: Provides console logs for voice command processing.
- MQTT Integration: Enables remote monitoring and control via MQTT.
- Error Handling: Manages audio, speech-to-text, natural language understanding, and MQTT errors with retries and logging.
- Configurable Device Name: Customizable door name (e.g., "Front Door") for MQTT messages.
- Real-Time Operation: Ensures responsive gesture and voice command processing.
- Extensible Design: Modular code allows easy addition of new gestures or voice intents.
- Open-Source Libraries: Built with OpenCV, MediaPipe, SoundDevice, Paho MQTT, and Rhasspy.
- Webcam or camera module
- Microphone
- Raspberry Pi (version 3 or newer recommended)
- Python 3.7 or newer
- MQTT broker (default:
mqtt.local
) - Docker and Docker Compose (for Rhasspy setup)
sudo apt update
sudo apt install -y docker.io docker-compose
sudo systemctl start docker
sudo systemctl enable docker
sudo usermod -aG docker $USER
newgrp
https://www.docker.com/products/docker-desktop/
Note: Log out and back in for group changes to take effect.
- Install Python:
- Download and install Python from python.org.
- Ensure "Add Python to PATH" is checked during installation.
- Install Dependencies:
pip install opencv-python mediapipe numpy paho-mqtt
- Install Visual C++ Redistributable (required for MediaPipe):
- Download and install from Microsoft's website.
- Clone or Download the Code:
- Obtain the project repository from the source.
- Install Python and Required Packages:
sudo apt update sudo apt install -y python3 python3-pip python3-dev sudo apt install -y libopencv-dev python3-opencv sudo apt install -y cmake protobuf-compiler sudo apt install -y v4l-utils
- Install Dependencies:
Note: If MediaPipe installation fails, try:
pip install mediapipe numpy paho-mqtt
pip3 install --upgrade pip pip3 install mediapipe --no-binary mediapipe
- Clone or Download the Code:
- Obtain the project repository from the source.
- Open a Command Prompt.
- Run the main script:
python gesture_mqtt.py
- To monitor MQTT messages (in a separate Command Prompt):
python mqtt_listener.py
- Open a Terminal.
- Make scripts executable:
chmod +x gesture_mqtt.py mqtt_listener.py
- Run the main script:
python3 gesture_mqtt.py
- To monitor MQTT messages (in a separate Terminal):
python3 mqtt_listener.py
- Start the application.
- Position yourself in front of the webcam.
- Perform supported gestures to trigger actions.
- Press
ESC
to exit the application.
Messages are published in JSON format:
{
"name": "device_name",
"state": "action"
}
Example:
- Door control:
{"name": "Front Door", "state": "unlock"}
- Switch control:
{"name": "CMD_SWITCH_ALL", "state": "on"}
- gesture_mosquitto.py - All commands included. Use a public mqtt broker (test.mosquttio.org) for testing.
- door_mosquitto.py - Similar to geature_mosquitto.py, only the commands relevant to door control are included.
- mqtt_listener.py - Used for testing purposes with the above 2 files. In Windows - We checked whether gesture commands identifying and mqtt message sending are working correctly by running mqtt_listener.py in one command prompt and gesture_mosquitto.py or door_mosquitto.py in another command prompt.
- door_mqtt.py - Similar to the above door_mosquitto.py but mqtt broker ip.address changed to mqtt.local gesture_control.py - The final file with the above change for identifying all gesture based commands.
MQTT Listener:
- The
mqtt_listener.py
script connects to the MQTT broker, subscribes to thecentral_main/control
topic, and prints received messages for debugging.
- Run Rhasspy with Docker:
docker run -d \ --network=host \ --name rhasspy \ -v "$HOME/.config/rhasspy/profiles:/profiles" \ rhasspy/rhasspy \ --user-profiles /profiles \ --profile en
- Access the Rhasspy Web UI at: http://localhost:12101/. copy paste the sentence.ini texts into sentence tab of Rhasspy web UI.
- Install Mosquitto (MQTT Broker):
sudo apt update sudo apt install -y mosquitto mosquitto-clients sudo systemctl start mosquitto sudo systemctl enable mosquitto
- Verify Mosquitto is Running:
sudo systemctl status mosquitto
- Install Paho MQTT:
pip install paho-mqtt
- Check Received MQTT Messages:
mosquitto_sub -h test.mosquitto.org -p 1883 -t "rhasspy/intent/recognized" -v
pip install sounddevice numpy scipy
- Open the Rhasspy Web UI.
- Navigate to the Sentences Tab.
- Copy and paste the contents of
sentences.ini
into the editor to define recognized intents.
- jarvis.py: Under development; will include wake word functionality.
- voiceControl.py: Processes voice commands without a wake word.
To confirm Rhasspy receives voice commands via MQTT:
mosquitto_sub -h test.mosquitto.org -p 1883 -t "rhasspy/intent/recognized" -v
- Ensure Docker and Mosquitto are running.
- Configure
sentences.ini
in the Rhasspy Web UI. - Run
voiceControl.py
to process voice commands. - To run the gesture_control_system in
ubuntu
first run the followingpip install -r ubuntu_requirements.txt
- Unit Test scripts are available in the
test
folder. - Review and adapt these scripts to verify functionality or customize for your setup.