Autonomous human-following robot controlled by bare-hand gestures in crowded indoor environments.
This project aims to reduce physical workload for nurses by automating the task of pulling medical carts using a mobile robot.
The robot can identify a specific human target among many people and follow them, while the user controls the robot with natural bare-hand gestures (no devices, gloves, or markers required for control).
✔ Human-following in crowded hallways
✔ Detect a person with a specific ArUco marker
✔ If the target changes or is lost, automatically re-identify
✔ Recognize the user’s hand pose using skeleton estimation + depth sensing
✔ Toggle robot driving / stop mode with a single hand gesture
▶ Human following + gesture control (demo video)
EBRL_demo.mp4
The robot:
- Detects people + ArUco markers
- Identifies the target person (marker-based)
- Tracks the user with DeepSORT
- Estimates body & hand skeleton
- Uses depth to choose the correct hand (filtering interference from others)
- Interprets hand gesture (palm) to toggle robot driving
- Controls motion based on distance from user
- Custom dataset for person + ArUco markers
- Trained YOLOv4-tiny (Darknet → TensorFlow)
- Optimized for real-time inference on Jetson (TensorRT)
- Track all people
- If a person’s bounding box contains the target marker → set as user
- Store their track_id as
target_person_id - If:
- target changes (keyboard event), or
- tracker loses the person
→ restart detection and re-identification
- Extract body keypoints with trt_pose
- Validate that the detected skeleton belongs to the current user by checking the neck keypoint inside their bounding box
- Use wrist & elbow keypoints to crop a square region
- Resize to 224×224 for hand gesture classifier (trt_pose_hand)
When another person’s hand overlaps:
- Use RealSense depth input
- Compare depth of detected hands
- Select the nearest hand as the user’s hand
- Implemented using a min-heap priority queue for efficiency
| Gesture | Robot Mode |
|---|---|
| ✋ Palm shown (4s hold) | Toggle driving / stopping |
| Other gestures | Keep current mode |
- Driving mode: robot follows user and adjusts speed based on distance (RGB-D camera)
- Non-driving mode: robot stops safely
- Ubuntu 18.04
- CUDA 10.2 / cuDNN 7.6.5 / NVIDIA Driver 470
- Anaconda, Python 3.7.11, TensorFlow 2.3.1, OpenCV 4.2
- Jupyter Lab, VSCode
- NVIDIA Jetson Xavier, JetPack 4.4
- CUDA 10.2, cuDNN 8.0.0, ROS Melodic
- Python 3.6.9
- TensorRT optimization for YOLOv4-tiny
✔ Project Manager (PM)
✔ Implemented user identification with DeepSORT
✔ Implemented gesture-based driving control using body + hand skeletons
✔ Data generation, model training & evaluation
✔ Real-time robot integration
- YOLOv4-tiny (Darknet → TensorFlow)
- DeepSORT
trt_pose(Body skeleton): https://github.com/NVIDIA-AI-IOT/trt_posetrt_pose_hand(Hand skeleton): https://github.com/NVIDIA-AI-IOT/trt_pose_hand
- Expand gesture vocabulary (forward, reverse, turn)
- Multi-user handling with priority switching
- Real-time SLAM + obstacle navigation
