RealTime-Object-Detection

Autonomous human-following robot controlled by bare-hand gestures in crowded indoor environments.

✅ Overview

This project aims to reduce physical workload for nurses by automating the task of pulling medical carts using a mobile robot.
The robot can identify a specific human target among many people and follow them, while the user controls the robot with natural bare-hand gestures (no devices, gloves, or markers required for control).

✔ Human-following in crowded hallways
✔ Detect a person with a specific ArUco marker
✔ If the target changes or is lost, automatically re-identify
✔ Recognize the user’s hand pose using skeleton estimation + depth sensing
✔ Toggle robot driving / stop mode with a single hand gesture

✅ System Demo

▶ Human following + gesture control (demo video)

EBRL_demo.mp4

✅ Architecture

The robot:

Detects people + ArUco markers
Identifies the target person (marker-based)
Tracks the user with DeepSORT
Estimates body & hand skeleton
Uses depth to choose the correct hand (filtering interference from others)
Interprets hand gesture (palm) to toggle robot driving
Controls motion based on distance from user

✅ Algorithm Details

1. Person & Marker Detection

Custom dataset for person + ArUco markers
Trained YOLOv4-tiny (Darknet → TensorFlow)
Optimized for real-time inference on Jetson (TensorRT)

2. User Identification with DeepSORT

Track all people
If a person’s bounding box contains the target marker → set as user
Store their track_id as target_person_id
If:
- target changes (keyboard event), or
- tracker loses the person
  → restart detection and re-identification

3. Body Skeleton Localization

Extract body keypoints with trt_pose
Validate that the detected skeleton belongs to the current user by checking the neck keypoint inside their bounding box

4. Hand Region Extraction

Use wrist & elbow keypoints to crop a square region
Resize to 224×224 for hand gesture classifier (trt_pose_hand)

5. Distinguishing the User’s Hand (Depth-based Filtering)

When another person’s hand overlaps:

Use RealSense depth input
Compare depth of detected hands
Select the nearest hand as the user’s hand
Implemented using a min-heap priority queue for efficiency

6. Gesture-Based Motion Control

Gesture	Robot Mode
✋ Palm shown (4s hold)	Toggle driving / stopping
Other gestures	Keep current mode

7. Driving Modes

Driving mode: robot follows user and adjusts speed based on distance (RGB-D camera)
Non-driving mode: robot stops safely

✅ Development Environment

Training / Development Server

Ubuntu 18.04
CUDA 10.2 / cuDNN 7.6.5 / NVIDIA Driver 470
Anaconda, Python 3.7.11, TensorFlow 2.3.1, OpenCV 4.2
Jupyter Lab, VSCode

Robot Platform

NVIDIA Jetson Xavier, JetPack 4.4
CUDA 10.2, cuDNN 8.0.0, ROS Melodic
Python 3.6.9
TensorRT optimization for YOLOv4-tiny

✅ Technologies

✅ Sijeong's Role

✔ Project Manager (PM)
✔ Implemented user identification with DeepSORT
✔ Implemented gesture-based driving control using body + hand skeletons
✔ Data generation, model training & evaluation
✔ Real-time robot integration

✅ Reference Models & Libraries

YOLOv4-tiny (Darknet → TensorFlow)
DeepSORT
trt_pose (Body skeleton): https://github.com/NVIDIA-AI-IOT/trt_pose
trt_pose_hand (Hand skeleton): https://github.com/NVIDIA-AI-IOT/trt_pose_hand

✅ Future Work

Expand gesture vocabulary (forward, reverse, turn)
Multi-user handling with priority switching
Real-time SLAM + obstacle navigation

Name		Name	Last commit message	Last commit date
Latest commit History 163 Commits
experiments		experiments
mediapipe/hands		mediapipe/hands
scout_bringup		scout_bringup
test-aruco-detection		test-aruco-detection
train-yolo		train-yolo
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RealTime-Object-Detection

✅ Overview

✅ System Demo

✅ Architecture

✅ Algorithm Details

1. Person & Marker Detection

2. User Identification with DeepSORT

3. Body Skeleton Localization

4. Hand Region Extraction

5. Distinguishing the User’s Hand (Depth-based Filtering)

6. Gesture-Based Motion Control

7. Driving Modes

✅ Development Environment

Training / Development Server

Robot Platform

✅ Technologies

✅ Sijeong's Role

✅ Reference Models & Libraries

✅ Future Work

About

Uh oh!

Contributors 2

Uh oh!

Languages

sijeong-kim/RealTime-Object-Detection

Folders and files

Latest commit

History

Repository files navigation

RealTime-Object-Detection

✅ Overview

✅ System Demo

✅ Architecture

✅ Algorithm Details

1. Person & Marker Detection

2. User Identification with DeepSORT

3. Body Skeleton Localization

4. Hand Region Extraction

5. Distinguishing the User’s Hand (Depth-based Filtering)

6. Gesture-Based Motion Control

7. Driving Modes

✅ Development Environment

Training / Development Server

Robot Platform

✅ Technologies

✅ Sijeong's Role

✅ Reference Models & Libraries

✅ Future Work

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages