Skip to content

Human-following robot that identifies a target user and controls driving with hand gestures in crowded environments.

Notifications You must be signed in to change notification settings

sijeong-kim/RealTime-Object-Detection

Repository files navigation

RealTime-Object-Detection

Autonomous human-following robot controlled by bare-hand gestures in crowded indoor environments.

✅ Overview

This project aims to reduce physical workload for nurses by automating the task of pulling medical carts using a mobile robot.
The robot can identify a specific human target among many people and follow them, while the user controls the robot with natural bare-hand gestures (no devices, gloves, or markers required for control).

✔ Human-following in crowded hallways
✔ Detect a person with a specific ArUco marker
✔ If the target changes or is lost, automatically re-identify
✔ Recognize the user’s hand pose using skeleton estimation + depth sensing
✔ Toggle robot driving / stop mode with a single hand gesture


✅ System Demo

Human following + gesture control (demo video)

EBRL_demo.mp4

✅ Architecture

EBRL_vis

The robot:

  1. Detects people + ArUco markers
  2. Identifies the target person (marker-based)
  3. Tracks the user with DeepSORT
  4. Estimates body & hand skeleton
  5. Uses depth to choose the correct hand (filtering interference from others)
  6. Interprets hand gesture (palm) to toggle robot driving
  7. Controls motion based on distance from user

✅ Algorithm Details

1. Person & Marker Detection

  • Custom dataset for person + ArUco markers
  • Trained YOLOv4-tiny (Darknet → TensorFlow)
  • Optimized for real-time inference on Jetson (TensorRT)

2. User Identification with DeepSORT

  • Track all people
  • If a person’s bounding box contains the target marker → set as user
  • Store their track_id as target_person_id
  • If:
    • target changes (keyboard event), or
    • tracker loses the person
      → restart detection and re-identification

3. Body Skeleton Localization

  • Extract body keypoints with trt_pose
  • Validate that the detected skeleton belongs to the current user by checking the neck keypoint inside their bounding box

4. Hand Region Extraction

  • Use wrist & elbow keypoints to crop a square region
  • Resize to 224×224 for hand gesture classifier (trt_pose_hand)

5. Distinguishing the User’s Hand (Depth-based Filtering)

When another person’s hand overlaps:

  • Use RealSense depth input
  • Compare depth of detected hands
  • Select the nearest hand as the user’s hand
  • Implemented using a min-heap priority queue for efficiency

6. Gesture-Based Motion Control

Gesture Robot Mode
✋ Palm shown (4s hold) Toggle driving / stopping
Other gestures Keep current mode

7. Driving Modes

  • Driving mode: robot follows user and adjusts speed based on distance (RGB-D camera)
  • Non-driving mode: robot stops safely

✅ Development Environment

Training / Development Server

  • Ubuntu 18.04
  • CUDA 10.2 / cuDNN 7.6.5 / NVIDIA Driver 470
  • Anaconda, Python 3.7.11, TensorFlow 2.3.1, OpenCV 4.2
  • Jupyter Lab, VSCode

Robot Platform

  • NVIDIA Jetson Xavier, JetPack 4.4
  • CUDA 10.2, cuDNN 8.0.0, ROS Melodic
  • Python 3.6.9
  • TensorRT optimization for YOLOv4-tiny

✅ Technologies

         


✅ Sijeong's Role

✔ Project Manager (PM)
✔ Implemented user identification with DeepSORT
✔ Implemented gesture-based driving control using body + hand skeletons
✔ Data generation, model training & evaluation
✔ Real-time robot integration


✅ Reference Models & Libraries


✅ Future Work

  • Expand gesture vocabulary (forward, reverse, turn)
  • Multi-user handling with priority switching
  • Real-time SLAM + obstacle navigation

About

Human-following robot that identifies a target user and controls driving with hand gestures in crowded environments.

Topics

Resources

Stars

Watchers

Forks

Contributors 2

  •  
  •