mogev2-trt.mp4
A ROS2 node for MoGeV2 depth estimation using TensorRT for real-time inference. This node subscribes to camera image and camera info topics and publishes directly both, a metric depth image and PointCloud2
point cloud.
- Real-time metric depth estimation using MoGeV2 with TensorRT acceleration
- Point cloud generation from metric depth image
- Debug visualization with colormap options
- Configurable precision (FP16/FP32)
Important
This repository is open-sourced and maintained by the Institute for Automotive Engineering (ika) at RWTH Aachen University.
We cover a wide variety of research topics within our Vehicle Intelligence & Automated Driving domain.
If you would like to learn more about how we can support your automated driving or robotics efforts, feel free to reach out to us!
π§ opensource@ika.rwth-aachen.de
-
Tested with image
nvcr.io/nvidia/tensorrt:25.08-py3
- Ubuntu 24.04, ROS2 Jazzy
- CUDA 13
- TensorRT 10.9
-
Tested with image
nvcr.io/nvidia/tensorrt:25.03-py3
- Ubuntu 24.04, ROS2 Jazzy
- CUDA 12.8.1
- TensorRT 10.9
Depending on your driver and CUDA version you need to select the appropriate base image.
~/input/image
(sensor_msgs/Image): Input camera image~/input/camera_info
(sensor_msgs/CameraInfo): Camera calibration info
~/output/depth_image
(sensor_msgs/Image): Depth image (32FC1 format)~/output/point_cloud
(sensor_msgs/PointCloud2): Generated point cloud~/output/depth_image_debug
(sensor_msgs/Image): Debug depth visualization (if enabled)
onnx_path
(string): Path to MoGe ONNX model fileprecision
(string): Inference precision ("fp16" or "fp32")
enable_debug
(bool): Enable debug visualization outputdebug_colormap
(string): Colormap for depth visualizationdebug_filepath
(string): Directory to save debug imageswrite_colormap
(bool): Save debug images to disk
ros2 launch moge_trt moge_trt.launch.py
ros2 launch moge_trt moge_trt.launch.py \
input_image_topic:=/your_camera/image_raw \
input_camera_info_topic:=/your_camera/camera_info \
output_depth_topic:=/moge/depth \
output_point_cloud_topic:=/moge/points
ros2 launch moge_trt moge_trt.launch.py \
params_file:=src/moge_trt/config/debug.param.yaml
- Obtain MoGe ONNX model: Follow the instructions to export the ONNX model here
- Place model file: Put the ONNX file in the
models/
directory - Update configuration: Modify
config/moge_trt.param.yaml
with the correct model path
Expected model input format:
- Input shape: [1, 3, 291, 518] (batch, channels, height, width)
- Input type: float32
- Value range: [0, 1] (normalized)
Expected model outputs:
points
: [1, 291, 518, 3] - 3D pointsmask
: [1, 291, 518] - validity maskmetric_scale
: [1] - scale factor
We precompile and provide a Docker image with all dependencies installed and ready to use. You can pull it from Docker Hub:
docker pull tillbeemelmanns/ros2-moge-trt:latest-dev
If you want to run rosbags and visualize the output in rviz2 make sure to install it
apt update && apt install -y \
ros-jazzy-rviz2 \
ros-jazzy-rosbag2 \
ros-jazzy-rosbag2-storage-mcap
We recommend to use the docker image in combination with our other tools for Docker and ROS.
- docker-ros automatically builds minimal container images of ROS applications
- docker-run is a CLI tool for simplified interaction with Docker images
# From your ROS2 workspace
colcon build --packages-select moge_trt --cmake-args -DCMAKE_BUILD_TYPE=Release
source install/setup.bash
The node is optimized for real-time performance.
Performance on RTX 6000:
- VITS: 37 FPS
- VITB: 31 FPS
- VITL: 17 FPS
Input Image + Camera Info
β
Preprocessing (CPU/GPU)
β
TensorRT Inference (GPU)
β
Postprocessing (CPU)
β
Depth Image + Point Cloud
The MoGe model outputs an affine-invariant point map, which has an unknown scale
and shift. When using a camera with known intrinsics, we don't need to
recover the focal length, but we still must solve for the Z-axis shift
for
each frame to align the point cloud correctly.
This is an optimization problem: find the shift
that results in the lowest
reprojection error. There are two common ways to measure this error:
- L2 norm (leading to a least-squares solution), which is used in the original MoGe repository
- L1 norm (leading to a least absolute deviations solution), which we use here for its simplicity and robustness.
This approach minimizes the sum of the squares of the reprojection errors: min Ξ£(error)Β²
.
- Key Characteristic: It is highly sensitive to outliers. If the model produces a few points with very large errors, the squaring of these errors causes them to dominate the calculation. The resulting
shift
will be heavily skewed to accommodate these bad points, often at the expense of a better fit for the majority of good points. - Implementation: Typically requires an iterative numerical solver (e.g., Levenberg-Marquardt, used in
scipy.optimize.least_squares
).
This approach minimizes the sum of the absolute values of the reprojection errors: min Ξ£|error|
.
- Key Characteristic: It is robust to outliers. Since the error is not squared, outlier points contribute linearly to the total error and do not dominate the solution. This is analogous to how the median is more robust to extreme values than the mean.
- Implementation: For this specific 1D optimization problem, there is a highly efficient, non-iterative solution: calculating the median of the roots of the individual error terms.
For this ROS2 node, we use the L1 Norm approach for three reasons:
- Robustness: Deep learning models can produce erroneous outputs. The L1 method provides stable and reliable
shift
estimates even in the presence of such outliers. - Performance: The direct median calculation is significantly faster and more predictable, making it ideal for real-time applications.
- Simplicity: It avoids adding heavy external dependencies (like Ceres Solver) for a C++ implementation.
-
TensorRT engine building fails:
- Check CUDA/TensorRT compatibility
- Verify ONNX model format
- Increase workspace size
-
Point cloud appears incorrect:
- Verify camera_info calibration
- Check coordinate frame conventions
- Validate depth value units
-
Performance issues:
- Enable FP16 precision
- Check GPU memory usage
Enable debug mode to troubleshoot:
enable_debug: true
debug_colormap: "JET"
debug_colormap_min_depth: 0.0
debug_colormap_max_depth: 50.0
write_colormap: true
This will publish colorized depth images and save them to disk for inspection.
This project is licensed under the MIT License - see the LICENSE file for details.
Thanks to the following repositories for inspiration: