OpenVLA-FR3: Vision-Language-Action Robot Control

A complete ROS2 integration enabling natural language control of the Franka FR3 robotic arm using OpenVLA (7B vision-language-action model) with real-time reactive control via MoveIt Servo.

Overview

This project bridges large vision-language models with real-time robot control, enabling the Franka FR3 to understand natural language instructions and execute manipulation tasks autonomously. The system uses a quantized OpenVLA model for inference on consumer GPUs and MoveIt Servo for reactive Cartesian control.

Key Innovation: Delta action formulation with VLA-Servo bridge pattern enables reactive, collision-aware motion while maintaining the benefits of learned visuomotor policies.

System Architecture

Natural Language Instruction → OpenVLA-7B (4-bit quantized)
                                    ↓
                            Camera Image (RGB)
                                    ↓
                        VLAAction (delta pose + gripper)
                                    ↓
                            VLA-Servo Bridge
                                    ↓
                    TwistStamped (velocity commands)
                                    ↓
                            MoveIt Servo
                                    ↓
                        Joint Trajectory Controller
                                    ↓
                        Gazebo Simulation / Real Hardware

Features

Core Capabilities

Vision-Language-Action Inference: OpenVLA-7B fine-tuned on LIBERO spatial tasks
4-bit Quantization: Runs on consumer GPUs (6GB VRAM) using BitsAndBytes
Real-Time Control: MoveIt Servo for reactive Cartesian velocity control at 30Hz
Delta Action Formulation: Relative pose changes enable reactive, closed-loop control
Collision Awareness: MoveIt planning scene monitoring with automatic collision avoidance
Natural Language Interface: Task specification via plain English instructions

Control Modes

Autonomous VLA Control: Vision-language-conditioned action prediction
Keyboard Teleoperation: Manual control for data collection and debugging
Dual Frame Control: Base frame or end-effector frame reference
Multiple Controllers: Position, velocity, impedance, and trajectory control

Development Tools

Docker Environment: Containerized setup with CUDA 12.1, ROS2 Humble, PyTorch
LIBERO-Compatible: Camera positioning and task environment match LIBERO benchmark
RViz Visualization: Interactive motion planning interface
Gazebo Simulation: Full physics simulation with RGB-D sensors

Architecture Components

Custom ROS2 Packages

1. `franka_openvla`

Main integration package containing:

openvla_node.py: VLA inference with 4-bit quantization
vla_servo_bridge.py: Translates VLA actions to Servo twist commands
keyboard_servo_teleop.py: Manual teleoperation for data collection
fr3.launch.py: Orchestrates entire system (Gazebo, MoveIt, Servo, VLA)
World files and object models (bins, cubes, tables)
Configuration files for Servo and controllers

2. `vla_interfaces`

Custom message definitions:

VLAAction.msg:
  std_msgs/Header header
  geometry_msgs/Vector3 delta_pos    # [dx, dy, dz] in meters
  geometry_msgs/Vector3 delta_rot    # [droll, dpitch, dyaw] in radians
  float32 gripper                     # 0.0 = open, 1.0 = closed

System Integration

OpenVLA Node

Model: openvla/openvla-7b-finetuned-libero-spatial
Input: RGB images from /rgbd_camera/image (640x480)
Output: Delta actions published to /vla/delta_actions
Parameters:
- instruction: Task description (e.g., "pick the cube and place in red bin")
- unnorm_key: Action unnormalization (default: libero_spatial)
- model_name: HuggingFace model path

VLA-Servo Bridge

Converts: VLAAction → TwistStamped
Safety: Velocity clamping (max linear: 0.3 m/s, max angular: 0.5 rad/s)
Timeout: Stops motion if no VLA action received within 0.5s
Keepalive: Maintains Servo connection with periodic zero commands

MoveIt Servo Configuration

Control rate: 30 Hz
Planning group: fr3_arm (7-DOF)
End-effector: fr3_hand_tcp
Collision checking: 10 Hz with singularity avoidance
Joint limits: Enforced with safety margins

Installation

Prerequisites

Ubuntu 22.04
NVIDIA GPU with CUDA support (6GB+ VRAM recommended)
Docker and Docker Compose with NVIDIA runtime

Setup

Clone repository:

git clone <repository-url>
cd openvla-fr3

Build Docker container:

docker-compose build

Launch container:

docker-compose up -d

Build ROS2 workspace (first time only):

docker exec -it vla_unified bash
source /opt/ros/humble/setup.bash
cd /ros2_ws
colcon build --symlink-install
source install/setup.bash

Usage

Launch Full System

Start the complete VLA control stack (Gazebo, MoveIt, Servo, OpenVLA):

docker exec vla_unified bash -c "
  source /opt/ros/humble/setup.bash &&
  source /ros2_ws/install/setup.bash &&
  ros2 launch franka_openvla fr3.launch.py
"

This launches:

Gazebo simulation with FR3 and manipulation environment
ROS2 controllers (joint state, arm, gripper)
MoveIt move_group with OMPL planning
MoveIt Servo for real-time control
RViz for visualization
OpenVLA inference node
VLA-Servo bridge
RGB-D camera with LIBERO-style positioning

Keyboard Teleoperation

Manual control for testing and data collection:

docker exec -it vla_unified bash -c "
  source /opt/ros/humble/setup.bash &&
  source /ros2_ws/install/setup.bash &&
  ros2 run franka_openvla keyboard_servo_teleop
"

Controls:

Linear: W/S (X), A/D (Y), Q/E (Z)
Angular: J/L (yaw), I/K (pitch), U/O (roll)
Frame Toggle: F (base frame ↔ end-effector frame)
Speed: +/- (increase/decrease velocity scaling)
Stop: SPACE
Quit: ESC

Change VLA Task Instruction

Modify the instruction parameter in fr3.launch.py:

Node(
    package='franka_openvla',
    executable='openvla_node',
    parameters=[{
        'instruction': 'pick up the cube and place it in the red bin',
        'unnorm_key': 'libero_spatial'
    }]
)

Or pass as launch argument:

ros2 launch franka_openvla fr3.launch.py instruction:="move the object to the blue bin"

Simulation Environment

The Gazebo world includes:

Work Table: Main manipulation surface (0.6m from robot base)
Colored Bins: Green, black, red, blue bins for sorting tasks
Target Cube: Manipulable object on table
RGB-D Camera: LIBERO-style agentview
- Position: 1.2m forward, 1.0m height
- Orientation: Looking at robot workspace
- Outputs: RGB image, depth, point cloud, camera info

Configuration

Key Configuration Files

config/servo_params.yaml: MoveIt Servo settings
- Velocity limits (linear: 0.4 m/s, rotational: 0.8 rad/s)
- Control rate (30 Hz)
- Safety factors (0.3x for conservative control)
- Collision checking parameters
config/moveit_controllers.yaml: Controller interfaces
- arm_controller (FollowJointTrajectory)
- gripper_controller (GripperCommand)
config/franka_gazebo_controllers.yaml: Gazebo ros2_control
- Joint state broadcaster
- Position, velocity, and impedance controllers
- Controller gains and limits

Adjusting VLA-Servo Bridge Parameters

Edit franka_openvla/vla_servo_bridge.py:

self.linear_scale = 0.5    # Velocity scaling (0.1-1.0)
self.angular_scale = 0.3   # Angular velocity scaling
self.max_linear = 0.3      # Max linear velocity (m/s)
self.max_angular = 0.5     # Max angular velocity (rad/s)
self.timeout = 0.5         # Command timeout (seconds)

Development

Project Structure

openvla-fr3/
├── src/
│   ├── franka_openvla/           # Main integration package
│   │   ├── franka_openvla/
│   │   │   ├── openvla_node.py
│   │   │   ├── vla_servo_bridge.py
│   │   │   └── keyboard_servo_teleop.py
│   │   ├── launch/
│   │   │   └── fr3.launch.py
│   │   ├── config/
│   │   │   ├── servo_params.yaml
│   │   │   └── moveit_controllers.yaml
│   │   ├── worlds/
│   │   │   └── franka_world.sdf
│   │   └── models/                # Object models
│   ├── vla_interfaces/            # Custom messages
│   ├── franka_gazebo/             # Simulation support
│   ├── franka_fr3_moveit_config/  # MoveIt configuration
│   ├── franka_hardware/           # Real robot interface
│   ├── franka_gripper/            # Gripper control
│   └── franka_msgs/               # Franka messages
├── Dockerfile
├── docker-compose.yml
└── README.md

Building from Source

cd /ros2_ws
colcon build --symlink-install --packages-select vla_interfaces franka_openvla
source install/setup.bash

Running Individual Nodes

# OpenVLA inference only
ros2 run franka_openvla openvla_node --ros-args -p instruction:="pick up the cube"

# VLA-Servo bridge only
ros2 run franka_openvla vla_servo_bridge

# Keyboard teleoperation only
ros2 run franka_openvla keyboard_servo_teleop

Technical Details

OpenVLA Model

Architecture: Vision encoder + language encoder + action decoder
Parameters: 7 billion (4-bit quantized to ~3.5GB memory)
Training: Fine-tuned on LIBERO spatial manipulation tasks
Inference Speed: ~5-10 Hz on consumer GPUs (RTX 3060+)
Action Space: 7-DOF delta actions (3 position + 3 orientation + 1 gripper)

Control Pipeline

Perception: RGB camera image captured at 30 Hz
Inference: OpenVLA predicts delta action from image + instruction
Translation: VLA-Servo bridge converts to twist commands
Planning: MoveIt Servo computes collision-free joint velocities
Execution: Joint trajectory controller sends commands to robot
Feedback: Loop closes with new camera observation

Safety Features

Velocity Clamping: Multi-level limits (VLA bridge + Servo + controller)
Collision Checking: Real-time monitoring with planning scene
Singularity Avoidance: Automatic damping near singularities
Command Timeout: Stops robot if VLA fails or crashes
Joint Limits: Enforced with configurable safety margins
Emergency Stop: Keyboard SPACE key or ROS service call

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
docs/_docs		docs/_docs
src		src
.env		.env
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
franka_entrypoint.sh		franka_entrypoint.sh

kalaiselvan-t/openvla-fr3

Folders and files

Latest commit

History

Repository files navigation