-
Notifications
You must be signed in to change notification settings - Fork 114
Add voice interactions with Gemini Live and ros-mcp-server to Gemini example. #115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
tracelarue
wants to merge
14
commits into
robotmcp:develop
from
tracelarue:Gemini-Live-with-ros-mcp-server
Closed
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
b90dc09
Added Gemini Live with ros-mcp-server example to Gemini example.
tracelarue 2412663
renamed to gemini_client.py, readme updates
tracelarue fcc86f4
removed mcp_config.json
tracelarue a170655
removed project inside of repository
tracelarue 31fcd4e
added requirements.txt
tracelarue f3c1d6e
removed mcp_handler.py
tracelarue 1f8c825
Updated gemini_client.py
tracelarue 68afc99
Updated readme
tracelarue d3eeb5c
ruff format fixes
tracelarue 0e8bc41
Updated readme to use uv
tracelarue 5b82396
Updated gemini_client.py to pull from mcp_config.json.
tracelarue 0d49d95
Readme updated af testing
tracelarue db536b5
Merge branch 'develop' into Gemini-Live-with-ros-mcp-server
stex2005 753a554
Update .gitignore
stex2005 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,139 @@ | ||
# Gemini Live with ROS MCP Server | ||
|
||
Control ROS robots with natural language voice commands using Google's Gemini Live API. | ||
|
||
**Pre-requisites** See the [installation instructions](../../../docs/installation.md) for detailed setup steps. | ||
|
||
**Tested In:** Ubuntu 22.04, Python 3.10, ROS2 Humble | ||
(Only works in Ubuntu) | ||
|
||
## Quick Setup | ||
|
||
1. **Install ROS MCP Server**: Follow the [installation guide](../../../docs/installation.md) | ||
|
||
2. **Install system dependencies** (required for audio): | ||
```bash | ||
sudo apt-get update | ||
sudo apt-get install python3-dev portaudio19-dev | ||
``` | ||
|
||
3. **Install additional dependencies for Gemini Live**: | ||
|
||
```bash | ||
# Navigate to the ros-mcp-server root directory | ||
cd ros-mcp-server | ||
|
||
# Install the additional dependencies needed for Gemini Live | ||
uv pip install google-genai pyaudio python-dotenv mss exceptiongroup taskgroup | ||
``` | ||
|
||
**Note**: The main ros-mcp-server project already includes most dependencies (mcp, opencv-python, pillow). We only need to add the Gemini-specific packages. | ||
|
||
4. **Get Google API Key**: Visit [Google AI Studio](https://aistudio.google.com) and create an API key | ||
|
||
5. **Create a `.env` file in the `gemini_live` folder**: | ||
```bash | ||
cd examples/2_gemini/gemini_live | ||
``` | ||
|
||
```env | ||
GOOGLE_API_KEY="your_google_api_key_here" | ||
``` | ||
Replace with your API key. | ||
|
||
6. **Create `mcp_config.json` in the gemini_live folder**: | ||
Replace `/absolute/path/to/ros-mcp-server` with your actual path. | ||
```json | ||
{ | ||
"mcpServers": { | ||
"ros-mcp-server": { | ||
"command": "uv", | ||
"args": [ | ||
"--directory", | ||
"/absolute/path/to/ros-mcp-server", | ||
"run", | ||
"server.py" | ||
] | ||
} | ||
} | ||
} | ||
``` | ||
|
||
## Usage | ||
|
||
**Start Gemini Live:** | ||
```bash | ||
# Navigate to the gemini_live folder | ||
cd ros-mcp-server/examples/2_gemini/gemini_live | ||
|
||
# Run the client (with defaults: no video, audio responses, mic muting enabled) | ||
uv run gemini_client.py | ||
``` | ||
|
||
**Command-line options:** | ||
|
||
**Video modes** (`--video`): | ||
- `--video=none` - Audio only (default) | ||
- `--video=camera` - Include camera feed | ||
- `--video=screen` - Include screen capture | ||
|
||
**Response modes** (`--responses`): | ||
- `--responses=TEXT` - Text responses only | ||
- `--responses=AUDIO` - Audio responses (default) | ||
|
||
**Microphone muting** (`--active-muting`): | ||
- `--active-muting=true` - Mute mic during audio playback (default, prevents echo/feedback). Recommended if not using headphones. | ||
- `--active-muting=false` - Keep mic active during audio playback | ||
|
||
**Example usage:** | ||
```bash | ||
uv run gemini_client.py --video=camera --responses=TEXT --active-muting=false | ||
``` | ||
Type `q` + Enter to quit. | ||
|
||
## Test with Turtlesim | ||
|
||
**Start rosbridge and turtlesim**: | ||
```bash | ||
# Terminal 1: Launch rosbridge | ||
ros2 launch rosbridge_server rosbridge_websocket_launch.xml | ||
``` | ||
```bash | ||
# Terminal 2: Start turtlesim | ||
ros2 run turtlesim turtlesim_node | ||
``` | ||
|
||
**Try these voice commands:** | ||
- "Connect to the robot on ip _ and port _ " | ||
- "What ROS topics are available?" | ||
- "Move the turtle forward at 1 m/s and 0 rad/s" | ||
- "Rotate the turtle at 3 rad/s" | ||
- "Change the pen color to red" | ||
|
||
|
||
See [Turtlesim Tutorial](../../1_turtlesim/README.md) for more examples. | ||
|
||
## Troubleshooting | ||
|
||
**Not responding to voice?** | ||
- Check microphone permissions and volume | ||
- Test: `arecord -d 5 test.wav && aplay test.wav` | ||
|
||
**Robot not moving?** | ||
- Verify robot/simulation is running | ||
- Check rosbridge is running: `ros2 launch rosbridge_server rosbridge_websocket_launch.xml` | ||
- Check: `ros2 topic list` | ||
- Ask: "List all available tools" to verify MCP connection | ||
|
||
**API key errors?** | ||
- Verify `.env` file exists with correct key | ||
- Check key is active in Google AI Studio | ||
|
||
**Dependency issues?** | ||
- If you get import errors, make sure you installed the additional dependencies: `uv pip install google-genai pyaudio python-dotenv mss exceptiongroup taskgroup` | ||
|
||
## | ||
Contributed by Trace LaRue | ||
traceglarue@gmail.com | ||
[www.traceglarue.com](https://www.traceglarue.com) | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.