A minimal browser automation agent using Google's Gemini 2.5 Computer Use Preview model and Playwright for web browser control.
- Visual Browser Control: Uses screenshots to "see" and interact with web pages
- Automated Actions: Supports mouse clicks, keyboard input, scrolling, navigation, and more
- Safety Controls: Built-in confirmation prompts for risky actions
- Human-in-the-Loop: Optional user confirmation for sensitive operations
open_web_browser
,navigate
,search
click_at
,hover_at
,type_text_at
key_combination
,scroll_document
,scroll_at
drag_and_drop
,go_back
,go_forward
wait_5_seconds
conda create -n gemcu python=3.11 -y
conda activate gemcu
python -m pip install --upgrade pip
python -m pip install google-genai playwright termcolor
playwright install chromium
# Windows PowerShell
$env:GEMINI_API_KEY="PASTE_YOUR_KEY_HERE"
# Linux/Mac
export GEMINI_API_KEY="PASTE_YOUR_KEY_HERE"
python agent.py "Find Wikipedia article about Niagara Falls and open History section"
- Python 3.11+
- Gemini API key (Get API key)
- Chrome/Chromium browser
This agent runs in a controlled browser environment. For production use, consider running in a sandboxed virtual machine or container for additional security.
Based on Google's Gemini Computer Use API.