-
Notifications
You must be signed in to change notification settings - Fork 36k
Description
For the past few months, I have been working on a few different prompts (which became an agent) & an MCP Server that can control a local instance of VS Code.
While doing this, I have hit over and over again the difficulty that is getting Playwright automation to write from a Monaco editor... of which we use all over the VS Code from editors to input boxes (in Extension view, Settings search, chat input).
And I'm not alone... There are many others before me that struggled to get playwright & Monaco to play nice together:
- Monaco and Playwright don't "play right" together
- How to enter text into monaco-editor control using Playwright
- How to insert code in monaco editor using Playwright?
While these folks have come up with workarounds that are... clever... my scenario is worse than theirs, because it is non-deterministic. I rely on an LLM to call the MCP Tool to write in the Monaco editor.
I've tried so so hard to prompt this to work nicely. Look at this section in my agent prompt:
vscode/.github/agents/demonstrate.md
Lines 68 to 96 in bf23ba9
| If you are typing into a monaco input and you can't use the standard methods, follow this sequence: | |
| **Monaco editors (used throughout VS Code) DO NOT work with standard Playwright methods like `.click()` on textareas or `.fill()` / `.type()`** | |
| **YOU MUST follow this exact sequence:** | |
| 1. **Take a page snapshot** to identify the editor structure in the accessibility tree | |
| 2. **Find the parent `code` role element** that wraps the Monaco editor | |
| - ❌ DO NOT click on `textarea` or `textbox` elements - these are overlaid by Monaco's rendering | |
| - ✅ DO click on the `code` role element that is the parent container | |
| 3. **Click on the `code` element** to focus the editor - this properly delegates focus to Monaco's internal text handling | |
| 4. **Verify focus** by checking that the nested textbox element has the `[active]` attribute in a new snapshot | |
| 5. **Use `page.keyboard.press()` for EACH character individually** - standard Playwright `type()` or `fill()` methods don't work with Monaco editors since they intercept keyboard events at the page level | |
| **Example:** | |
| ```js | |
| // ❌ WRONG - this will fail with timeout | |
| await page.locator('textarea').click(); | |
| await page.locator('textarea').fill('text'); | |
| // ✅ CORRECT | |
| await page.locator('[role="code"]').click(); | |
| await page.keyboard.press('t'); | |
| await page.keyboard.press('e'); | |
| await page.keyboard.press('x'); | |
| await page.keyboard.press('t'); | |
| ``` | |
| **Why this is required:** Monaco editors intercept keyboard events at the page level and use a virtualized rendering system. Clicking textareas directly or using `.fill()` bypasses Monaco's event handling, causing timeouts and failures. |
but the prompt also suffers greatly from having a LOT of context in an agent session (it queries GitHub MCP a bunch... queries the dom a bunch... context is quickly filled up)
When these instructions get forgotten about, I see the model mostly trying to type in the textarea inside of the Monaco Editor.... which frankly, I don't blame it! That would be the obvious thing to write to if you didn't know how to interact with Monaco properly.
I continue to workaround this by either:
- prompting harder
- adding specific mcp tools for different areas of the product
... but sadly that has its limits if context gets filled/or a Monaco editor not covered by specific tools is needing to be written to.
My ask is... can we help Playwright out here? Can we somehow make it easier to write (and maybe read) to Monaco from Dom manipulation and whatever else playwright does?
To truly have powerful autonomous control of VS Code, I think we have to solve this at the Monaco layer.