-
Notifications
You must be signed in to change notification settings - Fork 108
Description
Category
UI Components
Scope
Major Feature
Problem
In the spirit of "software 3.0", applications are being built for both human and AI agent consumption. There is currently no way for AI agents to interact with a shiny app natively. Today, it is required to use tools like computeruse or browseruse which take screenshots to use things like playwright to identify what in the dom can be manipulated.
It would be much more efficient for an interactive app's data to be exposed in a way that AI can use natively like hidden text in the DOM or the shiny server exposes an MCP server with automatic access to the controls built out in the UI for extracting the data it needs.
Solution
Injected context
A simple approach to making it easier for an AI agent to at least extract information from the shiny app would be to start including hidden DOM in the app that contains that data that is currently being exposed in the UI. For example if you had an app with a slider from 0-10 called sensitivity threshold and a line chart that is updated when the slider moves, there could be marked DOM that is updated like:
<input type="slider">
<shiny-ai-context data="name:sensitivity threshold; type: slider; value: 7" />
<div id="chart">...</div>
<shiny-ai-context data="name:sales; type: line-chart; points: [[0, 1],...]" />
Then, we document that when a shiny app is being scraped by an AI agent for interactivity purposes to focus on those custom hidden tags.
MCP server
I think it would be very interesting to automatically create MCP tools for each rendered component. If it is an input, then you have a function to read its value and a function to change its value. For outputs, it would be read only.
This could then be built on top of to be able to treat a shiny app as one large functional unit where the inputs are the params and the output components are the returned values. For example:
a = slider input [0, 10]
b = dropdown [x, y, z]
c = text input
out1 = line graph
out2 = text
out1, out2 = shiny_app(a, b, c)
where shiny_app is a tool that is exposed through an /mcp route on the shiny server.
human -> ui -> a,b,c -> shiny_app(a,b,c) -> out1, out2 -> ui -> human
(human ->) ai -> a,b,c -> shiny_app(a,b,c) -> out1, out2 -> ai (-> human)
We could then start treating shiny apps as dynamic functions that serve 2 purposes: for human stakeholders and then for AI agents. But the publisher would only need to write a single app. The details for each input and output could be automatically derived or the publisher just provides some basic context about their inputs and outputs for shiny to be able to configure itself as an executable tool that can be manipulated through the UI by a human or by an AI through MCP.
Alternatives (Optional)
No response
Example (Optional)
Impact (Optional)
No response
Contribution? (Optional)
Yes, I can implement (or help).