Skip to content

Commit 63d22ef

Browse files
ulivzycjcl868
andauthored
docs: init setting docs (#96)
* docs: init setting docs * Update docs/setting.md Co-authored-by: Charles <jinxin001@bytedance.com> --------- Co-authored-by: Charles <jinxin001@bytedance.com>
1 parent c768010 commit 63d22ef

9 files changed

+369
-0
lines changed

docs/preset.md

+78
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
# Preset Management Guide
2+
3+
> [!IMPORTANT]
4+
> Currently, **UI-TARS Desktop** does not directly provide server-side capabilities, so we do not provide a Preset for the open source community. welcome community developers to contribute your presets [here](../examples/presets/).
5+
6+
A **preset** is a collection of [settings](./setting.md) (_Introduced at [#61](https://github.com/bytedance/UI-TARS-desktop/pull/61)_), **UI-TARS Desktop** supports import presets via `files` or `URLs`:
7+
8+
```mermaid
9+
graph TD
10+
A[Import Preset] --> B{Preset Type}
11+
B -->|File| C[YAML File]
12+
B -->|URL| D[URL Endpoint]
13+
C --> E[Manual Updates 🔧]
14+
D --> F[Auto Sync ⚡]
15+
```
16+
17+
<br>
18+
19+
20+
## Preset Types Comparison
21+
22+
| Feature | Local Presets | Remote Presets |
23+
|-----------------------|------------------------|------------------------|
24+
| **Storage** | Device-local | Cloud-hosted |
25+
| **Update Mechanism** | Manual | Automatic |
26+
| **Access Control** | Read/Write | Read-Only |
27+
| **Versioning** | Manual | Git-integrated |
28+
29+
30+
31+
<br>
32+
33+
34+
## Examples
35+
36+
### Import from file
37+
38+
**UI-TARS Desktop** supports importing presets from files. Once the file is parsed successfully, the settings will be automatically updated.
39+
40+
| Function | Snapshot |
41+
| --- | ---|
42+
| Open Setting |<img width="320" alt="image" src="https://github.com/user-attachments/assets/1d2ae27c-9b2e-4896-96a6-04832f850907" /> |
43+
| Import Success | <img width="320" alt="image" src="https://github.com/user-attachments/assets/38f77101-7388-4363-ab27-668180f51aaa" />|
44+
| Exception: Invalid Content | <img width="320" alt="image" src="https://github.com/user-attachments/assets/5ebec2b2-12f6-4d1a-84a7-8202ef651223" /> |
45+
46+
47+
<br>
48+
49+
50+
### Import from URL
51+
52+
**UI-TARS Desktop** also supports importing presets from URLs. If automatic updates are set, presets will be automatically pulled every time the application is started.
53+
54+
| Function | Snapshot |
55+
| --- | ---|
56+
| Open Setting | <img width="320" alt="image" src="https://github.com/user-attachments/assets/d446da0e-3bb4-4ca5-bc95-4f235d979fd0" /> |
57+
| Import Success (Default) | <img width="320" alt="image" src="https://github.com/user-attachments/assets/a6470ed4-80ac-45a1-aaba-39e598d5af0f" /> |
58+
| Import Success (Auto Update) | <img width="320" alt="image" src="https://github.com/user-attachments/assets/b5364d66-6654-401b-969e-f85baeedbda0" />|
59+
60+
61+
<br>
62+
63+
64+
### Preset Example
65+
66+
```yaml
67+
name: UI TARS Desktop Example Preset
68+
language: en
69+
vlmProvider: Hugging Face
70+
vlmBaseUrl: https://your-endpoint.huggingface.cloud/v1
71+
vlmApiKey: your_api_key
72+
vlmModelName: your_model_name
73+
reportStorageBaseUrl: https://your-report-storage-endpoint.com/upload
74+
utioBaseUrl: https://your-utio-endpoint.com/collect
75+
```
76+
77+
See all [example presets](../examples/presets).
78+

docs/setting.md

+283
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,283 @@
1+
# Settings Configuration Guide
2+
3+
## Overview
4+
5+
**UI-TARS Desktop** offers granular control over application behavior through its settings system. This document provides comprehensive guidance on configuration options, preset management, and operational best practices.
6+
7+
<p align="center">
8+
<img src="../images/setting.png" alt="Settings Interface Overview" width="650">
9+
<br>
10+
<em>Main Settings Interface</em>
11+
</p>
12+
13+
14+
<br>
15+
16+
17+
## Configuration Options
18+
19+
### Language
20+
21+
Controls localization settings for VLM.
22+
23+
| Property | Details |
24+
| ----------- | ------------------------------ |
25+
| **Type** | `string` |
26+
| **Options** | `en` (English), `zh` (Chinese) |
27+
| **Default** | `en` |
28+
29+
> [!NOTE]
30+
> Changing the settings will **only** affect the output of VLM, not the language of the desktop app itself. Regarding the i18n of the App itself, welcome to contribute PR.
31+
32+
33+
<br>
34+
35+
36+
### VLM Provider
37+
38+
Selects the backend VLM provider for make GUI action decisions.
39+
40+
| Property | Details |
41+
| ----------- | ---------------------- |
42+
| **Type** | `string` |
43+
| **Options** | `Hugging Face`, `vLLM` |
44+
| **Default** | `Hugging Face` |
45+
46+
> [!NOTE]
47+
> This is an interface reserved for different VLM providers.
48+
49+
50+
<br>
51+
52+
53+
54+
### VLM Base URL
55+
56+
Specify the base url of the VLM that needs to be requested.
57+
58+
| Property | Details |
59+
| ------------ | -------- |
60+
| **Type** | `string` |
61+
| **Required** | `true` |
62+
63+
> [!NOTE]
64+
> VLM Base URL should be OpenAI compatible API endpoints (see [OpenAI API protocol document](https://platform.openai.com/docs/guides/vision/uploading-base-64-encoded-images) for more details).
65+
66+
67+
<br>
68+
69+
70+
71+
### VLM Model Name
72+
73+
Specify the requested module name.
74+
75+
| Property | Details |
76+
| ------------ | -------- |
77+
| **Type** | `string` |
78+
| **Required** | `true` |
79+
80+
81+
<br>
82+
83+
84+
### Report Storage Base URL
85+
86+
Defines the base URL for uploading report file. By default, when this option is not set, when the user clicks **Export as HTML** (a.k.a. <b>Share</b>), it will automatically trigger the download of the report file:
87+
88+
<p align="center">
89+
<img src="../images/download-report.png" alt="Download report" width="320">
90+
<br>
91+
</p>
92+
93+
Once it's set, when user click **Export as HTML**, report file will firstly be uploaded to the Report Storage Server, which returns a publicly accessible URL for the persistent file.
94+
95+
<p align="center">
96+
<img src="../images/upload-report-success.png" alt="Download report" width="320">
97+
<br>
98+
</p>
99+
100+
#### Report Storage Server Interface
101+
102+
The Report Storage Server should implement the following HTTP API endpoint:
103+
104+
| Property | Details |
105+
| ------------ | ------------------------------------------------------------------------------------------------------------ |
106+
| **Endpoint** | `POST /your-storage-enpoint` |
107+
| **Headers** | Content-Type: `multipart/form-data` <br> <!-- - Authorization: Bearer \<access_token\> (Not Supported) --> |
108+
109+
#### Request Body
110+
111+
The request should be sent as `multipart/form-data` with the following field:
112+
113+
| Field | Type | Required | Description | Constraints |
114+
| ----- | ---- | -------- | ---------------- | ---------------------------------- |
115+
| file | File | Yes | HTML report file | - Format: HTML<br>- Max size: 30MB |
116+
117+
#### Response
118+
119+
**Success Response (200 OK)**
120+
```json
121+
{
122+
"url": "https://example.com/reports/xxx.html"
123+
}
124+
```
125+
126+
The response should return a JSON object containing a publicly accessible URL where the report can be accessed.
127+
128+
> [!NOTE]
129+
> Currently, there is no authentication designed for Report Storage Server. If you have any requirements, please submit an [issue](https://github.com/bytedance/UI-TARS-desktop/issues).
130+
131+
132+
<br>
133+
134+
135+
### UTIO Base URL
136+
137+
**UTIO** (_UI-TARS Insights and Observation_) is a data collection mechanism for insights into **UI-TARS Desktop** (_Introduced at [#60](https://github.com/bytedance/UI-TARS-desktop/pull/60)_). The design of UTIO is also related to sharing. The overall process is as follows:
138+
139+
<p align="center">
140+
<img src="../images/utio-flow.png" alt="UTIO Flow" width="800">
141+
<br>
142+
<em>UTIO Flow</em>
143+
</p>
144+
145+
This option defines the base URL for the **UTIO** server that handles application events and instructions.
146+
147+
148+
#### Server Interface Specification
149+
150+
The UTIO server accepts events through HTTP POST requests and supports three types of events:
151+
152+
| Property | Details |
153+
| ------------ | -------------------------------- |
154+
| **Endpoint** | `POST /your-utio-endpoint` |
155+
| **Headers** | Content-Type: `application/json` |
156+
157+
##### Event Types
158+
159+
The server handles three types of events:
160+
161+
###### **Application Launch**
162+
```typescript
163+
interface AppLaunchedEvent {
164+
type: 'appLaunched';
165+
platform: 'iOS' | 'Android' | 'Web';
166+
osVersion: string;
167+
screenWidth: number;
168+
screenHeight: number;
169+
}
170+
```
171+
172+
###### **Send Instruction**
173+
```typescript
174+
interface SendInstructionEvent {
175+
type: 'sendInstruction';
176+
instruction: string;
177+
}
178+
```
179+
180+
###### **Share Report**
181+
```typescript
182+
interface ShareReportEvent {
183+
type: 'shareReport';
184+
lastScreenshot?: string;
185+
report?: string;
186+
instruction: string;
187+
}
188+
```
189+
190+
##### Request Example
191+
192+
```json
193+
{
194+
"type": "appLaunched",
195+
"platform": "iOS",
196+
"osVersion": "16.0.0",
197+
"screenWidth": 390,
198+
"screenHeight": 844
199+
}
200+
```
201+
202+
##### Response
203+
204+
**Success Response (200 OK)**
205+
```json
206+
{
207+
"success": true
208+
}
209+
```
210+
211+
> [!NOTE]
212+
> All events are processed asynchronously. The server should respond promptly to acknowledge receipt of the event.
213+
214+
215+
##### Server Example
216+
217+
###### Node.js
218+
219+
```js
220+
const express = require('express');
221+
const cors = require('cors');
222+
const app = express();
223+
const port = 3000;
224+
225+
app.use(cors());
226+
app.use(express.json());
227+
228+
app.post('/your-utio-endpoint', (req, res) => {
229+
const event = req.body;
230+
231+
if (!event || !event.type) {
232+
return res.status(400).json({ error: 'Missing event type' });
233+
}
234+
235+
switch (event.type) {
236+
case 'appLaunched':
237+
return handleAppLaunch(event, res);
238+
case 'sendInstruction':
239+
return handleSendInstruction(event, res);
240+
case 'shareReport':
241+
return handleShareReport(event, res);
242+
default:
243+
return res.status(400).json({ error: 'Unsupported event type' });
244+
}
245+
});
246+
247+
app.listen(port, () => {
248+
console.log(`Server listening on port ${port}`);
249+
});
250+
```
251+
252+
###### Python
253+
254+
```python
255+
from flask import Flask, request, jsonify
256+
from flask_cors import CORS
257+
import re
258+
259+
app = Flask(__name__)
260+
CORS(app)
261+
262+
@app.route('/events', methods=['POST'])
263+
def handle_event():
264+
data = request.get_json()
265+
266+
if not data or 'type' not in data:
267+
return jsonify({'error': 'Missing event type'}), 400
268+
269+
event_type = data['type']
270+
271+
if event_type == 'appLaunched':
272+
return handle_app_launch(data)
273+
elif event_type == 'sendInstruction':
274+
return handle_send_instruction(data)
275+
elif event_type == 'shareReport':
276+
return handle_share_report(data)
277+
else:
278+
return jsonify({'error': 'Unsupported event type'}), 400
279+
280+
if __name__ == '__main__':
281+
app.run(port=3000)
282+
```
283+

examples/presets/default.yaml

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
name: UI TARS Desktop Example Preset
2+
language: en
3+
vlmProvider: Hugging Face
4+
vlmBaseUrl: https://your-endpoint.huggingface.cloud/v1
5+
vlmApiKey: your_api_key
6+
vlmModelName: your_model_name
7+
reportStorageBaseUrl: https://your-report-storage-endpoint.com/upload
8+
utioBaseUrl: https://your-utio-endpoint.com/collect

images/download-report.png

46.1 KB
Loading

images/import-preset-from-local.png

103 KB
Loading

images/import-preset-from-remote.png

107 KB
Loading

images/setting.png

-350 KB
Loading

images/upload-report-success.png

165 KB
Loading

images/utio-flow.png

262 KB
Loading

0 commit comments

Comments
 (0)