Skip to content

Commit a2065ad

Browse files
authored
docs: some enhancements (#230)
1 parent b2e1271 commit a2065ad

File tree

6 files changed

+123
-112
lines changed

6 files changed

+123
-112
lines changed

README.md

Lines changed: 14 additions & 96 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
> **\[2025-03-16\]** We released a **technical preview** version of a new desktop app - [Agent TARS](./apps/agent-tars/README.md), a multimodal AI agent that leverages browser operations by visually interpreting web pages and seamlessly integrating with command lines and file systems.
55
66
<p align="center">
7-
<img alt="UI-TARS" width="260" src="./resources/icon.png">
7+
<img alt="UI-TARS" width="260" src="./apps/ui-tars/resources/icon.png">
88
</p>
99

1010
# UI-TARS Desktop
@@ -22,26 +22,20 @@ UI-TARS Desktop is a GUI Agent application based on [UI-TARS (Vision-Language Mo
2222
| &nbsp&nbsp 👓 <a href="https://github.com/web-infra-dev/midscene">Midscene (use in browser)</a>
2323
</p>
2424

25-
### ⚠️ Important Announcement: GGUF Model Performance
26-
27-
The **GGUF model** has undergone quantization, but unfortunately, its performance cannot be guaranteed. As a result, we have decided to **downgrade** it.
28-
29-
💡 **Alternative Solution**:
30-
You can use **[Cloud Deployment](#cloud-deployment)** or **[Local Deployment [vLLM]](#local-deployment-vllm)**(If you have enough GPU resources) instead.
31-
32-
We appreciate your understanding and patience as we work to ensure the best possible experience.
33-
34-
## Updates
35-
36-
- 🚀 01.25: We updated the **[Cloud Deployment](#cloud-deployment)** section in the 中文版: [GUI模型部署教程](https://bytedance.sg.larkoffice.com/docx/TCcudYwyIox5vyxiSDLlgIsTgWf#U94rdCxzBoJMLex38NPlHL21gNb) with new information related to the ModelScope platform. You can now use the ModelScope platform for deployment.
37-
3825
## Showcases
3926

4027
| Instruction | Video |
4128
| :---: | :---: |
4229
| Get the current weather in SF using the web browser | <video src="https://github.com/user-attachments/assets/5235418c-ac61-4895-831d-68c1c749fc87" height="300" /> |
4330
| Send a twitter with the content "hello world" | <video src="https://github.com/user-attachments/assets/737ccc11-9124-4464-b4be-3514cbced85c" height="300" /> |
4431

32+
33+
## News
34+
35+
- **\[2025-02-20\]** - 📦 Introduced [UI TARS SDK](./docs/sdk.md), is a powerful cross-platform toolkit for building GUI automation agents.
36+
- **\[2025-01-23\]** - 🚀 We updated the **[Cloud Deployment](./docs/deployment.md#cloud-deployment)** section in the 中文版: [GUI模型部署教程](https://bytedance.sg.larkoffice.com/docx/TCcudYwyIox5vyxiSDLlgIsTgWf#U94rdCxzBoJMLex38NPlHL21gNb) with new information related to the ModelScope platform. You can now use the ModelScope platform for deployment.
37+
38+
4539
## Features
4640

4741
- 🤖 Natural language control powered by Vision-Language Model
@@ -53,95 +47,19 @@ We appreciate your understanding and patience as we work to ensure the best poss
5347

5448
## Quick Start
5549

56-
### Download
57-
58-
You can download the [latest release](https://github.com/bytedance/UI-TARS-desktop/releases/latest) version of UI-TARS Desktop from our releases page.
59-
60-
> **Note**: If you have [Homebrew](https://brew.sh/) installed, you can install UI-TARS Desktop by running the following command:
61-
> ```bash
62-
> brew install --cask ui-tars
63-
> ```
64-
65-
### Install
66-
67-
#### MacOS
68-
69-
1. Drag **UI TARS** application into the **Applications** folder
70-
<img src="./images/mac_install.png" width="500px" />
71-
72-
2. Enable the permission of **UI TARS** in MacOS:
73-
- System Settings -> Privacy & Security -> **Accessibility**
74-
- System Settings -> Privacy & Security -> **Screen Recording**
75-
<img src="./images/mac_permission.png" width="500px" />
76-
77-
3. Then open **UI TARS** application, you can see the following interface:
78-
<img src="./images/mac_app.png" width="500px" />
79-
80-
81-
#### Windows
82-
83-
**Still to run** the application, you can see the following interface:
84-
85-
<img src="./images/windows_install.png" width="400px" />
86-
87-
### Deployment
88-
89-
#### Cloud Deployment
90-
We recommend using HuggingFace Inference Endpoints for fast deployment.
91-
We provide two docs for users to refer:
92-
93-
English version: [GUI Model Deployment Guide](https://juniper-switch-f10.notion.site/GUI-Model-Deployment-Guide-17b5350241e280058e98cea60317de71)
94-
95-
中文版: [GUI模型部署教程](https://bytedance.sg.larkoffice.com/docx/TCcudYwyIox5vyxiSDLlgIsTgWf#U94rdCxzBoJMLex38NPlHL21gNb)
96-
97-
#### Local Deployment [vLLM]
98-
We recommend using vLLM for fast deployment and inference. You need to use `vllm>=0.6.1`.
99-
```bash
100-
pip install -U transformers
101-
VLLM_VERSION=0.6.6
102-
CUDA_VERSION=cu124
103-
pip install vllm==${VLLM_VERSION} --extra-index-url https://download.pytorch.org/whl/${CUDA_VERSION}
104-
105-
```
106-
##### Download the Model
107-
We provide three model sizes on Hugging Face: **2B**, **7B**, and **72B**. To achieve the best performance, we recommend using the **7B-DPO** or **72B-DPO** model (based on your hardware configuration):
108-
109-
- [2B-SFT](https://huggingface.co/bytedance-research/UI-TARS-2B-SFT)
110-
- [7B-SFT](https://huggingface.co/bytedance-research/UI-TARS-7B-SFT)
111-
- [7B-DPO](https://huggingface.co/bytedance-research/UI-TARS-7B-DPO)
112-
- [72B-SFT](https://huggingface.co/bytedance-research/UI-TARS-72B-SFT)
113-
- [72B-DPO](https://huggingface.co/bytedance-research/UI-TARS-72B-DPO)
114-
115-
116-
##### Start an OpenAI API Service
117-
Run the command below to start an OpenAI-compatible API service:
118-
119-
```bash
120-
python -m vllm.entrypoints.openai.api_server --served-model-name ui-tars --model <path to your model>
121-
```
122-
123-
##### Input your API information
124-
125-
<img src="./images/settings_model.png" width="500px" />
126-
127-
<!-- If you use Ollama, you can use the following settings to start the server:
50+
See [Quick Start](./docs/quick-start.md).
12851

129-
```yaml
130-
VLM Provider: ollama
131-
VLM Base Url: http://localhost:11434/v1
132-
VLM API Key: api_key
133-
VLM Model Name: ui-tars
134-
``` -->
52+
## Deployment
13553

136-
> **Note**: VLM Base Url is OpenAI compatible API endpoints (see [OpenAI API protocol document](https://platform.openai.com/docs/guides/vision/uploading-base-64-encoded-images) for more details).
54+
See [Deployment](./docs/deployment.md).
13755

13856
## Contributing
13957

140-
[CONTRIBUTING.md](./CONTRIBUTING.md)
58+
See [CONTRIBUTING.md](./CONTRIBUTING.md).
14159

142-
## SDK(Experimental)
60+
## SDK (Experimental)
14361

144-
[SDK](./docs/sdk.md)
62+
See [@ui-tars/sdk](./docs/sdk.md)
14563

14664
## License
14765

apps/agent-tars/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@
2828

2929
## Getting Started
3030

31-
<!--TODO-->
31+
See [Quick Start](./docs/quick-start.md).
3232

3333
## Contributing
3434

apps/agent-tars/quick-start.md renamed to apps/agent-tars/docs/quick-start.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,65 +1,65 @@
1-
# Agent Tars Quick Start
1+
# Getting started with Agent TARS
22

3-
Hello, welcome to Agent Tars!
3+
Hello, welcome to Agent TARS!
44

5-
This guide will walk you through the process of setting up your first Agent Tars project.
5+
This guide will walk you through the process of setting up your first Agent TARS project.
66

77
## Necessary Configuration
88

99
Before you begin, you will need to set some necessary configuration.
1010

1111
You can click the left-bottom button to open the configuration page:
1212

13-
[setting-icon.png](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/setting-icon.jpeg)
13+
![setting-icon.png](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/setting-icon.jpeg)
1414

1515
Then you can set the model config and the search config.
1616

1717
For model config, you can set the model provider and api key:
1818

19-
[model-config.png](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/search-setting.jpeg)
19+
![model-config.png](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/search-setting.jpeg)
2020

2121
> For Azure OpenAI, you can set more params, including apiVersion, deploymentName and endpoint.
2222
2323
For search config, you can set the search provider and api key:
2424

25-
[search-settings.png](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/search-setting.jpeg)
25+
![search-settings.png](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/search-setting.jpeg)
2626

2727
## Start Your First Journey
2828

29-
Now you can start your first journey in Agent Tars!
29+
Now you can start your first journey in Agent TARS!
3030

3131
You can input your first question in the input box, and then press Enter to send your question. Here is an example:
3232

33-
[first-journey.jpeg](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/start-journey.jpeg)
33+
![first-journey.jpeg](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/start-journey.jpeg)
3434

3535
It's working!
3636

3737
We also support **Human In the Loop**, that means you can interact with the agent in the working process by the input box. If you want to change the direction of current agent work, you can insert your thoughts in the special input box on the top position, and then press Enter to send your thoughts. Here is an example:
3838

39-
[human-in-the-loop.jpeg](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/human-in-the-loop.jpeg)
39+
![human-in-the-loop.jpeg](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/human-in-the-loop.jpeg)
4040

4141
## Share Your Thead
4242

4343
You can share your thread with others by the share button on the top menu.
4444

4545
There are two modes to share your thread:
4646

47-
- **Local Html**: Agent Tars will bundle your thread into a html file, and you can share it with others.
48-
- **Remote Server Url**: Agent Tars will generate a url for you to share your thread with others, Agent Tars will upload the html bundle to a remote server.
47+
- **Local Html**: Agent TARS will bundle your thread into a html file, and you can share it with others.
48+
- **Remote Server Url**: Agent TARS will generate a url for you to share your thread with others, Agent TARS will upload the html bundle to a remote server.
4949

5050
### Local Mode
5151

5252
You can click the share button to open the share modal, and then click the **Local Html** button to share your thread.
5353

54-
[local-share](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/local-share.jpeg)
54+
![local-share](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/local-share.jpeg)
5555

5656
### Remote Mode
5757

5858
For the remote share mode, you need to set the remote server url in the share modal:
5959

60-
[remote-share](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/local-share.jpeg)
60+
![remote-share](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/remote-share.jpeg)
6161

62-
Then Agent Tars will post a request to the remote server to upload the html bundle, and then you can share the url with others. The specific request information is as follows:
62+
Then Agent TARS will post a request to the remote server to upload the html bundle, and then you can share the url with others. The specific request information is as follows:
6363

6464
- Method: POST
6565
- Body:

docs/deployment.md

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
# Deployment
2+
3+
### ⚠️ Important Announcement: GGUF Model Performance
4+
5+
The **GGUF model** has undergone quantization, but unfortunately, its performance cannot be guaranteed. As a result, we have decided to **downgrade** it.
6+
7+
💡 **Alternative Solution**:
8+
You can use **[Cloud Deployment](#cloud-deployment)** or **[Local Deployment [vLLM]](#local-deployment-vllm)**(If you have enough GPU resources) instead.
9+
10+
We appreciate your understanding and patience as we work to ensure the best possible experience.
11+
12+
## Cloud Deployment
13+
14+
We recommend using HuggingFace Inference Endpoints for fast deployment.
15+
We provide two docs for users to refer:
16+
17+
English version: [GUI Model Deployment Guide](https://juniper-switch-f10.notion.site/GUI-Model-Deployment-Guide-17b5350241e280058e98cea60317de71)
18+
19+
中文版: [GUI模型部署教程](https://bytedance.sg.larkoffice.com/docx/TCcudYwyIox5vyxiSDLlgIsTgWf#U94rdCxzBoJMLex38NPlHL21gNb)
20+
21+
## Local Deployment [vLLM]
22+
We recommend using vLLM for fast deployment and inference. You need to use `vllm>=0.6.1`.
23+
```bash
24+
pip install -U transformers
25+
VLLM_VERSION=0.6.6
26+
CUDA_VERSION=cu124
27+
pip install vllm==${VLLM_VERSION} --extra-index-url https://download.pytorch.org/whl/${CUDA_VERSION}
28+
29+
```
30+
### Download the Model
31+
We provide three model sizes on Hugging Face: **2B**, **7B**, and **72B**. To achieve the best performance, we recommend using the **7B-DPO** or **72B-DPO** model (based on your hardware configuration):
32+
33+
- [2B-SFT](https://huggingface.co/bytedance-research/UI-TARS-2B-SFT)
34+
- [7B-SFT](https://huggingface.co/bytedance-research/UI-TARS-7B-SFT)
35+
- [7B-DPO](https://huggingface.co/bytedance-research/UI-TARS-7B-DPO)
36+
- [72B-SFT](https://huggingface.co/bytedance-research/UI-TARS-72B-SFT)
37+
- [72B-DPO](https://huggingface.co/bytedance-research/UI-TARS-72B-DPO)
38+
39+
40+
### Start an OpenAI API Service
41+
Run the command below to start an OpenAI-compatible API service:
42+
43+
```bash
44+
python -m vllm.entrypoints.openai.api_server --served-model-name ui-tars --model <path to your model>
45+
```
46+
47+
### Input your API information
48+
49+
<img src="../apps/ui-tars/images/settings_model.png" width="500px" />
50+
51+
<!-- If you use Ollama, you can use the following settings to start the server:
52+
53+
```yaml
54+
VLM Provider: ollama
55+
VLM Base Url: http://localhost:11434/v1
56+
VLM API Key: api_key
57+
VLM Model Name: ui-tars
58+
``` -->
59+
60+
> **Note**: VLM Base Url is OpenAI compatible API endpoints (see [OpenAI API protocol document](https://platform.openai.com/docs/guides/vision/uploading-base-64-encoded-images) for more details).

docs/quick-start.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Quick Start
2+
3+
## Download
4+
5+
You can download the [latest release](https://github.com/bytedance/UI-TARS-desktop/releases/latest) version of UI-TARS Desktop from our releases page.
6+
7+
> **Note**: If you have [Homebrew](https://brew.sh/) installed, you can install UI-TARS Desktop by running the following command:
8+
> ```bash
9+
> brew install --cask ui-tars
10+
> ```
11+
12+
## Install
13+
14+
### MacOS
15+
16+
1. Drag **UI TARS** application into the **Applications** folder
17+
<img src="../apps/ui-tars/images/mac_install.png" width="500px" />
18+
19+
2. Enable the permission of **UI TARS** in MacOS:
20+
- System Settings -> Privacy & Security -> **Accessibility**
21+
- System Settings -> Privacy & Security -> **Screen Recording**
22+
<img src="../apps/ui-tars/images/mac_permission.png" width="500px" />
23+
24+
3. Then open **UI TARS** application, you can see the following interface:
25+
<img src="../apps/ui-tars/images/mac_app.png" width="500px" />
26+
27+
28+
### Windows
29+
30+
**Still to run** the application, you can see the following interface:
31+
32+
<img src="../apps/ui-tars/images/windows_install.png" width="400px" />
33+

docs/sdk.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# @ui-tars/sdk Guide(Beta)
1+
# @ui-tars/sdk Guide (Experimental)
22

33
## Overview
44

0 commit comments

Comments
 (0)