You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+14-96Lines changed: 14 additions & 96 deletions
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@
4
4
> **\[2025-03-16\]** We released a **technical preview** version of a new desktop app - [Agent TARS](./apps/agent-tars/README.md), a multimodal AI agent that leverages browser operations by visually interpreting web pages and seamlessly integrating with command lines and file systems.
@@ -22,26 +22,20 @@ UI-TARS Desktop is a GUI Agent application based on [UI-TARS (Vision-Language Mo
22
22
|    👓 <ahref="https://github.com/web-infra-dev/midscene">Midscene (use in browser)</a>
23
23
</p>
24
24
25
-
### ⚠️ Important Announcement: GGUF Model Performance
26
-
27
-
The **GGUF model** has undergone quantization, but unfortunately, its performance cannot be guaranteed. As a result, we have decided to **downgrade** it.
28
-
29
-
💡 **Alternative Solution**:
30
-
You can use **[Cloud Deployment](#cloud-deployment)** or **[Local Deployment [vLLM]](#local-deployment-vllm)**(If you have enough GPU resources) instead.
31
-
32
-
We appreciate your understanding and patience as we work to ensure the best possible experience.
33
-
34
-
## Updates
35
-
36
-
- 🚀 01.25: We updated the **[Cloud Deployment](#cloud-deployment)** section in the 中文版: [GUI模型部署教程](https://bytedance.sg.larkoffice.com/docx/TCcudYwyIox5vyxiSDLlgIsTgWf#U94rdCxzBoJMLex38NPlHL21gNb) with new information related to the ModelScope platform. You can now use the ModelScope platform for deployment.
37
-
38
25
## Showcases
39
26
40
27
| Instruction | Video |
41
28
| :---: | :---: |
42
29
| Get the current weather in SF using the web browser | <videosrc="https://github.com/user-attachments/assets/5235418c-ac61-4895-831d-68c1c749fc87"height="300" /> |
43
30
| Send a twitter with the content "hello world" | <videosrc="https://github.com/user-attachments/assets/737ccc11-9124-4464-b4be-3514cbced85c"height="300" /> |
44
31
32
+
33
+
## News
34
+
35
+
-**\[2025-02-20\]** - 📦 Introduced [UI TARS SDK](./docs/sdk.md), is a powerful cross-platform toolkit for building GUI automation agents.
36
+
-**\[2025-01-23\]** - 🚀 We updated the **[Cloud Deployment](./docs/deployment.md#cloud-deployment)** section in the 中文版: [GUI模型部署教程](https://bytedance.sg.larkoffice.com/docx/TCcudYwyIox5vyxiSDLlgIsTgWf#U94rdCxzBoJMLex38NPlHL21gNb) with new information related to the ModelScope platform. You can now use the ModelScope platform for deployment.
37
+
38
+
45
39
## Features
46
40
47
41
- 🤖 Natural language control powered by Vision-Language Model
@@ -53,95 +47,19 @@ We appreciate your understanding and patience as we work to ensure the best poss
53
47
54
48
## Quick Start
55
49
56
-
### Download
57
-
58
-
You can download the [latest release](https://github.com/bytedance/UI-TARS-desktop/releases/latest) version of UI-TARS Desktop from our releases page.
59
-
60
-
> **Note**: If you have [Homebrew](https://brew.sh/) installed, you can install UI-TARS Desktop by running the following command:
61
-
> ```bash
62
-
> brew install --cask ui-tars
63
-
>```
64
-
65
-
### Install
66
-
67
-
#### MacOS
68
-
69
-
1. Drag **UI TARS** application into the **Applications** folder
We provide three model sizes on Hugging Face: **2B**, **7B**, and **72B**. To achieve the best performance, we recommend using the **7B-DPO** or **72B-DPO** model (based on your hardware configuration):
<!-- If you use Ollama, you can use the following settings to start the server:
50
+
See [Quick Start](./docs/quick-start.md).
128
51
129
-
```yaml
130
-
VLM Provider: ollama
131
-
VLM Base Url: http://localhost:11434/v1
132
-
VLM API Key: api_key
133
-
VLM Model Name: ui-tars
134
-
``` -->
52
+
## Deployment
135
53
136
-
> **Note**: VLM Base Url is OpenAI compatible API endpoints (see [OpenAI API protocol document](https://platform.openai.com/docs/guides/vision/uploading-base-64-encoded-images) for more details).
We also support **Human In the Loop**, that means you can interact with the agent in the working process by the input box. If you want to change the direction of current agent work, you can insert your thoughts in the special input box on the top position, and then press Enter to send your thoughts. Here is an example:
You can share your thread with others by the share button on the top menu.
44
44
45
45
There are two modes to share your thread:
46
46
47
-
-**Local Html**: Agent Tars will bundle your thread into a html file, and you can share it with others.
48
-
-**Remote Server Url**: Agent Tars will generate a url for you to share your thread with others, Agent Tars will upload the html bundle to a remote server.
47
+
-**Local Html**: Agent TARS will bundle your thread into a html file, and you can share it with others.
48
+
-**Remote Server Url**: Agent TARS will generate a url for you to share your thread with others, Agent TARS will upload the html bundle to a remote server.
49
49
50
50
### Local Mode
51
51
52
52
You can click the share button to open the share modal, and then click the **Local Html** button to share your thread.
Then Agent Tars will post a request to the remote server to upload the html bundle, and then you can share the url with others. The specific request information is as follows:
62
+
Then Agent TARS will post a request to the remote server to upload the html bundle, and then you can share the url with others. The specific request information is as follows:
### ⚠️ Important Announcement: GGUF Model Performance
4
+
5
+
The **GGUF model** has undergone quantization, but unfortunately, its performance cannot be guaranteed. As a result, we have decided to **downgrade** it.
6
+
7
+
💡 **Alternative Solution**:
8
+
You can use **[Cloud Deployment](#cloud-deployment)** or **[Local Deployment [vLLM]](#local-deployment-vllm)**(If you have enough GPU resources) instead.
9
+
10
+
We appreciate your understanding and patience as we work to ensure the best possible experience.
11
+
12
+
## Cloud Deployment
13
+
14
+
We recommend using HuggingFace Inference Endpoints for fast deployment.
15
+
We provide two docs for users to refer:
16
+
17
+
English version: [GUI Model Deployment Guide](https://juniper-switch-f10.notion.site/GUI-Model-Deployment-Guide-17b5350241e280058e98cea60317de71)
We provide three model sizes on Hugging Face: **2B**, **7B**, and **72B**. To achieve the best performance, we recommend using the **7B-DPO** or **72B-DPO** model (based on your hardware configuration):
<!-- If you use Ollama, you can use the following settings to start the server:
52
+
53
+
```yaml
54
+
VLM Provider: ollama
55
+
VLM Base Url: http://localhost:11434/v1
56
+
VLM API Key: api_key
57
+
VLM Model Name: ui-tars
58
+
``` -->
59
+
60
+
> **Note**: VLM Base Url is OpenAI compatible API endpoints (see [OpenAI API protocol document](https://platform.openai.com/docs/guides/vision/uploading-base-64-encoded-images) for more details).
0 commit comments