From b61795cef2bf94953660fc98dded0c6b10f205f5 Mon Sep 17 00:00:00 2001
From: chenhaoli
Date: Wed, 19 Mar 2025 19:28:31 +0800
Subject: [PATCH 1/3] docs(agent-tars): tweak quick start
---
README.md | 12 +++++-----
apps/agent-tars/README.md | 2 +-
apps/agent-tars/{ => docs}/quick-start.md | 28 +++++++++++------------
3 files changed, 21 insertions(+), 21 deletions(-)
rename apps/agent-tars/{ => docs}/quick-start.md (63%)
diff --git a/README.md b/README.md
index c001ee8fe..43fb3b9e8 100644
--- a/README.md
+++ b/README.md
@@ -4,7 +4,7 @@
> **\[2025-03-16\]** We released a **technical preview** version of a new desktop app - [Agent TARS](./apps/agent-tars/README.md), a multimodal AI agent that leverages browser operations by visually interpreting web pages and seamlessly integrating with command lines and file systems.
-
+
# UI-TARS Desktop
@@ -67,22 +67,22 @@ You can download the [latest release](https://github.com/bytedance/UI-TARS-deskt
#### MacOS
1. Drag **UI TARS** application into the **Applications** folder
-
+
2. Enable the permission of **UI TARS** in MacOS:
- System Settings -> Privacy & Security -> **Accessibility**
- System Settings -> Privacy & Security -> **Screen Recording**
-
+
3. Then open **UI TARS** application, you can see the following interface:
-
+
#### Windows
**Still to run** the application, you can see the following interface:
-
+
### Deployment
@@ -122,7 +122,7 @@ python -m vllm.entrypoints.openai.api_server --served-model-name ui-tars --model
##### Input your API information
-
+
+See [Quick Start](./docs/quick-start.md).
## Contributing
diff --git a/apps/agent-tars/quick-start.md b/apps/agent-tars/docs/quick-start.md
similarity index 63%
rename from apps/agent-tars/quick-start.md
rename to apps/agent-tars/docs/quick-start.md
index 0dd6b129d..7e31a5393 100644
--- a/apps/agent-tars/quick-start.md
+++ b/apps/agent-tars/docs/quick-start.md
@@ -1,8 +1,8 @@
-# Agent Tars Quick Start
+# Getting started with Agent TARS
-Hello, welcome to Agent Tars!
+Hello, welcome to Agent TARS!
-This guide will walk you through the process of setting up your first Agent Tars project.
+This guide will walk you through the process of setting up your first Agent TARS project.
## Necessary Configuration
@@ -10,33 +10,33 @@ Before you begin, you will need to set some necessary configuration.
You can click the left-bottom button to open the configuration page:
-[setting-icon.png](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/setting-icon.jpeg)
+
Then you can set the model config and the search config.
For model config, you can set the model provider and api key:
-[model-config.png](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/search-setting.jpeg)
+
> For Azure OpenAI, you can set more params, including apiVersion, deploymentName and endpoint.
For search config, you can set the search provider and api key:
-[search-settings.png](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/search-setting.jpeg)
+
## Start Your First Journey
-Now you can start your first journey in Agent Tars!
+Now you can start your first journey in Agent TARS!
You can input your first question in the input box, and then press Enter to send your question. Here is an example:
-[first-journey.jpeg](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/start-journey.jpeg)
+
It's working!
We also support **Human In the Loop**, that means you can interact with the agent in the working process by the input box. If you want to change the direction of current agent work, you can insert your thoughts in the special input box on the top position, and then press Enter to send your thoughts. Here is an example:
-[human-in-the-loop.jpeg](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/human-in-the-loop.jpeg)
+
## Share Your Thead
@@ -44,22 +44,22 @@ You can share your thread with others by the share button on the top menu.
There are two modes to share your thread:
-- **Local Html**: Agent Tars will bundle your thread into a html file, and you can share it with others.
-- **Remote Server Url**: Agent Tars will generate a url for you to share your thread with others, Agent Tars will upload the html bundle to a remote server.
+- **Local Html**: Agent TARS will bundle your thread into a html file, and you can share it with others.
+- **Remote Server Url**: Agent TARS will generate a url for you to share your thread with others, Agent TARS will upload the html bundle to a remote server.
### Local Mode
You can click the share button to open the share modal, and then click the **Local Html** button to share your thread.
-[local-share](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/local-share.jpeg)
+
### Remote Mode
For the remote share mode, you need to set the remote server url in the share modal:
-[remote-share](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/local-share.jpeg)
+
-Then Agent Tars will post a request to the remote server to upload the html bundle, and then you can share the url with others. The specific request information is as follows:
+Then Agent TARS will post a request to the remote server to upload the html bundle, and then you can share the url with others. The specific request information is as follows:
- Method: POST
- Body:
From c3dc79b19cd62186d58c4f520615875af4ef87d9 Mon Sep 17 00:00:00 2001
From: chenhaoli
Date: Wed, 19 Mar 2025 19:38:30 +0800
Subject: [PATCH 2/3] docs(ui-tars): tweak README.md
---
README.md | 96 +++------------------------------------------
docs/deployment.md | 60 ++++++++++++++++++++++++++++
docs/quick-start.md | 33 ++++++++++++++++
3 files changed, 99 insertions(+), 90 deletions(-)
create mode 100644 docs/deployment.md
create mode 100644 docs/quick-start.md
diff --git a/README.md b/README.md
index 43fb3b9e8..bfe85a41d 100644
--- a/README.md
+++ b/README.md
@@ -22,14 +22,6 @@ UI-TARS Desktop is a GUI Agent application based on [UI-TARS (Vision-Language Mo
|    👓 Midscene (use in browser)
-### ⚠️ Important Announcement: GGUF Model Performance
-
-The **GGUF model** has undergone quantization, but unfortunately, its performance cannot be guaranteed. As a result, we have decided to **downgrade** it.
-
-💡 **Alternative Solution**:
-You can use **[Cloud Deployment](#cloud-deployment)** or **[Local Deployment [vLLM]](#local-deployment-vllm)**(If you have enough GPU resources) instead.
-
-We appreciate your understanding and patience as we work to ensure the best possible experience.
## Updates
@@ -53,95 +45,19 @@ We appreciate your understanding and patience as we work to ensure the best poss
## Quick Start
-### Download
-
-You can download the [latest release](https://github.com/bytedance/UI-TARS-desktop/releases/latest) version of UI-TARS Desktop from our releases page.
-
-> **Note**: If you have [Homebrew](https://brew.sh/) installed, you can install UI-TARS Desktop by running the following command:
-> ```bash
-> brew install --cask ui-tars
-> ```
-
-### Install
-
-#### MacOS
-
-1. Drag **UI TARS** application into the **Applications** folder
-
-
-2. Enable the permission of **UI TARS** in MacOS:
- - System Settings -> Privacy & Security -> **Accessibility**
- - System Settings -> Privacy & Security -> **Screen Recording**
-
-
-3. Then open **UI TARS** application, you can see the following interface:
-
-
-
-#### Windows
-
-**Still to run** the application, you can see the following interface:
-
-
-
-### Deployment
-
-#### Cloud Deployment
-We recommend using HuggingFace Inference Endpoints for fast deployment.
-We provide two docs for users to refer:
-
-English version: [GUI Model Deployment Guide](https://juniper-switch-f10.notion.site/GUI-Model-Deployment-Guide-17b5350241e280058e98cea60317de71)
-
-中文版: [GUI模型部署教程](https://bytedance.sg.larkoffice.com/docx/TCcudYwyIox5vyxiSDLlgIsTgWf#U94rdCxzBoJMLex38NPlHL21gNb)
-
-#### Local Deployment [vLLM]
-We recommend using vLLM for fast deployment and inference. You need to use `vllm>=0.6.1`.
-```bash
-pip install -U transformers
-VLLM_VERSION=0.6.6
-CUDA_VERSION=cu124
-pip install vllm==${VLLM_VERSION} --extra-index-url https://download.pytorch.org/whl/${CUDA_VERSION}
-
-```
-##### Download the Model
-We provide three model sizes on Hugging Face: **2B**, **7B**, and **72B**. To achieve the best performance, we recommend using the **7B-DPO** or **72B-DPO** model (based on your hardware configuration):
-
-- [2B-SFT](https://huggingface.co/bytedance-research/UI-TARS-2B-SFT)
-- [7B-SFT](https://huggingface.co/bytedance-research/UI-TARS-7B-SFT)
-- [7B-DPO](https://huggingface.co/bytedance-research/UI-TARS-7B-DPO)
-- [72B-SFT](https://huggingface.co/bytedance-research/UI-TARS-72B-SFT)
-- [72B-DPO](https://huggingface.co/bytedance-research/UI-TARS-72B-DPO)
-
-
-##### Start an OpenAI API Service
-Run the command below to start an OpenAI-compatible API service:
-
-```bash
-python -m vllm.entrypoints.openai.api_server --served-model-name ui-tars --model
-```
-
-##### Input your API information
-
-
-
-
+## Deployment
-> **Note**: VLM Base Url is OpenAI compatible API endpoints (see [OpenAI API protocol document](https://platform.openai.com/docs/guides/vision/uploading-base-64-encoded-images) for more details).
+See [Deployment](./docs/deployment.md).
## Contributing
-[CONTRIBUTING.md](./CONTRIBUTING.md)
+See [CONTRIBUTING.md](./CONTRIBUTING.md).
-## SDK(Experimental)
+## SDK (Experimental)
-[SDK](./docs/sdk.md)
+See [UI TARS SDK](./docs/sdk.md)
## License
diff --git a/docs/deployment.md b/docs/deployment.md
new file mode 100644
index 000000000..8da10fabe
--- /dev/null
+++ b/docs/deployment.md
@@ -0,0 +1,60 @@
+# Deployment
+
+### ⚠️ Important Announcement: GGUF Model Performance
+
+The **GGUF model** has undergone quantization, but unfortunately, its performance cannot be guaranteed. As a result, we have decided to **downgrade** it.
+
+💡 **Alternative Solution**:
+You can use **[Cloud Deployment](#cloud-deployment)** or **[Local Deployment [vLLM]](#local-deployment-vllm)**(If you have enough GPU resources) instead.
+
+We appreciate your understanding and patience as we work to ensure the best possible experience.
+
+## Cloud Deployment
+
+We recommend using HuggingFace Inference Endpoints for fast deployment.
+We provide two docs for users to refer:
+
+English version: [GUI Model Deployment Guide](https://juniper-switch-f10.notion.site/GUI-Model-Deployment-Guide-17b5350241e280058e98cea60317de71)
+
+中文版: [GUI模型部署教程](https://bytedance.sg.larkoffice.com/docx/TCcudYwyIox5vyxiSDLlgIsTgWf#U94rdCxzBoJMLex38NPlHL21gNb)
+
+## Local Deployment [vLLM]
+We recommend using vLLM for fast deployment and inference. You need to use `vllm>=0.6.1`.
+```bash
+pip install -U transformers
+VLLM_VERSION=0.6.6
+CUDA_VERSION=cu124
+pip install vllm==${VLLM_VERSION} --extra-index-url https://download.pytorch.org/whl/${CUDA_VERSION}
+
+```
+### Download the Model
+We provide three model sizes on Hugging Face: **2B**, **7B**, and **72B**. To achieve the best performance, we recommend using the **7B-DPO** or **72B-DPO** model (based on your hardware configuration):
+
+- [2B-SFT](https://huggingface.co/bytedance-research/UI-TARS-2B-SFT)
+- [7B-SFT](https://huggingface.co/bytedance-research/UI-TARS-7B-SFT)
+- [7B-DPO](https://huggingface.co/bytedance-research/UI-TARS-7B-DPO)
+- [72B-SFT](https://huggingface.co/bytedance-research/UI-TARS-72B-SFT)
+- [72B-DPO](https://huggingface.co/bytedance-research/UI-TARS-72B-DPO)
+
+
+### Start an OpenAI API Service
+Run the command below to start an OpenAI-compatible API service:
+
+```bash
+python -m vllm.entrypoints.openai.api_server --served-model-name ui-tars --model
+```
+
+### Input your API information
+
+
+
+
+
+> **Note**: VLM Base Url is OpenAI compatible API endpoints (see [OpenAI API protocol document](https://platform.openai.com/docs/guides/vision/uploading-base-64-encoded-images) for more details).
diff --git a/docs/quick-start.md b/docs/quick-start.md
new file mode 100644
index 000000000..447b783a8
--- /dev/null
+++ b/docs/quick-start.md
@@ -0,0 +1,33 @@
+# Quick Start
+
+## Download
+
+You can download the [latest release](https://github.com/bytedance/UI-TARS-desktop/releases/latest) version of UI-TARS Desktop from our releases page.
+
+> **Note**: If you have [Homebrew](https://brew.sh/) installed, you can install UI-TARS Desktop by running the following command:
+> ```bash
+> brew install --cask ui-tars
+> ```
+
+## Install
+
+### MacOS
+
+1. Drag **UI TARS** application into the **Applications** folder
+
+
+2. Enable the permission of **UI TARS** in MacOS:
+ - System Settings -> Privacy & Security -> **Accessibility**
+ - System Settings -> Privacy & Security -> **Screen Recording**
+
+
+3. Then open **UI TARS** application, you can see the following interface:
+
+
+
+### Windows
+
+**Still to run** the application, you can see the following interface:
+
+
+
From e086963a0994cd4db01571eb1b3061417614176c Mon Sep 17 00:00:00 2001
From: chenhaoli
Date: Wed, 19 Mar 2025 19:45:36 +0800
Subject: [PATCH 3/3] choreL tweaks
---
README.md | 14 ++++++++------
docs/sdk.md | 2 +-
2 files changed, 9 insertions(+), 7 deletions(-)
diff --git a/README.md b/README.md
index bfe85a41d..cf0f5172c 100644
--- a/README.md
+++ b/README.md
@@ -22,11 +22,6 @@ UI-TARS Desktop is a GUI Agent application based on [UI-TARS (Vision-Language Mo
|    👓 Midscene (use in browser)
-
-## Updates
-
-- 🚀 01.25: We updated the **[Cloud Deployment](#cloud-deployment)** section in the 中文版: [GUI模型部署教程](https://bytedance.sg.larkoffice.com/docx/TCcudYwyIox5vyxiSDLlgIsTgWf#U94rdCxzBoJMLex38NPlHL21gNb) with new information related to the ModelScope platform. You can now use the ModelScope platform for deployment.
-
## Showcases
| Instruction | Video |
@@ -34,6 +29,13 @@ UI-TARS Desktop is a GUI Agent application based on [UI-TARS (Vision-Language Mo
| Get the current weather in SF using the web browser | |
| Send a twitter with the content "hello world" | |
+
+## News
+
+- **\[2025-02-20\]** - 📦 Introduced [UI TARS SDK](./docs/sdk.md), is a powerful cross-platform toolkit for building GUI automation agents.
+- **\[2025-01-23\]** - 🚀 We updated the **[Cloud Deployment](./docs/deployment.md#cloud-deployment)** section in the 中文版: [GUI模型部署教程](https://bytedance.sg.larkoffice.com/docx/TCcudYwyIox5vyxiSDLlgIsTgWf#U94rdCxzBoJMLex38NPlHL21gNb) with new information related to the ModelScope platform. You can now use the ModelScope platform for deployment.
+
+
## Features
- 🤖 Natural language control powered by Vision-Language Model
@@ -57,7 +59,7 @@ See [CONTRIBUTING.md](./CONTRIBUTING.md).
## SDK (Experimental)
-See [UI TARS SDK](./docs/sdk.md)
+See [@ui-tars/sdk](./docs/sdk.md)
## License
diff --git a/docs/sdk.md b/docs/sdk.md
index 4e55871e9..209a75f46 100644
--- a/docs/sdk.md
+++ b/docs/sdk.md
@@ -1,4 +1,4 @@
-# @ui-tars/sdk Guide(Beta)
+# @ui-tars/sdk Guide (Experimental)
## Overview