From b61795cef2bf94953660fc98dded0c6b10f205f5 Mon Sep 17 00:00:00 2001
From: chenhaoli <chenhaoli@bytedance.com>
Date: Wed, 19 Mar 2025 19:28:31 +0800
Subject: [PATCH 1/3] docs(agent-tars): tweak quick start

---
 README.md                                 | 12 +++++-----
 apps/agent-tars/README.md                 |  2 +-
 apps/agent-tars/{ => docs}/quick-start.md | 28 +++++++++++------------
 3 files changed, 21 insertions(+), 21 deletions(-)
 rename apps/agent-tars/{ => docs}/quick-start.md (63%)
diff --git a/README.md b/README.md
index c001ee8fe..43fb3b9e8 100644
--- a/README.md
+++ b/README.md
@@ -4,7 +4,7 @@
 > **\[2025-03-16\]** We released a **technical preview** version of a new desktop app - [Agent TARS](./apps/agent-tars/README.md), a multimodal AI agent that leverages browser operations by visually interpreting web pages and seamlessly integrating with command lines and file systems.
 
 <p align="center">
-  <img alt="UI-TARS"  width="260" src="./resources/icon.png">
+  <img alt="UI-TARS"  width="260" src="./apps/ui-tars/resources/icon.png">
 </p>
 
 # UI-TARS Desktop
@@ -67,22 +67,22 @@ You can download the [latest release](https://github.com/bytedance/UI-TARS-deskt
 #### MacOS
 
 1. Drag **UI TARS** application into the **Applications** folder
-  <img src="./images/mac_install.png" width="500px" />
+  <img src="./apps/ui-tars/images/mac_install.png" width="500px" />
 
 2. Enable the permission of **UI TARS** in MacOS:
   - System Settings -> Privacy & Security -> **Accessibility**
   - System Settings -> Privacy & Security -> **Screen Recording**
-  <img src="./images/mac_permission.png" width="500px" />
+  <img src="./apps/ui-tars/images/mac_permission.png" width="500px" />
 
 3. Then open **UI TARS** application, you can see the following interface:
-  <img src="./images/mac_app.png" width="500px" />
+  <img src="./apps/ui-tars/images/mac_app.png" width="500px" />
 
 
 #### Windows
 
 **Still to run** the application, you can see the following interface:
 
-<img src="./images/windows_install.png" width="400px" />
+<img src="./apps/ui-tars/images/windows_install.png" width="400px" />
 
 ### Deployment
 
@@ -122,7 +122,7 @@ python -m vllm.entrypoints.openai.api_server --served-model-name ui-tars --model
 
 ##### Input your API information
 
-<img src="./images/settings_model.png" width="500px" />
+<img src="./apps/ui-tars/images/settings_model.png" width="500px" />
 
 <!-- If you use Ollama, you can use the following settings to start the server:
 
diff --git a/apps/agent-tars/README.md b/apps/agent-tars/README.md
index 5f28218fa..9c7c5c1d9 100644
--- a/apps/agent-tars/README.md
+++ b/apps/agent-tars/README.md
@@ -28,7 +28,7 @@
 
 ## Getting Started
 
-<!--TODO-->
+See [Quick Start](./docs/quick-start.md).
 
 ## Contributing
 
diff --git a/apps/agent-tars/quick-start.md b/apps/agent-tars/docs/quick-start.md
similarity index 63%
rename from apps/agent-tars/quick-start.md
rename to apps/agent-tars/docs/quick-start.md
index 0dd6b129d..7e31a5393 100644
--- a/apps/agent-tars/quick-start.md
+++ b/apps/agent-tars/docs/quick-start.md
@@ -1,8 +1,8 @@
-# Agent Tars Quick Start
+# Getting started with Agent TARS 
 
-Hello, welcome to Agent Tars!
+Hello, welcome to Agent TARS!
 
-This guide will walk you through the process of setting up your first Agent Tars project.
+This guide will walk you through the process of setting up your first Agent TARS project.
 
 ## Necessary Configuration
 
@@ -10,33 +10,33 @@ Before you begin, you will need to set some necessary configuration.
 
 You can click the left-bottom button to open the configuration page:
 
-[setting-icon.png](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/setting-icon.jpeg)
+![setting-icon.png](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/setting-icon.jpeg)
 
 Then you can set the model config and the search config.
 
 For model config, you can set the model provider and api key:
 
-[model-config.png](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/search-setting.jpeg)
+![model-config.png](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/search-setting.jpeg)
 
 > For Azure OpenAI, you can set more params, including apiVersion, deploymentName and endpoint.
 
 For search config, you can set the search provider and api key:
 
-[search-settings.png](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/search-setting.jpeg)
+![search-settings.png](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/search-setting.jpeg)
 
 ## Start Your First Journey
 
-Now you can start your first journey in Agent Tars!
+Now you can start your first journey in Agent TARS!
 
 You can input your first question in the input box, and then press Enter to send your question. Here is an example:
 
-[first-journey.jpeg](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/start-journey.jpeg)
+![first-journey.jpeg](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/start-journey.jpeg)
 
 It's working!
 
 We also support **Human In the Loop**, that means you can interact with the agent in the working process by the input box. If you want to change the direction of current agent work, you can insert your thoughts in the special input box on the top position, and then press Enter to send your thoughts. Here is an example:
 
-[human-in-the-loop.jpeg](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/human-in-the-loop.jpeg)
+![human-in-the-loop.jpeg](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/human-in-the-loop.jpeg)
 
 ## Share Your Thead
 
@@ -44,22 +44,22 @@ You can share your thread with others by the share button on the top menu.
 
 There are two modes to share your thread:
 
-- **Local Html**: Agent Tars will bundle your thread into a html file, and you can share it with others.
-- **Remote Server Url**: Agent Tars will generate a url for you to share your thread with others, Agent Tars will upload the html bundle to a remote server.
+- **Local Html**: Agent TARS will bundle your thread into a html file, and you can share it with others.
+- **Remote Server Url**: Agent TARS will generate a url for you to share your thread with others, Agent TARS will upload the html bundle to a remote server.
 
 ### Local Mode
 
 You can click the share button to open the share modal, and then click the **Local Html** button to share your thread.
 
-[local-share](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/local-share.jpeg)
+![local-share](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/local-share.jpeg)
 
 ### Remote Mode
 
 For the remote share mode, you need to set the remote server url in the share modal:
 
-[remote-share](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/local-share.jpeg)
+![remote-share](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/remote-share.jpeg)
 
-Then Agent Tars will post a request to the remote server to upload the html bundle, and then you can share the url with others. The specific request information is as follows:
+Then Agent TARS will post a request to the remote server to upload the html bundle, and then you can share the url with others. The specific request information is as follows:
 
 - Method: POST
 - Body:

From c3dc79b19cd62186d58c4f520615875af4ef87d9 Mon Sep 17 00:00:00 2001
From: chenhaoli <chenhaoli@bytedance.com>
Date: Wed, 19 Mar 2025 19:38:30 +0800
Subject: [PATCH 2/3] docs(ui-tars): tweak README.md

---
 README.md           | 96 +++------------------------------------------
 docs/deployment.md  | 60 ++++++++++++++++++++++++++++
 docs/quick-start.md | 33 ++++++++++++++++
 3 files changed, 99 insertions(+), 90 deletions(-)
 create mode 100644 docs/deployment.md
 create mode 100644 docs/quick-start.md

diff --git a/README.md b/README.md
index 43fb3b9e8..bfe85a41d 100644
--- a/README.md
+++ b/README.md
@@ -22,14 +22,6 @@ UI-TARS Desktop is a GUI Agent application based on [UI-TARS (Vision-Language Mo
 | &nbsp&nbsp 👓 <a href="https://github.com/web-infra-dev/midscene">Midscene (use in browser)</a>
 </p>
 
-### ⚠️ Important Announcement: GGUF Model Performance
-
-The **GGUF model** has undergone quantization, but unfortunately, its performance cannot be guaranteed. As a result, we have decided to **downgrade** it.
-
-💡 **Alternative Solution**:
-You can use **[Cloud Deployment](#cloud-deployment)** or **[Local Deployment [vLLM]](#local-deployment-vllm)**(If you have enough GPU resources) instead.
-
-We appreciate your understanding and patience as we work to ensure the best possible experience.
 
 ## Updates
 
@@ -53,95 +45,19 @@ We appreciate your understanding and patience as we work to ensure the best poss
 
 ## Quick Start
 
-### Download
-
-You can download the [latest release](https://github.com/bytedance/UI-TARS-desktop/releases/latest) version of UI-TARS Desktop from our releases page.
-
-> **Note**: If you have [Homebrew](https://brew.sh/) installed, you can install UI-TARS Desktop by running the following command:
-> ```bash
-> brew install --cask ui-tars
-> ```
-
-### Install
-
-#### MacOS
-
-1. Drag **UI TARS** application into the **Applications** folder
-  <img src="./apps/ui-tars/images/mac_install.png" width="500px" />
-
-2. Enable the permission of **UI TARS** in MacOS:
-  - System Settings -> Privacy & Security -> **Accessibility**
-  - System Settings -> Privacy & Security -> **Screen Recording**
-  <img src="./apps/ui-tars/images/mac_permission.png" width="500px" />
-
-3. Then open **UI TARS** application, you can see the following interface:
-  <img src="./apps/ui-tars/images/mac_app.png" width="500px" />
-
-
-#### Windows
-
-**Still to run** the application, you can see the following interface:
-
-<img src="./apps/ui-tars/images/windows_install.png" width="400px" />
-
-### Deployment
-
-#### Cloud Deployment
-We recommend using HuggingFace Inference Endpoints for fast deployment.
-We provide two docs for users to refer:
-
-English version: [GUI Model Deployment Guide](https://juniper-switch-f10.notion.site/GUI-Model-Deployment-Guide-17b5350241e280058e98cea60317de71)
-
-中文版: [GUI模型部署教程](https://bytedance.sg.larkoffice.com/docx/TCcudYwyIox5vyxiSDLlgIsTgWf#U94rdCxzBoJMLex38NPlHL21gNb)
-
-#### Local Deployment [vLLM]
-We recommend using vLLM for fast deployment and inference. You need to use `vllm>=0.6.1`.
-```bash
-pip install -U transformers
-VLLM_VERSION=0.6.6
-CUDA_VERSION=cu124
-pip install vllm==${VLLM_VERSION} --extra-index-url https://download.pytorch.org/whl/${CUDA_VERSION}
-
-```
-##### Download the Model
-We provide three model sizes on Hugging Face: **2B**, **7B**, and **72B**. To achieve the best performance, we recommend using the **7B-DPO** or **72B-DPO** model (based on your hardware configuration):
-
-- [2B-SFT](https://huggingface.co/bytedance-research/UI-TARS-2B-SFT)
-- [7B-SFT](https://huggingface.co/bytedance-research/UI-TARS-7B-SFT)
-- [7B-DPO](https://huggingface.co/bytedance-research/UI-TARS-7B-DPO)
-- [72B-SFT](https://huggingface.co/bytedance-research/UI-TARS-72B-SFT)
-- [72B-DPO](https://huggingface.co/bytedance-research/UI-TARS-72B-DPO)
-
-
-##### Start an OpenAI API Service
-Run the command below to start an OpenAI-compatible API service:
-
-```bash
-python -m vllm.entrypoints.openai.api_server --served-model-name ui-tars --model <path to your model>
-```
-
-##### Input your API information
-
-<img src="./apps/ui-tars/images/settings_model.png" width="500px" />
-
-<!-- If you use Ollama, you can use the following settings to start the server:
+See [Quick Start](./docs/quick-start.md).
 
-```yaml
-VLM Provider: ollama
-VLM Base Url: http://localhost:11434/v1
-VLM API Key: api_key
-VLM Model Name: ui-tars
-``` -->
+## Deployment
 
-> **Note**: VLM Base Url is OpenAI compatible API endpoints (see [OpenAI API protocol document](https://platform.openai.com/docs/guides/vision/uploading-base-64-encoded-images) for more details).
+See [Deployment](./docs/deployment.md).
 
 ## Contributing
 
-[CONTRIBUTING.md](./CONTRIBUTING.md)
+See [CONTRIBUTING.md](./CONTRIBUTING.md).
 
-## SDK(Experimental)
+## SDK (Experimental)
 
-[SDK](./docs/sdk.md)
+See [UI TARS SDK](./docs/sdk.md)
 
 ## License
 
diff --git a/docs/deployment.md b/docs/deployment.md
new file mode 100644
index 000000000..8da10fabe
--- /dev/null
+++ b/docs/deployment.md
@@ -0,0 +1,60 @@
+# Deployment
+
+### ⚠️ Important Announcement: GGUF Model Performance
+
+The **GGUF model** has undergone quantization, but unfortunately, its performance cannot be guaranteed. As a result, we have decided to **downgrade** it.
+
+💡 **Alternative Solution**:
+You can use **[Cloud Deployment](#cloud-deployment)** or **[Local Deployment [vLLM]](#local-deployment-vllm)**(If you have enough GPU resources) instead.
+
+We appreciate your understanding and patience as we work to ensure the best possible experience.
+
+## Cloud Deployment
+
+We recommend using HuggingFace Inference Endpoints for fast deployment.
+We provide two docs for users to refer:
+
+English version: [GUI Model Deployment Guide](https://juniper-switch-f10.notion.site/GUI-Model-Deployment-Guide-17b5350241e280058e98cea60317de71)
+
+中文版: [GUI模型部署教程](https://bytedance.sg.larkoffice.com/docx/TCcudYwyIox5vyxiSDLlgIsTgWf#U94rdCxzBoJMLex38NPlHL21gNb)
+
+## Local Deployment [vLLM]
+We recommend using vLLM for fast deployment and inference. You need to use `vllm>=0.6.1`.
+```bash
+pip install -U transformers
+VLLM_VERSION=0.6.6
+CUDA_VERSION=cu124
+pip install vllm==${VLLM_VERSION} --extra-index-url https://download.pytorch.org/whl/${CUDA_VERSION}
+
+```
+### Download the Model
+We provide three model sizes on Hugging Face: **2B**, **7B**, and **72B**. To achieve the best performance, we recommend using the **7B-DPO** or **72B-DPO** model (based on your hardware configuration):
+
+- [2B-SFT](https://huggingface.co/bytedance-research/UI-TARS-2B-SFT)
+- [7B-SFT](https://huggingface.co/bytedance-research/UI-TARS-7B-SFT)
+- [7B-DPO](https://huggingface.co/bytedance-research/UI-TARS-7B-DPO)
+- [72B-SFT](https://huggingface.co/bytedance-research/UI-TARS-72B-SFT)
+- [72B-DPO](https://huggingface.co/bytedance-research/UI-TARS-72B-DPO)
+
+
+### Start an OpenAI API Service
+Run the command below to start an OpenAI-compatible API service:
+
+```bash
+python -m vllm.entrypoints.openai.api_server --served-model-name ui-tars --model <path to your model>
+```
+
+### Input your API information
+
+<img src="../apps/ui-tars/images/settings_model.png" width="500px" />
+
+<!-- If you use Ollama, you can use the following settings to start the server:
+
+```yaml
+VLM Provider: ollama
+VLM Base Url: http://localhost:11434/v1
+VLM API Key: api_key
+VLM Model Name: ui-tars
+``` -->
+
+> **Note**: VLM Base Url is OpenAI compatible API endpoints (see [OpenAI API protocol document](https://platform.openai.com/docs/guides/vision/uploading-base-64-encoded-images) for more details).
diff --git a/docs/quick-start.md b/docs/quick-start.md
new file mode 100644
index 000000000..447b783a8
--- /dev/null
+++ b/docs/quick-start.md
@@ -0,0 +1,33 @@
+# Quick Start
+
+## Download
+
+You can download the [latest release](https://github.com/bytedance/UI-TARS-desktop/releases/latest) version of UI-TARS Desktop from our releases page.
+
+> **Note**: If you have [Homebrew](https://brew.sh/) installed, you can install UI-TARS Desktop by running the following command:
+> ```bash
+> brew install --cask ui-tars
+> ```
+
+## Install
+
+### MacOS
+
+1. Drag **UI TARS** application into the **Applications** folder
+  <img src="../apps/ui-tars/images/mac_install.png" width="500px" />
+
+2. Enable the permission of **UI TARS** in MacOS:
+  - System Settings -> Privacy & Security -> **Accessibility**
+  - System Settings -> Privacy & Security -> **Screen Recording**
+  <img src="../apps/ui-tars/images/mac_permission.png" width="500px" />
+
+3. Then open **UI TARS** application, you can see the following interface:
+  <img src="../apps/ui-tars/images/mac_app.png" width="500px" />
+
+
+### Windows
+
+**Still to run** the application, you can see the following interface:
+
+<img src="../apps/ui-tars/images/windows_install.png" width="400px" />
+

From e086963a0994cd4db01571eb1b3061417614176c Mon Sep 17 00:00:00 2001
From: chenhaoli <chenhaoli@bytedance.com>
Date: Wed, 19 Mar 2025 19:45:36 +0800
Subject: [PATCH 3/3] choreL tweaks

---
 README.md   | 14 ++++++++------
 docs/sdk.md |  2 +-
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/README.md b/README.md
index bfe85a41d..cf0f5172c 100644
--- a/README.md
+++ b/README.md
@@ -22,11 +22,6 @@ UI-TARS Desktop is a GUI Agent application based on [UI-TARS (Vision-Language Mo
 | &nbsp&nbsp 👓 <a href="https://github.com/web-infra-dev/midscene">Midscene (use in browser)</a>
 </p>
 
-
-## Updates
-
-- 🚀 01.25: We updated the **[Cloud Deployment](#cloud-deployment)** section in the 中文版: [GUI模型部署教程](https://bytedance.sg.larkoffice.com/docx/TCcudYwyIox5vyxiSDLlgIsTgWf#U94rdCxzBoJMLex38NPlHL21gNb) with new information related to the ModelScope platform. You can now use the ModelScope platform for deployment.
-
 ## Showcases
 
 | Instruction  | Video |
@@ -34,6 +29,13 @@ UI-TARS Desktop is a GUI Agent application based on [UI-TARS (Vision-Language Mo
 | Get the current weather in SF using the web browser      |    <video src="https://github.com/user-attachments/assets/5235418c-ac61-4895-831d-68c1c749fc87" height="300" />    |
 | Send a twitter with the content "hello world"   | <video src="https://github.com/user-attachments/assets/737ccc11-9124-4464-b4be-3514cbced85c" height="300" />        |
 
+
+## News
+
+- **\[2025-02-20\]** - 📦 Introduced [UI TARS SDK](./docs/sdk.md), is a powerful cross-platform toolkit for building GUI automation agents.
+- **\[2025-01-23\]** - 🚀 We updated the **[Cloud Deployment](./docs/deployment.md#cloud-deployment)** section in the 中文版: [GUI模型部署教程](https://bytedance.sg.larkoffice.com/docx/TCcudYwyIox5vyxiSDLlgIsTgWf#U94rdCxzBoJMLex38NPlHL21gNb) with new information related to the ModelScope platform. You can now use the ModelScope platform for deployment.
+
+
 ## Features
 
 - 🤖 Natural language control powered by Vision-Language Model
@@ -57,7 +59,7 @@ See [CONTRIBUTING.md](./CONTRIBUTING.md).
 
 ## SDK (Experimental)
 
-See [UI TARS SDK](./docs/sdk.md)
+See [@ui-tars/sdk](./docs/sdk.md)
 
 ## License
 
diff --git a/docs/sdk.md b/docs/sdk.md
index 4e55871e9..209a75f46 100644
--- a/docs/sdk.md
+++ b/docs/sdk.md
@@ -1,4 +1,4 @@
-# @ui-tars/sdk Guide（Beta）
+# @ui-tars/sdk Guide (Experimental)
 
 ## Overview