docs: some enhancements (#230)

ulivz · web-flow · commit a2065ad5800a · 2025-03-19T19:50:31.000+08:00
diff --git a/README.md b/README.md
@@ -4,7 +4,7 @@
 > **\[2025-03-16\]** We released a **technical preview** version of a new desktop app - [Agent TARS](./apps/agent-tars/README.md), a multimodal AI agent that leverages browser operations by visually interpreting web pages and seamlessly integrating with command lines and file systems.
 
 <p align="center">
-  <img alt="UI-TARS"  width="260" src="./resources/icon.png">
+  <img alt="UI-TARS"  width="260" src="./apps/ui-tars/resources/icon.png">
 </p>
 
 # UI-TARS Desktop
@@ -22,26 +22,20 @@ UI-TARS Desktop is a GUI Agent application based on [UI-TARS (Vision-Language Mo
 | &nbsp&nbsp 👓 <a href="https://github.com/web-infra-dev/midscene">Midscene (use in browser)</a>
 </p>
 
-### ⚠️ Important Announcement: GGUF Model Performance
-
-The **GGUF model** has undergone quantization, but unfortunately, its performance cannot be guaranteed. As a result, we have decided to **downgrade** it.
-
-💡 **Alternative Solution**:
-You can use **[Cloud Deployment](#cloud-deployment)** or **[Local Deployment [vLLM]](#local-deployment-vllm)**(If you have enough GPU resources) instead.
-
-We appreciate your understanding and patience as we work to ensure the best possible experience.
-
-## Updates
-
-- 🚀 01.25: We updated the **[Cloud Deployment](#cloud-deployment)** section in the 中文版: [GUI模型部署教程](https://bytedance.sg.larkoffice.com/docx/TCcudYwyIox5vyxiSDLlgIsTgWf#U94rdCxzBoJMLex38NPlHL21gNb) with new information related to the ModelScope platform. You can now use the ModelScope platform for deployment.
-
 ## Showcases
 
 | Instruction  | Video |
 | :---:  | :---: |
 | Get the current weather in SF using the web browser      |    <video src="https://github.com/user-attachments/assets/5235418c-ac61-4895-831d-68c1c749fc87" height="300" />    |
 | Send a twitter with the content "hello world"   | <video src="https://github.com/user-attachments/assets/737ccc11-9124-4464-b4be-3514cbced85c" height="300" />        |
 
+
+## News
+
+- **\[2025-02-20\]** - 📦 Introduced [UI TARS SDK](./docs/sdk.md), is a powerful cross-platform toolkit for building GUI automation agents.
+- **\[2025-01-23\]** - 🚀 We updated the **[Cloud Deployment](./docs/deployment.md#cloud-deployment)** section in the 中文版: [GUI模型部署教程](https://bytedance.sg.larkoffice.com/docx/TCcudYwyIox5vyxiSDLlgIsTgWf#U94rdCxzBoJMLex38NPlHL21gNb) with new information related to the ModelScope platform. You can now use the ModelScope platform for deployment.
+
+
 ## Features
 
 - 🤖 Natural language control powered by Vision-Language Model
@@ -53,95 +47,19 @@ We appreciate your understanding and patience as we work to ensure the best poss
 
 ## Quick Start
 
-### Download
-
-You can download the [latest release](https://github.com/bytedance/UI-TARS-desktop/releases/latest) version of UI-TARS Desktop from our releases page.
-
-> **Note**: If you have [Homebrew](https://brew.sh/) installed, you can install UI-TARS Desktop by running the following command:
-> ```bash
-> brew install --cask ui-tars
-> ```
-
-### Install
-
-#### MacOS
-
-1. Drag **UI TARS** application into the **Applications** folder
-  <img src="./images/mac_install.png" width="500px" />
-
-2. Enable the permission of **UI TARS** in MacOS:
-  - System Settings -> Privacy & Security -> **Accessibility**
-  - System Settings -> Privacy & Security -> **Screen Recording**
-  <img src="./images/mac_permission.png" width="500px" />
-
-3. Then open **UI TARS** application, you can see the following interface:
-  <img src="./images/mac_app.png" width="500px" />
-
-
-#### Windows
-
-**Still to run** the application, you can see the following interface:
-
-<img src="./images/windows_install.png" width="400px" />
-
-### Deployment
-
-#### Cloud Deployment
-We recommend using HuggingFace Inference Endpoints for fast deployment.
-We provide two docs for users to refer:
-
-English version: [GUI Model Deployment Guide](https://juniper-switch-f10.notion.site/GUI-Model-Deployment-Guide-17b5350241e280058e98cea60317de71)
-
-中文版: [GUI模型部署教程](https://bytedance.sg.larkoffice.com/docx/TCcudYwyIox5vyxiSDLlgIsTgWf#U94rdCxzBoJMLex38NPlHL21gNb)
-
-#### Local Deployment [vLLM]
-We recommend using vLLM for fast deployment and inference. You need to use `vllm>=0.6.1`.
-```bash
-pip install -U transformers
-VLLM_VERSION=0.6.6
-CUDA_VERSION=cu124
-pip install vllm==${VLLM_VERSION} --extra-index-url https://download.pytorch.org/whl/${CUDA_VERSION}
-
-```
-##### Download the Model
-We provide three model sizes on Hugging Face: **2B**, **7B**, and **72B**. To achieve the best performance, we recommend using the **7B-DPO** or **72B-DPO** model (based on your hardware configuration):
-
-- [2B-SFT](https://huggingface.co/bytedance-research/UI-TARS-2B-SFT)
-- [7B-SFT](https://huggingface.co/bytedance-research/UI-TARS-7B-SFT)
-- [7B-DPO](https://huggingface.co/bytedance-research/UI-TARS-7B-DPO)
-- [72B-SFT](https://huggingface.co/bytedance-research/UI-TARS-72B-SFT)
-- [72B-DPO](https://huggingface.co/bytedance-research/UI-TARS-72B-DPO)
-
-
-##### Start an OpenAI API Service
-Run the command below to start an OpenAI-compatible API service:
-
-```bash
-python -m vllm.entrypoints.openai.api_server --served-model-name ui-tars --model <path to your model>
-```
-
-##### Input your API information
-
-<img src="./images/settings_model.png" width="500px" />
-
-<!-- If you use Ollama, you can use the following settings to start the server:
+See [Quick Start](./docs/quick-start.md).
 
-```yaml
-VLM Provider: ollama
-VLM Base Url: http://localhost:11434/v1
-VLM API Key: api_key
-VLM Model Name: ui-tars
-``` -->
+## Deployment
 
-> **Note**: VLM Base Url is OpenAI compatible API endpoints (see [OpenAI API protocol document](https://platform.openai.com/docs/guides/vision/uploading-base-64-encoded-images) for more details).
+See [Deployment](./docs/deployment.md).
 
 ## Contributing
 
-[CONTRIBUTING.md](./CONTRIBUTING.md)
+See [CONTRIBUTING.md](./CONTRIBUTING.md).
 
-## SDK(Experimental)
+## SDK (Experimental)
 
-[SDK](./docs/sdk.md)
+See [@ui-tars/sdk](./docs/sdk.md)
 
 ## License
 
diff --git a/apps/agent-tars/README.md b/apps/agent-tars/README.md
@@ -28,7 +28,7 @@
 
 ## Getting Started
 
-<!--TODO-->
+See [Quick Start](./docs/quick-start.md).
 
 ## Contributing
 
diff --git a/apps/agent-tars/docs/quick-start.md b/apps/agent-tars/docs/quick-start.md
@@ -1,65 +1,65 @@
-# Agent Tars Quick Start
+# Getting started with Agent TARS 
 
-Hello, welcome to Agent Tars!
+Hello, welcome to Agent TARS!
 
-This guide will walk you through the process of setting up your first Agent Tars project.
+This guide will walk you through the process of setting up your first Agent TARS project.
 
 ## Necessary Configuration
 
 Before you begin, you will need to set some necessary configuration.
 
 You can click the left-bottom button to open the configuration page:
 
-[setting-icon.png](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/setting-icon.jpeg)
+![setting-icon.png](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/setting-icon.jpeg)
 
 Then you can set the model config and the search config.
 
 For model config, you can set the model provider and api key:
 
-[model-config.png](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/search-setting.jpeg)
+![model-config.png](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/search-setting.jpeg)
 
 > For Azure OpenAI, you can set more params, including apiVersion, deploymentName and endpoint.
 
 For search config, you can set the search provider and api key:
 
-[search-settings.png](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/search-setting.jpeg)
+![search-settings.png](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/search-setting.jpeg)
 
 ## Start Your First Journey
 
-Now you can start your first journey in Agent Tars!
+Now you can start your first journey in Agent TARS!
 
 You can input your first question in the input box, and then press Enter to send your question. Here is an example:
 
-[first-journey.jpeg](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/start-journey.jpeg)
+![first-journey.jpeg](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/start-journey.jpeg)
 
 It's working!
 
 We also support **Human In the Loop**, that means you can interact with the agent in the working process by the input box. If you want to change the direction of current agent work, you can insert your thoughts in the special input box on the top position, and then press Enter to send your thoughts. Here is an example:
 
-[human-in-the-loop.jpeg](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/human-in-the-loop.jpeg)
+![human-in-the-loop.jpeg](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/human-in-the-loop.jpeg)
 
 ## Share Your Thead
 
 You can share your thread with others by the share button on the top menu.
 
 There are two modes to share your thread:
 
-- **Local Html**: Agent Tars will bundle your thread into a html file, and you can share it with others.
-- **Remote Server Url**: Agent Tars will generate a url for you to share your thread with others, Agent Tars will upload the html bundle to a remote server.
+- **Local Html**: Agent TARS will bundle your thread into a html file, and you can share it with others.
+- **Remote Server Url**: Agent TARS will generate a url for you to share your thread with others, Agent TARS will upload the html bundle to a remote server.
 
 ### Local Mode
 
 You can click the share button to open the share modal, and then click the **Local Html** button to share your thread.
 
-[local-share](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/local-share.jpeg)
+![local-share](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/local-share.jpeg)
 
 ### Remote Mode
 
 For the remote share mode, you need to set the remote server url in the share modal:
 
-[remote-share](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/local-share.jpeg)
+![remote-share](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/remote-share.jpeg)
 
-Then Agent Tars will post a request to the remote server to upload the html bundle, and then you can share the url with others. The specific request information is as follows:
+Then Agent TARS will post a request to the remote server to upload the html bundle, and then you can share the url with others. The specific request information is as follows:
 
 - Method: POST
 - Body:
diff --git a/docs/deployment.md b/docs/deployment.md
@@ -0,0 +1,60 @@
+# Deployment
+
+### ⚠️ Important Announcement: GGUF Model Performance
+
+The **GGUF model** has undergone quantization, but unfortunately, its performance cannot be guaranteed. As a result, we have decided to **downgrade** it.
+
+💡 **Alternative Solution**:
+You can use **[Cloud Deployment](#cloud-deployment)** or **[Local Deployment [vLLM]](#local-deployment-vllm)**(If you have enough GPU resources) instead.
+
+We appreciate your understanding and patience as we work to ensure the best possible experience.
+
+## Cloud Deployment
+
+We recommend using HuggingFace Inference Endpoints for fast deployment.
+We provide two docs for users to refer:
+
+English version: [GUI Model Deployment Guide](https://juniper-switch-f10.notion.site/GUI-Model-Deployment-Guide-17b5350241e280058e98cea60317de71)
+
+中文版: [GUI模型部署教程](https://bytedance.sg.larkoffice.com/docx/TCcudYwyIox5vyxiSDLlgIsTgWf#U94rdCxzBoJMLex38NPlHL21gNb)
+
+## Local Deployment [vLLM]
+We recommend using vLLM for fast deployment and inference. You need to use `vllm>=0.6.1`.
+```bash
+pip install -U transformers
+VLLM_VERSION=0.6.6
+CUDA_VERSION=cu124
+pip install vllm==${VLLM_VERSION} --extra-index-url https://download.pytorch.org/whl/${CUDA_VERSION}
+
+```
+### Download the Model
+We provide three model sizes on Hugging Face: **2B**, **7B**, and **72B**. To achieve the best performance, we recommend using the **7B-DPO** or **72B-DPO** model (based on your hardware configuration):
+
+- [2B-SFT](https://huggingface.co/bytedance-research/UI-TARS-2B-SFT)
+- [7B-SFT](https://huggingface.co/bytedance-research/UI-TARS-7B-SFT)
+- [7B-DPO](https://huggingface.co/bytedance-research/UI-TARS-7B-DPO)
+- [72B-SFT](https://huggingface.co/bytedance-research/UI-TARS-72B-SFT)
+- [72B-DPO](https://huggingface.co/bytedance-research/UI-TARS-72B-DPO)
+
+
+### Start an OpenAI API Service
+Run the command below to start an OpenAI-compatible API service:
+
+```bash
+python -m vllm.entrypoints.openai.api_server --served-model-name ui-tars --model <path to your model>
+```
+
+### Input your API information
+
+<img src="../apps/ui-tars/images/settings_model.png" width="500px" />
+
+<!-- If you use Ollama, you can use the following settings to start the server:
+
+```yaml
+VLM Provider: ollama
+VLM Base Url: http://localhost:11434/v1
+VLM API Key: api_key
+VLM Model Name: ui-tars
+``` -->
+
+> **Note**: VLM Base Url is OpenAI compatible API endpoints (see [OpenAI API protocol document](https://platform.openai.com/docs/guides/vision/uploading-base-64-encoded-images) for more details).
diff --git a/docs/quick-start.md b/docs/quick-start.md
@@ -0,0 +1,33 @@
+# Quick Start
+
+## Download
+
+You can download the [latest release](https://github.com/bytedance/UI-TARS-desktop/releases/latest) version of UI-TARS Desktop from our releases page.
+
+> **Note**: If you have [Homebrew](https://brew.sh/) installed, you can install UI-TARS Desktop by running the following command:
+> ```bash
+> brew install --cask ui-tars
+> ```
+
+## Install
+
+### MacOS
+
+1. Drag **UI TARS** application into the **Applications** folder
+  <img src="../apps/ui-tars/images/mac_install.png" width="500px" />
+
+2. Enable the permission of **UI TARS** in MacOS:
+  - System Settings -> Privacy & Security -> **Accessibility**
+  - System Settings -> Privacy & Security -> **Screen Recording**
+  <img src="../apps/ui-tars/images/mac_permission.png" width="500px" />
+
+3. Then open **UI TARS** application, you can see the following interface:
+  <img src="../apps/ui-tars/images/mac_app.png" width="500px" />
+
+
+### Windows
+
+**Still to run** the application, you can see the following interface:
+
+<img src="../apps/ui-tars/images/windows_install.png" width="400px" />
+
diff --git a/docs/sdk.md b/docs/sdk.md
@@ -1,4 +1,4 @@
-# @ui-tars/sdk Guide（Beta）
+# @ui-tars/sdk Guide (Experimental)
 
 ## Overview
 

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-# @ui-tars/sdk Guide（Beta）`
	`1`	`+# @ui-tars/sdk Guide (Experimental)`
`2`	`2`
`3`	`3`	`## Overview`
`4`	`4`