Skip to content

docs: some enhancements #230

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 19, 2025
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 7 additions & 91 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
> **\[2025-03-16\]** We released a **technical preview** version of a new desktop app - [Agent TARS](./apps/agent-tars/README.md), a multimodal AI agent that leverages browser operations by visually interpreting web pages and seamlessly integrating with command lines and file systems.

<p align="center">
<img alt="UI-TARS" width="260" src="./resources/icon.png">
<img alt="UI-TARS" width="260" src="./apps/ui-tars/resources/icon.png">
</p>

# UI-TARS Desktop
Expand All @@ -22,14 +22,6 @@ UI-TARS Desktop is a GUI Agent application based on [UI-TARS (Vision-Language Mo
| &nbsp&nbsp 👓 <a href="https://github.com/web-infra-dev/midscene">Midscene (use in browser)</a>
</p>

### ⚠️ Important Announcement: GGUF Model Performance

The **GGUF model** has undergone quantization, but unfortunately, its performance cannot be guaranteed. As a result, we have decided to **downgrade** it.

💡 **Alternative Solution**:
You can use **[Cloud Deployment](#cloud-deployment)** or **[Local Deployment [vLLM]](#local-deployment-vllm)**(If you have enough GPU resources) instead.

We appreciate your understanding and patience as we work to ensure the best possible experience.

## Updates

Expand All @@ -53,95 +45,19 @@ We appreciate your understanding and patience as we work to ensure the best poss

## Quick Start

### Download

You can download the [latest release](https://github.com/bytedance/UI-TARS-desktop/releases/latest) version of UI-TARS Desktop from our releases page.

> **Note**: If you have [Homebrew](https://brew.sh/) installed, you can install UI-TARS Desktop by running the following command:
> ```bash
> brew install --cask ui-tars
> ```

### Install

#### MacOS

1. Drag **UI TARS** application into the **Applications** folder
<img src="./images/mac_install.png" width="500px" />

2. Enable the permission of **UI TARS** in MacOS:
- System Settings -> Privacy & Security -> **Accessibility**
- System Settings -> Privacy & Security -> **Screen Recording**
<img src="./images/mac_permission.png" width="500px" />

3. Then open **UI TARS** application, you can see the following interface:
<img src="./images/mac_app.png" width="500px" />


#### Windows

**Still to run** the application, you can see the following interface:

<img src="./images/windows_install.png" width="400px" />

### Deployment

#### Cloud Deployment
We recommend using HuggingFace Inference Endpoints for fast deployment.
We provide two docs for users to refer:

English version: [GUI Model Deployment Guide](https://juniper-switch-f10.notion.site/GUI-Model-Deployment-Guide-17b5350241e280058e98cea60317de71)

中文版: [GUI模型部署教程](https://bytedance.sg.larkoffice.com/docx/TCcudYwyIox5vyxiSDLlgIsTgWf#U94rdCxzBoJMLex38NPlHL21gNb)

#### Local Deployment [vLLM]
We recommend using vLLM for fast deployment and inference. You need to use `vllm>=0.6.1`.
```bash
pip install -U transformers
VLLM_VERSION=0.6.6
CUDA_VERSION=cu124
pip install vllm==${VLLM_VERSION} --extra-index-url https://download.pytorch.org/whl/${CUDA_VERSION}

```
##### Download the Model
We provide three model sizes on Hugging Face: **2B**, **7B**, and **72B**. To achieve the best performance, we recommend using the **7B-DPO** or **72B-DPO** model (based on your hardware configuration):

- [2B-SFT](https://huggingface.co/bytedance-research/UI-TARS-2B-SFT)
- [7B-SFT](https://huggingface.co/bytedance-research/UI-TARS-7B-SFT)
- [7B-DPO](https://huggingface.co/bytedance-research/UI-TARS-7B-DPO)
- [72B-SFT](https://huggingface.co/bytedance-research/UI-TARS-72B-SFT)
- [72B-DPO](https://huggingface.co/bytedance-research/UI-TARS-72B-DPO)


##### Start an OpenAI API Service
Run the command below to start an OpenAI-compatible API service:

```bash
python -m vllm.entrypoints.openai.api_server --served-model-name ui-tars --model <path to your model>
```

##### Input your API information

<img src="./images/settings_model.png" width="500px" />

<!-- If you use Ollama, you can use the following settings to start the server:
See [Quick Start](./docs/quick-start.md).

```yaml
VLM Provider: ollama
VLM Base Url: http://localhost:11434/v1
VLM API Key: api_key
VLM Model Name: ui-tars
``` -->
## Deployment

> **Note**: VLM Base Url is OpenAI compatible API endpoints (see [OpenAI API protocol document](https://platform.openai.com/docs/guides/vision/uploading-base-64-encoded-images) for more details).
See [Deployment](./docs/deployment.md).

## Contributing

[CONTRIBUTING.md](./CONTRIBUTING.md)
See [CONTRIBUTING.md](./CONTRIBUTING.md).

## SDK(Experimental)
## SDK (Experimental)

[SDK](./docs/sdk.md)
See [UI TARS SDK](./docs/sdk.md)

## License

Expand Down
2 changes: 1 addition & 1 deletion apps/agent-tars/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@

## Getting Started

<!--TODO-->
See [Quick Start](./docs/quick-start.md).

## Contributing

Expand Down
Original file line number Diff line number Diff line change
@@ -1,65 +1,65 @@
# Agent Tars Quick Start
# Getting started with Agent TARS

Hello, welcome to Agent Tars!
Hello, welcome to Agent TARS!

This guide will walk you through the process of setting up your first Agent Tars project.
This guide will walk you through the process of setting up your first Agent TARS project.

## Necessary Configuration

Before you begin, you will need to set some necessary configuration.

You can click the left-bottom button to open the configuration page:

[setting-icon.png](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/setting-icon.jpeg)
![setting-icon.png](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/setting-icon.jpeg)

Then you can set the model config and the search config.

For model config, you can set the model provider and api key:

[model-config.png](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/search-setting.jpeg)
![model-config.png](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/search-setting.jpeg)

> For Azure OpenAI, you can set more params, including apiVersion, deploymentName and endpoint.
For search config, you can set the search provider and api key:

[search-settings.png](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/search-setting.jpeg)
![search-settings.png](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/search-setting.jpeg)

## Start Your First Journey

Now you can start your first journey in Agent Tars!
Now you can start your first journey in Agent TARS!

You can input your first question in the input box, and then press Enter to send your question. Here is an example:

[first-journey.jpeg](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/start-journey.jpeg)
![first-journey.jpeg](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/start-journey.jpeg)

It's working!

We also support **Human In the Loop**, that means you can interact with the agent in the working process by the input box. If you want to change the direction of current agent work, you can insert your thoughts in the special input box on the top position, and then press Enter to send your thoughts. Here is an example:

[human-in-the-loop.jpeg](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/human-in-the-loop.jpeg)
![human-in-the-loop.jpeg](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/human-in-the-loop.jpeg)

## Share Your Thead

You can share your thread with others by the share button on the top menu.

There are two modes to share your thread:

- **Local Html**: Agent Tars will bundle your thread into a html file, and you can share it with others.
- **Remote Server Url**: Agent Tars will generate a url for you to share your thread with others, Agent Tars will upload the html bundle to a remote server.
- **Local Html**: Agent TARS will bundle your thread into a html file, and you can share it with others.
- **Remote Server Url**: Agent TARS will generate a url for you to share your thread with others, Agent TARS will upload the html bundle to a remote server.

### Local Mode

You can click the share button to open the share modal, and then click the **Local Html** button to share your thread.

[local-share](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/local-share.jpeg)
![local-share](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/local-share.jpeg)

### Remote Mode

For the remote share mode, you need to set the remote server url in the share modal:

[remote-share](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/local-share.jpeg)
![remote-share](https://lf3-static.bytednsdoc.com/obj/eden-cn/uhbfnupenuhf/agent-tars/remote-share.jpeg)

Then Agent Tars will post a request to the remote server to upload the html bundle, and then you can share the url with others. The specific request information is as follows:
Then Agent TARS will post a request to the remote server to upload the html bundle, and then you can share the url with others. The specific request information is as follows:

- Method: POST
- Body:
Expand Down
60 changes: 60 additions & 0 deletions docs/deployment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Deployment

### ⚠️ Important Announcement: GGUF Model Performance

The **GGUF model** has undergone quantization, but unfortunately, its performance cannot be guaranteed. As a result, we have decided to **downgrade** it.

💡 **Alternative Solution**:
You can use **[Cloud Deployment](#cloud-deployment)** or **[Local Deployment [vLLM]](#local-deployment-vllm)**(If you have enough GPU resources) instead.

We appreciate your understanding and patience as we work to ensure the best possible experience.

## Cloud Deployment

We recommend using HuggingFace Inference Endpoints for fast deployment.
We provide two docs for users to refer:

English version: [GUI Model Deployment Guide](https://juniper-switch-f10.notion.site/GUI-Model-Deployment-Guide-17b5350241e280058e98cea60317de71)

中文版: [GUI模型部署教程](https://bytedance.sg.larkoffice.com/docx/TCcudYwyIox5vyxiSDLlgIsTgWf#U94rdCxzBoJMLex38NPlHL21gNb)

## Local Deployment [vLLM]
We recommend using vLLM for fast deployment and inference. You need to use `vllm>=0.6.1`.
```bash
pip install -U transformers
VLLM_VERSION=0.6.6
CUDA_VERSION=cu124
pip install vllm==${VLLM_VERSION} --extra-index-url https://download.pytorch.org/whl/${CUDA_VERSION}

```
### Download the Model
We provide three model sizes on Hugging Face: **2B**, **7B**, and **72B**. To achieve the best performance, we recommend using the **7B-DPO** or **72B-DPO** model (based on your hardware configuration):

- [2B-SFT](https://huggingface.co/bytedance-research/UI-TARS-2B-SFT)
- [7B-SFT](https://huggingface.co/bytedance-research/UI-TARS-7B-SFT)
- [7B-DPO](https://huggingface.co/bytedance-research/UI-TARS-7B-DPO)
- [72B-SFT](https://huggingface.co/bytedance-research/UI-TARS-72B-SFT)
- [72B-DPO](https://huggingface.co/bytedance-research/UI-TARS-72B-DPO)


### Start an OpenAI API Service
Run the command below to start an OpenAI-compatible API service:

```bash
python -m vllm.entrypoints.openai.api_server --served-model-name ui-tars --model <path to your model>
```

### Input your API information

<img src="../apps/ui-tars/images/settings_model.png" width="500px" />

<!-- If you use Ollama, you can use the following settings to start the server:
```yaml
VLM Provider: ollama
VLM Base Url: http://localhost:11434/v1
VLM API Key: api_key
VLM Model Name: ui-tars
``` -->

> **Note**: VLM Base Url is OpenAI compatible API endpoints (see [OpenAI API protocol document](https://platform.openai.com/docs/guides/vision/uploading-base-64-encoded-images) for more details).
33 changes: 33 additions & 0 deletions docs/quick-start.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Quick Start

## Download

You can download the [latest release](https://github.com/bytedance/UI-TARS-desktop/releases/latest) version of UI-TARS Desktop from our releases page.

> **Note**: If you have [Homebrew](https://brew.sh/) installed, you can install UI-TARS Desktop by running the following command:
> ```bash
> brew install --cask ui-tars
> ```
## Install
### MacOS
1. Drag **UI TARS** application into the **Applications** folder
<img src="../apps/ui-tars/images/mac_install.png" width="500px" />
2. Enable the permission of **UI TARS** in MacOS:
- System Settings -> Privacy & Security -> **Accessibility**
- System Settings -> Privacy & Security -> **Screen Recording**
<img src="../apps/ui-tars/images/mac_permission.png" width="500px" />
3. Then open **UI TARS** application, you can see the following interface:
<img src="../apps/ui-tars/images/mac_app.png" width="500px" />
### Windows
**Still to run** the application, you can see the following interface:
<img src="../apps/ui-tars/images/windows_install.png" width="400px" />
Loading