Skip to content

Commit 7926910

Browse files
committed
chore: update README.md and add here banner (#214)
chore: fix readme
1 parent dbbae59 commit 7926910

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

57 files changed

+187
-202
lines changed
File renamed without changes.

README.md

Lines changed: 125 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -1,77 +1,160 @@
1-
# Agent TARS
21

3-
**Agent TARS** is a multimodal AI agent that revolutionizes GUI interaction. By visually interpreting environments like web pages, Agent TARS empowers GUI agents with enhanced context and capabilities, making it a versatile tool to perform a wide range of tasks including searching, browsing, and synthesizing information. Furthermore, Agent TARS facilitates seamless integration with file systems and command line interface, enabling a cohesive workflow with intuitive GUI capabilities.
42

5-
With a redesigned desktop client, Agent TARS enhances its GUI understanding with an advanced agent framework. This synergy enables generic tasks and paves the way for continuous performance optimization of GUI agents like [UI-TARS](https://github.com/bytedance/ui-tars), combined with an agent framework. The framework makes it easier for developers to build and customize GUI agent solutions.
3+
> [!IMPORTANT]
4+
> **\[2025-03-16\]** We released a **technical preview** version of a new desktop app - [Agent TARS](./apps/omega/README.md), a multimodal AI agent that leverages browser operations by visually interpreting web pages and seamlessly integrating with command lines and file systems.
65
7-
# Showcases
6+
<p align="center">
7+
<img alt="UI-TARS" width="260" src="./resources/icon.png">
8+
</p>
89

9-
- [ ] Add demo
10+
# UI-TARS Desktop
1011

11-
# ✨️ Key Features
12+
UI-TARS Desktop is a GUI Agent application based on [UI-TARS (Vision-Language Model)](https://github.com/bytedance/UI-TARS) that allows you to control your computer using natural language.
1213

13-
Agent TARS builds upon the foundation of [UI-TARS-desktop](./apps/ui-tars/README.md) and introduces three major enhancements:
1414

15-
- **🌐 Smarter Browser Control:** Using UI understanding, Agent TARS excels at operating browsers. With an advanced agent framework, it plans and executes complex tasks like operator and deep research, unlocking a wider range of scenarios for GUI agents.
16-
- **💡 More Tools, More Power:** It combines browser UI skills with features like search, file editing, command-line actions, and tool integration via the Model Context Protocol (MCP). This makes tackling intricate tasks a breeze and helps developers build a vibrant GUI agent ecosystem.
17-
- **💻️ Shiny New Desktop UI:** Enjoy a revamped PC desktop client (built with Electron) featuring searches and browser displays, chat UI with session management, model configuration and planning steps—making it easier to expand GUI agent applications.
15+
<p align="center">
16+
&nbsp&nbsp 📑 <a href="https://arxiv.org/abs/2501.12326">Paper</a> &nbsp&nbsp
17+
| 🤗 <a href="https://huggingface.co/bytedance-research/UI-TARS-7B-DPO">Hugging Face Models</a>&nbsp&nbsp
18+
| &nbsp&nbsp🫨 <a href="https://discord.gg/pTXwYVjfcs">Discord</a>&nbsp&nbsp
19+
| &nbsp&nbsp🤖 <a href="https://www.modelscope.cn/models/bytedance-research/UI-TARS-7B-DPO">ModelScope</a>&nbsp&nbsp
20+
<br>
21+
🖥️ Desktop Application &nbsp&nbsp
22+
| &nbsp&nbsp 👓 <a href="https://github.com/web-infra-dev/midscene">Midscene (use in browser)</a>
23+
</p>
1824

19-
*Note:* The original UI-TARS-desktop client sticks around, and our SDK is now more universal for broader use.
25+
### ⚠️ Important Announcement: GGUF Model Performance
2026

21-
## 🌐 Enhanced GUI Agent Tool Integration
27+
The **GGUF model** has undergone quantization, but unfortunately, its performance cannot be guaranteed. As a result, we have decided to **downgrade** it.
2228

23-
Agent TARS excels at connecting tools related to GUI Agents, creating cohesive task executions:
29+
💡 **Alternative Solution**:
30+
You can use **[Cloud Deployment](#cloud-deployment)** or **[Local Deployment [vLLM]](#local-deployment-vllm)**(If you have enough GPU resources) instead.
2431

25-
- **Search and Browse:** Conduct searches and navigate web pages effortlessly.
26-
- **Exploration:** Dynamically open links and scroll down pages to explore content while browsing.
27-
- **Information Synthesis:** Collect and synthesize information into final results.
32+
We appreciate your understanding and patience as we work to ensure the best possible experience.
2833

29-
## 🛠️ Engineering Development Made Easy
34+
## Updates
3035

31-
Agent TARS offers a robust framework to integrate the multimodal model into projects seamlessly. Its well-structured architecture simplifies building custom workflows, enabling developers to harness multimodal capabilities with ease.
36+
- 🚀 01.25: We updated the **[Cloud Deployment](#cloud-deployment)** section in the 中文版: [GUI模型部署教程](https://bytedance.sg.larkoffice.com/docx/TCcudYwyIox5vyxiSDLlgIsTgWf#U94rdCxzBoJMLex38NPlHL21gNb) with new information related to the ModelScope platform. You can now use the ModelScope platform for deployment.
3237

33-
## 🔎 Functional Expansion and Tool Support
38+
## Showcases
3439

35-
Agent TARS provides a comprehensive platform with comprehensive functions and tool support, including:
40+
| Instruction | Video |
41+
| :---: | :---: |
42+
| Get the current weather in SF using the web browser | <video src="https://github.com/user-attachments/assets/5235418c-ac61-4895-831d-68c1c749fc87" height="300" /> |
43+
| Send a twitter with the content "hello world" | <video src="https://github.com/user-attachments/assets/737ccc11-9124-4464-b4be-3514cbced85c" height="300" /> |
3644

37-
- **Operator** **with Browser**
38-
- **Coding &** **Artifact** **Preview**
39-
- **MCP-Based Tools**
45+
## Features
4046

41-
## 📽️ Replay and Sharing
47+
- 🤖 Natural language control powered by Vision-Language Model
48+
- 🖥️ Screenshot and visual recognition support
49+
- 🎯 Precise mouse and keyboard control
50+
- 💻 Cross-platform support (Windows/MacOS)
51+
- 🔄 Real-time feedback and status display
52+
- 🔐 Private and secure - fully local processing
4253

43-
Share your task execution journeys with Agent TARS:
54+
## Quick Start
4455

45-
- **Standardized Data Persistence:** Save and access your data reliably.
46-
- **Web Publishing:** Publish execution processes to web pages for display and collaboration.
56+
### Download
4757

48-
# Getting Started
58+
You can download the [latest release](https://github.com/bytedance/UI-TARS-desktop/releases/latest) version of UI-TARS Desktop from our releases page.
4959

50-
**Clone the** **Repository**:
60+
> **Note**: If you have [Homebrew](https://brew.sh/) installed, you can install UI-TARS Desktop by running the following command:
61+
> ```bash
62+
> brew install --cask ui-tars
63+
> ```
64+
65+
### Install
66+
67+
#### MacOS
68+
69+
1. Drag **UI TARS** application into the **Applications** folder
70+
<img src="./images/mac_install.png" width="500px" />
71+
72+
2. Enable the permission of **UI TARS** in MacOS:
73+
- System Settings -> Privacy & Security -> **Accessibility**
74+
- System Settings -> Privacy & Security -> **Screen Recording**
75+
<img src="./images/mac_permission.png" width="500px" />
76+
77+
3. Then open **UI TARS** application, you can see the following interface:
78+
<img src="./images/mac_app.png" width="500px" />
79+
80+
81+
#### Windows
82+
83+
**Still to run** the application, you can see the following interface:
84+
85+
<img src="./images/windows_install.png" width="400px" />
86+
87+
### Deployment
88+
89+
#### Cloud Deployment
90+
We recommend using HuggingFace Inference Endpoints for fast deployment.
91+
We provide two docs for users to refer:
92+
93+
English version: [GUI Model Deployment Guide](https://juniper-switch-f10.notion.site/GUI-Model-Deployment-Guide-17b5350241e280058e98cea60317de71)
94+
95+
中文版: [GUI模型部署教程](https://bytedance.sg.larkoffice.com/docx/TCcudYwyIox5vyxiSDLlgIsTgWf#U94rdCxzBoJMLex38NPlHL21gNb)
96+
97+
#### Local Deployment [vLLM]
98+
We recommend using vLLM for fast deployment and inference. You need to use `vllm>=0.6.1`.
99+
```bash
100+
pip install -U transformers
101+
VLLM_VERSION=0.6.6
102+
CUDA_VERSION=cu124
103+
pip install vllm==${VLLM_VERSION} --extra-index-url https://download.pytorch.org/whl/${CUDA_VERSION}
51104
52105
```
53-
git clone https://github.com/bytedance/agent-TARS.git
106+
##### Download the Model
107+
We provide three model sizes on Hugging Face: **2B**, **7B**, and **72B**. To achieve the best performance, we recommend using the **7B-DPO** or **72B-DPO** model (based on your hardware configuration):
108+
109+
- [2B-SFT](https://huggingface.co/bytedance-research/UI-TARS-2B-SFT)
110+
- [7B-SFT](https://huggingface.co/bytedance-research/UI-TARS-7B-SFT)
111+
- [7B-DPO](https://huggingface.co/bytedance-research/UI-TARS-7B-DPO)
112+
- [72B-SFT](https://huggingface.co/bytedance-research/UI-TARS-72B-SFT)
113+
- [72B-DPO](https://huggingface.co/bytedance-research/UI-TARS-72B-DPO)
114+
115+
116+
##### Start an OpenAI API Service
117+
Run the command below to start an OpenAI-compatible API service:
118+
119+
```bash
120+
python -m vllm.entrypoints.openai.api_server --served-model-name ui-tars --model <path to your model>
54121
```
55122

56-
## Future Plans
123+
##### Input your API information
57124

58-
Agent TARS is more than a tool—it’s a platform for the future of multimodal agents. Upcoming enhancements include:
125+
<img src="./images/settings_model.png" width="500px" />
59126

60-
- Ongoing optimization of agent framework-GUI Agent synergy with expanded model compatibility.
61-
- Expansion to mobile device operations with cross-platform framework.
62-
- Integration with game environments for AI-driven gameplay.
127+
<!-- If you use Ollama, you can use the following settings to start the server:
128+
129+
```yaml
130+
VLM Provider: ollama
131+
VLM Base Url: http://localhost:11434/v1
132+
VLM API Key: api_key
133+
VLM Model Name: ui-tars
134+
``` -->
135+
136+
> **Note**: VLM Base Url is OpenAI compatible API endpoints (see [OpenAI API protocol document](https://platform.openai.com/docs/guides/vision/uploading-base-64-encoded-images) for more details).
63137
64138
## Contributing
65139

66-
- [ ] update [contributing.md](./contributing.md)
140+
[CONTRIBUTING.md](./CONTRIBUTING.md)
67141

68-
## License
142+
## SDK(Experimental)
69143

70-
Agent TARS is licensed under the Apache License 2.0.
144+
[SDK](./docs/sdk.md)
71145

72-
# Acknowledgments
146+
## License
147+
148+
UI-TARS Desktop is licensed under the Apache License 2.0.
73149

74-
- A huge thanks to the UI-TARS and UI-TARS-desktop team for their foundational work.
75-
- Gratitude to all contributors and the open-source community for their support.
150+
## Citation
151+
If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil:
76152

77-
**Join us in shaping the future of multimodal AI agents with Agent TARS!**
153+
```BibTeX
154+
@article{qin2025ui,
155+
title={UI-TARS: Pioneering Automated GUI Interaction with Native Agents},
156+
author={Qin, Yujia and Ye, Yining and Fang, Junjie and Wang, Haoming and Liang, Shihao and Tian, Shizuo and Zhang, Junda and Li, Jiahao and Li, Yunxin and Huang, Shijue and others},
157+
journal={arXiv preprint arXiv:2501.12326},
158+
year={2025}
159+
}
160+
```

apps/omega/README.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
<a href="https://github.com/bytedance/agent-tars/releases">
2+
<img src="./resources/hero.png">
3+
</a>
4+
5+
# Agent TARS
6+
7+
<p>
8+
<a href="https://github.com/bytedance/UI-TARS-desktop/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-Apache 2.0-blue.svg?style=flat-square&logo=apache&colorA=564341&colorB=EDED91" alt="license" /></a>
9+
<a href="https://github.com/bytedance/UI-TARS-desktop/graphs/contributors"><img alt="GitHub contributors" src="https://img.shields.io/github/contributors/bytedance/UI-TARS-desktop?style=flat-square&logo=github&colorA=564341&colorB=EDED91"></a>
10+
</p>
11+
12+
**Agent TARS** is an open-source GUI agent designed to revolutionize multimodal interaction by visually interpreting web pages and seamlessly integrating with command lines and file systems.
13+
14+
> [!CAUTION]
15+
> **DISCLAIMER**: Agent TARS is still in **Technical Preview** stage and not stable yet. It's not recommended to use it in production.
16+
17+
## Showcases
18+
19+
| Instruction | Replay |
20+
| ----------- | ------ |
21+
| | |
22+
23+
## ✨️ Features
24+
25+
- **🌐 Advanced Browser Operations:** Executes sophisticated tasks like Deep Research and Operator functions through an agent framework, enabling comprehensive planning and execution.
26+
- **🛠️ Comprehensive Tool Support:** Integrates with search, file editing, command line, and Model Context Protocol (MCP) tools to handle complex workflows.
27+
- **💻️ Enhanced Desktop App:** A revamped UI with displays for browsers, multimodal elements, session management, model configuration, dialogue flow visualization, and browser/search status tracking.
28+
- **🔄 Workflow Orchestration:** Seamlessly connects GUI Agent tools—search, browse, explore links, and synthesize information into final outputs.
29+
- **⚙️ Developer-Friendly Framework:** Simplifies integration with UI-TARS and custom workflow creation for GUI Agent projects.
30+
31+
32+
## Getting Started
33+
34+
**Clone the** **Repository**:
35+
36+
```
37+
git clone https://github.com/bytedance/agent-TARS.git
38+
```
39+
40+
## Future Plans
41+
42+
Agent TARS is more than a tool—it’s a platform for the future of multimodal agents. Upcoming enhancements include:
43+
44+
- Ongoing optimization of agent framework-GUI Agent synergy with expanded model compatibility.
45+
- Expansion to mobile device operations with cross-platform framework.
46+
- Integration with game environments for AI-driven gameplay.
47+
48+
## Contributing
49+
50+
- [ ] update [contributing.md](./contributing.md)
51+
52+
## License
53+
54+
Agent TARS is licensed under the Apache License 2.0.
55+
56+
## Acknowledgments
57+
58+
- A huge thanks to the UI-TARS and UI-TARS-desktop team for their foundational work.
59+
- Gratitude to all contributors and the open-source community for their support.
60+
61+
**Join us in shaping the future of multimodal AI agents with Agent TARS!**

apps/omega/resources/hero.png

1010 KB
Loading

0 commit comments

Comments
 (0)