Skip to content

Commit e76e1a8

Browse files
authored
[MOT] fix mot doc (#3025)
* fix mot doc * remove image_lists, fix all mot docs, add custom data * fix doc, test=document_fix * fix dov * fix doc format, test=document_fix
1 parent 6cfe364 commit e76e1a8

32 files changed

+1005
-170073
lines changed

configs/mot/README.md

+227
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,227 @@
1+
English | [简体中文](README_cn.md)
2+
3+
# MOT (Multi-Object Tracking)
4+
5+
## Table of Contents
6+
- [Introduction](#Introduction)
7+
- [Model Zoo](#Model_Zoo)
8+
- [Dataset Preparation](#Dataset_Preparation)
9+
- [Installation](#Installation)
10+
- [Getting Start](#Getting_Start)
11+
- [Citations](#Citations)
12+
13+
## Introduction
14+
PaddleDetection implements three multi-object tracking methods.
15+
- [DeepSORT](https://arxiv.org/abs/1812.00442) (Deep Cosine Metric Learning SORT) extends the original [SORT](https://arxiv.org/abs/1703.07402) (Simple Online and Realtime Tracking) algorithm to integrate appearance information based on a deep appearance descriptor. It adds a CNN model to extract features in image of human part bounded by a detector. Here we use `JDE` as detection model to generate boxes, and select `PCBPyramid` as the ReID model. We also support loading the boxes from saved detection result files.
16+
17+
- [JDE](https://arxiv.org/abs/1909.12605) (Joint Detection and Embedding) is a fast and high-performance multiple-object tracker that learns the object detection task and appearance embedding task simutaneously in a shared neural network.
18+
19+
- [FairMOT](https://arxiv.org/abs/2004.01888) focuses on accomplishing the detection and re-identification in a single network to improve the inference speed, presents a simple baseline which consists of two homogeneous branches to predict pixel-wise objectness scores and re-ID features. The achieved fairness between the two tasks allows FairMOT to obtain high levels of detection and tracking accuracy.
20+
21+
<div align="center">
22+
<img src="../../docs/images/mot16_jde.gif" width=500 />
23+
</div>
24+
25+
## Model Zoo
26+
27+
### JDE on MOT-16 training set
28+
29+
| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config |
30+
| :----------------- | :------- | :----: | :----: | :---: | :----: | :---: | :---: | :---: | :---: |
31+
| DarkNet53 | 1088x608 | 73.2 | 69.4 | 1320 | 6613 | 21629 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_1088x608.yml) |
32+
| DarkNet53 | 864x480 | 70.1 | 65.4 | 1341 | 6454 | 25208 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_864x480.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_864x480.yml) |
33+
| DarkNet53 | 576x320 | 63.1 | 64.6 | 1357 | 7083 | 32312 | - |[model](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_576x320.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/jde/jde_darknet53_30e_576x320.yml) |
34+
35+
**Notes:**
36+
JDE used 8 GPUs for training and mini-batch size as 4 on each GPU, and trained for 30 epoches.
37+
38+
### DeepSORT on MOT-16 training set
39+
40+
| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | Detector | ReID | config |
41+
| :---------| :------- | :----: | :----: | :--: | :----: | :---: | :---: |:-----: | :-----: | :-----: |
42+
| DarkNet53 | 1088x608 | 72.2 | 60.5 | 998 | 8054 | 21644 | 5.07 |[JDE](https://paddledet.bj.bcebos.com/models/mot/jde_darknet53_30e_1088x608.pdparams)| [ReID](https://paddledet.bj.bcebos.com/models/mot/deepsort_pcb_pyramid_r101.pdparams)|[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/deepsort/deepsort_pcb_pyramid_r101.yml) |
43+
44+
**Notes:**
45+
DeepSORT does not need to train, only used for evaluation. Before DeepSORT evaluation, you should get detection results by a detection model first, here we use JDE, and then prepare them like this:
46+
```
47+
det_results_dir
48+
|——————MOT16-02.txt
49+
|——————MOT16-04.txt
50+
|——————MOT16-05.txt
51+
|——————MOT16-09.txt
52+
|——————MOT16-10.txt
53+
|——————MOT16-11.txt
54+
|——————MOT16-13.txt
55+
```
56+
Each txt is the detection result of all the pictures extracted from each video, and each line describes a bounding box with the following format:
57+
```
58+
[frame_id][identity][bb_left][bb_top][width][height][conf][x][y][z]
59+
```
60+
**Notes:**
61+
- `frame_id` is the frame number of the image
62+
- `identity` is the object id using default value `-1`
63+
- `bb_left` is the X coordinate of the left bound of the object box
64+
- `bb_top` is the Y coordinate of the upper bound of the object box
65+
- `width, height` is the pixel width and height
66+
- `conf` is the object score with default value `1` (the results had been filtered out according to the detection score threshold)
67+
- `x,y,z` are used in 3D, default to `-1` in 2D.
68+
69+
### FairMOT Results on MOT-16 train set
70+
71+
| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config |
72+
| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
73+
| DLA-34(paper) | 1088x608 | 83.3 | 81.9 | 544 | 3822 | 14095 | - | - | - |
74+
| DLA-34 | 1088x608 | 83.7 | 83.3 | 435 | 3829 | 13764 | - | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml) |
75+
76+
77+
### FairMOT Results on MOT-16 test set
78+
79+
| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config |
80+
| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
81+
| DLA-34(paper) | 1088x608 | 74.9 | 72.8 | 1074 | - | - | 25.9 | - | - |
82+
| DLA-34 | 1088x608 | 74.8 | 74.4 | 930 | 7038 | 37994 | - | [model](https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml) |
83+
84+
**Notes:**
85+
FairMOT used 8 GPUs for training and mini-batch size as 6 on each GPU, and trained for 30 epoches.
86+
87+
## Dataset Preparation
88+
89+
### MOT Dataset
90+
PaddleDetection use the same training data as [JDE](https://github.com/Zhongdao/Towards-Realtime-MOT) and [FairMOT](https://github.com/ifzhang/FairMOT). Please refer to [PrepareMOTDataSet](../../docs/tutorials/PrepareMOTDataSet.md) to download and prepare all the training data including **Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17 and MOT16**. **MOT15 and MOT20** can also be downloaded from the official webpage of MOT challenge. If you want to use these datasets, please **follow their licenses**.
91+
92+
### Data Format
93+
These several relevant datasets have the following structure:
94+
```
95+
Caltech
96+
|——————images
97+
| └——————00001.jpg
98+
| |—————— ...
99+
| └——————0000N.jpg
100+
└——————labels_with_ids
101+
└——————00001.txt
102+
|—————— ...
103+
└——————0000N.txt
104+
MOT17
105+
|——————images
106+
| └——————train
107+
| └——————test
108+
└——————labels_with_ids
109+
└——————train
110+
```
111+
Annotations of these datasets are provided in a unified format. Every image has a corresponding annotation text. Given an image path, the annotation text path can be generated by replacing the string `images` with `labels_with_ids` and replacing `.jpg` with `.txt`.
112+
113+
In the annotation text, each line is describing a bounding box and has the following format:
114+
```
115+
[class] [identity] [x_center] [y_center] [width] [height]
116+
```
117+
**Notes:**
118+
- `class` should be `0`. Only single-class multi-object tracking is supported now.
119+
- `identity` is an integer from `0` to `num_identities - 1`(`num_identities` is the total number of instances of objects in the dataset), or `-1` if this box has no identity annotation.
120+
- `[x_center] [y_center] [width] [height]` are normalized by the width/height of the image, so they are floating point numbers ranging from 0 to 1.
121+
122+
### Dataset Directory
123+
124+
First, follow the command below to download the `image_list.zip` and unzip it in the `dataset/mot` directory:
125+
```
126+
wget https://dataset.bj.bcebos.com/mot/image_lists.zip
127+
```
128+
Then download and unzip each dataset, and the final directory is as follows:
129+
```
130+
dataset/mot
131+
|——————image_lists
132+
|——————caltech.10k.val
133+
|——————caltech.all
134+
|——————caltech.train
135+
|——————caltech.val
136+
|——————citypersons.train
137+
|——————citypersons.val
138+
|——————cuhksysu.train
139+
|——————cuhksysu.val
140+
|——————eth.train
141+
|——————mot15.train
142+
|——————mot16.train
143+
|——————mot17.train
144+
|——————mot20.train
145+
|——————prw.train
146+
|——————prw.val
147+
|——————Caltech
148+
|——————Cityscapes
149+
|——————CUHKSYSU
150+
|——————ETHZ
151+
|——————MOT15
152+
|——————MOT16
153+
|——————MOT17
154+
|——————MOT20
155+
|——————PRW
156+
```
157+
158+
## Installation
159+
160+
Install all the related dependencies for MOT:
161+
```
162+
pip install lap sklearn motmetrics openpyxl cython_bbox
163+
or
164+
pip install -r requirements.txt
165+
```
166+
**Notes:**
167+
Install `cython_bbox` for windows, please refer to this [tutorial](https://stackoverflow.com/questions/60349980/is-there-a-way-to-install-cython-bbox-for-windows)
168+
169+
170+
## Getting Start
171+
172+
### 1. Training
173+
174+
Training FairMOT on 8 GPUs with following command
175+
176+
```bash
177+
python -m paddle.distributed.launch --log_dir=./fairmot_dla34_30e_1088x608/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml
178+
```
179+
180+
### 2. Evaluation
181+
182+
Evaluating the track performance of FairMOT on val dataset in single GPU with following commands:
183+
184+
```bash
185+
# use weights released in PaddleDetection model zoo
186+
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams
187+
188+
# use saved checkpoint in training
189+
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=output/fairmot_dla34_30e_1088x608/model_final
190+
```
191+
192+
## Citations
193+
```
194+
@article{wang2019towards,
195+
title={Towards Real-Time Multi-Object Tracking},
196+
author={Wang, Zhongdao and Zheng, Liang and Liu, Yixuan and Wang, Shengjin},
197+
journal={arXiv preprint arXiv:1909.12605},
198+
year={2019}
199+
}
200+
201+
@inproceedings{Wojke2017simple,
202+
title={Simple Online and Realtime Tracking with a Deep Association Metric},
203+
author={Wojke, Nicolai and Bewley, Alex and Paulus, Dietrich},
204+
booktitle={2017 IEEE International Conference on Image Processing (ICIP)},
205+
year={2017},
206+
pages={3645--3649},
207+
organization={IEEE},
208+
doi={10.1109/ICIP.2017.8296962}
209+
}
210+
211+
@inproceedings{Wojke2018deep,
212+
title={Deep Cosine Metric Learning for Person Re-identification},
213+
author={Wojke, Nicolai and Bewley, Alex},
214+
booktitle={2018 IEEE Winter Conference on Applications of Computer Vision (WACV)},
215+
year={2018},
216+
pages={748--756},
217+
organization={IEEE},
218+
doi={10.1109/WACV.2018.00087}
219+
}
220+
221+
@article{wang2019towards,
222+
title={Towards Real-Time Multi-Object Tracking},
223+
author={Wang, Zhongdao and Zheng, Liang and Liu, Yixuan and Wang, Shengjin},
224+
journal={arXiv preprint arXiv:1909.12605},
225+
year={2019}
226+
}
227+
```

0 commit comments

Comments
 (0)