Add PointTrack++

patrick-llgc · patrick-llgc · commit d5fce9056f02 · 2020-07-19T16:24:26.000-07:00
diff --git a/README.md b/README.md
@@ -115,6 +115,7 @@ Geometrically Constrained Keypoints in Real-Time](https://drive.google.com/file/
 - [Tsinghua-Daimler Cyclists: A New Benchmark for Vison-Based Cyclist Detection](http://www.gavrila.net/Publications/iv16_cyclist_benchmark.pdf) [[Notes](paper_notes/tsinghua_daimler_cyclist.md)] <kbd>IV 2016</kbd> [dataset, cyclist Detection]
 - [Specialized Cyclist Detection Dataset: Challenging Real-World Computer Vision Dataset for Cyclist Detection Using a Monocular RGB Camera](https://drive.google.com/drive/u/0/folders/1inawrX9NVcchDQZepnBeJY4i9aAI5mg9) [[Notes]([paper_notes/specialized_cyclists.md)] <kbd>IV 2019</kbd> [Extention to KITTI]
 - [PointTrack: Segment as Points for Efficient Online Multi-Object Tracking and Segmentation](https://arxiv.org/abs/2007.01550) [[Notes](paper_notes/pointtrack.md)] <kbd>ECCV 2020 oral</kbd> [MOTS]
+- [PointTrack++ for Effective Online Multi-Object Tracking and Segmentation](https://arxiv.org/abs/2007.01549) [[Notes](paper_notes/pointtrack++.md)] <kbd>CVPR 2020 workshop</kbd> [CVPR2020 MOTS Challenge Winner. PointTrack++ ranks first on KITTI MOTS]
 - [SpatialEmbedding: Instance Segmentation by Jointly Optimizing Spatial Embeddings and Clustering Bandwidth](https://arxiv.org/abs/1906.11109) [[Notes](paper_notes/spatial_embedding.md)] <kbd>ICCV 2019</kbd> [one-stage, instance segmentation]
 - [BA-Net: Dense Bundle Adjustment Networks](https://arxiv.org/abs/1806.04807) [[Notes](paper_notes/banet.md)] <kbd>ICLR 2019</kbd> [Bundle adjustment]
 - [DeepSFM: Structure From Motion Via Deep Bundle Adjustment](https://arxiv.org/abs/1912.09697) <kbd>ECCV 2020 oral</kbd>
diff --git a/paper_notes/pointtrack++.md b/paper_notes/pointtrack++.md
@@ -0,0 +1,23 @@
+# [PointTrack++ for Effective Online Multi-Object Tracking and Segmentation](https://arxiv.org/abs/2007.01549)
+
+_July 2020_
+
+tl;dr: Follow-up work of [PointTrack](pointtrack.md) for MOTS.
+
+#### Overall impression
+Three main contributions:
+
+
+#### Key ideas
+- Semantic segmentation map as seed map in [PointTrack](pointtrack.md) and [SpatialEmbedding](spatial_embedding.md).
+- Copy and paste data augmentation for crowded scenes. Need segmentation mask.
+- Training instance embedding:
+	- [PointTrack](pointtrack.md) consists of D track ids, each with three crops with equal temporal space. It does not use 3 consecutive frames to increase the intra-track-id discrepancy. The space S is randomly chosen between 1 and 10.
+	- [PointTrack++](pointtrack++.md) finds that for environment embedding, making S>2 does not converge, but for foreground 2D point cloud a large S (~12) helps to achieve a higher performance. Thus the embeddings are trained separately. Then the individual MLP weights are fixed, and a new MLP is trained to aggregate these info together. 
+
+#### Technical details
+- Image is upsampled to twice the original size for better performance.
+
+#### Notes
+- Questions and notes on how to improve/revise the current work  
+
diff --git a/paper_notes/pointtrack.md b/paper_notes/pointtrack.md
@@ -23,10 +23,13 @@ This work tackles the newly created track of [MOTS (multiple object tracking and
 - Seed consistency:
 	- Using Optical flow and last seed to encourage consistent seed. --> This additional optical flow map will definitely help [CenterTrack](centertrack.md).
 	- Penalize difference between the warped seed from last frame with optical flow and the seed predicted from current frame. 
-- Points with highest (top 10%) importance can be visualized by their weights, a natural feature from [PointNet](pointnet.md) embedding.
-- Visualization of instance embedding with T-SNE is also quite interesting.
+- Training instance embedding:
+	- [PointTrack](pointtrack.md) consists of D track ids, each with three crops with equal temporal space. It does not use 3 consecutive frames to increase the intra-track-id discrepancy. The space S is randomly chosen between 1 and 10.
+	- [PointTrack++](pointtrack++.md) finds that for environment embedding, making S>2 does not converge, but for foreground 2D point cloud a large S (~12) helps to achieve a higher performance. Thus the embeddings are trained separately. Then the individual MLP weights are fixed, and a new MLP is trained to aggregate these info together. 
 
 #### Technical details
+- Points with highest (top 10%) importance can be visualized by their weights, a natural feature from [PointNet](pointnet.md) embedding.
+- Visualization of instance embedding with T-SNE is also quite interesting.
 - Ablation study showed that the removal of color in the input leads to the biggest drop.
 
 #### Notes