You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[Feature-metric Loss for Self-supervised Learning of Depth and Egomotion](https://arxiv.org/abs/2007.10603)[[Notes](paper_notes/feature_metric.md)] <kbd>ECCV 2020</kbd> [feature-metric, local minima, monodepth]
101
+
-[Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction](https://arxiv.org/abs/1803.03893) <kbd>CVPR 2018</kbd> [feature-metric, monodepth]
102
+
-[Learning monocular depth estimation infusing traditional stereo knowledge](https://arxiv.org/abs/1904.04144)[[Notes](paper_notes/monoresmatch.md)] <kbd>CVPR 2019</kbd> [monodepth, local minima]
103
+
-[Every Pixel Counts: Unsupervised Geometry Learning with Holistic 3D Motion Understanding](https://arxiv.org/abs/1806.10556) <kbd>ECCV 2018</kbd>
104
+
-[Every Pixel Counts ++: Joint Learning of Geometry and Motion with 3D Holistic Understanding](https://arxiv.org/abs/1810.06125) <kbd>TPAMI 2018</kbd>
105
+
-[Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation](https://arxiv.org/abs/1805.09806) <kbd>CVPR 2019</kbd>
106
+
-[Detection in Crowded Scenes: One Proposal, Multiple Predictionn](https://arxiv.org/abs/2003.09163) <kbd>CVPR 2020 oral</kbd> [Megvii]
-[SGDepth: Self-Supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance](https://arxiv.org/abs/2007.06936) <kbd>ECCV 2020</kbd>
109
+
-[Learning Depth from Monocular Videos using Direct Methods](https://arxiv.org/abs/1712.00175)
110
+
-[Vid2Depth: Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints](https://arxiv.org/abs/1802.05522) <kbd>CVPR 2018</kbd>
111
+
112
+
99
113
## 2020-07 (19)
100
114
-[CenterTrack: Tracking Objects as Points](https://arxiv.org/abs/2004.01177)[[Notes](paper_notes/centertrack.md)] <kbd>ECCV 2020 spotlight</kbd> [camera based 3D MOD, MOT SOTA, CenterNet, video based object detection]
101
115
-[CenterPoint: Center-based 3D Object Detection and Tracking](https://arxiv.org/abs/2006.11275)[[Notes](paper_notes/centerpoint.md)][lidar based 3D MOD, CenterNet]
-[PointTrack++ for Effective Online Multi-Object Tracking and Segmentation](https://arxiv.org/abs/2007.01549)[[Notes](paper_notes/pointtrack++.md)] <kbd>CVPR 2020 workshop</kbd> [CVPR2020 MOTS Challenge Winner. PointTrack++ ranks first on KITTI MOTS]
116
130
-[SpatialEmbedding: Instance Segmentation by Jointly Optimizing Spatial Embeddings and Clustering Bandwidth](https://arxiv.org/abs/1906.11109)[[Notes](paper_notes/spatial_embedding.md)] <kbd>ICCV 2019</kbd> [one-stage, instance segmentation]
-[Feature-metric Loss for Self-supervised Learning of Depth and Egomotion](https://arxiv.org/abs/2007.10603) <kbd>ECCV 2020</kbd>
119
132
-[DeepSFM: Structure From Motion Via Deep Bundle Adjustment](https://arxiv.org/abs/1912.09697) <kbd>ECCV 2020 oral</kbd> [multi-frame monodepth]
120
133
-[Consistent Video Depth Estimation](https://arxiv.org/abs/2004.15021)[[Notes](paper_notes/consistent_video_depth.md)] <kbd>SIGGRAPH 2020</kbd> [multi-frame monodepth, online finetune]
121
134
-[DeepV2D: Video to Depth with Differentiable Structure from Motion](https://arxiv.org/abs/1812.04605)[[Notes](paper_notes/deepv2d.md)] <kbd>ICLR 2020</kbd> [multi-frame monodepth, Jia Deng]
122
135
-[GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose](https://arxiv.org/abs/1803.02276)[[Notes](paper_notes/geonet.md)] <kbd>CVPR 2018</kbd> [residual optical flow, monodepth]
123
136
-[GLNet: Self-supervised Learning with Geometric Constraints in Monocular Video: Connecting Flow, Depth, and Camera](https://arxiv.org/abs/1907.05820)[[Notes](paper_notes/glnet.md)] <kbd>ICCV 2019</kbd> [online finetune]
-[MonoUncertainty: On the uncertainty of self-supervised monocular depth estimation](https://arxiv.org/abs/2005.06209)[[Notes](paper_notes/mono_uncertainty.md)] <kbd>CVPR 2020</kbd> [depth uncertainty]
126
-
-[Learning monocular depth estimation infusing traditional stereo knowledge](https://arxiv.org/abs/1904.04144)[[Notes](paper_notes/monoresmatch.md)] <kbd>CVPR 2019</kbd>
127
139
-[Supervising the new with the old: learning SFM from SFM](http://openaccess.thecvf.com/content_ECCV_2018/papers/Maria_Klodt_Supervising_the_new_ECCV_2018_paper.pdf)[[Notes](paper_notes/learn_sfm_from_sfm.md)] <kbd>ECCV 2018</kbd>
128
140
-[Neural RGB->D Sensing: Depth and Uncertainty from a Video Camera](https://arxiv.org/abs/1901.02571) <kbd>CVPR 2019</kbd> [multi-frame monodepth]
129
141
-[Don't Forget The Past: Recurrent Depth Estimation from Monocular Video](https://arxiv.org/abs/2001.02613)[multi-frame monodepth, RNN]
-[MiDas: Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer](https://arxiv.org/abs/1907.01341)[monodepth, dynamic object, synthetic dataset]
142
154
-[Semantics-Driven Unsupervised Learning for Monocular Depth and Ego-Motion Estimation](https://arxiv.org/abs/2006.04371)[monodepth]
143
155
-[OmegaNet: Distilled Semantics for Comprehensive Scene Understanding from Videos](https://arxiv.org/abs/2003.14030) <kbd>CVPR 2020</kbd>
144
-
-[SGDepth: Self-Supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance](https://arxiv.org/abs/2007.06936) <kbd>ECCV 2020</kbd>
145
156
-[Monocular Plan View Networks for Autonomous Driving](https://arxiv.org/abs/1905.06937) <kbd>IROS 2019</kbd> [BEV-Net]
146
157
-[MoNet3D: Towards Accurate Monocular 3D Object Localization in Real Time](https://arxiv.org/abs/2006.16007) <kbd>ICML 2020</kbd> [mono3D]
147
158
-[CubifAE-3D: Monocular Camera Space Cubification on Autonomous Vehicles for Auto-Encoder based 3D Object Detection](https://arxiv.org/abs/2006.04080)[mono3D]
@@ -171,7 +182,7 @@ Geometrically Constrained Keypoints in Real-Time](https://drive.google.com/file/
171
182
-[Object as Hotspots: An Anchor-Free 3D Object Detection Approach via Firing of Hotspots](https://arxiv.org/abs/1912.12791)
172
183
-[Gradient Centralization: A New Optimization Technique for Deep Neural Networks](https://arxiv.org/abs/2004.01461) <kbd>ECCV 2020 oral</kbd>
173
184
-[Depth Completion via Deep Basis Fitting](https://arxiv.org/abs/1912.10336) <kbd>WACV 2020</kbd>
174
-
-[LPG: From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation](https://arxiv.org/abs/1907.10326)[monodepth]
185
+
-[BTS: From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation](https://arxiv.org/abs/1907.10326)[monodepth, supervised]
175
186
-[The Edge of Depth: Explicit Constraints between Segmentation and Depth](https://arxiv.org/abs/2004.00171) <kbd>CVPR 2020</kbd> [monodepth, Xiaoming Liu]
176
187
-[On the Continuity of Rotation Representations in Neural Networks](https://arxiv.org/abs/1812.07035) <kbd>CVPR 2019</kbd> [rotational representation]
177
188
-[VDO-SLAM: A Visual Dynamic Object-aware SLAM System](https://arxiv.org/abs/2005.11052) <kbd>IJRR 2020</kbd>
-[Part-level Car Parsing and Reconstruction from a Single Street View](https://arxiv.org/abs/1811.10837)[[Notes](paper_notes/apollo_car_parts.md)][Baidu]
-[RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving](https://arxiv.org/abs/2001.03343)[[Notes](paper_notes/rtm3d.md)] <kbd>ECCV 2020 spotlight</kbd>
329
-
-[DORN: Deep Ordinal Regression Network for Monocular Depth Estimation](https://arxiv.org/abs/1806.02446)[[Notes](paper_notes/dorn.md)] <kbd>CVPR 2018</kbd>
340
+
-[DORN: Deep Ordinal Regression Network for Monocular Depth Estimation](https://arxiv.org/abs/1806.02446)[[Notes](paper_notes/dorn.md)] <kbd>CVPR 2018</kbd> [monodepth, supervised]
330
341
-[D&T: Detect to Track and Track to Detect](https://arxiv.org/abs/1710.03958)[[Notes](paper_notes/detect_track.md)] <kbd>ICCV 2017</kbd> (from Feichtenhofer)
331
342
-[CRF-Net: A Deep Learning-based Radar and Camera Sensor Fusion Architecture for Object Detection](https://ieeexplore.ieee.org/abstract/document/8916629/)[[Notes](paper_notes/crf_net.md)] <kbd>SDF 2019</kbd> (radar detection)
332
343
-[RVNet: Deep Sensor Fusion of Monocular Camera and Radar for Image-based Obstacle Detection in Challenging Environments](https://www.researchgate.net/profile/Vijay_John3/publication/335833918_RVNet_Deep_Sensor_Fusion_of_Monocular_Camera_and_Radar_for_Image-based_Obstacle_Detection_in_Challenging_Environments/links/5d7f164e92851c87c38b09f1/RVNet-Deep-Sensor-Fusion-of-Monocular-Camera-and-Radar-for-Image-based-Obstacle-Detection-in-Challenging-Environments.pdf)[[Notes](paper_notes/rvnet.md)] <kbd>PSIVT 2019</kbd>
Copy file name to clipboardExpand all lines: paper_notes/banet.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,6 +9,8 @@ BA-Net proposed to do **BA on feature maps** to avoid sensitivity to photometric
9
9
10
10
Note that there is no PoseNet to predict ego motion. The output of the BA layer is the camera pose sequence and point cloud depths.
11
11
12
+
The idea of feature metric loss is further extended in [Feature metric monodepth](feature_metric.md) <kbd>ECCV 2020</kbd>.
13
+
12
14
[DeepV2D](deepv2d.md) is similar to [BA-Net](banet.md).
13
15
14
16
-[BA-Net](banet.md) tries to optimize one joint nonlinear optimization over all variables, and thus needs to decompose depth prediction with depth basis to reduce search space.
Copy file name to clipboardExpand all lines: paper_notes/depth_hints.md
+5-1Lines changed: 5 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
_July 2020_
4
4
5
-
tl;dr: Use depth pseudo-label to guide the self-supervised depth prediction.
5
+
tl;dr: Use depth pseudo-label to guide the self-supervised depth prediction out of local minima.
6
6
7
7
#### Overall impression
8
8
This paper digs into self-supervised learning and provides tons of insights, in a fashion similar to [What Monodepth See](what_monodepth_see.md).
@@ -11,6 +11,10 @@ It first showed that the photometric loss function (DSSIM + L1) used in monodept
11
11
12
12
This paper proposed a way to consume possibly noisy depth label together with self-supervised pipeline, and is better than using supervised signal alone, or simply sum the two loss together.
13
13
14
+
Another way to avoid local maxima is to use feature-metric loss instead of photometric loss, such as in [Feature metric monodepth](feature_metric.md), [BA-Net](banet.md) and [Deep Feature Reconstruction](depth_vo_feat.md).
15
+
16
+
In comparison, [Depth Hints](depth_hints.md) still uses photometric loss, and [Feature metric monodepth](feature_metric.md) will largely avoid the inferenece of local minima.
17
+
14
18
#### Key ideas
15
19
- When we have pseudo-label (proxy label), we can use it in the following way
16
20
- $l_r$ is photometric reprojection loss, $l_s$ is supervised loss
# [Feature-metric Loss for Self-supervised Learning of Depth and Egomotion](https://arxiv.org/abs/2007.10603)
2
+
3
+
_August 2020_
4
+
5
+
tl;dr: Feature metric loss to avoid local maxima in monodepth.
6
+
7
+
#### Overall impression
8
+
Local minima in monocular depth estimation happens as it is sufficient but not necessary for small photometric error. This issue has been tackled by either replacing photometric with feature-metric errors, or using cues to guide optimization out of local minima ([Depth Hints](depth_hints.md) and [MonoResMatch](monoresmatch.md)).
9
+
10
+
In comparison, [Depth Hints](depth_hints.md) still uses photometric loss, and [Feature metric monodepth](feature_metric.md) will largely avoid the inferenece of local minima.
11
+
12
+
The discussion of feature metric loss is perhaps first raised in [BA-Net](banet.md) and [Deep Feature Reconstruction](depth_vo_feat.md). It has the advantage to be less sensitive to photometric calibration (camera exposure, white balance) and is dense supervision.
13
+
14
+
However how to learn this feature map is the key. The paper uses AutoEncoder to do this, and have two extra loss terms to ensure large but smooth gradient, for faster and more general optimization.
15
+
16
+
>> Small photometric loss does not necessarily guarantee accurate depth and pose, especially for pixels in textureless region. Depth smoothness loss forces depth propagation from discriminative regions to textureless regions. However such propagation is with limited range and tend to cause over smooth results.
17
+
18
+
>> A set of assumptions (for SfM-Learner): the corresponding 3D point is static with Lambertian reflectance and not occluded in both views.
19
+
20
+
#### Key ideas
21
+
- Learn a good feature
22
+
- Use AutoEncoder to learn the encoded feature.
23
+
- **Discriminative loss** encourages gradient in texture region.
24
+
- **Convergent loss** encourages the gradient to be smooth, and thus ensures a large convergence basin.
25
+
- In summary, the feature has large first order but small second order gradients. The discriminative loss and convergent loss combined lead to a smooth sloped feature map in textureless region.
26
+
- The feature-metric loss is combined with photometric loss. Not sure how this changes when feature-metric loss is used alone.
27
+
- Online refinement for 20 iterations on one test sample.
28
+
29
+
#### Technical details
30
+
- Both $\partial L/\partial D(p)$ (depth) and $\partial L/\partial G$ (pose) rely on image gradient $\partial I/\partial p$. For texture-less regions, the image gradients are close to zero and thus contributes to zero loss for depth and pose. Thus we need to learn a better feature representation $\phi$ to solve this issue such that $\partial \phi/\partial p$ is not zero.
31
+
-[DORN](dorn.md) and [BTS]() are still the SOTA for supervised monodepth.
32
+
- Depth normalized by depth mean in loss function.
33
+
34
+
#### Notes
35
+
- In retrospect, performing photometric loss is quite fragile and dangerous. Photometric calibration (required by DSO and SfM-Learner) is perhaps as simple as one layer of neural network and we should leave this to the network to learn a good feature to use for depth estimation.
0 commit comments