Rename depth in the wild to LearnK

patrick-llgc · patrick-llgc · commit 1508911c3a98 · 2020-08-01T16:45:27.000-07:00
diff --git a/README.md b/README.md
@@ -456,11 +456,17 @@ Crosswalk Behavior](http://openaccess.thecvf.com/content_ICCV_2017_workshops/pap
 - [An intriguing failing of convolutional neural networks and the CoordConv solution](https://arxiv.org/abs/1807.03247) [[Notes](paper_notes/coord_conv.md)] <kbd>NIPS 2018</kbd>
 
 
+
+## 2019-08 (0)
+- [Detection in Crowded Scenes: One Proposal, Multiple Predictionn](https://arxiv.org/abs/2003.09163) <kbd>CVPR 2020 oral</kbd> [Megvii]
+- [BorderDet: Border Feature for Dense Object Detection](https://arxiv.org/abs/2007.11056) <kbd>ECCV 2020 oral</kbd> [Megvii]
+
+
 ## 2019-07 (19)
 - [Deep Parametric Continuous Convolutional Neural Networks](http://openaccess.thecvf.com/content_cvpr_2018/papers/Wang_Deep_Parametric_Continuous_CVPR_2018_paper.pdf) [[Notes](paper_notes/parametric_cont_conv.md)] <kbd>CVPR 2018</kbd> (@Uber, sensor fusion)
 - [ContFuse: Deep Continuous Fusion for Multi-Sensor 3D Object Detection](http://openaccess.thecvf.com/content_ECCV_2018/papers/Ming_Liang_Deep_Continuous_Fusion_ECCV_2018_paper.pdf) [[Notes](paper_notes/contfuse.md)] <kbd>ECCV 2018</kbd> [Uber ATG, sensor fusion, BEV]
 - [Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net](http://openaccess.thecvf.com/content_cvpr_2018/papers/Luo_Fast_and_Furious_CVPR_2018_paper.pdf) [[Notes](paper_notes/faf.md)] <kbd>CVPR 2018 oral</kbd> [lidar only, perception and prediction]
-- [Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unknown Cameras](https://arxiv.org/pdf/1904.04998.pdf) [[Notes](paper_notes/mono_depth_video_in_the_wild.md)] <kbd>ICCV 2019</kbd> [monocular depth estimation, intrinsic estimation, SOTA]
+- [LearnK: Unsupervised Monocular Depth Learning from Unknown Cameras](https://arxiv.org/pdf/1904.04998.pdf) [[Notes](paper_notes/learnk.md)] <kbd>ICCV 2019</kbd> [monocular depth estimation, intrinsic estimation, SOTA]
 - [monodepth: Unsupervised Monocular Depth Estimation with Left-Right Consistency](https://arxiv.org/abs/1609.03677) [[Notes](paper_notes/monodepth.md)] <kbd>CVPR 2017 oral</kbd> (monocular depth estimation, stereo for training)
 - [Struct2depth: Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos](https://arxiv.org/abs/1811.06152) [[Notes](paper_notes/struct2depth.md)] <kbd>AAAI 2019</kbd> [monocular depth estimation, estimating movement of dynamic object, infinite depth problem, online finetune]
 - [Unsupervised Learning of Geometry with Edge-aware Depth-Normal Consistency](https://arxiv.org/pdf/1711.03665.pdf) [[Notes](paper_notes/edge_aware_depth_normal.md)] <kbd>AAAI 2018</kbd> (monocular depth estimation, static assumption, surface normal)
diff --git a/paper_notes/deep3dbox.md b/paper_notes/deep3dbox.md
@@ -19,7 +19,7 @@ A simpler version for 3d proposal generation based on 2d bbox and viewpoint clas
 	-  This is also used in depth/disparity estimation, such as in [TW-SMNet](twsm_net.md).
 - **Representation matters**. 
 	- Regress dimension and orientation first. 
-	- The authors tried regressing dimension and distance at the same time but found it to be highly sensitive to input errors. --> This is understandable as dim and distance are highly correlated in determining the dimension of the bbox. (c.f. [depth in the wild](mono_depth_video_in_the_wild.md) to understand the coupling of estimation parameters. Sometimes an overall supervision signal is given to two tightly coupled parameters and it is not enough to get accurate estimate for both parameters)
+	- The authors tried regressing dimension and distance at the same time but found it to be highly sensitive to input errors. --> This is understandable as dim and distance are highly correlated in determining the dimension of the bbox. (c.f. [depth in the wild](learnk.md) to understand the coupling of estimation parameters. Sometimes an overall supervision signal is given to two tightly coupled parameters and it is not enough to get accurate estimate for both parameters)
 - Orientation of a car can be estimated fairly accurately, given ground truth (from lidar annotation). Angle errors are: 3 degrees for easy case, 6 for moderate and 8 for hard cases.
 - The translational vector (center of the 3dbbox) is calculated deterministically from solving linear equations. However the center of the 3dbbox can be calculated fairly easily with reprojecting the 2d bbox height and the center to the 3d world. See [monoPSR](monopsr.md) for a rough estimate of the 3D position.
 
diff --git a/paper_notes/glnet.md b/paper_notes/glnet.md
@@ -7,7 +7,7 @@ tl;dr: Combine monodepth with optical flow with geometric and photometric losses
 #### Overall impression
 The paper proposes two online refinement strategies, one finetuning the model and one finetuning the image. --> cf [Struct2Depth](struc2depth.md) and [Consistent video depth](consistent_video_depth.md).
 
-It also predicts intrinsics for videos in the wild. --> cf [Depth from Videos in the Wild](mono_depth_video_in_the_wild.md).
+It also predicts intrinsics for videos in the wild. --> cf [Depth from Videos in the Wild](learnk.md).
 
 The paper has several interesting ideas, but there are some conflicts as well. The main issue is that it uses FlowNet to handle dynamic regions but it still enforces epipolar constraints on the optical flow. Also it does not handle depth of the dynamic regions well. 
 
diff --git a/paper_notes/kp3d.md b/paper_notes/kp3d.md
@@ -5,7 +5,7 @@ _March 2020_
 tl;dr: Predict keypoints and depth from videos simultaneously and in a unsupervised fashion.
 
 #### Overall impression
-This paper is based on two streams of unsupervised research based on video. The first is depth estimation starting from [sfm Learner](sfm_learner.md), [depth in the wild](mono_depth_video_in_the_wild.md) and [scale-consistent sfm Learner](sc_sfm_learner.md), and the second is the self-supervised keypoint learning starting from [superpoint](superpoint.md), [unsuperpoint](unsuperpoint.md) and [unsuperpoint with outlier rejection](kp2d.md).
+This paper is based on two streams of unsupervised research based on video. The first is depth estimation starting from [sfm Learner](sfm_learner.md), [depth in the wild](learnk.md) and [scale-consistent sfm Learner](sc_sfm_learner.md), and the second is the self-supervised keypoint learning starting from [superpoint](superpoint.md), [unsuperpoint](unsuperpoint.md) and [unsuperpoint with outlier rejection](kp2d.md).
 
 The two major enablers of this research is [scale-consistent sfm Learner](sc_sfm_learner.md) and [unsuperpoint](unsuperpoint.md).
 
diff --git a/paper_notes/learnk.md b/paper_notes/learnk.md
@@ -7,7 +7,7 @@ tl;dr: Estimate the intrinsics in addition to the extrinsics of the camera from
 #### Overall impression
 Same authors for [Struct2Depth](struct2depth.md). This work eliminates the assumption of the availability of intrinsics. This opens up a whole lot possibilities to learn from a wide range of videos. 
 
-This network regresses depth, ego-motion, object motion and camera intrinsics from mono videos. --> The idea of regressing intrinsics is similar to [GLNet](glnet.md).
+This network regresses depth, ego-motion, object motion and camera intrinsics from mono videos. Thus it is named learn-K (intrinsics) --> The idea of regressing intrinsics is similar to [GLNet](glnet.md).
 
 #### Key ideas
 - Estimate each of the intrinsics
diff --git a/paper_notes/monodepth.md b/paper_notes/monodepth.md
@@ -9,7 +9,7 @@ This paper is one pioneering work on monocular depth estimation with self-superv
 
 When people are talking about monocular depth estimation, they mean "monocular at inference". The system can still rely on other supervision at training, either explicit supervision by dense depth map GT or with self-supervision via consistency.
 
-I feel that for self-supervised method there are tons of tricks and know-hows about tuning the model, cf. [google AI's depth in the wild paper](mono_depth_video_in_the_wild.md).
+I feel that for self-supervised method there are tons of tricks and know-hows about tuning the model, cf. [google AI's depth in the wild paper](learnk.md).
 
 Monodepth requires synchronized and rectified image pairs. It also does not handle occlusion in training. It is superseded by [monodepth2](monodepth2.md), which focuses on depth estimation from monocular video.
 
diff --git a/paper_notes/oft.md b/paper_notes/oft.md
@@ -22,7 +22,7 @@ The network does not require explicit info about intrinsics, but rather learns t
 
 #### Technical details
 - Replace batchnorm with groupnorm.
-- Data augmentation and adjusting intrinsic parameters accordingly (including cx, cy and fx and fy, c.f., [depth in the wild](mono_depth_video_in_the_wild.md) paper).
+- Data augmentation and adjusting intrinsic parameters accordingly (including cx, cy and fx and fy, c.f., [depth in the wild](learnk.md) paper).
 - Sum loss instead of averaging to avoid biasing toward examples with few object instances.
 
 #### Notes
diff --git a/paper_notes/struct2depth.md b/paper_notes/struct2depth.md
@@ -11,7 +11,7 @@ The improvement on prediction of depth in dynamic object is amazing. It also pre
 
 The paper's annotation is quite sloppy. I would perhaps need to read the code to understand better.
 
-It directly inspired [depth in the wild](mono_depth_video_in_the_wild.md).
+It directly inspired [depth in the wild](learnk.md).
 
 #### Key ideas
 - Segment each dynamic object with Mask RCNN