You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[An intriguing failing of convolutional neural networks and the CoordConv solution](https://arxiv.org/abs/1807.03247)[[Notes](paper_notes/coord_conv.md)] <kbd>NIPS 2018</kbd>
457
457
458
458
459
+
460
+
## 2019-08 (0)
461
+
-[Detection in Crowded Scenes: One Proposal, Multiple Predictionn](https://arxiv.org/abs/2003.09163) <kbd>CVPR 2020 oral</kbd> [Megvii]
-[ContFuse: Deep Continuous Fusion for Multi-Sensor 3D Object Detection](http://openaccess.thecvf.com/content_ECCV_2018/papers/Ming_Liang_Deep_Continuous_Fusion_ECCV_2018_paper.pdf)[[Notes](paper_notes/contfuse.md)] <kbd>ECCV 2018</kbd> [Uber ATG, sensor fusion, BEV]
462
468
-[Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net](http://openaccess.thecvf.com/content_cvpr_2018/papers/Luo_Fast_and_Furious_CVPR_2018_paper.pdf)[[Notes](paper_notes/faf.md)] <kbd>CVPR 2018 oral</kbd> [lidar only, perception and prediction]
463
-
-[Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unknown Cameras](https://arxiv.org/pdf/1904.04998.pdf)[[Notes](paper_notes/mono_depth_video_in_the_wild.md)] <kbd>ICCV 2019</kbd> [monocular depth estimation, intrinsic estimation, SOTA]
-[monodepth: Unsupervised Monocular Depth Estimation with Left-Right Consistency](https://arxiv.org/abs/1609.03677)[[Notes](paper_notes/monodepth.md)] <kbd>CVPR 2017 oral</kbd> (monocular depth estimation, stereo for training)
465
471
-[Struct2depth: Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos](https://arxiv.org/abs/1811.06152)[[Notes](paper_notes/struct2depth.md)] <kbd>AAAI 2019</kbd> [monocular depth estimation, estimating movement of dynamic object, infinite depth problem, online finetune]
466
472
-[Unsupervised Learning of Geometry with Edge-aware Depth-Normal Consistency](https://arxiv.org/pdf/1711.03665.pdf)[[Notes](paper_notes/edge_aware_depth_normal.md)] <kbd>AAAI 2018</kbd> (monocular depth estimation, static assumption, surface normal)
Copy file name to clipboardExpand all lines: paper_notes/deep3dbox.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@ A simpler version for 3d proposal generation based on 2d bbox and viewpoint clas
19
19
- This is also used in depth/disparity estimation, such as in [TW-SMNet](twsm_net.md).
20
20
-**Representation matters**.
21
21
- Regress dimension and orientation first.
22
-
- The authors tried regressing dimension and distance at the same time but found it to be highly sensitive to input errors. --> This is understandable as dim and distance are highly correlated in determining the dimension of the bbox. (c.f. [depth in the wild](mono_depth_video_in_the_wild.md) to understand the coupling of estimation parameters. Sometimes an overall supervision signal is given to two tightly coupled parameters and it is not enough to get accurate estimate for both parameters)
22
+
- The authors tried regressing dimension and distance at the same time but found it to be highly sensitive to input errors. --> This is understandable as dim and distance are highly correlated in determining the dimension of the bbox. (c.f. [depth in the wild](learnk.md) to understand the coupling of estimation parameters. Sometimes an overall supervision signal is given to two tightly coupled parameters and it is not enough to get accurate estimate for both parameters)
23
23
- Orientation of a car can be estimated fairly accurately, given ground truth (from lidar annotation). Angle errors are: 3 degrees for easy case, 6 for moderate and 8 for hard cases.
24
24
- The translational vector (center of the 3dbbox) is calculated deterministically from solving linear equations. However the center of the 3dbbox can be calculated fairly easily with reprojecting the 2d bbox height and the center to the 3d world. See [monoPSR](monopsr.md) for a rough estimate of the 3D position.
Copy file name to clipboardExpand all lines: paper_notes/glnet.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ tl;dr: Combine monodepth with optical flow with geometric and photometric losses
7
7
#### Overall impression
8
8
The paper proposes two online refinement strategies, one finetuning the model and one finetuning the image. --> cf [Struct2Depth](struc2depth.md) and [Consistent video depth](consistent_video_depth.md).
9
9
10
-
It also predicts intrinsics for videos in the wild. --> cf [Depth from Videos in the Wild](mono_depth_video_in_the_wild.md).
10
+
It also predicts intrinsics for videos in the wild. --> cf [Depth from Videos in the Wild](learnk.md).
11
11
12
12
The paper has several interesting ideas, but there are some conflicts as well. The main issue is that it uses FlowNet to handle dynamic regions but it still enforces epipolar constraints on the optical flow. Also it does not handle depth of the dynamic regions well.
Copy file name to clipboardExpand all lines: paper_notes/kp3d.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ _March 2020_
5
5
tl;dr: Predict keypoints and depth from videos simultaneously and in a unsupervised fashion.
6
6
7
7
#### Overall impression
8
-
This paper is based on two streams of unsupervised research based on video. The first is depth estimation starting from [sfm Learner](sfm_learner.md), [depth in the wild](mono_depth_video_in_the_wild.md) and [scale-consistent sfm Learner](sc_sfm_learner.md), and the second is the self-supervised keypoint learning starting from [superpoint](superpoint.md), [unsuperpoint](unsuperpoint.md) and [unsuperpoint with outlier rejection](kp2d.md).
8
+
This paper is based on two streams of unsupervised research based on video. The first is depth estimation starting from [sfm Learner](sfm_learner.md), [depth in the wild](learnk.md) and [scale-consistent sfm Learner](sc_sfm_learner.md), and the second is the self-supervised keypoint learning starting from [superpoint](superpoint.md), [unsuperpoint](unsuperpoint.md) and [unsuperpoint with outlier rejection](kp2d.md).
9
9
10
10
The two major enablers of this research is [scale-consistent sfm Learner](sc_sfm_learner.md) and [unsuperpoint](unsuperpoint.md).
Copy file name to clipboardExpand all lines: paper_notes/learnk.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ tl;dr: Estimate the intrinsics in addition to the extrinsics of the camera from
7
7
#### Overall impression
8
8
Same authors for [Struct2Depth](struct2depth.md). This work eliminates the assumption of the availability of intrinsics. This opens up a whole lot possibilities to learn from a wide range of videos.
9
9
10
-
This network regresses depth, ego-motion, object motion and camera intrinsics from mono videos. --> The idea of regressing intrinsics is similar to [GLNet](glnet.md).
10
+
This network regresses depth, ego-motion, object motion and camera intrinsics from mono videos. Thus it is named learn-K (intrinsics) --> The idea of regressing intrinsics is similar to [GLNet](glnet.md).
Copy file name to clipboardExpand all lines: paper_notes/monodepth.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ This paper is one pioneering work on monocular depth estimation with self-superv
9
9
10
10
When people are talking about monocular depth estimation, they mean "monocular at inference". The system can still rely on other supervision at training, either explicit supervision by dense depth map GT or with self-supervision via consistency.
11
11
12
-
I feel that for self-supervised method there are tons of tricks and know-hows about tuning the model, cf. [google AI's depth in the wild paper](mono_depth_video_in_the_wild.md).
12
+
I feel that for self-supervised method there are tons of tricks and know-hows about tuning the model, cf. [google AI's depth in the wild paper](learnk.md).
13
13
14
14
Monodepth requires synchronized and rectified image pairs. It also does not handle occlusion in training. It is superseded by [monodepth2](monodepth2.md), which focuses on depth estimation from monocular video.
Copy file name to clipboardExpand all lines: paper_notes/oft.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,7 +22,7 @@ The network does not require explicit info about intrinsics, but rather learns t
22
22
23
23
#### Technical details
24
24
- Replace batchnorm with groupnorm.
25
-
- Data augmentation and adjusting intrinsic parameters accordingly (including cx, cy and fx and fy, c.f., [depth in the wild](mono_depth_video_in_the_wild.md) paper).
25
+
- Data augmentation and adjusting intrinsic parameters accordingly (including cx, cy and fx and fy, c.f., [depth in the wild](learnk.md) paper).
26
26
- Sum loss instead of averaging to avoid biasing toward examples with few object instances.
0 commit comments