Skip to content

Commit 36e2156

Browse files
committed
Update notes
1 parent afa1c08 commit 36e2156

File tree

7 files changed

+13
-7
lines changed

7 files changed

+13
-7
lines changed

paper_notes/mono3d++.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,4 +21,4 @@ The paper seems to use 3D depth off the shelf but it was not described in detail
2121
#### Notes
2222
- Where does the label come from?
2323
- The wireframe model is fragile and cannot model under-represented cases.
24-
-
24+
- von Mises distribution: circular Gaussian distribution

paper_notes/mono_3dod_2d3d_constraints.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
_October 2019_
44

5-
tl;dr: Summary of the main idea.
5+
tl;dr: Summary of the main idea. The review is published at [towarddatacience](https://towardsdatascience.com/geometric-reasoning-based-cuboid-generation-in-monocular-3d-object-detection-5ee2996270d1?source=friends_link&sk=ebead4b51a3f75476d308997dd88dd75).
66

77
### Deep 3D Box
88
from [Deep3DBox](https://arxiv.org/pdf/1612.00496.pdf) and its [Supplementary material](https://cs.gmu.edu/~amousavi/papers/3D-Deepbox-Supplementary.pdf), and [review in 中文](https://blog.csdn.net/qq_29462849/article/details/91314777)

paper_notes/monogrnet.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,9 @@ _August 2019_
55
tl;dr: Use the same network to estimate instance depth, 2D and 3D bbox.
66

77
#### Overall impression
8-
The authors raises a critical issue in the current depth estimation that the evaluation metrics are not focused on instance level depth estimation. That means all SOTA methods are sub-optimal in terms of estimating instance level depth. This point echoes mine that the pixel wise depth map need finetune for 3D object detection, as opposed to freezing it as done in [pseudo-lidar end2end](pseudo_lidar_e2e.md).
8+
The authors raises a critical issue in the current depth estimation that the evaluation metrics are not focused on instance level depth estimation. That means all SOTA methods are sub-optimal in terms of estimating instance level depth. This point echoes mine that the pixel wise depth map need finetune for 3D object detection, as opposed to freezing it as done in [pseudo-lidar end2end](pseudo_lidar_e2e.md). This point is further elaborated in [ForeSeE](foresee_mono3dod.md) which separates FG and BG depth estimation.
9+
10+
It also mentioned that pixel-wise depth is too expensive for mono3dod, and instance depth should be enough. --> similar to [TLNet](tlnet.md).
911

1012
The depth estimation is instance based sparse depth info. --> There should be a way to marry depth estimation and 3D object detection together. Or is this sparse depth info is already enough? This is the first tunable depth estimation in almost all mono3DOD research. Most just use pretrained depth estimation off the shelf.
1113

paper_notes/monogrnet_russian.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@ _October 2019_
55
tl;dr: Regress keypoints in 2D images and use 3D CAD model to infer depth.
66

77
#### Overall impression
8-
The training is based on 3D CAD model with minimal keypoint annotation. This is valuable as it saves much annotation effort on 2D images, which is inefficient and inaccurate. It only assumes that the intrinsics are given.
8+
The training is based on 3D CAD model with minimal keypoint annotation. This is valuable as it saves much annotation effort on 2D images, which is inefficient and inaccurate. It also seems to use the semi-automatic way to annotate 2D keypoints as in [deep MANTA](deep_manta.md).
99

10-
It is related to [deepMANTA](deep_manta.md) that it relies on keypoint regression for monocular 3DOD. It is inspired by [monoGRNet](monogrnet.md) that it uses simple geometric reasoning for Mono3DOD. The idea of using keypoint to estimate depth can also be found in [GS3D](gs3d.md).
10+
It is related to [deepMANTA](deep_manta.md) that it relies on keypoint regression for monocular 3DOD. The idea of using keypoint to estimate depth can also be found in [GS3D](gs3d.md). It is not actually that related to MonoGRNet.
1111

1212
It follows the Mono3DOD tradition that regresses local yaw and dimension offset from image patches and infer depth from these results.
1313

@@ -20,7 +20,7 @@ It follows the Mono3DOD tradition that regresses local yaw and dimension offset
2020
- CAD and dimension model: CAD cls + dimension offset.
2121
- Scale according to keypoint distance. This idea is valid as the dimension or keypoint distances do not vary much.
2222
- Regress the local orientation, then convert to global orientation.
23-
- Reprojection consistency loss: this would need accurate **extrinsic** information. We could get this from localization, but this is better left off to sensor fusion to fuse the info. --> But sensor fusion is better to do a sanity check of the consistency from different methods. Or DL could use a nominal extrinsic information to check the consistency.
23+
- Reprojection consistency loss to make the results from multiple heads consistent. (e.g., 2D 3D tight constraints, 2D keypoint location)
2424

2525
#### Technical details
2626
- Dense depth estimation may be redundant in context of 3D object detection. [MonoGRNet](monogrnet.md) only regresses instance level depth. This paper focuses on salient features (keypoints).

paper_notes/review_descriptors.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,4 +28,7 @@ How to use multiple features per image? Keypoint matching or bag-of-visual-words
2828
![](https://docs.opencv.org/3.4/fast_speedtest.jpg)
2929
- [openCV example](https://docs.opencv.org/3.1.0/df/d0c/tutorial_py_fast.html#gsc.tab=0)
3030

31+
#### Harris
32+
- Quite fast (although slower than FAST), more accurate
33+
3134
#### SIFT

paper_notes/review_mono_3dod.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ _October 2019_
44

55
- Update: 10/24/2019, initial creation of table
66
- Update: 10/28/2019, added centerNet (from UT Austin) and mono 3d tracking (from DeepDrive)
7+
- Update: 11/25/2019, blog post published to [towarddatascience](https://towardsdatascience.com/monocular-3d-object-detection-in-autonomous-driving-2476a3c7f57e?source=friends_link&sk=160d236be1881b6ee1b431a943666fdb) and google [spreadsheet](https://docs.google.com/spreadsheets/d/1X_ViM-W4QbHPbJ2dHouRgkRAyzEnBS6J_9VxPEXvDM4/edit#gid=0).
78

89
| name | Time | venue | title | tl;dr | predecessor | backbone | 3d size | 3d shape | keypoint | 3d orientation | distance | 2D to 3D tight optim | required input | drawbacks | tricks and contributions | insights |
910
|---------------------------------------------|------|----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------|------------------------------------------------------------------------|-----------------------------------------------------------------|----------------------------------------------------|------------------------------|---------------------------------------------------------------------------------|------------------------------------------------------------------------------------|---------------------------------------------------------|-------------------------------------------------------------------------|------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|

paper_notes/tlnet.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ _October 2019_
55
tl;dr: Place 3D anchors inside the frustum subtended by 2D object detection as the mono baseline. The stereo branches reweigh feature maps based on their coherence score.
66

77
#### Overall impression
8-
Pixel level depth maps are too expensive for 3DOD. Object level depth should be good enough.
8+
Pixel level depth maps are too expensive for 3DOD. Object level depth should be good enough. --> this is similar to [MonoGRNet](monogrnet.md).
99

1010
The paper provides a solid mono baseline. --> this can be perhaps improved by using some huristics such as vehicle size to overcome the dense sampling of 3D anchors.
1111

0 commit comments

Comments
 (0)