You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The paper proposed two approaches for distance estimation. One is based on DORN with better discretization strategy, and the second is based on breaking down the distance into two large bins, one for near objects and the other for faraway ones.
9
9
10
-
It is [CenterNet](centernet.md) based approach, very similar to [SMOKE](smoke.md) and [KM3D-Net](km3d_net.md).
10
+
Overall this paper is a very solid contribution to monocular 3D object detection. Nothing fancy, but concrete experiment and small design tweaks.
11
+
12
+
A quick summary of [CenterNet](centernet.md) monocular 3D object detection.
13
+
14
+
-[CenterNet](centernet.md) predicts 2D bbox center and uses it as 3D bbox center.
15
+
-[SMOKE](smoke.md) predicts projected 3D bbox center.
16
+
-[KM3D-Net](km3d_net.md) and [Center3D](center3d.md) predict 2D bbox center and offset from projected 3D bbox center.
11
17
12
18
#### Key ideas
19
+
- 2D and projected 3D center are different
20
+
- the gap decreases for faraway objects and which appear in the center area of the image plane.
21
+
- The gap becomes significant for objects that are close to the camera or on the image boundary.
13
22
- LID (linear increasing discretization)
14
-
- The SID (space-increasing discretization) approach used by [DORN](dorn.md) gives too many bins in the nearby range.
15
-
- DepJoint
23
+
- The SID (space-increasing discretization) approach used by [DORN](dorn.md) gives too dense bins in the unnecessary nearby range.
24
+
- The length of the bins increases linearly in LID (and log-wise in SID).
25
+
- [DORN](dorn.md) counts the number of bins with proba > 0.5 as ordinal label and use the median value of that bins the estimated depth in meters.
26
+
- LID also uses a regression bit to predict the residual value. --> This is very important to ensure good depth estimation as shown in the ablation study.
27
+
- DepJoint: piece wise depth prediction
16
28
- Breaking the distance into two bins (either overlapping or back-to-back bins)
29
+
- Eigen's exponential transformation of distance: $\Phi (d) = e ^ {-d}$.
30
+
- This has very good accuracy in close range, but not so in distance range
31
+
- Augment the prediction for faraway objects by also predicting $d' = d_{max} - d$. Then during inference, uses the weighted prediction of the two prediction.
32
+
- The bin breakdown is controlled by two hyper parameters. The bins can have overlap or back-to-back.
17
33
18
34
#### Technical details
19
-
-Summary of technical details
35
+
-RA (reference area) solves the issue of lack of supervision for attribute prediction. Not only the GT center point contribute to the attribute prediction losses, but a dilated support region is used to predict all the attribute. --> this is inspired by the support region in [SS3D](ss3d.md).
20
36
21
37
#### Notes
22
38
- Questions and notes on how to improve/revise the current work
Copy file name to clipboardExpand all lines: paper_notes/centernet.md
+6Lines changed: 6 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -9,6 +9,12 @@ CenterNet is a very generic object detection framework that can be used for 2D o
9
9
10
10
[FCOS](fcos.md) regressed distances to four edges, while [CenterNet](centernet.md) only regresses width and height. The [FCOS](fcos.md) formulation is more general as it can handle amodal bbox cases (the object center may not be the center of bbox).
11
11
12
+
A quick summary of [CenterNet](centernet.md) monocular 3D object detection.
13
+
14
+
-[CenterNet](centernet.md) predicts 2D bbox center and uses it as 3D bbox center.
15
+
-[SMOKE](smoke.md) predicts projected 3D bbox center.
16
+
-[KM3D-Net](km3d_net.md) and [Center3D](center3d.md) predict 2D bbox center and offset from projected 3D bbox center.
17
+
12
18
13
19
#### Key ideas
14
20
- Other properties, such as object size, dimension, 3D extent, orientation, and pose are regressed directly from image features at the center location.
Copy file name to clipboardExpand all lines: paper_notes/smoke.md
+8-1Lines changed: 8 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -7,8 +7,14 @@ tl;dr: Mono3D based on [CenterNet](centernet.md) and [monoDIS](monodis.md).
7
7
#### Overall impression
8
8
The paper is a solid engineering paper as an extension to [CenterNet](centernet.md), similar to [MonoPair](monopair.md). It does not have a lot of new tricks. It is similar to the popular solutions to the [Kaggle mono3D competition](https://www.kaggle.com/c/pku-autonomous-driving).
9
9
10
+
A quick summary of [CenterNet](centernet.md) monocular 3D object detection.
11
+
12
+
-[CenterNet](centernet.md) predicts 2D bbox center and uses it as 3D bbox center.
13
+
-[SMOKE](smoke.md) predicts projected 3D bbox center.
14
+
-[KM3D-Net](km3d_net.md) and [Center3D](center3d.md) predict 2D bbox center and offset from projected 3D bbox center.
15
+
10
16
#### Key ideas
11
-
- SMOKE eliminates 2D object detection altogether. Instead of predicting the 2d bbox center and the 3d/2d center offset, SMOKE predicts 3D center directly.
17
+
- SMOKE eliminates 2D object detection altogether. Instead of predicting the 2d bbox center and the 3d/2d center offset, SMOKE predicts 3D center directly. --> This may have some issues as for cars heavily truncated, the 3D center may not be inside the image.
12
18
- Rather than regressing the 7 DoF variables with separate loss functions, SMOKE transform the variables into 8 corner representation of 3D boxes and regress them with **a unified loss functions**. This is a nice way to implicitly weigh the loss functions. (cf [To learn or not to learn](to_learn_or_not.md) which regresses an essential matrix.)
13
19
-**Disentangles loss** from [monoDIS](monodis.md) groups the 8 parameters into 3 groups. In each group, use the prediction in that group and the gt from other groups to lift to 3D and calculate overall loss. The final loss is an unweighted averaged of the loss from different group.
14
20
- Classification
@@ -30,4 +36,5 @@ The paper is a solid engineering paper as an extension to [CenterNet](centernet.
30
36
31
37
#### Notes
32
38
-[Code on github](https://github.com/lzccccc/SMOKE)
39
+
- Need to implement the 2D center prediction and offset between 2D and 3D to recover heavily truncated 3D bbox. This method can be extended to other scenarios where the predicted location goes out of a ROI. See [KM3D-Net](km3d_net.md) and [Center3D](center3d.md).
Copy file name to clipboardExpand all lines: paper_notes/ss3d.md
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -20,7 +20,7 @@ This paper also demonstrates **the possibility to directly regress the distance
20
20
- The 26 numbers can also be trained to fit 3D IoU, but the 26 numbers need to be fitted to a valid 3D bbox online. This requires some complex manipulation of gradient.
21
21
22
22
#### Technical details
23
-
- All pixels in the support (central 20% of bbox) is responsible for detecting the bounding box sizes. Thus NMS is needed to find local optimum. The 26 numbers (from 26 channels most likely) associated with the local optimum point is used to predict the 3D box.
23
+
- All pixels in the support (central 20% of bbox) is responsible for detecting the bounding box sizes. Thus NMS is needed to find local optimum. The 26 numbers (from 26 channels most likely) associated with the local optimum point is used to predict the 3D box. --> This is also used in [Center3D](center3d.md).
24
24
25
25
#### Notes
26
26
- Questions and notes on how to improve/revise the current work
0 commit comments