Skip to content

Commit 2bc6008

Browse files
committed
Update notes for MPDM series
1 parent da243bd commit 2bc6008

File tree

6 files changed

+37
-31
lines changed

6 files changed

+37
-31
lines changed

README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ I regularly update [my blog in Toward Data Science](https://medium.com/@patrickl
3434
- [Multimodal Regression](https://towardsdatascience.com/anchors-and-multi-bin-loss-for-multi-modal-target-regression-647ea1974617)
3535
- [Paper Reading in 2019](https://towardsdatascience.com/the-200-deep-learning-papers-i-read-in-2019-7fb7034f05f7?source=friends_link&sk=7628c5be39f876b2c05e43c13d0b48a3)
3636

37-
## 2024-06 (7)
37+
## 2024-06 (8)
3838
- [LINGO-1: Exploring Natural Language for Autonomous Driving](https://wayve.ai/thinking/lingo-natural-language-autonomous-driving/) [[Notes](paper_notes/lingo1.md)] [Wayve, open-loop world model]
3939
- [LINGO-2: Driving with Natural Language](https://wayve.ai/thinking/lingo-2-driving-with-language/) [[Notes](paper_notes/lingo2.md)] [Wayve, closed-loop world model]
4040
- [OpenVLA: An Open-Source Vision-Language-Action Model](https://arxiv.org/abs/2406.09246) [open source RT-2]
@@ -49,6 +49,7 @@ I regularly update [my blog in Toward Data Science](https://medium.com/@patrickl
4949
- [trajdata: A Unified Interface to Multiple Human Trajectory Datasets](https://arxiv.org/abs/2307.13924) <kbd>NeurIPS 2023</kbd> [Marco Pavone, Nvidia]
5050
- [Optimal Vehicle Trajectory Planning for Static Obstacle Avoidance using Nonlinear Optimization](https://arxiv.org/abs/2307.09466) [Xpeng]
5151
- [Jointly Learnable Behavior and Trajectory Planning for Self-Driving Vehicles](https://arxiv.org/abs/1910.04586) [[Notes](paper_notes/joint_learned_bptp.md)] <kbd>IROS 2019 Oral</kbd> [Uber ATG, behavioral planning, motion planning]
52+
- [HiVT: Hierarchical Vector Transformer for Multi-Agent Motion Prediction]
5253
- [Enhancing End-to-End Autonomous Driving with Latent World Model](https://arxiv.org/abs/2406.08481)
5354
- [OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments](https://arxiv.org/abs/2312.09243) [Jiwen Lu]
5455
- [RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision](https://arxiv.org/abs/2309.09502) <kbd>ICRA 2024</kbd>
@@ -246,6 +247,8 @@ I regularly update [my blog in Toward Data Science](https://medium.com/@patrickl
246247
- [BEVDepth: Acquisition of Reliable Depth for Multi-view 3D Object Detection](https://arxiv.org/abs/2206.10092) [[Notes](paper_notes/bevdepth.md)] [BEVNet, NuScenes SOTA, Megvii]
247248
- [CVT: Cross-view Transformers for real-time Map-view Semantic Segmentation](https://arxiv.org/abs/2205.02833) [[Notes](paper_notes/cvt.md)] <kbd>CVPR 2022 oral</kbd> [UTAustin, Philipp]
248249
- [Wayformer: Motion Forecasting via Simple & Efficient Attention Networks](https://arxiv.org/abs/2207.05844) [[Notes](paper_notes/wayformer.md)] [Behavior prediction, Waymo]
250+
- [Hivt: Hierarchical vector transformer for multi-agent motion prediction](https://openaccess.thecvf.com/content/CVPR2022/papers/Zhou_HiVT_Hierarchical_Vector_Transformer_for_Multi-Agent_Motion_Prediction_CVPR_2022_paper.pdf) <kbd>CVPR 2022</kbd> [Zikang Zhou, agent-centric, motion prediction]
251+
- [QCNet: Query-Centric Trajectory Prediction](https://openaccess.thecvf.com/content/CVPR2023/papers/Zhou_Query-Centric_Trajectory_Prediction_CVPR_2023_paper.pdf) <kbd>CVPR 2023</kbd> [Zikang Zhou, scene-centric, motion prediction]
249252

250253
## 2022-06 (3)
251254
- [BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object Detection](https://arxiv.org/abs/2203.17054) [[Notes](paper_notes/bevdet4d.md)] [BEVNet]
@@ -1519,7 +1522,6 @@ Feature Extraction](https://arxiv.org/abs/2010.02893) [monodepth, semantics, Nav
15191522
- [SAM: Segment Anything](https://arxiv.org/abs/2304.02643) [FAIR]
15201523
- [GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding](https://arxiv.org/abs/2303.11325)
15211524
- [Motion Prediction using Trajectory Sets and Self-Driving Domain Knowledge](https://arxiv.org/abs/2006.04767) [Encode Road requirement to prediction]
1522-
- [Hivt: Hierarchical vector transformer for multi-agent motion prediction](https://openaccess.thecvf.com/content/CVPR2022/papers/Zhou_HiVT_Hierarchical_Vector_Transformer_for_Multi-Agent_Motion_Prediction_CVPR_2022_paper.pdf) <kbd>CVPR 2022</kbd>
15231525
- [Transformer Feed-Forward Layers Are Key-Value Memories](https://arxiv.org/abs/2012.14913) <kbd>EMNLP 2021</kbd>
15241526
- [BEV-LaneDet: a Simple and Effective 3D Lane Detection Baseline](https://arxiv.org/abs/2210.06006) <kbd>CVPR 2023</kbd> [BEVNet]
15251527
- [Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception](https://arxiv.org/abs/2303.05970) [BEVNet, megvii]

learning_pnc/pnc_notes.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Introduction
44
- The notes were taken for the [Prediction, Decision and Planning for Autonomous driving](https://www.shenlanxueyuan.com/course/671) from Shenlan Xueyuan mooc course.
5-
- The lecturer is [Wenchao Ding](website: https://wenchaoding.github.io/personal/index.html), former engineer at Huawei and not AP at Fudan University.
5+
- The lecturer is [Wenchao Ding](website: https://wenchaoding.github.io/personal/index.html), former engineer at Huawei and now AP at Fudan University.
66

77
# Model-based Prediction
88
## Overview
@@ -463,11 +463,13 @@
463463

464464
- Continuous state with belief MDP
465465
- Put complexity into state transition, and solve with ML.
466-
- Normal solution: MPC, with limited lookup ahead (forward simulation).
467-
- MCTS
466+
- Normal solution: MPC-like , with limited lookup ahead (forward simulation).
467+
- MCTS with forward simulation
468+
- The complexity lies in multi-agent interaction roll out, and branching out
469+
- MPC: receding horizon planning
468470
- Plan in MDP
469-
- Assuming the most likely belief is the real state. In the ULT case, assuming the most likely behavior of the other car to be reality, and act accordingly.
470-
- Cannot actively collect information. This is actually the charm of POMDP’s intelligence. POMDP will lead to some action that actively collects information.
471+
- Approximate POMDP as MDP. Assuming the most likely belief argmax(b) is the real state. In the ULT case, assuming the most likely behavior of the other car to be reality, and act accordingly.
472+
- MDP cannot actively collect information. This is actually the charm of POMDP’s intelligence. POMDP will lead to some action that actively collects information.
471473

472474
## EPSILON
473475

paper_notes/eudm.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,9 @@ In order to make POMDP more tractable it is essential to incorporate domain know
1111

1212
In EUDM, ego behavior is allowed to change, allowing more flexible decision making than MPDM. This allows EUDM can make a lane-change decision even before passing the blocking vehicle (accelerate, then lane change).
1313

14+
![](https://pic3.zhimg.com/80/v2-a7778368cbf39f083ef5ad5a2f931a4e_1440w.webp)
15+
16+
1417
EUDM does guided branching in both action (of ego) and intention (of others).
1518

1619
EUDM couples prediction and planning module.
@@ -20,7 +23,7 @@ It is further improved by [MARC](marc.md) where it considers risk-aware continge
2023
#### Key ideas
2124
- DCP-Tree (domain specific closed-loop policy tree), ego-centric
2225
- Guided branching in action space
23-
- Each trace only contains ONE change of action (more flexible than MPDM but still manageable).
26+
- Each trace only contains ONE change of action (more flexible than MPDM but still manageable). This is a tree with pruning mechanism built-in. [MCDM](mcdm.md) essentially has a much more aggressive pruning as only one type of action is allowed (KKK, RRR, LLL, etc)
2427
- Each semantic action is 2s, 4 levels deep, so planning horizon of 8s.
2528
- CFB (conditional focused branching), for other agents
2629
- conditioned on ego intention

paper_notes/marc.md

Lines changed: 19 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -7,18 +7,25 @@ tl;dr: Generating safe and non-conservative behaviors in dense dynamic environme
77
#### Overall impression
88
This is a continuation of work in [MPDM](mpdm.md) and [EUDM](eudm.md). It introduces dynamic branching based on scene-level divergence, and risk-aware contingency planning based on user-defined risk tolerance.
99

10-
POMDP provides a theoretically sounds framework to handle dynamic interaction, but it suffers from curse of dimensionality and making it infeasible to solve in realtime.
10+
POMDP provides a theoretically sounds framework to handle dynamic interaction, but it suffers from curse of dimensionality and making it infeasible to solve in realtime.
1111

12-
* [MPDM](mpdm.md) prunes belief trees heavily and decomposes POMDP into a limited number of closed-loop policy evaluations. MPDM has only one ego policy over planning horizon (8s). Mainly BP.
13-
* EUDM improves by having multiple (2) policy in planning horizon, and performs DCP-Tree and CFB (conditoned focused branching) to use domain specific knowledge to guide branching in both action and intention space. Mainly BP.
14-
* MARC performs risk-aware contigency planning based on multiple scenarios. And it combines BP and MP.
15-
* All previous MPDM-like methods consider the optimal policy and single trajectory generation over all scenarios, resulting in lack of gurantee of policy consistency and loss of multimodality info.
12+
[MPDM](mpdm.md) and [EUDM](eudm.md) are mainly BP models, but [MARC](marc.md) combines BP and MP.
13+
14+
belief trees heavily and decomposes POMDP into a limited number of closed-loop policy evaluations.
15+
16+
For the policy tree (or policy-conditioned scenario tree) building, we can see how the tree got built with more and more careful pruning process with improvements from different works.
17+
18+
* [MPDM](mpdm.md) is the pioneering work prunes belief trees heavily and decomposes POMDP into a limited number of closed-loop policy evaluations. MPDM has only one ego policy over planning horizon (8s).
19+
* [MPDM](mpdm.md) iterates over all ego policies, and uses the most likely one policy given road structure and pose of vehicle.
20+
* [MPDM2](mpdm2.md) iterates over all ego policies, and iterate over (a set of) possible policies of other agents predicted by a motion prediction model.
21+
* [EUDM](eudm.md) itrates all ego policies, and then iterate over all possible policies of other agents to identify **critical scenarios** (CFB, conditioned filtered branching). Guide branching in both action and intention space. [EPSILON](epsilon.md) used the same method.
22+
* [MARC](marc.md) iterates all ego policies, iterates over a set of predicted policies of other agents, identifies **key agents** (and ignores other agents even in critical scenarios).
23+
24+
25+
All previous MPDM-like methods consider the optimal policy and single trajectory generation over all scenarios, resulting in lack of gurantee of policy consistency and loss of multimodality info.
1626

1727
#### Key ideas
18-
- Planning is hard from uncertainty and interaction (inherently multimodal intentions).
19-
- For interactive decision making, MDP or POMDP are mathematically rigorous formulations for decision processes in stochastic environments.
20-
- For static (non-interactive) decision making, the normal trioka of planninig (sampling, searching, optimization) would suffice.
21-
- *Contigency planning* generates deterministic behavior for mulutiple future scenarios. In other words, it plans a short-term trajectory that ensures safety for all potential scenarios.
28+
- *Contigency planning* generates deterministic behavior for mulutiple future scenarios. In other words, it plans a short-term trajectory that ensures safety for all potential scenarios. --> This is very similar to the idea of *backup plan* in [EPSILON](epsilon.md).
2229
- Scenario tree construction
2330
- generating policy-conditioned critical scenario sets via closed-loop forward simulation (similar to CFB in EUDM?).
2431
- building scenario tree with scene-level divergence assessment. Determine the latest timestamp at which the scenario diverge. Delaying branching time as much as possble.
@@ -35,7 +42,9 @@ POMDP provides a theoretically sounds framework to handle dynamic interaction, b
3542
- with better effiency (avg speed) and riding comfort (max decel/acc).
3643

3744
#### Technical details
38-
- Summary of technical details, such as important training details, or bugs of previous benchmarks.
45+
- Planning is hard from uncertainty and interaction (inherently multimodal intentions).
46+
- For interactive decision making, MDP or POMDP are mathematically rigorous formulations for decision processes in stochastic environments.
47+
- For static (non-interactive) decision making, the normal trioka of planninig (sampling, searching, optimization) would suffice.
3948

4049
#### Notes
4150
- Questions and notes on how to improve/revise the current work

paper_notes/mpdm.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,5 +33,4 @@ Despite simple design, MPDM is a pioneering work in decision making, and improve
3333
- Summary of technical details, such as important training details, or bugs of previous benchmarks.
3434

3535
#### Notes
36-
- Questions and notes on how to improve/revise the current work
37-
36+
- The white paper from [May Mobility](https://maymobility.com/resources/autonomy-at-scale-white-paper/) explains the idea with more plain language and examples.

paper_notes/mpdm2.md

Lines changed: 2 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -5,18 +5,9 @@ _June 2024_
55
tl;dr: Improvement of MPDM in predicting the intention of other vehicles.
66

77
#### Overall impression
8-
The majority is the same as the previous work [MPDM](mpdm.md).
9-
10-
For the policy tree (or policy-conditioned scenario tree) building, we can see how the tree got built with more and more careful pruning process with improvements from different works.
11-
12-
* [MPDM](mpdm.md) iterates over all ego policies, and uses the most likely one policy given road structure and pose of vehicle.
13-
* [MPDM2](mpdm2.md) iterates over all ego policies, and iterate over (a set of) possible policies of other agents predicted by a motion prediction model.
14-
* [EUDM](eudm.md) itrates all ego policies, and then iterate over all possible policies of other agents to identify **critical scenarios** (CFB, conditioned filtered branching). [EPSILON](epsilon.md) used the same method.
15-
* [MARC](marc.md) iterates all ego policies, iterates over a set of predicted policies of other agents, identifies **key agents** (and ignores other agents even in critical scenarios).
16-
17-
18-
![](https://pic3.zhimg.com/80/v2-a7778368cbf39f083ef5ad5a2f931a4e_1440w.webp)
8+
The majority is the same as the previous work [MPDM](mpdm.md). There is a follow up article on this as well [MPDM3](https://link.springer.com/article/10.1007/s10514-017-9619-z) which expands [MPDM2](mpdm2.md) with more experiments, but with the same methodology.
199

10+
So the main idea of MPDM is already covered in the original short paper [MPDM](mpdm.md).
2011

2112
#### Key ideas
2213
- Motion prediction of other agents with a classical ML methods (Maximum likelihood estimation).

0 commit comments

Comments
 (0)