Skip to content

SceneCOT/scenecot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

22 Commits
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

SceneCOT: Eliciting Grounded Chain-of-Thought Reasoning in 3D Scenes

ย  ย 
LEO Teaser
SceneCOT: We propose a Chain-of-Thought reasoning method in 3D scenes (SceneCOT), decoupling a complex reasoning task into simpler and manageable problems, and building corresponding visual clues based on multimodal expert modules. To our knowledge, this is the first attempt to successfully implement the COT technique for achieving human-like step-by-step reasoning for 3D scene understanding, where we show great potential in extending it to a wider range of 3D scene understanding scenarios.

SceneCOT Framework

LEO Teaser
SceneCOT achieves great performance on MSQA, and Beacon3D, demonstrating the effectiveness of our reasoning framework. Especially, our method significanlty enhances the performance on counting, the most challenging task in MSQA. Our method also significanlty outperforms previous methods by a large margin in Beacon3D.

๐Ÿ”ฅ News

  • [2025-6] We released the webpage of SceneCOT.

๐Ÿ“ TODO List

  • Arxiv paper
  • Evaluation code
  • Model weights
  • SceneCOT-185K dataset
  • Training code

BibTex

If you find our work helpful, please consider citing us:

@article{linghu2025scenecot,
  title={SceneCOT: Eliciting Grounded Chain-of-Thought Reasoning in 3D Scenes},
  author={Linghu, Xiongkun and Huang, Jiangyong and Zhu, Ziyu and Jia, Baoxiong and Huang, Siyuan},
  journal={arXiv preprint arXiv:2510.16714},
  year={2025}
}

About

A step-by-step reasoning framework for 3D scene understanding

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published