-
Notifications
You must be signed in to change notification settings - Fork 575
Add EgoExo Forge and VistaDream examples #11883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add EgoExo Forge and VistaDream examples #11883
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! Thanks for opening this pull request.
Because this is your first time contributing to this repository, make sure you've read our Contributor Guide and Code of Conduct.
|
|
||
| This is an external example. Check the [repository](https://github.com/rerun-io/vistadream) for more information. | ||
|
|
||
| **Requires: Linux** with **NVIDIA GPU** (tested with CUDA 12.9) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| **Requires: Linux** with **NVIDIA GPU** (tested with CUDA 12.9) | |
| **Requires**: Linux with an NVIDIA GPU (tested with CUDA 12.9) |
| @@ -0,0 +1,35 @@ | |||
| <!--[metadata] | |||
| title = "EgoExo Forge" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| title = "EgoExo Forge" | |
| title = "EgoExo Forge" <!-- NOLINT --> |
I think ignoring the lint here is fine, as it's the name of the actual project.
I hope the lint can be skipped like this.
|
|
||
| https://vimeo.com/1134260310?autoplay=1&loop=1&autopause=0&background=1&muted=1&ratio=2386:1634 | ||
|
|
||
| A comprehensive collection of datasets and tools for egocentric and exocentric human activity understanding, featuring hand-object interactions, manipulation tasks, and multi-view recordings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| A comprehensive collection of datasets and tools for egocentric and exocentric human activity understanding, featuring hand-object interactions, manipulation tasks, and multi-view recordings. | |
| A collection of datasets and tools for egocentric and exocentric human activity understanding, featuring hand-object interactions, manipulation tasks, and multi-view recordings. |
|
|
||
| ## Background | ||
|
|
||
| EgoExo Forge provides a consistent labeling scheme and data layout for multiple different egocentric and exocentric human datasets, that have different sensor configurations and annotations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| EgoExo Forge provides a consistent labeling scheme and data layout for multiple different egocentric and exocentric human datasets, that have different sensor configurations and annotations. | |
| EgoExo Forge provides a consistent labeling scheme and data layout across multiple egocentric and exocentric human datasets with varying sensor configurations and annotations. |
I think this is a bit easier to read.
| * [Assembly101](https://assembly-101.github.io/) from Meta. A procedural activity dataset with 4321 multi-view videos of people assembling and disassembling 101 take-apart toy vehicles, featuring rich variations in action ordering, mistakes, and corrections. | ||
| * [HO-Cap](https://irvlutd.github.io/HOCap/) from Nvidia and the University of Texas at Dallas. A dataset for 3D reconstruction and pose tracking of hands and objects in videos, featuring humans interacting with objects for various tasks including pick-and-place actions and handovers. | ||
| * [EgoDex](https://arxiv.org/abs/2505.11709) from Apple. The largest and most diverse dataset of dexterous human manipulation with 829 hours of egocentric video and paired 3D hand tracking, covering 194 different tabletop tasks with everyday household objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * [Assembly101](https://assembly-101.github.io/) from Meta. A procedural activity dataset with 4321 multi-view videos of people assembling and disassembling 101 take-apart toy vehicles, featuring rich variations in action ordering, mistakes, and corrections. | |
| * [HO-Cap](https://irvlutd.github.io/HOCap/) from Nvidia and the University of Texas at Dallas. A dataset for 3D reconstruction and pose tracking of hands and objects in videos, featuring humans interacting with objects for various tasks including pick-and-place actions and handovers. | |
| * [EgoDex](https://arxiv.org/abs/2505.11709) from Apple. The largest and most diverse dataset of dexterous human manipulation with 829 hours of egocentric video and paired 3D hand tracking, covering 194 different tabletop tasks with everyday household objects. | |
| * [Assembly101](https://assembly-101.github.io/): A procedural activity dataset with 4321 multi-view videos of people assembling and disassembling 101 take-apart toy vehicles, featuring rich variations in action ordering, mistakes, and corrections. | |
| * [HO-Cap](https://irvlutd.github.io/HOCap/): A dataset for 3D reconstruction and pose tracking of hands and objects in videos, featuring humans interacting with objects for various tasks including pick-and-place actions and handovers. | |
| * [EgoDex](https://arxiv.org/abs/2505.11709): The largest and most diverse dataset of dexterous human manipulation with 829 hours of egocentric video and paired 3D hand tracking, covering 194 different tabletop tasks with everyday household objects. |
Do we need to mention the authors in the same line when we link to the page already? And a colon before the description is a good idea I think.
| Make sure you have the [Pixi package manager](https://pixi.sh/latest/#installation) installed and run | ||
|
|
||
| ```sh | ||
| git clone https://github.com/rerun-io/egoexo-forge.git | ||
| cd egoexo-forge | ||
| pixi run app | ||
| ``` | ||
|
|
||
| You can try the example on a HuggingFace space [here](https://pablovela5620-egoexo-forge-viewer.hf.space/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Make sure you have the [Pixi package manager](https://pixi.sh/latest/#installation) installed and run | |
| ```sh | |
| git clone https://github.com/rerun-io/egoexo-forge.git | |
| cd egoexo-forge | |
| pixi run app | |
| ``` | |
| You can try the example on a HuggingFace space [here](https://pablovela5620-egoexo-forge-viewer.hf.space/). | |
| You can try the example on a HuggingFace space [here](https://pablovela5620-egoexo-forge-viewer.hf.space/). | |
| Or locally, make sure you have the [Pixi package manager](https://pixi.sh/latest/#installation) installed and run | |
| ```sh | |
| git clone https://github.com/rerun-io/egoexo-forge.git | |
| cd egoexo-forge | |
| pixi run app |
I think mentioning the HF space first makes sense, because then readers can immediately click and try it out. At the end it's a bit hidden
| VistaDream addresses the challenge of 3D scene reconstruction from a single image through a novel two-stage pipeline: | ||
|
|
||
| 1. **Coarse 3D Scaffold Construction**: Creates a global scene structure by outpainting image boundaries and estimating depth maps. | ||
| 2. **Multi-view Consistency Sampling (MCS)**: Uses iterative diffusion-based RGB-D inpainting with multi-view consistency constraints to generate high-quality novel views. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 2. **Multi-view Consistency Sampling (MCS)**: Uses iterative diffusion-based RGB-D inpainting with multi-view consistency constraints to generate high-quality novel views. | |
| 2. **Multi-view Consistency Sampling**: Uses iterative diffusion-based RGB-D inpainting with multi-view consistency constraints to generate high-quality novel views. |
MCS is never referenced again, so no need to introduce it.
|
|
||
| **Requires: Linux** with **NVIDIA GPU** (tested with CUDA 12.9) | ||
|
|
||
| Make sure you have the [Pixi package manager](https://pixi.sh/latest/#installation) installed and run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: missing colon
| Make sure you have the [Pixi package manager](https://pixi.sh/latest/#installation) installed and run | |
| Make sure you have the [Pixi package manager](https://pixi.sh/latest/#installation) installed and run: |
| 1. **Coarse 3D Scaffold Construction**: Creates a global scene structure by outpainting image boundaries and estimating depth maps. | ||
| 2. **Multi-view Consistency Sampling (MCS)**: Uses iterative diffusion-based RGB-D inpainting with multi-view consistency constraints to generate high-quality novel views. | ||
|
|
||
| The framework integrates multiple state-of-the-art models: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't count Rerun as a model, although it's state of the art 🤓 How about just:
| The framework integrates multiple state-of-the-art models: | |
| The framework utilizes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this list carries the same information as the first paragraph of this README. Do we need both?
What
This adds the two external examples EgoExo Forge and VistaDream.
egoexo_forge.mov
vistadream.mov
Checklist
To run all checks from
main, comment on the PR with@rerun-bot full-check.