This repo contains the dataset download and processing code used in
Sekai: A Video Dataset towards World Exploration
Zhen Li, Chuanhao Li, Xiaofeng Mao, Shaoheng Lin, Ming Li, Shitian Zhao, Zhaopan Xu, Xinyue Li, Yukang Feng, Jianwen Sun, Zizhen Li, Fanrui Zhang, Jiaxin Ai, Zhixiang Wang, Yuwei Wu, Tong He, Jiangmiao Pang, Yu Qiao, Yunde Jia, Kaipeng Zhang
Shanghai AI Laboratory, Beijing Institute of Technology
- [2025.07.10] We're thrilled by the community's enthusiasm — Dataset Access Assistance is now updated!
- [2025.06.25] Video download and clip extraction tools for Sekai-Real are now available!
- [2025.06.19] We have released our paper — discussions and feedback are warmly welcome!
TL;DR We present Sekai (せかい, “world” in Japanese), a high-quality egocentric video dataset for immersive world exploration and generation. Sekai includes over 5000 hours of YouTube videos and game footage with rich annotations. It features:
- 📹 Diverse, high-resolution videos (720p)
- 🌍 Coverage of 100+ countries and 750+ cities
- 🚶♂️ First-person and 🛸 drone perspectives
- 🕒 Long sequences (≥ 60s) for real-world continuity
- 🏷️ Detailed annotations: location, scene, weather, crowd, captions, and camera trajectories
Sekai supports tasks like video understanding, navigation, and video-audio co-generation.
The Sekai dataset includes Sekai-Real from YouTube videos and Sekai-Game from video game videos. The camera trajectories for both parts are represented using an intrinsic matrix and per-frame extrinsic matrices, all of which are normalized.
If you confirm that you are experiencing insurmountable difficulties in obtaining Sekai(-Real) dataset through the following steps, please fill out this form. We’ll review your request shortly and send you the details.
We provide a comprehensive toolchain for downloading original videos and extracting video clips.
Split | Annotation | Camera Trajectories | # Source Videos | # Samples | Video Duration | Storage Space |
---|---|---|---|---|---|---|
Sekai-Real-Walking | Huggingface | Huggingface+ | 6552 | 299173 | 4986h | ~10TB |
Sekai-Real-Walking-HQ* | Huggingface | Huggingface | 3879 | 18208 | 304h | ~600GB |
Sekai-Real-Drone | Huggingface | Huggingface | 69 | 23912 | 65h | ~140GB |
* denotes the best-of-the-best videos sampled in consideration of the computational resources for training.
+ denotes that a subset of videos was annotated with camera trajectories. Refer to the paper for more details.
The videos and corresponding camera trajectory files of Sekai-Game is hosted on Hugging Face. Click the link to view and download.
Split | Annotation | Videos & Camera Trajectories |
---|---|---|
Sekai-Game-Walking | Huggingface | part1 and part2 |
Sekai-Game-Drone | Huggingface | here |
- Tools for Sekai-Real video download and clip extraction.
- Modified MegaSam used in Sekai.
See license.
If you find this project helpful, please consider citing:
@article{li2025sekai,
title={Sekai: A Video Dataset towards World Exploration},
author={Zhen Li and Chuanhao Li and Xiaofeng Mao and Shaoheng Lin and Ming Li and Shitian Zhao and Zhaopan Xu and Xinyue Li and Yukang Feng and Jianwen Sun and Zizhen Li and Fanrui Zhang and Jiaxin Ai and Zhixiang Wang and Yuwei Wu and Tong He and Jiangmiao Pang and Yu Qiao and Yunde Jia and Kaipeng Zhang},
journal={arXiv preprint arXiv:2506.15675},
year={2025}
}