Vox-MMSD: Voxel-wise Multi-scale and Multi-modal Self-Distillation for Self-supervised Brain Tumor Segmentation
This is the official code for Vox-MMSD: Voxel-wise Multi-scale and Multi-modal Self-Distillation for Self-supervised Brain Tumor Segmentation.
[2025-07] Our work has been accepted by Journal of Biomedical and Health Informatics (JBHI).
Our contributions are summarized as follows:
- We propose a self-supervised multi-modal pre-training framework for brain tumor segmentation, which can extract modality-invariant representations from multi-modal MRI scans for improved performance on small downstream segmentation datasets.
- To learn modality-invariant features, we introduce SiaBloMM to generate siamese blocks as model inputs, which introduces independent spatial and modality masking to the input during model pre-training.
- To enhance the voxel-level feature representation ability for segmentation, we propose ORMS that generates voxel pairs with perturbed local context and modality set, which is combined with multi-scale self-distillation for effectively learning both global and local contextual features.
Download the BraTS-GLI dataset from BraTS 2023, and put them in the ./BraTS-GLI/source_data/, use
python ./BraTS-GLI/create_dataset_csv.py
to preprocess the data and get .csv file for pre-training. Similarly, if you need to preprocess the downstream datasets, for example, BraTS-PED here, you can also follow this operation.
- Move into the Pymic-dev and install
cd PyMIC-dev
pip install -e .
- Move back to the Vox-MMSD dir and run pre-training command
cd ..
pymic_train ./BraTS-GLI/config/unet3d_voxmmsd.cfg
- Here we use BraTS-PED as the downstream dataset, use
pymic_train ./BraTS-PED/config/unet3d_baseline.cfg
to train a downstream segmentation model from scratch. Or use
pymic_train ./BraTS-PED/config/unet3d_voxmmsd.cfg
to train a downstream segmentation model with pre-trained model by Vox-MMSD.
Our codebase is built upon the Pymic, and refers to DINO and Vox2Vec.