Convolutional upsampling of DINOv2 [1] features for weakly supervised segmentation [2]. In short, we train a bisected U-net to upsample low-resolution features by targeting high-resolution ground truths generated from other methods (i.e, FeatUp [3], LoftUp [4]) which may not scale as nicely in time / memory / generalisability as CNNs. They can they be used in Weka-style [5] feature-based / interactive / weakly supervised segmentation. Check out the examples to get started!
Note: you'll need nvcc
installed to install flash-attn. See install/INSTALL_NVCC.md
.
Either
git clone https://github.com/tldr-group/vulture
conda env create -f install/conda.yml
conda activate vulture
pip install . --no-deps
# Force MAX_JOBS to avoid FA hogging all the cores; --no-build-isolation s.t it can find CUDA & nvcc
MAX_JOBS=4 pip install --no-build-isolation flash-attn
or
git clone https://github.com/tldr-group/vulture
python -m venv .venv
source .venv/bin/activate
pip install .
MAX_JOBS=4 pip install --no-build-isolation flash-attn
python apply.py
or
git clone https://github.com/tldr-group/vulture
curl -LsSf https://astral.sh/uv/install.sh | sh # install uv
uv sync
# update .env if we need to change CUDA_HOME / LD_LIBRARY_PATH later
uv run --env-file install/.env -- pip install --no-build-isolation flash-attn
uv run apply.py
The conda path comes with all the 'paper' dependencies (needed to reproduce the figures), if you want those with pip/uv/etc, run
pip install '.[paper]'
# OR
uv sync --extra paper
In an 'Anaconda Powershell Prompt' (search in start menu)
conda env create -f install\conda.yml
conda activate vulture
pip install -e . --no-deps
or (recommended)
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
uv sync
Note: flash-attn doesn't build/requires extra steps to build on windows.
You need to perform the following steps:
- Clone the repo
- Copy the trained models from this folder into a new
trained_models
folder (note: you'll need to download the checkpoints first) - Activate your environment (conda/pip/uv) that has
vulture
installed - Run the GUI
git clone https://github.com/tldr-group/interactive-seg-gui
mkdir interactive-seg-gui/trained_models
cp -r trained_models/ interactive-seg-gui/trained_models/
cd interactive-seg-gui
# activate your venv with vulture installed in it either via conda or .venv and run
python main.py
# OR using uv:
uv run --project ..\vulture\ main.py
git clone https://github.com/tldr-group/interactive-seg-gui
mkdir -p interactive-seg-gui\trained_models
Copy-Item trained_models\ interactive-seg-gui\trained_models\ -Recurse
Set-Location interactive-seg-gui
python main.py
Checkpoints are available from huggingface, either download them into the trained_models/
directory or run
chmod +x install/download_chkpoints.sh
./install/download_chkpoints.sh
Windows:
.\install\download_chkpoints.ps1
examples/ # example notebooks for usage
│ └─ ...
paper_figures/ # notebooks to generate the paper figures
│ └─ fig_data/ # data needed for the notebooks, downloaded from zenodo
│ └─ ...
trained_models/ # model checkpoints (weights and model configs inside)
│ └─ fit_reg_f32.pth # downloaded with `install/download_chkpoints.sh`
│ └─ ...
install/
│ └─ conda.yml # conda env file
│ └─ download_chkpoints.sh # get checkpoints from gdrive
vulture/
├─ comparisons/ # wrapper code for other upsamplers / segmentation models
│ └─ ...
├─ datasets/
│ └─ lr_hr_embedding_dataset.py
├─ models/
│ ├─ configs/ # JSONs for training run parameters
│ ├─ external/ # external models used
│ │ ├─ autoencoder.py # compresses low-res DINOv2 features
│ │ ├─ online_denoiser.py # 'denoises' low-res ViT features - from [6]
│ │ └─ vit_wrapper.py # wrapper around DINOv2 for low-res features
│ ├─ layers.py # (u-net) layer components for our down-/upsampler
│ └─ model.py # down-/upsampler architecture
├─ train/ # training script
│ └─ train_upsampler.py
├─ main.py # E2E 'CompleteUpsampler' class + helper functions
├─ feature_prep.py # FeatUp style feature preprocessing (PCA)
└─ utils.py # plotting etc
@article{docherty2025maybedontneedunet,
title={Maybe you don't need a U-Net: convolutional feature upsampling for materials micrograph segmentation},
author={Ronan Docherty and Antonis Vamvakeros and Samuel J. Cooper},
year={2025},
journal={arXiv prerprint, arXiv:2508.21529}
eprint={2508.21529},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2508.21529},
}
- [1] M. Oquab et al., "DINOv2: Learning Robust Visual Features without Supervision" (2023), ICLR, https://arxiv.org/abs/2304.07193
- [2] R. Docherty et al., "Upsampling DINOv2 features for unsupervised vision tasks and weakly supervised materials segmentation" (2024), Neurips AI4Mat workshop, https://arxiv.org/abs/2410.19836
- [3] S. Fu et al., "FeatUp: A Model-Agnostic Framework for Features at Any Resolution" (2024), ICLR, https://arxiv.org/abs/2403.10516
- [4] H. Huang et al., "LoftUp: A Coordinate-Based Feature Upsampler for Vision Foundation Models", ICCV, https://arxiv.org/abs/2504.14032
- [5] I. Arganda-Carreras et al., " Trainable Weka Segmentation: a machine learning tool for microscopy pixel classification", Bioinformatics (2017), https://academic.oup.com/bioinformatics/article/33/15/2424/3092362
- [6] J. Yang et al., "Denoising Vision Transformers" (2024), ECCV, https://arxiv.org/abs/2401.02957