NaViT not really resolution agnostic? 🤔 #342

Sann5 · 2025-03-18T15:28:21Z

Sann5
Mar 18, 2025

Hot take here 🔥. NaViT may have allowed to handle images with varied aspect ratios but it did not fix handling arbitrary resolutions. For this, inter/extrapolation is still needed. Fractional Factorized positional embeddings (hight and width) are initialized as learnable 1-dimensional vectors of fixed size. So if one of the dimensions of the input image exceeds this fixed size there will be an indexing error. Maybe Im wrong, but this is what it looks like to me looking at some publicly available implementations 🤷🏻‍♂️. Would love some input on this, its driving me crazy 🤯.

Answered by dempsey-ryan

Mar 18, 2025

Agreed, it is not "truly" flexible to arbitrary image sizes. Image resolutions still need to be a multiple of the patch size. My strategy for this is zero-padding to multiples of the patch size, which seems to be a reasonable workaround, but if done on the fly (i.e. inside the torch Dataset getitem) it can add some overhead.

View full answer

dempsey-ryan · 2025-03-18T15:47:41Z

dempsey-ryan
Mar 18, 2025

Agreed, it is not "truly" flexible to arbitrary image sizes. Image resolutions still need to be a multiple of the patch size. My strategy for this is zero-padding to multiples of the patch size, which seems to be a reasonable workaround, but if done on the fly (i.e. inside the torch Dataset getitem) it can add some overhead.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

NaViT not really resolution agnostic? 🤔 #342

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

NaViT not really resolution agnostic? 🤔 #342

Uh oh!

Sann5 Mar 18, 2025

Replies: 1 comment

Uh oh!

dempsey-ryan Mar 18, 2025

Sann5
Mar 18, 2025

dempsey-ryan
Mar 18, 2025