Stabilising video depth #6

calledit · 2025-04-09T09:30:01Z

Hi to start, really great work with UniK3D!

I am trying to generate depth video here and it works great but i have issues with frame to frame stability. There is quite some variation(flicker) between each frame even when the real depth is more or less static. The issue is also visible in the UniK3D README banner.

Would it be possible to add some way to stabilise the output, In to the next release of the model?

One way could be to implement the option of having a low resolution depth map and a mask as an extra input ontop of the RGB input, 32x32 resolution would probably be plenty even 16x16 would probably be enough.

For video you could then take the depth output of the last frame and feed it in as a prompt together with a mask for dynamic areas, (a mask for parts in the input depth map that that should be ignored due to unknown movments)

If the camera rotates you can then use external basic camera tracking to cancel out the rotation in the depth prompt and mask. So the model would not need to care about that type of stuff.

This would also work great for a use case where you know the distance to a single thing in the frame. Then you can simply mask everything except that thing then specify its depth and the network would be able to work of that known depth. Say one knew the middle pixel was 25.4 meters away, such a feature would allow one to help the model out.

Does this sound like a reasonable idea? I understand this will require retraining the network (or at least the decoder) unless one wan't to add the depth input in to the encoder to.

lpiccinelli-eth · 2025-04-21T04:29:32Z

Hey, thanks for your explanation and suggestion!
What you suggest makes total sense, and we have an ongoing project that is targeting exactly this use case, eg, with previous depth and/or flow as additional information to stabilise depth 😅, and it actually stabilizes depth really well.
I will try to release it soon and add the news once it is released!

Dr0mp · 2025-04-28T19:12:11Z

Wouldnt an average between past "frame" and current "frame" have a stabilisation effect to the points position, also maybe from 1 to 4 buffered frames?

calledit · 2025-04-28T19:21:43Z

Wouldnt an average between past "frame" and current "frame" have a stabilisation effect to the points position, also maybe from 1 to 4 buffered frames?

Sure it would, but it would only be usable if your scene is completely static.
Or if you know what parts in the frame are static and mask everything not static. But that would still leave the dynamic parts of the image "unstable."

The point of using the model for this is that it will be able to stabilise the dynamic parts of the frame using "knowledge" about distances in the static parts of the frame.

Static meaning "background" or specifically things that don't move in the video.

Dynamic meaning stuff that does moves in the video. Like people, cars, animals and so on.

Dr0mp · 2025-04-28T19:35:35Z

This would assume static camera?

Oh i think i understand you approach, so mask the dynamic (based on a noise threshold to avoid false positives), add that over the static, static remains fixed dynamic is continously updated. Did i got that right?

And for pivoting the camera we would need some external tracking data(either generated or some camera sensors)?

While my approach will just settle down the jiggling overall, I think this could work toghether.
I am curios, because I work more with visual coding rather than python, and these tehniques, as long as they are coded in pixels, they can be done on the fly in touchdesigner at a prototyping level.

calledit · 2025-04-28T19:54:42Z

Oh i think i understand you approach, so mask the dynamic (based on a noise threshold to avoid false positives), add that over the static, static remains fixed dynamic is continously updated. Did i got that right?

Kind of like that, you would need to track the camera movement and account for it's movement by subtracting it from the static depth. But tracking the camera is easy and can be done pretty much flawlessly these days, using some AI powered tracker like mega-sam.

But i don't think trying this is worth it without underlying ML model support, It will never look good enough to be usable. It would be better to use one of the less accurate video models that already exist. Even if they are "less accurate" their true stable output will still beat any post processing you can apply to the frames to stabilise them.

calledit · 2025-05-11T14:59:29Z

If the camera rotates you can then use external basic camera tracking to cancel out the rotation in the depth prompt and mask. So the model would not need to care about that type of stuff.

Thought i could add some info on that.

To cancel out basic camera rotation you could use the tracking tools in metric_depth_video_toolbox that are based on cotracker3 to do this:

python track_points_in_video.py --color_video ~/input.mp4 --nr_iterations 4 --steps_bewtwen_track_init 30

The result is a tracking file called ~/input.mp4_tracking.json which contains a list of tracked points for the video.
To cancel out camera rotation you simply need to move each frame in the opposite direction of the average point movement for that frame. If the scene contains lots of movement you might want to mask that out. (but for many scenes that probably wont be needed. But should it be needed the metric DVT contains tools for that too)

lpiccinelli-eth mentioned this issue Apr 23, 2025

Not an issue just my tests(windows) #13

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stabilising video depth #6

Stabilising video depth #6

calledit commented Apr 9, 2025 •

edited

Loading

lpiccinelli-eth commented Apr 21, 2025

Uh oh!

Dr0mp commented Apr 28, 2025

Uh oh!

calledit commented Apr 28, 2025 •

edited

Loading

Uh oh!

Dr0mp commented Apr 28, 2025

Uh oh!

calledit commented Apr 28, 2025

Uh oh!

calledit commented May 11, 2025

Uh oh!

Stabilising video depth #6

Stabilising video depth #6

Comments

calledit commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

lpiccinelli-eth commented Apr 21, 2025

Uh oh!

Dr0mp commented Apr 28, 2025

Uh oh!

calledit commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dr0mp commented Apr 28, 2025

Uh oh!

calledit commented Apr 28, 2025

Uh oh!

calledit commented May 11, 2025

Uh oh!

calledit commented Apr 9, 2025 •

edited

Loading

calledit commented Apr 28, 2025 •

edited

Loading