-
Notifications
You must be signed in to change notification settings - Fork 32
Stabilising video depth #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hey, thanks for your explanation and suggestion! |
Wouldnt an average between past "frame" and current "frame" have a stabilisation effect to the points position, also maybe from 1 to 4 buffered frames? |
Sure it would, but it would only be usable if your scene is completely static. The point of using the model for this is that it will be able to stabilise the dynamic parts of the frame using "knowledge" about distances in the static parts of the frame. Static meaning "background" or specifically things that don't move in the video. Dynamic meaning stuff that does moves in the video. Like people, cars, animals and so on. |
This would assume static camera? Oh i think i understand you approach, so mask the dynamic (based on a noise threshold to avoid false positives), add that over the static, static remains fixed dynamic is continously updated. Did i got that right? And for pivoting the camera we would need some external tracking data(either generated or some camera sensors)? While my approach will just settle down the jiggling overall, I think this could work toghether. |
Kind of like that, you would need to track the camera movement and account for it's movement by subtracting it from the static depth. But tracking the camera is easy and can be done pretty much flawlessly these days, using some AI powered tracker like mega-sam. But i don't think trying this is worth it without underlying ML model support, It will never look good enough to be usable. It would be better to use one of the less accurate video models that already exist. Even if they are "less accurate" their true stable output will still beat any post processing you can apply to the frames to stabilise them. |
Thought i could add some info on that. To cancel out basic camera rotation you could use the tracking tools in metric_depth_video_toolbox that are based on cotracker3 to do this: python track_points_in_video.py --color_video ~/input.mp4 --nr_iterations 4 --steps_bewtwen_track_init 30 The result is a tracking file called ~/input.mp4_tracking.json which contains a list of tracked points for the video. |
Uh oh!
There was an error while loading. Please reload this page.
Hi to start, really great work with UniK3D!
I am trying to generate depth video here and it works great but i have issues with frame to frame stability. There is quite some variation(flicker) between each frame even when the real depth is more or less static. The issue is also visible in the UniK3D README banner.
Would it be possible to add some way to stabilise the output, In to the next release of the model?
One way could be to implement the option of having a low resolution depth map and a mask as an extra input ontop of the RGB input, 32x32 resolution would probably be plenty even 16x16 would probably be enough.
For video you could then take the depth output of the last frame and feed it in as a prompt together with a mask for dynamic areas, (a mask for parts in the input depth map that that should be ignored due to unknown movments)
If the camera rotates you can then use external basic camera tracking to cancel out the rotation in the depth prompt and mask. So the model would not need to care about that type of stuff.
This would also work great for a use case where you know the distance to a single thing in the frame. Then you can simply mask everything except that thing then specify its depth and the network would be able to work of that known depth. Say one knew the middle pixel was 25.4 meters away, such a feature would allow one to help the model out.
Does this sound like a reasonable idea? I understand this will require retraining the network (or at least the decoder) unless one wan't to add the depth input in to the encoder to.
The text was updated successfully, but these errors were encountered: