Skip to content

fix(previews): avoid large file downloads for remote movie storage #52079

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

printminion-co
Copy link
Contributor

@printminion-co printminion-co commented Apr 9, 2025

Summary

  • On NC configured with remote storage like S3
    • User uploads video file of 3.7Gb
    • While file uploads it also stored to S3
    • After the upload is complete, when the user navigates to the folder containing the newly uploaded file, the thumbnail generation logic starts
  • The current logic downloads the entire 3.7 GB file from S3 to create the thumbnail (as observed in the network traffic screen).

Selection_20250408-002

Error produced by ffmpeg

Movie preview generation failed Output: ffmpeg version 6.1.1 Copyright (c) 2000-2023 the FFmpeg developers
  built with gcc 13.2.1 (Alpine 13.2.1_git20240309) 20240309
  configuration: --prefix=/usr --disable-librtmp --disable-lzma --disable-static --disable-stripping --enable-avfilter --enable-gpl --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libdav1d --enable-libdrm --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libharfbuzz --enable-libmp3lame --enable-libopenmpt --enable-libopus --enable-libplacebo --enable-libpulse --enable-librav1e --enable-librist --enable-libsoxr --enable-libsrt --enable-libssh --enable-libtheora --enable-libv4l2 --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxcb --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-lto=auto --enable-lv2 --enable-openssl --enable-pic --enable-postproc --enable-pthreads --enable-shared --enable-vaapi --enable-vdpau --enable-version3 --enable-vulkan --optflags=-O3 --enable-libjxl --enable-libsvtav1 --enable-libvpl
  libavutil      58. 29.100 / 58. 29.100
  libavcodec     60. 31.102 / 60. 31.102
  libavformat    60. 16.100 / 60. 16.100
  libavdevice    60.  3.100 / 60.  3.100
  libavfilter     9. 12.100 /  9. 12.100
  libswscale      7.  5.100 /  7.  5.100
  libswresample   4. 12.100 /  4. 12.100
  libpostproc    57.  3.100 / 57.  3.100
  [mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f53f2e9c600] moov atom not found
[in#0 @ 0x7f53f2f928c0] Error opening input: Invalid data found when processing input
Error opening input file /tmp/oc_tmp_POfcbD.
Error opening input files: Invalid data found when processing input

Proposed Solution

Prevent downloading entire movie files from remote storage (e.g., S3) when the 'moov atom' is located at the end of the file.

Test

Can be tested (locally with Minio) with video files larger than 5Mb with "moov atom" at the end of the file

Download sample video with missing "moov atom" at the beginning of the file
wget https://www.sample-videos.com/video321/mp4/720/big_buck_bunny_720p_10mb.mp4

  • Upload file before proposed change. Observe - thumbnail exist. I happens because after the error in log "Movie preview generation failed Output" the whole file is downloaded from S3.
  • Upload file with proposed change.
    • Observe - no thumbnail is generated

Checklist

Copy link
Contributor

@come-nc come-nc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to get only the end of the file?

Copy link
Contributor

@artonge artonge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@printminion-co I was not able to reproduce the full download locally. Is there anything specific about your setup? Maybe it is the S3 bucket?

Could it be that your S3 bucket does not support partially seeking the file? Not sure exactly what would be missing. Maybe @juliusknorr or @icewind1991 know better?

Also, I assume that in your testing, the full download was triggered by the null option, meaning that the 5242880 was failing. Do you have any error message related to that?

My goal would be to try to fix the partial download instead of dropping the functionality.

@juliusknorr
Copy link
Member

Also, I assume that in your testing, the full download was triggered by the null option, meaning that the 5242880 was failing. Do you have any error message related to that?

I guess that is what @printminion-co mentions that with this specific video file the relevant bits are at the end so it always requires the full read.

I'm not sure if it also may also depend on other factors like the ffmpeg version used if a preview can be generated without the moov atom.

From my POV it would be acceptable to not have previews in those cases, though from a user perspective it may be unexpected to see some previews of videos, but not all.

@skjnldsv
Copy link
Member

Ha, we had a ticket on that topic literally a few days ago.
yeah, that's the issue with videos, not much we can do tbh.
We could stop after X bytes, that would work sure.

Some files are also more error prone, like MOV files are terribly optimized for streaming for that reason, moov atom section is almost never in the early strt of the file 😢

Copy link
Contributor

@artonge artonge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make it easier to read.

@printminion-co printminion-co force-pushed the fix/s3_traffic_on_video_thumbnails branch 2 times, most recently from 9fdd39c to 42b86e1 Compare April 22, 2025 11:31
@printminion-co printminion-co requested a review from artonge April 22, 2025 11:36
@printminion-co printminion-co force-pushed the fix/s3_traffic_on_video_thumbnails branch from 42b86e1 to 68b8d8b Compare April 22, 2025 14:59
@printminion-co printminion-co requested a review from artonge April 22, 2025 15:00
@printminion-co
Copy link
Contributor Author

@printminion-co I was not able to reproduce the full download locally. Is there anything specific about your setup? Maybe it is the S3 bucket?

Could it be that your S3 bucket does not support partially seeking the file? Not sure exactly what would be missing. Maybe @juliusknorr or @icewind1991 know better?

Also, I assume that in your testing, the full download was triggered by the null option, meaning that the 5242880 was failing. Do you have any error message related to that?

My goal would be to try to fix the partial download instead of dropping the functionality.

The functionality will be dropped on files larger than 5Mb on which the ffmpeg thumbnail generator will get the
Following error (added to description).

Movie preview generation failed Output: ffmpeg version 6.1.1 Copyright (c) 2000-2023 the FFmpeg developers
  built with gcc 13.2.1 (Alpine 13.2.1_git20240309) 20240309
  configuration: --prefix=/usr --disable-librtmp --disable-lzma --disable-static --disable-stripping --enable-avfilter --enable-gpl --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libdav1d --enable-libdrm --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libharfbuzz --enable-libmp3lame --enable-libopenmpt --enable-libopus --enable-libplacebo --enable-libpulse --enable-librav1e --enable-librist --enable-libsoxr --enable-libsrt --enable-libssh --enable-libtheora --enable-libv4l2 --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxcb --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-lto=auto --enable-lv2 --enable-openssl --enable-pic --enable-postproc --enable-pthreads --enable-shared --enable-vaapi --enable-vdpau --enable-version3 --enable-vulkan --optflags=-O3 --enable-libjxl --enable-libsvtav1 --enable-libvpl
  libavutil      58. 29.100 / 58. 29.100
  libavcodec     60. 31.102 / 60. 31.102
  libavformat    60. 16.100 / 60. 16.100
  libavdevice    60.  3.100 / 60.  3.100
  libavfilter     9. 12.100 /  9. 12.100
  libswscale      7.  5.100 /  7.  5.100
  libswresample   4. 12.100 /  4. 12.100
  libpostproc    57.  3.100 / 57.  3.100
  [mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f53f2e9c600] moov atom not found
[in#0 @ 0x7f53f2f928c0] Error opening input: Invalid data found when processing input
Error opening input file /tmp/oc_tmp_POfcbD.
Error opening input files: Invalid data found when processing input"

Probably NC admins with S3 backend could later via config define their own upper re-download limit they are ok with (e.g. 50mb)

Prevent downloading entire movie files from remote storage (e.g., S3)
when the 'moov atom' is located at the end of the file.

Signed-off-by: Misha M.-Kupriyanov <kupriyanov@strato.de>
@printminion-co printminion-co force-pushed the fix/s3_traffic_on_video_thumbnails branch from 68b8d8b to 4a924bf Compare April 23, 2025 08:04
Copy link
Contributor

Hello there,
Thank you so much for taking the time and effort to create a pull request to our Nextcloud project.

We hope that the review process is going smooth and is helpful for you. We want to ensure your pull request is reviewed to your satisfaction. If you have a moment, our community management team would very much appreciate your feedback on your experience with this PR review process.

Your feedback is valuable to us as we continuously strive to improve our community developer experience. Please take a moment to complete our short survey by clicking on the following link: https://cloud.nextcloud.com/apps/forms/s/i9Ago4EQRZ7TWxjfmeEpPkf6

Thank you for contributing to Nextcloud and we hope to hear from you soon!

(If you believe you should not receive this message, you can add yourself to the blocklist.)

@AndyScherzinger AndyScherzinger added this to the Nextcloud 32 milestone Apr 24, 2025
@AndyScherzinger AndyScherzinger merged commit 34949e4 into nextcloud:master Apr 24, 2025
182 of 198 checks passed
@artonge
Copy link
Contributor

artonge commented Apr 24, 2025

/backport to stable31

@invario
Copy link
Contributor

invario commented Apr 26, 2025

Apologies for being late to the discussion, but to my untrained eye, it seems this would affect all non-local shares, including SMB? If that is the case, that is unfortunate, as I have a library of videos larger than 5MB that is mounted as external storage in my NC instance. Previews have already been pre-generated for all of them so I won't notice any difference personally, but although these files are technically stored "remote", they are actually on the same server (for reasons I won't get into) so the connection speed is very high. Even if these were mounted remote but on a LAN, larger files than 5MB could still be accessed fairly quickly.

I've been racking my brain all night to see if there's a better way to deal with the problem of having to download the entire file but thus far my research is indicating it's not possible.

ffprobe/ffmpeg supposedly have methods of accessing the S3 to allow seeking to the end to fetch the moov atom without requesting the entire file, but the problem is NC's temp file mechanism added in the middle that wouldn't allow this to work.

For testing purposes, I've made efforts to brute force trim the front of a video file off down to the last 500KB or so (which should be plenty to cover the moov atom) but this results in ffmpeg/ffprobe failing with a moov atom not found error.

@invario
Copy link
Contributor

invario commented Apr 26, 2025

For what it's worth, I came across the following two threads during my research:
https://stackoverflow.com/questions/12620631/ffmpegphp-get-thumbnail-from-external-url

And:
PHP-FFMpeg/PHP-FFMpeg#167

Is there a reason why NC preview generation is written to call the ffmpeg binary directly as opposed to using PHP-FFMpeg?

Not that this would solve the issue... But supposedly passing a signed URI directly would work

@kesselb
Copy link
Contributor

kesselb commented May 2, 2025

opposed to using PHP-FFMpeg?

Is also calling the binary directly (through binarydriver and symfony process)

@invario
Copy link
Contributor

invario commented Jun 19, 2025

Still digging around and thinking about how to implement this but buried in common.php is this:

	/**
	 * A custom storage implementation can return an url for direct download of a give file.
	 *
	 * For now the returned array can hold the parameter url - in future more attributes might follow.
	 */
	public function getDirectDownload(string $path): array|false {
		return [];
	}

...which would be exactly what's needed to pass directly to ffmpeg for an S3 bucket so that it could immediately seek to the end of the file to retrieve the moov atom. Unfortunately, it's just a placeholder and not implemented yet.

@invario
Copy link
Contributor

invario commented Jun 21, 2025

@printminion-co I would love it if you could give my PR a test and let me know how it works for you.

@printminion-co
Copy link
Contributor Author

@printminion-co I would love it if you could give my PR a test and let me know how it works for you.

You mean probably the #53634

@invario
Copy link
Contributor

invario commented Jun 24, 2025

Yes, that's the one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants