enhance: improve media type recognition with HEAD or magic bytes #2599

Soxasora · 2025-10-11T17:53:06Z

Description

fixes #2341
fixes #1433
can address #850 for images

We check if a link is a media file by downloading it fully, and then we download it again when we want to render it.
This PR improves media type recognition on links by fetching HEAD or, as a fallback, the first magic bytes via magic-bytes.js.

Client and imgproxy can use an endpoint placed in the capture micro service (avoids CORS) to know if they're dealing with an image or a video.

Also checks if a link has HTTP Basic Auth.

Screenshots

Loading an image/avif file from the browser to check and render media vs checking the magic bytes
Proof of concept

Screen.Recording.2025-10-11.at.17.19.03.mp4

Additional Context

The endpoint lives in the capture micro service, because of this, the compose profile must have "capture".
Maybe we can add an extra fallback for when the capture instances go offline, if ever

We can get rid of the HEAD fetch if there's the possibility of false informations, it's just a cheap way to get Content-Type

Can address #850 for images

The first magic bytes of an image can also contain informations about dimensions, and we could use it to avoid render jumps before imgproxy takes over. It's not implemented in this PR

This job could also be done in-house but we would deal with lots of magic numbers this way, so a popular and well-maintained library seemed a better idea.

This doesn't get rid of the heuristics involved in the imgproxy worker, it's still something that we know for sure that it works. But it's definitely redundant now.

Checklist

Are your changes backward compatible? Please answer below:

For example, a change is not backward compatible if you removed a GraphQL field or dropped a database column.
Yes

On a scale of 1-10 how well and how have you QA'd this change and any features it might affect? Please answer below:
7, pretty good actually

For frontend changes: Tested on mobile, light and dark mode? Please answer below:
n/a

Did you introduce any new environment variables? If so, call them out explicitly here:
The following env vars have been introduced

MEDIA_CHECK_ROUTE=media
-- route for the capture micro service
MEDIA_CHECK_URL_DOCKER=http://capture:5678/media
-- url for imgproxy, communication between containers
NEXT_PUBLIC_MEDIA_CHECK_URL=http://localhost:5678/media
-- url for client-side fetches, e.g. media-or-link.js

The last one has been introduced in .env.production too but I don't think that file is even used

Did you use AI for this? If so, how much did it assist you?
The readFirstBytes function is partially vibed, there were some things not really clear to me in that moment about the part of reading the small chunk with Reader, so it came in help.

Note

Adds a capture service endpoint to detect image/video via HEAD or magic-bytes and wires it into the frontend and imgproxy worker with new env vars.

Capture service:
- Add media-check endpoint (capture/media-check.js) using HEAD and magic bytes (magic-bytes.js) with timeout/byte limits and basic-auth handling.
- Wire route in capture/index.js at /${MEDIA_CHECK_ROUTE}/:url.
- Add dependency magic-bytes.js.
Worker (worker/imgproxy.js):
- Replace ad-hoc HEAD/GET detection with call to MEDIA_CHECK_URL endpoint; cache result.
Frontend (components/media-or-link.js):
- Replace video/img probe hack with fetch to PUBLIC_MEDIA_CHECK_URL to set isImage/isVideo.
Config/Env:
- New envs: MEDIA_CHECK_ROUTE, MEDIA_CHECK_URL_DOCKER, NEXT_PUBLIC_MEDIA_CHECK_URL (added to .env.development and .env.production).
- Expose process.env.NEXT_PUBLIC_MEDIA_CHECK_URL via next.config.js DefinePlugin; export PUBLIC_MEDIA_CHECK_URL in lib/constants.js.
Docs:
- Update README.md example to include capture in COMPOSE_PROFILES.

^{Written by Cursor Bugbot for commit 8c3e6d1. This will update automatically on new commits. Configure here.}

…ts first (magic) bytes

socket-security · 2025-10-11T17:53:40Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff	Package	Supply Chain Security	Vulnerability	Quality	Maintenance	License
	magic-bytes.js@1.12.1

View full report

cursor · 2025-10-12T16:38:08Z

worker/imgproxy.js

 const IMGPROXY_URL = process.env.IMGPROXY_URL_DOCKER || process.env.NEXT_PUBLIC_IMGPROXY_URL
 const IMGPROXY_SALT = process.env.IMGPROXY_SALT
 const IMGPROXY_KEY = process.env.IMGPROXY_KEY
+const MEDIA_CHECK_URL = process.env.MEDIA_CHECK_URL_DOCKER || process.env.NEXT_PUBLIC_MEDIA_CHECK_URL


Bug: Media Type URL Fetch Fails Without Env Vars

The new media type checking mechanism relies on environment variables for its URL. If these are unset, fetch calls are made to invalid undefined/... URLs, causing TypeError or fetch failures. This prevents media from being correctly identified, regressing from the previous self-contained implementation.

Additional Locations (2)

components/media-or-link.js#L137-L138

worker/imgproxy.js#L148-L149

Well... yeah, everything would show as links.
I was wondering if a fallback to the traditional system might be acceptable considering the dislocation of the endpoint to another service.

Soxasora added 2 commits October 11, 2025 19:20

enhance: improve media type recognition by fetching HEAD or reading i…

60001dd

…ts first (magic) bytes

add origin protection, handle links behind basic auth

3d4e0ef

rollback export from createImgproxyPath

41f380f

Soxasora marked this pull request as ready for review October 11, 2025 18:15

This comment was marked as outdated.

Sign in to view

Soxasora added 2 commits October 11, 2025 20:21

fix api return statements, protect url swap

c05e813

light cleanup

f6267a4

This comment was marked as outdated.

Sign in to view

fix wrong fetch url light cleanup

fd80b8a

This comment was marked as outdated.

Sign in to view

Soxasora marked this pull request as draft October 12, 2025 09:25

amend this

012af00

Soxasora mentioned this pull request Oct 12, 2025

support audio embeds #2533

Open

Soxasora added 2 commits October 12, 2025 18:14

do media checks with the capture microservice

d23456b

fix media check import

d6ea440

Soxasora marked this pull request as ready for review October 12, 2025 16:24

This comment was marked as outdated.

Sign in to view

we don't need to abort on imgproxy

8c3e6d1

cursor bot reviewed Oct 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

enhance: improve media type recognition with HEAD or magic bytes #2599

enhance: improve media type recognition with HEAD or magic bytes #2599

Uh oh!

Soxasora commented Oct 11, 2025 •

edited by cursor bot

Loading

Uh oh!

socket-security bot commented Oct 11, 2025 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

cursor bot Oct 12, 2025

Uh oh!

Soxasora Oct 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

enhance: improve media type recognition with HEAD or magic bytes #2599

Are you sure you want to change the base?

enhance: improve media type recognition with HEAD or magic bytes #2599

Uh oh!

Conversation

Soxasora commented Oct 11, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Screenshots

Additional Context

Checklist

Uh oh!

socket-security bot commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

cursor bot Oct 12, 2025

Choose a reason for hiding this comment

Bug: Media Type URL Fetch Fails Without Env Vars

Uh oh!

Soxasora Oct 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Soxasora commented Oct 11, 2025 •

edited by cursor bot

Loading

socket-security bot commented Oct 11, 2025 •

edited

Loading