Skip to content

Add permitted display characters check for post titles #5692

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

SleeplessOne1917
Copy link
Member

Fixes #5691

@@ -152,7 +152,8 @@ pub fn is_valid_matrix_id(matrix_id: &str) -> LemmyResult<()> {

pub fn is_valid_post_title(title: &str) -> LemmyResult<()> {
let length = title.trim().chars().count();
let check = (3..=200).contains(&length) && !has_newline(title);
let check =
(3..=200).contains(&length) && !has_newline(title) && has_3_permitted_display_chars(title);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curiously, has_3_premitted_display_chars still allows users to have invisible characters, but enforces that there needs to be at least 3 visible characters. It's currently used for validating display names.

Copy link
Contributor

@hackerncoder hackerncoder May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trans flag is two unicode emojies (4 chars) in a u200d trench coat1, so if it completely disallowed invis chars like 200d, you would be unable to have that. There may be other stuff like this.

1 if it doesn't link to the technical information page, I recommend clicking on it.

Copy link
Contributor

@hackerncoder hackerncoder May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I just realised that uFE0F is actually an invis char, and it isn't part of the FORBIDDEN_DISPLAY_CHARS const. I checked the website that the FORBIDDEN_DISPLAY_CHARS was taken from and realised that it is significantly longer than FORBIDDEN_DISPLAY_CHARS. Someone may wish to copy the other chars into FORBIDDEN_DISPLAY_CHARS? As it stands it is possible to circumvent has_3_permitted_display_chars.

Copy link
Member

@Nutomic Nutomic May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #3888 for context about has_3_premitted_display_chars and also see the docs for str.chars(). It seems we would need something like unicode-segmentation to iterate over grapheme clusters and filter invisible ones, so that combined symbols like flags are not affected. Ideally there would be a library to check for this, but the only one I found (invisible_unicode) seems to have the same problem and is not actively maintained. Might be worth creating a new library for this.

Copy link
Member

@dessalines dessalines May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the json list of characters if someone wants to do this. Would be good to just automate a pull from that repo or use it as a submodule: https://github.com/flopp/invisible-characters/blob/main/characters.json .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Unicode U+200E, U+200F, and potentially other invisible characters are allowed as post titles
4 participants