-
-
Notifications
You must be signed in to change notification settings - Fork 924
Add permitted display characters check for post titles #5692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@@ -152,7 +152,8 @@ pub fn is_valid_matrix_id(matrix_id: &str) -> LemmyResult<()> { | |||
|
|||
pub fn is_valid_post_title(title: &str) -> LemmyResult<()> { | |||
let length = title.trim().chars().count(); | |||
let check = (3..=200).contains(&length) && !has_newline(title); | |||
let check = | |||
(3..=200).contains(&length) && !has_newline(title) && has_3_permitted_display_chars(title); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curiously, has_3_premitted_display_chars
still allows users to have invisible characters, but enforces that there needs to be at least 3 visible characters. It's currently used for validating display names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trans flag is two unicode emojies (4 chars) in a u200d trench coat1, so if it completely disallowed invis chars like 200d, you would be unable to have that. There may be other stuff like this.
1 if it doesn't link to the technical information page, I recommend clicking on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I just realised that uFE0F is actually an invis char, and it isn't part of the FORBIDDEN_DISPLAY_CHARS
const. I checked the website that the FORBIDDEN_DISPLAY_CHARS
was taken from and realised that it is significantly longer than FORBIDDEN_DISPLAY_CHARS
. Someone may wish to copy the other chars into FORBIDDEN_DISPLAY_CHARS
? As it stands it is possible to circumvent has_3_permitted_display_chars
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #3888 for context about has_3_premitted_display_chars and also see the docs for str.chars(). It seems we would need something like unicode-segmentation to iterate over grapheme clusters and filter invisible ones, so that combined symbols like flags are not affected. Ideally there would be a library to check for this, but the only one I found (invisible_unicode) seems to have the same problem and is not actively maintained. Might be worth creating a new library for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the json list of characters if someone wants to do this. Would be good to just automate a pull from that repo or use it as a submodule: https://github.com/flopp/invisible-characters/blob/main/characters.json .
Fixes #5691