-
-
Notifications
You must be signed in to change notification settings - Fork 734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MarkdownSanitizer sanitizes URLs #2378
Comments
I'm willing to take a shot at this issue, but I'd like to discuss the correct way to approach it first. The current entities in MarkdownSanitizer are represented by simple tokens or by regex. Parsing URLs with regex is an unpleasant problem, and I would probably have to roll my own regex since I'm unsure of licensing (and correctness) implications if I grab one from somewhere else on the internet. Java does have a URL parser, but its implementation seems like a bad fit for this use case, since I'd have to construct a new URL and potentially catch an exception (MalformedURLException) on every possible text sequence. I could explore a hybrid approach, with a regex to detect possible leading URLs in a sequence (i.e. in the text sequence I'm open to any other ideas. |
I am currently giving this issue a shot. This is somewhat complicated as previously suggested, but as long as the programme detects and URL, replaces it with a placeholder and changes it back after the algorithm, it should work correctly. I would be able to make a pull request tomorrow. |
For the final assertion in that test method, I think you may have made a mistake: the expect result should be the URL with only one layer of "~" wrapped around the word "test" in the url, not two. |
-Suggested by EclipsedSolari in issue discord-jda#2378 -The algorithm implemented searches and replaces all URLs by a placeholder, and replaces them back after the majority of the compute method finishes. -Also added a test method in the MarkdownTest class.
General Troubleshooting
Expected Behaviour
Discord treats URLs (almost) as literals, even URLs with Markdown characters in them.
Consider the following Google search:
https://www.google.com/search?q=__test__
Although the string
__test__
is Markdown for underlined text, because it's part of a URL, Discord leaves it as-is.JDA's MarkdownSanitizer does not make the distinction between URLs and normal text, so JDA escapes this URL:
https://www.google.com/search?q=\_\_test\_\_
When trying to send this URL as a Discord message, Discord replaces the backslashes with forward slashes but doesn't apply Markdown rules, including escaping, so the end result is the broken URL
https://www.google.com/search?q=/_/_test/_/_
.I expected JDA's Markdown handler to leave URLs as-is.
Code Example for Reproduction Steps
Code for JDABuilder or DefaultShardManagerBuilder used
Exception or Error
The text was updated successfully, but these errors were encountered: