MarkdownSanitizer sanitizes URLs #2378

EclipsedSolari · 2023-01-11T07:13:42Z

General Troubleshooting

I have checked for similar issues on the Issue-tracker.
I have updated to the latest JDA version
I have checked the branches or the maintainers' PRs for upcoming bug fixes.

Expected Behaviour

Discord treats URLs (almost) as literals, even URLs with Markdown characters in them.

Consider the following Google search:
https://www.google.com/search?q=__test__

Although the string __test__ is Markdown for underlined text, because it's part of a URL, Discord leaves it as-is.

JDA's MarkdownSanitizer does not make the distinction between URLs and normal text, so JDA escapes this URL:
https://www.google.com/search?q=\_\_test\_\_

When trying to send this URL as a Discord message, Discord replaces the backslashes with forward slashes but doesn't apply Markdown rules, including escaping, so the end result is the broken URL https://www.google.com/search?q=/_/_test/_/_.

I expected JDA's Markdown handler to leave URLs as-is.

Code Example for Reproduction Steps

//add this to the EscapeMarkdownTest class in src/test/java/MarkdownTest.java
//all of the assertions except the first fail
@Test
public void testUrl()
{
	Assertions.assertEquals("https://www.google.com/", markdown.compute("https://www.google.com/"));
	Assertions.assertEquals("https://www.google.com/search?q=_test_", markdown.compute("https://www.google.com/search?q=_test_"));
	Assertions.assertEquals("https://www.google.com/search?q=__test__", markdown.compute("https://www.google.com/search?q=__test__"));
	Assertions.assertEquals("https://www.google.com/search?q=*test*", markdown.compute("https://www.google.com/search?q=*test*"));
	Assertions.assertEquals("https://www.google.com/search?q=**test**", markdown.compute("https://www.google.com/search?q=**test**"));
	Assertions.assertEquals("https://www.google.com/search?q=***test***", markdown.compute("https://www.google.com/search?q=***test***"));
	Assertions.assertEquals("https://www.google.com/search?q=~~test~~", markdown.compute("https://www.google.com/search?q=~test~"));
}

Code for JDABuilder or DefaultShardManagerBuilder used

N/A

Exception or Error

N/A

The text was updated successfully, but these errors were encountered:

EclipsedSolari · 2023-01-11T07:22:53Z

I'm willing to take a shot at this issue, but I'd like to discuss the correct way to approach it first.

The current entities in MarkdownSanitizer are represented by simple tokens or by regex. Parsing URLs with regex is an unpleasant problem, and I would probably have to roll my own regex since I'm unsure of licensing (and correctness) implications if I grab one from somewhere else on the internet.

Java does have a URL parser, but its implementation seems like a bad fit for this use case, since I'd have to construct a new URL and potentially catch an exception (MalformedURLException) on every possible text sequence.

I could explore a hybrid approach, with a regex to detect possible leading URLs in a sequence (i.e. in the text sequence https://www.google.com?q=~~test~~ some other text here, each space indicates the end of a potential URL) and feed those potential URLs to java.net.URL for validation.

I'm open to any other ideas.

LaFriska · 2023-07-15T15:24:15Z

I am currently giving this issue a shot. This is somewhat complicated as previously suggested, but as long as the programme detects and URL, replaces it with a placeholder and changes it back after the algorithm, it should work correctly.

I would be able to make a pull request tomorrow.

LaFriska · 2023-07-16T02:24:40Z

For the final assertion in that test method, I think you may have made a mistake: the expect result should be the URL with only one layer of "~" wrapped around the word "test" in the url, not two.

-Suggested by EclipsedSolari in issue discord-jda#2378 -The algorithm implemented searches and replaces all URLs by a placeholder, and replaces them back after the majority of the compute method finishes. -Also added a test method in the MarkdownTest class.

LaFriska mentioned this issue Jul 16, 2023

Fixed MarkdownSanitizer altering URL symbols #2501

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MarkdownSanitizer sanitizes URLs #2378

MarkdownSanitizer sanitizes URLs #2378

EclipsedSolari commented Jan 11, 2023

EclipsedSolari commented Jan 11, 2023

LaFriska commented Jul 15, 2023

LaFriska commented Jul 16, 2023

MarkdownSanitizer sanitizes URLs #2378

MarkdownSanitizer sanitizes URLs #2378

Comments

EclipsedSolari commented Jan 11, 2023

General Troubleshooting

Expected Behaviour

Code Example for Reproduction Steps

Code for JDABuilder or DefaultShardManagerBuilder used

Exception or Error

EclipsedSolari commented Jan 11, 2023

LaFriska commented Jul 15, 2023

LaFriska commented Jul 16, 2023