-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Erroneous rogue element detection? #84
Comments
It does appear that asian characters are allowable in URL addresses. We have not so far whitelisted these types of characters in adblock-lean and is why this rogue element detection occurs. Alphaumeric characters are allowed via [::alnum::] |
I thought that international characters, while showing up as such in the browser, in fact are transparently encoded into ascii characters when dns resolution is performed. This Wikipedia article seems to confirm that: Which means that internationalized domain names can be adblocked as-is, in their ascii transcription. Technically we could implement support for additional alphabets, but they are many, so this may get complicated and have a performance impact. For this reason, I think that the correct solution is to use lists that have the internationalized domain names encoded with Punycode. I'd suggest @jul059 to file a bug report to whoever maintains the list which has this entry. Edit: if using a different list or fixing this list is not possible, I suppose we could make rogue element detection optional. Since rogue element check safeguards against both corrupted and malicious lists, the user would then need to make sure that the lists they are using are safe and sound. |
Upon further research, perhaps we could consider changing the regex to support all unicode characters. Apparently this can be done by replacing (edit: it's [0-9], not [0_9] as incorrectly represented by Github)
If this does work but has performance impact, we could make this an optional feature. The question of whether this is actually needed still stands. |
I read this same thing also. The Blocklist maintainer could possibly switch to using ascii character.
I'll give this a performance/sed test later on tonight. Let's see how that goes and decide from there. |
So I have some different script behavior than OP. Running on the latest adblock-lean main (no modification), my test file with entry 華信金融.tw passes adblock-lean rogue check with no issues However, dnsmasq does not like these characters anyway. So that should pretty much be decision made as asian characters cause this error:
|
One other thing we could do for this use case is add an option enabling pre-processing of certain lists with the idn2 utility. idn2 can be used to translate unicode to punycode. This requires packages So something like
|
I think converting urls to punycode should be up to the list maintainer and not built into adblock-lean. The OP could request that particular blocklist maintainer to do so. Something else in that list is |
The question is whether including internationalized domain names (IDN's) in blocklists is a normal practice. From a little research, it looks like at least pihole does accept IDN's. I can't tell how widespread the use of these lists is but if it doesn't cost us much effort, we could as well support them in adblock-lean (with help of idn2). |
I don't think it's widespread use, first I've seen after all these years. Can you make any sense of this address in that same list? I'm trying to understand if the maintainer needs to make some updates since the really only a few idns in that list , and a couple that look like this:
|
Does this issue still need to be worked on? |
@lynxthecat I think we haven't reached a conclusion on this one. I think it's a matter of policy. In other words, you are the boss, you decide :) |
Doesn't this answer it for us? Ultimately if dnsmasq is going to reject we're limited by that? |
This definitely does mean that those lists can not be used as-is. See further discussion of implementing a conversion mechanism, which is what probably some other adblockers which do accept these lists do. |
Adblock-lean reports a rogue element in the following blocklist:
It looks to me like an erroneous detection.
The text was updated successfully, but these errors were encountered: