Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unescape "*" to "*" in mw.uri.anchorEncode() #276

Merged
merged 1 commit into from
Apr 24, 2024

Conversation

xxyzz
Copy link
Collaborator

@xxyzz xxyzz commented Apr 24, 2024

Still don't know how MediaWiki implements anchorEncode, I found this function could unescape "*" in Wiktionary's Lua debug console.

This change fixes the "The specified language Proto-Turkic is unattested, while the given word is not marked with '*' to indicate that it is reconstructed." Lua errors in "Reconstruction" pages

MediaWiki code:
https://github.com/wikimedia/mediawiki-extensions-Scribunto/blob/755b549fe66628a2891e9a61a9abade238dd0e9b/includes/Engines/LuaCommon/UriLibrary.php#L29-L33 https://github.com/wikimedia/mediawiki/blob/6592072169f1c25d43723e0956c701855aa4c6ab/includes/parser/CoreParserFunctions.php#L1058-L1062

Lua error: https://kaikki.org/dictionary/All%20languages%20combined/errors/details-The-specified-language-Proto-Turkic-is-yS4Aqcfj.html#LUA-error-in--invoke--links-templates----l_term_t----notself-1---parent---Template-l-self----1---trk-pro---2------usgay-----

More details:

Lua code: https://en.wiktionary.org/wiki/Module:links#L-365--L-375

-- Find embedded links and ensure they link to the correct section.
local function process_embedded_links(text, data, plain)
    -- Process the non-linked text.
    text = data.lang:makeDisplayText(text, data.sc[1], true)  -- "*" is escaped to "*" at here

    -- If the text begins with * and another character, then act as if each link begins with *. However, don't do this if the * is contained within a link at the start. E.g. `|*[[[foo]]](https://en.wiktionary.org/wiki/foo)` would set all_reconstructed to true, while `|[[[*foo]]](https://en.wiktionary.org/wiki/*foo)` would not.
    local all_reconstructed = false
    if not plain then
        -- anchorEncode removes links etc.
        if anchorEncode(text):sub(1, 1) == "*" then   -- anchorEncode convert "*" to "*"
            all_reconstructed = true

Still don't know how MediaWiki implements `anchorEncode`, I found this
function could unescape "*" in Wiktionary's Lua debug console.

This change fixes the "The specified language Proto-Turkic is
unattested, while the given word is not marked with '*' to indicate
that it is reconstructed." Lua errors in "Reconstruction" pages

MediaWiki code:
https://github.com/wikimedia/mediawiki-extensions-Scribunto/blob/755b549fe66628a2891e9a61a9abade238dd0e9b/includes/Engines/LuaCommon/UriLibrary.php#L29-L33
https://github.com/wikimedia/mediawiki/blob/6592072169f1c25d43723e0956c701855aa4c6ab/includes/parser/CoreParserFunctions.php#L1058-L1062
@xxyzz xxyzz merged commit 1330408 into tatuylonen:main Apr 24, 2024
5 checks passed
@xxyzz xxyzz deleted the anchorEncode branch April 24, 2024 07:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant