You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
mw.uls.getFrequentLanguageList blindly appends whatever $.uls.data.getLanguagesInTerritory( countryCode ) spits to the list of "common languages" for a territory.
If we look deeper into why languages suggested for Italy are so wrong ( https://bugzilla.wikimedia.org/62346), in addition to the issues already reported to CLDR there is the issue that we're not applying any threshold.
For instance, CLDR tells us that hr is spoken by 0.0057 % of the population, which is probably correct, but nevertheless hr manages to get into the list of "common" languages, which is absurd. I know that if the data was better then picking the top 7-9 languages (as the compact links feature does) would hide this issue, but it would make sense to cut the long tail, be it a threshold of 1, 0.1 or 0.01 % of the population.
The implementation doesn't matter. Some alternatives to cutting the tail in getLanguagesInTerritory:
the output could contain some data (like the population data in CLDR) so that mw.uls.getFrequentLanguageList can do a filtering on its own, or
it could be a new jquery.uls function, wrapping getLanguagesInTerritory, which cuts the tail and would be used by mw.uls.getFrequentLanguageList .
The text was updated successfully, but these errors were encountered:
mw.uls.getFrequentLanguageList blindly appends whatever $.uls.data.getLanguagesInTerritory( countryCode ) spits to the list of "common languages" for a territory.
If we look deeper into why languages suggested for Italy are so wrong ( https://bugzilla.wikimedia.org/62346), in addition to the issues already reported to CLDR there is the issue that we're not applying any threshold.
For instance, CLDR tells us that hr is spoken by 0.0057 % of the population, which is probably correct, but nevertheless hr manages to get into the list of "common" languages, which is absurd. I know that if the data was better then picking the top 7-9 languages (as the compact links feature does) would hide this issue, but it would make sense to cut the long tail, be it a threshold of 1, 0.1 or 0.01 % of the population.
The implementation doesn't matter. Some alternatives to cutting the tail in getLanguagesInTerritory:
The text was updated successfully, but these errors were encountered: