Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getLanguagesInTerritory should apply a threshold or allow consumers to do so #134

Open
nemobis opened this issue Mar 14, 2014 · 0 comments

Comments

@nemobis
Copy link
Contributor

nemobis commented Mar 14, 2014

mw.uls.getFrequentLanguageList blindly appends whatever $.uls.data.getLanguagesInTerritory( countryCode ) spits to the list of "common languages" for a territory.
If we look deeper into why languages suggested for Italy are so wrong ( https://bugzilla.wikimedia.org/62346), in addition to the issues already reported to CLDR there is the issue that we're not applying any threshold.

For instance, CLDR tells us that hr is spoken by 0.0057 % of the population, which is probably correct, but nevertheless hr manages to get into the list of "common" languages, which is absurd. I know that if the data was better then picking the top 7-9 languages (as the compact links feature does) would hide this issue, but it would make sense to cut the long tail, be it a threshold of 1, 0.1 or 0.01 % of the population.

The implementation doesn't matter. Some alternatives to cutting the tail in getLanguagesInTerritory:

  1. the output could contain some data (like the population data in CLDR) so that mw.uls.getFrequentLanguageList can do a filtering on its own, or
  2. it could be a new jquery.uls function, wrapping getLanguagesInTerritory, which cuts the tail and would be used by mw.uls.getFrequentLanguageList .
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant