Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find more relevant labels #37

Open
Berkmann18 opened this issue May 25, 2020 · 1 comment
Open

Find more relevant labels #37

Berkmann18 opened this issue May 25, 2020 · 1 comment

Comments

@Berkmann18
Copy link
Member

Berkmann18 commented May 25, 2020

At the moment, the dataset looks like:
dataset
This is not good! and that's what we have after a down-sampling on the null labels (i.e. labels that can't be classified in one of the categories in https://allcontributors.org/docs/en/emoji-key) which are ≈ 16.61% of the whole dataset (ideally being less than business, ..., userTesting combined).
Down-sampling null labels would be an option, however, most of the ones left seems (fairly) widely used.

So the remaining option is to level up the other categories by adding more labels of those categories, especially the ones that can be found in GH/GL/Bitbucket repos alone.

@Berkmann18
Copy link
Member Author

Berkmann18 commented Apr 18, 2021

Note: the diagram shown above was updated and now we're up to 613 labels (~15.01% of which are in the null category).
If you have any labels that would help in levelling up the actual categories then it would be grand.

What I'm hoping to achieve right here is to have each category represent at least 9.58% and ideally having a more levelled-up distribution of ~19.15% each (while trying to have labels that are realistic or better: used in some repos).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants