Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Predict() when number of unique predicted labels is less than number of possible labels #341

Open
ebridge2 opened this issue Dec 5, 2019 · 0 comments
Labels

Comments

@ebridge2
Copy link
Collaborator

ebridge2 commented Dec 5, 2019

Describe the bug
When calling predict, I obtain the error:

 Error in factor(predictions, labels = labels) : 
  invalid 'labels'; length n should be 1 or k

where n > k every time (note you could substitute n, k for any integers satisfying this constraint above).

To Reproduce

I believe the error is that if predictions does not contain any predictions for a single class that exists in the training data, the way that the factoring is done causes an error. Minimal reproducible example demonstrating this issue with the way the predictions are being assigned class labels would be (ie, the flaw with the approach chosen):

x <- rep(letters[1:5], 3)  # x has only 5 unique elements
factor(x, labels=LETTERS[1:10])  # note that there are more labels than unique elements of x

Error in factor(x, labels = LETTERS[1:10]) : 
  invalid 'labels'; length 10 should be 1 or 5

I noticed this bug when I had a training set with extremely sparse representation (30 samples of 10,000) of a single class, which presumably is just never predicted during prediction and hence the error is thrown if I had to guess.

Expected behavior
The predictions are returned.

Desktop (please complete the following information):

  • OS: Ubuntu 18.04
  • Language: R
  • Version 2.0.4

Additional context
It would appear this issue can be fixed by simply:


x <- rep(letters[1:5], 3)  # x has only 5 unique elements
factor(x, levels=LETTERS[1:10])  # note that there are more labels than unique elements of x
@ebridge2 ebridge2 added the bug label Dec 5, 2019
@ebridge2 ebridge2 changed the title Issue with Predict() Issue with Predict() when number of unique predicted labels is less than number of possible labels Dec 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant