Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when plotting a C5.0 tree with factors which have spaces in levels #10

Open
rakeshnbabu opened this issue Aug 4, 2017 · 5 comments

Comments

@rakeshnbabu
Copy link

Hello, please see the below code which reproduces a bug in handling the levels of factors with spaces in the name when plotting C5.0 trees. This is in R version 3.4.1 (2017-06-30) x86_64-w64-mingw32 on Windows 7:

library(C50)
data(mtcars)
#Let's add some factors
mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$gear <- as.factor(mtcars$gear)
#Let's add some spaces to the factors
levels(mtcars$gear) <- c("3 speed", "4 speed" ,"5 speed")

myTree <- C5.0(cyl ~ gear, data=mtcars)
plot(myTree)

Error in partysplit(varid = as.integer(i), index = index, info = k, prob = NULL) : 
  minimum of ‘index’ is not equal to 1

The error itself is due to NA values being passed in the index vector. The root cause is probably that the factor levels are being split on spaces, but I'm unable to trace exactly where. On line 212 of as.party.C5.0.R, the for loop which generates the index value throws NA's because the factor levels stored in a1s do not match the factor levels in xlev.

@rakeshnbabu
Copy link
Author

There is a similar issue with the same error, but a different root cause, which can be traced to the model.frame.C5.0 function. On line 29 of the file as.party.C5.0.R, drop.unused.levels is set to TRUE. In my production code, my decision tree winds up referring to levels which are dropped from the model frame. This causes the same issue with NA's being passed to partysplit. I've not opened a separate report for this because I am unable to generate a trivial data set to reproduce it. Do you recall why that flag is not set to FALSE?

@topepo
Copy link
Owner

topepo commented Feb 15, 2018

This should be fixed in the github version (0.1.1.9000) if you would like to test.

@topepo
Copy link
Owner

topepo commented Feb 15, 2018

I've also changed

mf$drop.unused.levels <- FALSE

for testing

@kohleth
Copy link

kohleth commented Apr 23, 2018

Hi, the plotting is good, except it reverses the order of my levels in the plot.

library(C5)
iris$Y=factor(ifelse(iris$Species=='setosa','Y','N'))
levels(iris$Y)
model=C5.0(Y~Sepal.Length,data=iris,rules=F)
plot(model)

stepping through the code, it seems that in partykit:::plot.party, when the function terminal_node is defined, an argument reverse is set to TRUE.

@topepo
Copy link
Owner

topepo commented May 21, 2018

That appears to be how partkit does things. Try:

library(partykit)
mod <- ctree(Y~Sepal.Length,data=iris)
plot(mod)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants