Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails to run superClass when one of the raster layer is factor #108

Open
bappa10085 opened this issue Feb 16, 2024 · 3 comments
Open

Fails to run superClass when one of the raster layer is factor #108

bappa10085 opened this issue Feb 16, 2024 · 3 comments

Comments

@bappa10085
Copy link

bappa10085 commented Feb 16, 2024

I am trying to run superClass with a raster stack having both numeric and categorical layers. But it returns me the following error:

Error: variable 'Class' was fitted with type "numeric" but type "factor" was supplied

But if I use caret package directly for model building, the model runs fine. By default caret converts factor variable using one hot encoding. Here is a minimal, reproducible example

library(terra)
library(RStoolbox)

f <- system.file("ex/elev.tif", package="terra")
elevation <- rast(f)
slope <- terrain(elevation, "slope")
aspect <- terrain(elevation, "aspect")
plot(elevation)

#Make the aspect categorical 
m_aspect <- c(-Inf, -1, 1,  
              -1, 22.5, 2,
              22.5, 67.5, 3,
              67.5, 112.5, 4,  
              112.5, 157.5, 5,
              157.5, 202.5, 6,
              202.5, 247.5, 7,  
              247.5, 292.5, 8,
              292.5, 337.5, 9,
              337.5, Inf, 2)

rclmat_aspect <- matrix(m_aspect, ncol=3, byrow=TRUE)

rc_aspect <- classify(aspect, rclmat_aspect, include.lowest=TRUE)
plot(rc_aspect)

aspect_classes <- data.frame(Value = 1:9,
                             Class = c("Flat", "N","NE","E","SE","S","SW","W",
                                       "NW"))

levels(rc_aspect) <- aspect_classes

logo <- c(elevation, slope, rc_aspect)

p <- read.table(text = "Longitude Latitude
49.60 6.00
49.65 6.10
49.70 6.15
49.75 6.20
49.80 6.25
49.85 6.27
49.87 5.80
49.90 5.83
50.00 5.85
50.05 5.90", header = T)

a <- p + 0.01  

pb <- c(rep("Yes", nrow(p)), rep("No", nrow(a)))
pts <- cbind(pb, rbind(p, a))

sp.pts <- terra::vect(pts, geom=c("Latitude", "Longitude"), crs=crs(elevation))

v <- terra::extract(logo, sp.pts, xy = T, ID = F, bind=T)

## Fit classifier (splitting training into 70% training data, 30% validation data)
rf_mod <- superClass(logo, trainData = sf::st_as_sf(v), 
                     responseCol = "pb", 
                     model = "rf", tuneLength = 1, trainPartition = 0.7, 
                     predict = T,
                     predType = "prob", #for class probabilities
                     mode = "classification",
                     kfold = 3, na.rm=TRUE)
@KonstiDE
Copy link
Collaborator

KonstiDE commented Feb 19, 2024

@bappa10085 You are correct, caret converts it, however, terra does not. I am first using terra::predict before feeding anything into caret which seems to have some problems with factors. I will investigate on that, thanks for reporting. For now, just convert your SpatRaster "Class" (the values of it) to numeric before it then gets coverted by caret again to factors. As follows:

logo$Class <- as.numeric(logo$Class)
rf_mod <- superClass(logo,
                     trainData = sf::st_as_sf(v),
                     responseCol = "pb",
                     model = "rf",
                     tuneLength = 1,
                     trainPartition = 0.7,
                     predict = T,
                     predType = "prob", #for class probabilities
                     mode = "classification",
                     kfold = 3, na.rm=TRUE)

plot(rf_mod$map)

@fstrech
Copy link

fstrech commented Aug 5, 2024

Does caret know how to convert the logo$Class back into a factor variable? The training data after the model suggests it treated Class as a pure numeric variable (only 6 of the 9 levels from Class are represented in the training data):

rf_mod$model$trainingData
.outcome elevation slope Class
1 Yes 352 0.9963959 4
2 Yes 277 5.0016905 8
3 Yes 318 0.9194205 4
4 Yes 380 1.4358448 5
5 Yes 336 4.8786019 9
6 Yes 428 1.0746833 4
7 Yes 490 0.9363304 4
8 No 325 0.7089894 3
9 No 253 3.1731447 4
10 No 396 2.3262434 7
11 No 358 1.2676616 4
12 No 354 1.1709739 4
13 No 424 2.2205044 3
14 No 463 1.1765749 9

@KonstiDE
Copy link
Collaborator

KonstiDE commented Aug 8, 2024

If I understand u right its just a matter of training data, not about conversion.

Executing

rf_mod <- RStoolbox::superClass(logo, trainData = sf::st_as_sf(v),
                     responseCol = "pb",
                     model = "rf", tuneLength = 1, trainPartition = 0.7,
                     predict = T,
                     predType = "prob", #for class probabilities
                     mode = "classification",
                     kfold = 3, na.rm=TRUE)
length(unique(rf_mod$model$trainingData$Class))

gives back sometimes 6, 7, or 8 for me and maybe 10 test runs. It can be, that just the training split was randomly selected that some classes either are exclusively within the validation or training set or are just not really represented enough anymore to be predicted...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants