Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load kknn pkg on startup #4453

Closed
wants to merge 1 commit into from

Conversation

JorisGoosen
Copy link
Contributor

- Fixes jasp-stats/INTERNAL-jasp#1227 (K-nn and regression broken)
@JorisGoosen
Copy link
Contributor Author

I can attest that this at least solves the problem.

@vandenman
Copy link
Contributor

I think we can fix this inside jaspMachineLearning with something like

contr.dummy <- kknn::contr.dummy

all we need to ensure is that contr.dummy can be found by get. This is possible as long as we define contr.dummy one level above the function where e.g., kknn::train.kknn is called.

@JorisGoosen
Copy link
Contributor Author

I think we can fix this inside jaspMachineLearning with something like

contr.dummy <- kknn::contr.dummy

all we need to ensure is that contr.dummy can be found by get. This is possible as long as we define contr.dummy one level above the function where e.g., kknn::train.kknn is called.

I think it might be more futureproof to just call library(kknn) but does that have bad side-effects?

@vandenman
Copy link
Contributor

vandenman commented Jan 25, 2021

I looked into it a bit and it turns out it's a bit more work than what I said above. It is doable to fix this within R though.

The downside of library(kknn) is that the following names are loaded (and reserved):

[1] "contr.ordinal"      "kknn"               "kknn.dist2"         "fast.table"        
 [5] "plot.train.kknn"    "contr.metric"       "train.kknn"         "contr.int"         
 [9] "optKernel"          "specClust"          "print.train.kknn"   "plot.specClust"    
[13] "summary.train.kknn" "Laplacian"          "AUC"                "predict.train.kknn"
[17] "print.kknn"         "mydist"             "cv.kknn"            "summary.kknn"      
[21] "simulation"         "getClosest"         "contr.dummy"        "kknn.dist"         
[25] "predict.kknn"       "prepare.Discrete"

if we need another call to library(pkg) then that may introduce clashes. Adding the library call is the easiest way to fix this though.

@AlexanderLyNL
Copy link
Contributor

I think we can fix this inside jaspMachineLearning with something like

contr.dummy <- kknn::contr.dummy

all we need to ensure is that contr.dummy can be found by get. This is possible as long as we define contr.dummy one level above the function where e.g., kknn::train.kknn is called.

I think it might be more futureproof to just call library(kknn) but does that have bad side-effects?

It depends on how you set it up I guess, but when we started, it was set up without namespace protection which is why I using library, dettach etc.

@JorisGoosen
Copy link
Contributor Author

Hmm, well...

I think it would be best if kknn just worked properly without us needing to use library() on it, but I assume that would require a fix in the pkg, no?

I think putting the library(kknn) in JASP is ok, if we also try to fix the problem at it's root and open a PR there. That way we could remove the library call here, in time.

Would that work?
As in, I assume getting those functions working in kknn would just require prefacing them with their namespace? Or Is this a more complicated problem than I think it is?

@vandenman
Copy link
Contributor

I think putting the library(kknn) in JASP is ok, if we also try to fix the problem at it's root and open a PR there. That way we could remove the library call here, in time.

There is already an open issue at the GitHub repo. It should be relatively straightforward to fix this in the package itself, I'll see if I can make a PR to the repo this week.

As in, I assume getting those functions working in kknn would just require prefacing them with their namespace? Or Is this a more complicated problem than I think it is?

It's a bit more complicated. The core of this problem boils down to a poor design choice by R. R has this procedure to create a "model matrix". Basically it transforms this, let's call it X

continuous category
0.5 0
0.2 1
0.1 2

into something like this:

continuous category_dummy_1 category_dummy_2 category_dummy_3
0.5 1 0 0
0.2 0 1 0
0.1 0 0 1

the second representation is useful for a lot of linear algebra. There are multiple ways to do this transformation though, for instance, most algorithms use a built in conversion that lumps category_dummy_1 together with the intercept. The way this is done also differs for unordered categorical variables (character & factor) vs ordered categorical variables (ordered factors).

These different ways are called "contrasts" and you can set these by options(contrasts = list(ordered = "string", unordered = "string")). They have to be set that way and the string is the name of a function... (direct functions return an error!). Next, the function model.matrix looks at these contrasts and then assigns as an attribute to X$category, which is the name of the contrast function for the type of category. Next, some C code is called (see here). This C code then has to find this R function somehow! For that purpose, it uses get. But get doesn't know where to look. I think it just loops over whatever search() returns. But since kknn is loaded via a namespace, it's not on the search(). Thus we get an error.

Also, get throws an error for kknn::contr.dummy. That's probably because you also could do `kknn::contr.dummy` <- ... and use kknn::contr.dummy as an identifier rather than a namespace lookup (don't ask me why this is allowed).

A solution (the only reasonable one I came up with actually) is to manually specify a contrast matrix as an attribute, i.e., we do attr(X$category, "contrasts") <- ... ourselves. model.matrix takes extra care not to overwrite that attribute if it's already set and the C-code uses its value. We could do this within JASP, but it would be much better to do this within the kknn package.

@JorisGoosen
Copy link
Contributor Author

So we can wait and see if you manage to fix kknn and otherwise we can always merge this to be able to release JASP with some working ML.

@koenderks
Copy link
Contributor

Mind yourself though, the latest release of the kknn package is from 2016-03-26.

@JorisGoosen
Copy link
Contributor Author

It does seem to be under semi-active development if I peek at https://github.com/KlausVigo/kknn so at least the maintainer is alive.

@vandenman
Copy link
Contributor

Okay so I've opened a PR at https://github.com/KlausVigo/kknn that should fix the problem. And now we wait...

@JorisGoosen
Copy link
Contributor Author

Depends on KlausVigo/kknn#24

@JorisGoosen
Copy link
Contributor Author

So... It turns out this already sneaked into development somehow. I probably pushed by accident.

Anyhow, this PR can be closed because it really doesn't serbve much of a purpose anymore: b630d38

Once @vandenman hears back from Klaus we can always remove it.

@JorisGoosen JorisGoosen closed this Feb 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants