Skip to content

Commit

Permalink
remove BigKnn and fix mentions of ShinyAppBuilder
Browse files Browse the repository at this point in the history
  • Loading branch information
egillax committed Dec 17, 2024
1 parent f1b3e56 commit 667ad20
Show file tree
Hide file tree
Showing 8 changed files with 21 additions and 244 deletions.
1 change: 0 additions & 1 deletion NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,6 @@ export(setCoxModel)
export(setDecisionTree)
export(setGradientBoostingMachine)
export(setIterativeHardThresholding)
export(setKNN)
export(setLassoLogisticRegression)
export(setLightGBM)
export(setMLP)
Expand Down
202 changes: 0 additions & 202 deletions R/KNN.R

This file was deleted.

4 changes: 2 additions & 2 deletions R/ViewShinyPlp.R
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ viewDatabaseResultPlp <- function(
# one shiny app

viewPlps <- function(databaseSettings){
rlang::check_installed("ShinyAppBuilder")
rlang::check_installed("OhdsiShinyAppBuilder")
rlang::check_installed("ResultModelManager")
connectionDetails <- do.call(
DatabaseConnector::createConnectionDetails,
Expand All @@ -135,7 +135,7 @@ viewPlps <- function(databaseSettings){
connection <- ResultModelManager::ConnectionHandler$new(connectionDetails)
databaseSettings$connectionDetailSettings <- NULL

shinyAppVersion <- strsplit(x = as.character(utils::packageVersion('ShinyAppBuilder')), split = '\\.')[[1]]
shinyAppVersion <- strsplit(x = as.character(utils::packageVersion('OhdsiShinyAppBuilder')), split = '\\.')[[1]]

if((shinyAppVersion[1] <= 1 & shinyAppVersion[2] < 2)){
# Old code to be backwards compatable
Expand Down
18 changes: 10 additions & 8 deletions man/createLearningCurve.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 7 additions & 5 deletions man/plotLearningCurve.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

23 changes: 0 additions & 23 deletions man/setKNN.Rd

This file was deleted.

4 changes: 2 additions & 2 deletions man/setLightGBM.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 0 additions & 1 deletion vignettes/BuildingPredictiveModels.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,6 @@ tabl <- "
| Regularized Logistic Regression | Lasso logistic regression belongs to the family of generalized linear models, where a linear combination of the variables is learned and finally a logistic function maps the linear combination to a value between 0 and 1. The lasso regularization adds a cost based on model complexity to the objective function when training the model. This cost is the sum of the absolute values of the linear combination of the coefficients. The model automatically performs feature selection by minimizing this cost. We use the Cyclic coordinate descent for logistic, Poisson and survival analysis (Cyclops) package to perform large-scale regularized logistic regression: https://github.com/OHDSI/Cyclops | var (starting variance), seed |
| Gradient boosting machines | Gradient boosting machines is a boosting ensemble technique and in our framework it combines multiple decision trees. Boosting works by iteratively adding decision trees but adds more weight to the data-points that are misclassified by prior decision trees in the cost function when training the next tree. We use Extreme Gradient Boosting, which is an efficient implementation of the gradient boosting framework implemented in the xgboost R package available from CRAN. | ntree (number of trees), max depth (max levels in tree), min rows (minimum data points in in node), learning rate, balance (balance class labels), seed |
| Random forest | Random forest is a bagging ensemble technique that combines multiple decision trees. The idea behind bagging is to reduce the likelihood of overfitting, by using weak classifiers, but combining multiple diverse weak classifiers into a strong classifier. Random forest accomplishes this by training multiple decision trees but only using a subset of the variables in each tree and the subset of variables differ between trees. Our packages uses the sklearn learn implementation of Random Forest in python. | mtry (number of features in each tree),ntree (number of trees), maxDepth (max levels in tree), minRows (minimum data points in in node),balance (balance class labels), seed |
| K-nearest neighbors | K-nearest neighbors (KNN) is an algorithm that uses some metric to find the K closest labelled data-points, given the specified metric, to a new unlabelled data-point. The prediction of the new data-points is then the most prevalent class of the K-nearest labelled data-points. There is a sharing limitation of KNN, as the model requires labelled data to perform the prediction on new data, and it is often not possible to share this data across data sites.We included the BigKnn classifier developed in OHDSI which is a large scale k-nearest neighbor classifier using the Lucene search engine: https://github.com/OHDSI/BigKnn | k (number of neighbours),weighted (weight by inverse frequency) |
| Naive Bayes | The Naive Bayes algorithm applies the Bayes theorem with the 'naive' assumption of conditional independence between every pair of features given the value of the class variable. Based on the likelihood the data belongs to a class and the prior distribution of the class, a posterior distribution is obtained. | none |
| AdaBoost | AdaBoost is a boosting ensemble technique. Boosting works by iteratively adding classifiers but adds more weight to the data-points that are misclassified by prior classifiers in the cost function when training the next classifier. We use the sklearn 'AdaboostClassifier' implementation in Python. | nEstimators (the maximum number of estimators at which boosting is terminated), learningRate (learning rate shrinks the contribution of each classifier by learning_rate. There is a trade-off between learningRate and nEstimators) |
| Decision Tree | A decision tree is a classifier that partitions the variable space using individual tests selected using a greedy approach. It aims to find partitions that have the highest information gain to separate the classes. The decision tree can easily overfit by enabling a large number of partitions (tree depth) and often needs some regularization (e.g., pruning or specifying hyper-parameters that limit the complexity of the model). We use the sklearn 'DecisionTreeClassifier' implementation in Python. | maxDepth (the maximum depth of the tree), minSamplesSplit,minSamplesLeaf, minImpuritySplit (threshold for early stopping in tree growth. A node will split if its impurity is above the threshold, otherwise it is a leaf.), seed,classWeight ('Balance' or 'None') |
Expand Down

0 comments on commit 667ad20

Please sign in to comment.