diff --git a/_pkgdown.yml b/_pkgdown.yml index de4bb9f36..763c23504 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -23,6 +23,7 @@ navbar: - benchmarks - predictors - bestpractice + - clinicalmodels - news right: [hades, github] components: @@ -41,6 +42,9 @@ navbar: bestpractice: text: Best Practices href: articles/BestPractices.html + clinicalmodels: + text: Clinical Models + href: articles/ClinicalModels.html benchmarks: text: Benchmarks href: articles/BenchmarkTasks.html diff --git a/docs/404.html b/docs/404.html new file mode 100644 index 000000000..7dbf2ed28 --- /dev/null +++ b/docs/404.html @@ -0,0 +1,184 @@ + + + + + + + +Page not found (404) • PatientLevelPrediction + + + + + + + + + + + +
+
+ + + + +
+
+ + +Content not found. Please use links in the navbar. + +
+ + + +
+ + + + +
+ + + + + + + + diff --git a/docs/articles/AddingCustomFeatureEngineering.html b/docs/articles/AddingCustomFeatureEngineering.html new file mode 100644 index 000000000..b7261a0cc --- /dev/null +++ b/docs/articles/AddingCustomFeatureEngineering.html @@ -0,0 +1,396 @@ + + + + + + + +Adding Custom Feature Engineering Functions • PatientLevelPrediction + + + + + + + + + + + + +
+
+ + + + +
+
+ + + + + +
+

Introduction +

+

This vignette describes how you can add your own custom function for +feature engineering in the Observational Health Data Sciences and +Informatics (OHDSI) PatientLevelPrediction +package. This vignette assumes you have read and are comfortable with +building single patient level prediction models as described in the BuildingPredictiveModels +vignette.

+

We invite you to share your new feature engineering functions +with the OHDSI community through our GitHub +repository.

+
+
+

Feature Engineering Function Code Structure +

+

To make a custom feature engineering function that can be used within +PatientLevelPrediction you need to write two different functions. The +‘create’ function and the ‘implement’ function.

+

The ‘create’ function, e.g., +create<FeatureEngineeringFunctionName>, takes the parameters of +the feature engineering ‘implement’ function as input, checks these are +valid and outputs these as a list of class ‘featureEngineeringSettings’ +with the ‘fun’ attribute specifying the ‘implement’ function to +call.

+

The ‘implement’ function, e.g., +implement<FeatureEngineeringFunctionName>, must take as input:

+
    +
  • +

    trainData - a list containing:

    +
      +
    • covariateData: the +plpData$covariateDatarestricted to the training +patients

    • +
    • labels: a data frame that contain +rowId(patient identifier) and outcomeCount +(the class labels)

    • +
    • folds: a data.frame that contains rowId +(patient identifier) and index (the cross validation +fold)

    • +
    +
  • +
  • featureEngineeringSettings - the output of your +create<FeatureEngineeringFunctionName>

  • +
+

The ‘implement’ function can then do any manipulation of the +trainData (adding new features or removing features) but +must output a trainData object containing the new +covariateData, labels and folds +for the training data patients.

+
+
+

Example +

+

Let’s consider the situation where we wish to create an age spline +feature. To make this custom feature engineering function we need to +write the ‘create’ and ‘implement’ R functions.

+
+

Create function +

+

Our age spline feature function will create a new feature using the +plpData$cohorts$ageYear column. We will implement a +restricted cubic spline that requires specifying the number of knots. +Therefore, the inputs for this are: knots - an +integer/double specifying the number of knots.

+
+createAgeSpline <- function(
+                     knots = 5
+                     ){
+  
+  # create list of inputs to implement function
+  featureEngineeringSettings <- list(
+    knots = knots
+    )
+  
+  # specify the function that will implement the sampling
+  attr(featureEngineeringSettings, "fun") <- "implementAgeSplines"
+
+  # make sure the object returned is of class "sampleSettings"
+  class(featureEngineeringSettings) <- "featureEngineeringSettings"
+  return(featureEngineeringSettings)
+  
+}
+

We now need to create the ‘implement’ function +implementAgeSplines()

+
+
+

Implement function +

+

All ‘implement’ functions must take as input the +trainData and the featureEngineeringSettings +(this is the output of the ‘create’ function). They must return a +trainData object containing the new +covariateData, labels and +folds.

+

In our example, the createAgeSpline() will return a list +with ‘knots’. The featureEngineeringSettings therefore +contains this.

+
+implementAgeSplines <- function(trainData, featureEngineeringSettings, model=NULL) {
+  # if there is a model, it means this function is called through applyFeatureengineering, meaning it   # should apply the model fitten on training data to the test data
+  if (is.null(model)) {
+    knots <- featureEngineeringSettings$knots
+    ageData <- trainData$labels
+    y <- ageData$outcomeCount
+    X <- ageData$ageYear
+    model <- mgcv::gam(
+      y ~ s(X, bs='cr', k=knots, m=2)
+    )
+    newData <- data.frame(
+      rowId = ageData$rowId,
+      covariateId = 2002,
+      covariateValue = model$fitted.values
+    )
+  }
+  else {
+    ageData <- trainData$labels
+    X <- trainData$labels$ageYear
+    y <- ageData$outcomeCount
+    newData <- data.frame(y=y, X=X)
+    yHat <- predict(model, newData)
+    newData <- data.frame(
+      rowId = trainData$labels$rowId,
+      covariateId = 2002,
+      covariateValue = yHat
+    )
+  }
+  
+  # remove existing age if in covariates 
+  trainData$covariateData$covariates <- trainData$covariateData$covariates |> 
+    dplyr::filter(!covariateId %in% c(1002))
+  
+  # update covRef
+  Andromeda::appendToTable(trainData$covariateData$covariateRef, 
+                           data.frame(covariateId=2002,
+                                      covariateName='Cubic restricted age splines',
+                                      analysisId=2,
+                                      conceptId=2002))
+  
+  # update covariates
+  Andromeda::appendToTable(trainData$covariateData$covariates, newData)
+  
+  featureEngineering <- list(
+    funct = 'implementAgeSplines',
+    settings = list(
+      featureEngineeringSettings = featureEngineeringSettings,
+      model = model
+    )
+  )
+  
+  attr(trainData$covariateData, 'metaData')$featureEngineering = listAppend(
+    attr(trainData$covariateData, 'metaData')$featureEngineering,
+    featureEngineering
+  )
+  
+  return(trainData)
+}
+
+
+
+

Acknowledgments +

+

Considerable work has been dedicated to provide the +PatientLevelPrediction package.

+
+citation("PatientLevelPrediction")
+
## 
+## To cite PatientLevelPrediction in publications use:
+## 
+##   Reps JM, Schuemie MJ, Suchard MA, Ryan PB, Rijnbeek P (2018). "Design
+##   and implementation of a standardized framework to generate and
+##   evaluate patient-level prediction models using observational
+##   healthcare data." _Journal of the American Medical Informatics
+##   Association_, *25*(8), 969-975.
+##   <https://doi.org/10.1093/jamia/ocy032>.
+## 
+## A BibTeX entry for LaTeX users is
+## 
+##   @Article{,
+##     author = {J. M. Reps and M. J. Schuemie and M. A. Suchard and P. B. Ryan and P. Rijnbeek},
+##     title = {Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data},
+##     journal = {Journal of the American Medical Informatics Association},
+##     volume = {25},
+##     number = {8},
+##     pages = {969-975},
+##     year = {2018},
+##     url = {https://doi.org/10.1093/jamia/ocy032},
+##   }
+

Please reference this paper if you use the PLP Package in +your work:

+

Reps JM, Schuemie +MJ, Suchard MA, Ryan PB, Rijnbeek PR. Design and implementation of a +standardized framework to generate and evaluate patient-level prediction +models using observational healthcare data. J Am Med Inform Assoc. +2018;25(8):969-975.

+

This work is supported in part through the National Science +Foundation grant IIS 1251151.

+
+
+ + + +
+ + + + +
+ + + + + + + + diff --git a/docs/articles/AddingCustomModels.html b/docs/articles/AddingCustomModels.html new file mode 100644 index 000000000..3cafc9a9e --- /dev/null +++ b/docs/articles/AddingCustomModels.html @@ -0,0 +1,780 @@ + + + + + + + +Adding Custom Patient-Level Prediction Algorithms • PatientLevelPrediction + + + + + + + + + + + + +
+
+ + + + +
+
+ + + + + +
+

Introduction +

+

This vignette describes how you can add your own custom algorithms in +the Observational Health Data Sciencs and Informatics (OHDSI) PatientLevelPrediction +package. This allows you to fully leverage the OHDSI +PatientLevelPrediction framework for model development and validation. +This vignette assumes you have read and are comfortable with building +single patient level prediction models as described in the BuildingPredictiveModels +vignette.

+

We invite you to share your new algorithms with the OHDSI +community through our GitHub +repository.

+
+
+

Algorithm Code Structure +

+

Each algorithm in the package should be implemented in its own +<Name>.R file, e.g. KNN.R, containing a set<Name> function, +a fit<Name> function and a predict<Name> function. +Occasionally the fit and prediction functions may be reused (if using an +R classifier see RClassifier.R or if using a scikit-learn classifier see +SklearnClassifier.R). We will now describe each of these functions in +more detail below.

+
+

Set +

+

The set<Name> is a function that takes as input the different +hyper-parameter values to do a grid search when training. The output of +the functions needs to be a list as class modelSettings +containing:

+
    +
  • param - all the combinations of the hyper-parameter values +input
  • +
  • fitFunction - a string specifying what function to call to fit the +model
  • +
+

The param object can have a setttings attribute containing any extra +settings. For example to specify the model name and the seed used for +reproducibility:

+
+attr(param, 'settings') <- list(
+  seed = 12,
+  modelName = "Special classifier"
+  )
+

For example, if you were adding a model called madeUp that has two +hyper-parameters then the set function should be:

+
+setMadeUp <- function(a=c(1,4,10), b=2, seed=NULL){
+  # add input checks here...
+  
+  param <- split(
+    expand.grid(
+      a=a, 
+      b=b
+    ),
+    1:(length(a)*length(b))
+    )
+  
+  attr(param, 'settings') <- list(
+    modelName = "Made Up",
+    requiresDenseMatrix = TRUE,
+    seed = seed
+    )
+  
+  # now create list of all combinations:
+  result <- list(
+    fitFunction = 'fitMadeUp', # this will be called to train the made up model
+    param = param
+  )
+  class(result) <- 'modelSettings' 
+  
+  return(result)
+}
+
+
+

Fit +

+

This function should train your custom model for each parameter +entry, pick the best parameters and train a final model for that +setting.

+

The fit<Model> should have as inputs:

+
    +
  • trainData - a list containing the covariateData, labels and folds +for the training population
  • +
  • param - the hyper-parameters as a list of all combinations
  • +
  • search - the type of hyper-parameter search
  • +
  • analysisId - an identifier for the analysis
  • +
+

The fit function should return a list of class plpModel +with the following objects:

+
    +
  • model - a trained model (or location of the model if it is not an R +object)
  • +
  • prediction - a data.frame object with the trainData$labels plus an +extra column with the name ‘value’ corresponding to the predicted risk +of having the outcome during the time-at-risk.
  • +
  • preprocessing - the settings required to preprocess the data when +applying the model +
      +
    • featureEngineering - the feature engineering settings e.g., +attr(trainData\(covariateData, +"metaData")\)featureEngineering,
    • +
    • tidyCovariates - the preprocessing settings e.g., +attr(trainData\(covariateData, +"metaData")\)tidyCovariateDataSettings,
    • +
    • requireDenseMatrix - does the model require a dense matrix? e.g., +attr(param, ‘settings’)$requiresDenseMatrix,
    • +
    +
  • +
  • modelDesign - a list containing: +
      +
    • targetId - the id of the target cohort
    • +
    • outcomeId - the id of the outcome cohort
    • +
    • plpDataSettings - the plpData settings e.g., attr(trainData, +“metaData”)$plpDataSettings
    • +
    • covariateSettings - the covariate settings e.g., attr(trainData, +“metaData”)$covariateSettings
    • +
    • populationSettings - the population settings e.g., attr(trainData, +“metaData”)$populationSettings,
    • +
    • featureEngineeringSettings - the feature engineering settings e.g., +attr(trainData\(covariateData, +"metaData")\)featureEngineeringSettings,
    • +
    • preprocessSettings - the preprocessing settings e.g., +attr(trainData\(covariateData, +"metaData")\)preprocessSettings,
    • +
    • modelSettings = a list containing: model (model name), param (the +hyper-parameter search list), finalModelParameters (the final model +hyper-parameters), extraSettings (any extra settings)
    • +
    • splitSettings - the split settings e.g., attr(trainData, +“metaData”)$splitSettings,
    • +
    • sampleSettings - the sample settings e.g., attr(trainData, +“metaData”)$sampleSettings
    • +
    +
  • +
  • trainDetails - a list containing: +
      +
    • analysisId - the identifier for the analysis
    • +
    • developmentDatabase - the database used to develop the model
    • +
    • attrition - the attrition
    • +
    • trainingTime - how long it took to train the model
    • +
    • trainingDate - date of model training
    • +
    • hyperParamSearch - the hyper-parameter search used to train the +model
    • +
    • any other objects specific to training
    • +
    +
  • +
  • covariateImportance - a data.frame containing the columns +‘covariateId’, ‘covariateValue’ (the variable importance) and ‘columnId’ +(the column number that the variable need to be mapped to when +implementing the model)
  • +
+

In additon the plpModel requires two attributes:

+
    +
  • predictionFunction - the name of the function used to make +predictions
  • +
  • modelType - whether the model is ‘binary’ or ‘survival’
  • +
+

For example +attr(result, 'predictionFunction') <- 'madeupPrediction' +means when the model is applied to new data, the ‘madeupPrediction’ +function is called to make predictions. If this doesnt exist, then the +model will fail. The other attribute is the modelType +attr(result, 'modelType') <- 'binary' this is needed +when evaluating the model to ensure the correct evaluation is applied. +Currently the evaluation supports ‘binary’ and ‘survival’ modelType.

+

Note: If a new modelType is desired, then the evalaution code within +PatientLevelPrediction must be updated to specify how the new type is +evaluated. This requires making edits to PatientLevelPrediction and then +making a pull request to the PatientLevelPrediction github. The +evaluation cannot have one off customization because the evaluation must +be standardized to enable comparison across similar models.

+

A full example of a custom ‘binary’ classifier fit function is:

+
+fitMadeUp <- function(trainData, modelSettings, search, analysisId){
+  
+  param <- modelSettings$param
+  
+  # **************** code to train the model here
+  # trainedModel <- this code should apply each hyper-parameter combination   
+  # (param[[i]]) using the specified search (e.g., cross validation)
+  #                 then pick out the best hyper-parameter setting
+  #                 and finally fit a model on the whole train data using the 
+  #                 optimal hyper-parameter settings
+  # ****************
+  
+  # **************** code to apply the model to trainData
+  # prediction <- code to apply trainedModel to trainData
+  # ****************
+  
+  # **************** code to get variable importance (if possible)
+  # varImp <- code to get importance of each variable in trainedModel
+  # ****************
+  
+  
+  # construct the standard output for a model:
+  result <- list(model = trainedModel,
+                 prediction = prediction, # the train and maybe the cross validation predictions for the trainData
+                 preprocessing = list(
+                   featureEngineering = attr(trainData$covariateData, "metaData")$featureEngineering,
+      tidyCovariates = attr(trainData$covariateData, "metaData")$tidyCovariateDataSettings, 
+      requireDenseMatrix = attr(param, 'settings')$requiresDenseMatrix,
+      
+                 ),
+    modelDesign = list(
+      outcomeId = attr(trainData, "metaData")$outcomeId,
+      targetId = attr(trainData, "metaData")$targetId,
+      plpDataSettings = attr(trainData, "metaData")$plpDataSettings,
+      covariateSettings = attr(trainData, "metaData")$covariateSettings,
+      populationSettings = attr(trainData, "metaData")$populationSettings,
+      featureEngineeringSettings = attr(trainData$covariateData, "metaData")$featureEngineeringSettings,
+      prerocessSettings = attr(trainData$covariateData, "metaData")$prerocessSettings, 
+      modelSettings = list(
+        model = attr(param, 'settings')$modelName, # the model name
+        param = param,
+        finalModelParameters = param[[bestInd]], # best hyper-parameters
+        extraSettings = attr(param, 'settings')
+      ),
+      splitSettings = attr(trainData, "metaData")$splitSettings,
+      sampleSettings = attr(trainData, "metaData")$sampleSettings
+    ),
+    
+    trainDetails = list(
+      analysisId = analysisId,
+      developmentDatabase = attr(trainData, "metaData")$cdmDatabaseSchema,
+      attrition = attr(trainData, "metaData")$attrition, 
+      trainingTime = timeToTrain, # how long it took to train the model
+      trainingDate = Sys.Date(),
+      hyperParamSearch = hyperSummary # the hyper-parameters and performance data.frame
+    ),
+    covariateImportance = merge(trainData$covariateData$covariateRef, varImp, by='covariateId') # add variable importance to covariateRef if possible
+  )
+  class(result) <- 'plpModel'
+  attr(result, 'predictionFunction') <- 'madeupPrediction'
+  attr(result, 'modelType') <- 'binary'
+  return(result)
+    
+}
+

You could make the fitMadeUp function cleaner by adding helper +function in the MadeUp.R file that are called by the fit function (for +example a function to run cross validation). It is important to ensure +there is a valid prediction function (the one specified by +attr(result, 'predictionFunction') <- 'madeupPrediction' +is madeupPrediction()) as specified below.

+
+
+

Predict +

+

The prediction function takes as input the plpModel returned by fit, +new data and a corresponding cohort. It returns a data.frame with the +same columns as cohort but with an additional column:

+
    +
  • value - the predicted risk from the plpModel for each patient in the +cohort
  • +
+

For example:

+
+madeupPrediction <- function(plpModel, data, cohort){ 
+
+  # ************* code to do prediction for each rowId in cohort
+  # predictionValues <- code to do prediction here returning the predicted risk
+  #               (value) for each rowId in cohort 
+  #**************
+  
+  prediction <- merge(cohort, predictionValues, by='rowId')
+  attr(prediction, "metaData") <- list(modelType = attr(plpModel, 'modelType')) 
+  return(prediction)
+  
+}
+
+
+
+

Algorithm Example +

+

Below a fully functional algorithm example is given, however we +highly recommend you to have a look at the available algorithms in the +package (see GradientBoostingMachine.R for the set function, +RClassifier.R for the fit and prediction function for R +classifiers).

+
+

Set +

+
+setMadeUp <- function(a=c(1,4,6), b=2, seed=NULL){
+  # add input checks here...
+  
+  if(is.null(seed)){
+    seed <- sample(100000,1)
+  }
+  
+  param <- split(
+    expand.grid(
+      a=a, 
+      b=b
+    ),
+    1:(length(a)*length(b))
+    )
+  
+  attr(param, 'settings') <- list(
+    modelName = "Made Up",
+    requiresDenseMatrix = TRUE,
+    seed = seed
+    )
+  
+  # now create list of all combinations:
+  result <- list(
+    fitFunction = 'fitMadeUp', # this will be called to train the made up model
+    param = param
+  )
+  class(result) <- 'modelSettings' 
+  
+  return(result)
+}
+
+
+

Fit +

+
fitMadeUp <- function(trainData, modelSettings, search, analysisId){
+
+  # set the seed for reproducibility
+  param <- modelSettings$param
+  set.seed(attr(param, 'settings')$seed)
+  
+  # add folds to labels:
+  trainData$labels <- merge(trainData$labels, trainData$folds, by= 'rowId')
+  # convert data into sparse R Matrix:
+  mappedData <- toSparseM(trainData,map=NULL)
+  matrixData <- mappedData$dataMatrix
+  labels <- mappedData$labels
+  covariateRef <- mappedData$covariateRef
+
+  #============= STEP 1 ======================================
+  # pick the best hyper-params and then do final training on all data...
+  writeLines('Cross validation')
+  param.sel <- lapply(
+    param, 
+    function(x){
+      do.call(
+        made_up_model, 
+        list(
+          param = x, 
+          final = F, 
+          data = matrixData, 
+          labels = labels
+          )  
+      )
+      }
+    )
+  hyperSummary <- do.call(rbind, lapply(param.sel, function(x) x$hyperSum))
+  hyperSummary <- as.data.frame(hyperSummary)
+  hyperSummary$auc <- unlist(lapply(param.sel, function(x) x$auc)) 
+  param.sel <- unlist(lapply(param.sel, function(x) x$auc))
+  bestInd <- which.max(param.sel)
+  
+  #get cross val prediction for best hyper-parameters
+  prediction <- param.sel[[bestInd]]$prediction
+  prediction$evaluationType <- 'CV'
+  
+  writeLines('final train')
+  finalResult <- do.call(
+    made_up_model, 
+    list(
+      param = param[[bestInd]], 
+      final = T, 
+      data = matrixData, 
+      labels = labels
+      )  
+    )
+  
+  trainedModel <- finalResult$model
+  
+  # prediction risk on training data:
+  finalResult$prediction$evaluationType <- 'Train'
+  
+  # get CV and train prediction
+  prediction <- rbind(prediction, finalResult$prediction)
+  
+  varImp <- covariateRef %>% dplyr::collect()
+  # no feature importance available
+  vqrImp$covariateValue <- 0 
+  
+ timeToTrain <- Sys.time() - start
+
+  # construct the standard output for a model:
+  result <- list(model = trainedModel,
+                 prediction = prediction, 
+    preprocessing = list(
+                   featureEngineering = attr(trainData$covariateData, "metaData")$featureEngineering,
+      tidyCovariates = attr(trainData$covariateData, "metaData")$tidyCovariateDataSettings, 
+      requireDenseMatrix = attr(param, 'settings')$requiresDenseMatrix,
+      
+                 ),
+    modelDesign = list(
+      outcomeId = attr(trainData, "metaData")$outcomeId,
+      targetId = attr(trainData, "metaData")$targetId,
+      plpDataSettings = attr(trainData, "metaData")$plpDataSettings,
+      covariateSettings = attr(trainData, "metaData")$covariateSettings,
+      populationSettings = attr(trainData, "metaData")$populationSettings,
+      featureEngineeringSettings = attr(trainData$covariateData, "metaData")$featureEngineeringSettings,
+      prerocessSettings = attr(trainData$covariateData, "metaData")$prerocessSettings, 
+      modelSettings = list(
+        model = attr(param, 'settings')$modelName, # the model name
+        param = param,
+        finalModelParameters = param[[bestInd]], # best hyper-parameters
+        extraSettings = attr(param, 'settings')
+      ),
+      splitSettings = attr(trainData, "metaData")$splitSettings,
+      sampleSettings = attr(trainData, "metaData")$sampleSettings
+    ),
+    
+    trainDetails = list(
+      analysisId = analysisId,
+      developmentDatabase = attr(trainData, "metaData")$cdmDatabaseSchema,
+      attrition = attr(trainData, "metaData")$attrition, 
+      trainingTime = timeToTrain, # how long it took to train the model
+      trainingDate = Sys.Date(),
+      hyperParamSearch = hyperSummary # the hyper-parameters and performance data.frame
+    ),
+    covariateImportance = merge(trainData$covariateData$covariateRef, varImp, by='covariateId') # add variable importance to covariateRef if possible
+  ),
+    covariateImportance = varImp
+  )
+  class(result) <- 'plpModel'
+  attr(result, 'predictionFunction') <- 'madeupPrediction'
+  attr(result, 'modelType') <- 'binary'
+  return(result)
+    
+}
+
+
+

Helpers +

+

In the fit model a helper function made_up_model is +called, this is the function that trains a model given the data, labels +and hyper-parameters.

+
+made_up_model <- function(param, data, final=F, labels){
+  
+  if(final==F){
+    # add value column to store all predictions
+    labels$value <- rep(0, nrow(labels))
+    attr(labels, "metaData") <- list(modelType = "binary")
+    
+    foldPerm <- c() # this holds CV aucs
+    for(index in 1:max(labels$index)){
+      model <- madeup::model(
+        x = data[labels$index!=index,], # remove left out fold
+        y = labels$outcomeCount[labels$index!=index],
+        a = param$a, 
+        b = param$b
+      )
+      
+      # predict on left out fold
+      pred <- stats::predict(model, data[labels$index==index,])
+      labels$value[labels$index==index] <- pred
+      
+      # calculate auc on help out fold  
+      aucVal <- computeAuc(labels[labels$index==index,])
+      foldPerm<- c(foldPerm,aucVal)    
+    }
+    auc <- computeAuc(labels) # overal AUC
+
+  } else {
+    model <- madeup::model(
+      x = data, 
+      y = labels$outcomeCount,
+      a = param$a,
+      b = param$b
+      )
+    
+    pred <- stats::predict(model, data)
+    labels$value <- pred
+    attr(labels, "metaData") <- list(modelType = "binary") 
+    auc <- computeAuc(labels)
+    foldPerm <- auc
+  }
+  
+  result <- list(
+    model = model,
+    auc = auc,
+    prediction = labels,
+    hyperSum = c(a = a, b = b, fold_auc = foldPerm)
+  )
+  
+  return(result)
+}
+
+
+

Predict +

+

The final step is to create a predict function for the model. In the +example above the predeiction function +attr(result, 'predictionFunction') <- 'madeupPrediction' +was madeupPrediction, so a madeupPrediction function is +required when applying the model. The predict function needs to take as +input the plpModel returned by the fit function, new data to apply the +model on and the cohort specifying the patients of interest to make the +prediction for.

+
+madeupPrediction <- function(plpModel, data , cohort){ 
+  
+  if(class(data) == 'plpData'){
+    # convert
+    matrixObjects <- toSparseM(
+      plpData = data, 
+      cohort = cohort,
+      map = plpModel$covariateImportance %>% 
+        dplyr::select("columnId", "covariateId")
+    )
+    
+    newData <- matrixObjects$dataMatrix
+    cohort <- matrixObjects$labels
+    
+  }else{
+    newData <- data
+  }
+  
+  if(class(plpModel) == 'plpModel'){
+    model <- plpModel$model
+  } else{
+    model <- plpModel
+  }
+  
+  cohort$value <- stats::predict(model, data)
+  
+  # fix the rowIds to be the old ones
+  # now use the originalRowId and remove the matrix rowId
+  cohort <- cohort %>% 
+    dplyr::select(-"rowId") %>%
+    dplyr::rename(rowId = "originalRowId")
+  
+  attr(cohort, "metaData") <- list(modelType = attr(plpModel, 'modelType')) 
+  return(cohort)
+  
+}
+

As the madeup model uses the standard R prediction, it has the same +prediction function as xgboost, so we could have not added a new +prediction function and instead made the predictionFunction of the +result returned by fitMadeUpModel to +attr(result, 'predictionFunction') <- 'predictXgboost'.

+
+
+
+

Acknowledgments +

+

Considerable work has been dedicated to provide the +PatientLevelPrediction package.

+
+citation("PatientLevelPrediction")
+
## 
+## To cite PatientLevelPrediction in publications use:
+## 
+##   Reps JM, Schuemie MJ, Suchard MA, Ryan PB, Rijnbeek P (2018). "Design
+##   and implementation of a standardized framework to generate and
+##   evaluate patient-level prediction models using observational
+##   healthcare data." _Journal of the American Medical Informatics
+##   Association_, *25*(8), 969-975.
+##   <https://doi.org/10.1093/jamia/ocy032>.
+## 
+## A BibTeX entry for LaTeX users is
+## 
+##   @Article{,
+##     author = {J. M. Reps and M. J. Schuemie and M. A. Suchard and P. B. Ryan and P. Rijnbeek},
+##     title = {Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data},
+##     journal = {Journal of the American Medical Informatics Association},
+##     volume = {25},
+##     number = {8},
+##     pages = {969-975},
+##     year = {2018},
+##     url = {https://doi.org/10.1093/jamia/ocy032},
+##   }
+

Please reference this paper if you use the PLP Package in +your work:

+

Reps JM, Schuemie +MJ, Suchard MA, Ryan PB, Rijnbeek PR. Design and implementation of a +standardized framework to generate and evaluate patient-level prediction +models using observational healthcare data. J Am Med Inform Assoc. +2018;25(8):969-975.

+

This work is supported in part through the National Science +Foundation grant IIS 1251151.

+
+
+ + + +
+ + + + +
+ + + + + + + + diff --git a/docs/articles/AddingCustomSamples.html b/docs/articles/AddingCustomSamples.html new file mode 100644 index 000000000..84ccdde27 --- /dev/null +++ b/docs/articles/AddingCustomSamples.html @@ -0,0 +1,367 @@ + + + + + + + +Adding Custom Sampling Functions • PatientLevelPrediction + + + + + + + + + + + + +
+
+ + + + +
+
+ + + + + +
+

Introduction +

+

This vignette describes how you can add your own custom function for +sampling the target population in the Observational Health Data Sciencs +and Informatics (OHDSI) PatientLevelPrediction +package. This vignette assumes you have read and are comfortable with +building single patient level prediction models as described in the BuildingPredictiveModels +vignette.

+

We invite you to share your new sample functions with the +OHDSI community through our GitHub +repository.

+
+
+

Sample Function Code Structure +

+

To make a sampling function that can be used within +PatientLevelPrediction you need to write two different functions. The +‘create’ function and the ‘implement’ function.

+

The ‘create’ function, e.g., create<SampleFunctionName>, takes +the parameters of the sample ‘implement’ function as input, checks these +are valid and outputs these as a list of class ‘sampleSettings’ with the +‘fun’ attribute specifying the ‘implement’ function to call.

+

The ‘implement’ function, e.g., implement<SampleFunctionName>, +must take as input: * trainData - a list containing: - covariateData: +the plpData$covariateData restricted to the training patients - labels: +a data frame that contain rowId (patient identifier) and outcomeCount +(the class labels) - folds: a data.frame that contains rowId (patient +identifier) and index (the cross validation fold) * sampleSettings - the +output of your create<SampleFunctionName>

+

The ‘implement’ function can then do any manipulation of the +trainData (such as undersampling or oversampling) but must output a +trainData object containing the covariateData, labels and folds for the +new training data sample.

+
+
+

Example +

+

Let’s consider the situation where we wish to take a random sample of +the training data population. To make this custom sampling function we +need to write the ‘create’ and ‘implement’ R functions.

+
+

Create function +

+

Our random sampling function will randomly sample n +patients from the trainData. Therefore, the inputs for this are: * +n an integer/double specifying the number of patients to +sample * sampleSeed an integer/double specifying the seed +for reproducibility

+
+createRandomSampleSettings <- function(
+                     n = 10000,
+                     sampleSeed = sample(10000,1)
+                     ){
+  
+  # add input checks
+  checkIsClass(n, c('numeric','integer'))
+  checkHigher(n,0)
+  checkIsClass(sampleSeed, c('numeric','integer'))
+  
+  # create list of inputs to implement function
+  sampleSettings <- list(
+    n = n,
+    sampleSeed  = sampleSeed 
+    )
+  
+  # specify the function that will implement the sampling
+  attr(sampleSettings, "fun") <- "implementRandomSampleSettings"
+
+  # make sure the object returned is of class "sampleSettings"
+  class(sampleSettings) <- "sampleSettings"
+  return(sampleSettings)
+  
+}
+

We now need to create the ‘implement’ function +implementRandomSampleSettings()

+
+
+

Implement function +

+

All ‘implement’ functions must take as input the trainData and the +sampleSettings (this is the output of the ‘create’ function). They must +return a trainData object containing the covariateData, labels and +folds.

+

In our example, the createRandomSampleSettings() will +return a list with ‘n’ and ‘sampleSeed’. The sampleSettings therefore +contains these.

+
+implementRandomSampleSettings <- function(trainData, sampleSettings){
+
+  n <- sampleSetting$n
+  sampleSeed <- sampleSetting$sampleSeed
+  
+  if(n > nrow(trainData$labels)){
+    stop('Sample n bigger than training population')
+  }
+  
+  # set the seed for the randomization
+  set.seed(sampleSeed)
+  
+  # now implement the code to do your desired sampling
+  
+  sampleRowIds <- sample(trainData$labels$rowId, n)
+  
+  sampleTrainData <- list()
+  
+  sampleTrainData$labels <- trainData$labels %>% 
+    dplyr::filter(.data$rowId %in% sampleRowIds) %>% 
+    dplyr::collect()
+  
+  sampleTrainData$folds <- trainData$folds %>% 
+    dplyr::filter(.data$rowId %in% sampleRowIds) %>% 
+    dplyr::collect()
+  
+  sampleTrainData$covariateData <- Andromeda::andromeda()
+  sampleTrainData$covariateData$covariateRef <-trainData$covariateData$covariateRef
+  sampleTrainData$covariateData$covariates <- trainData$covariateData$covariates %>% dplyr::filter(.data$rowId %in% sampleRowIds)
+  
+  #update metaData$populationSize 
+  metaData <- attr(trainData$covariateData, 'metaData')
+  metaData$populationSize = n
+  attr(sampleTrainData$covariateData, 'metaData') <- metaData
+  
+  # make the cocvariateData the correct class
+  class(sampleTrainData$covariateData) <- 'CovariateData'
+  
+  # return the updated trainData
+  return(sampleTrainData)
+}
+
+
+
+

Acknowledgments +

+

Considerable work has been dedicated to provide the +PatientLevelPrediction package.

+
+citation("PatientLevelPrediction")
+
## 
+## To cite PatientLevelPrediction in publications use:
+## 
+##   Reps JM, Schuemie MJ, Suchard MA, Ryan PB, Rijnbeek P (2018). "Design
+##   and implementation of a standardized framework to generate and
+##   evaluate patient-level prediction models using observational
+##   healthcare data." _Journal of the American Medical Informatics
+##   Association_, *25*(8), 969-975.
+##   <https://doi.org/10.1093/jamia/ocy032>.
+## 
+## A BibTeX entry for LaTeX users is
+## 
+##   @Article{,
+##     author = {J. M. Reps and M. J. Schuemie and M. A. Suchard and P. B. Ryan and P. Rijnbeek},
+##     title = {Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data},
+##     journal = {Journal of the American Medical Informatics Association},
+##     volume = {25},
+##     number = {8},
+##     pages = {969-975},
+##     year = {2018},
+##     url = {https://doi.org/10.1093/jamia/ocy032},
+##   }
+

Please reference this paper if you use the PLP Package in +your work:

+

Reps JM, Schuemie +MJ, Suchard MA, Ryan PB, Rijnbeek PR. Design and implementation of a +standardized framework to generate and evaluate patient-level prediction +models using observational healthcare data. J Am Med Inform Assoc. +2018;25(8):969-975.

+

This work is supported in part through the National Science +Foundation grant IIS 1251151.

+
+
+ + + +
+ + + + +
+ + + + + + + + diff --git a/docs/articles/AddingCustomSplitting.html b/docs/articles/AddingCustomSplitting.html new file mode 100644 index 000000000..6f56760db --- /dev/null +++ b/docs/articles/AddingCustomSplitting.html @@ -0,0 +1,333 @@ + + + + + + + +Adding Custom Data Splitting Functions • PatientLevelPrediction + + + + + + + + + + + + +
+
+ + + + +
+
+ + + + + +
+

Introduction +

+

This vignette describes how you can add your own custom function for +splitting the labelled data into training data and validation data in +the Observational Health Data Sciencs and Informatics (OHDSI) PatientLevelPrediction +package. This vignette assumes you have read and are comfortable with +building single patient level prediction models as described in the BuildingPredictiveModels +vignette.

+

We invite you to share your new data splitting functions with +the OHDSI community through our GitHub +repository.

+
+
+

Data Splitting Function Code Structure +

+

To make a custom data splitting function that can be used within +PatientLevelPrediction you need to write two different functions. The +‘create’ function and the ‘implement’ function.

+

The ‘create’ function, e.g., create<DataSplittingFunction>, +takes the parameters of the data splitting ‘implement’ function as +input, checks these are valid and outputs these as a list of class +‘splitSettings’ with the ‘fun’ attribute specifying the ‘implement’ +function to call.

+

The ‘implement’ function, e.g., +implement<DataSplittingFunction>, must take as input: * +population: a data frame that contain rowId (patient identifier), +ageYear, gender and outcomeCount (the class labels) * splitSettings - +the output of your create<DataSplittingFunction>

+

The ‘implement’ function then needs to implement code to assign each +rowId in the population to a splitId (<0 means in the train data, 0 +means not used and >0 means in the training data with the value +defining the cross validation fold).

+
+
+

Example +

+

Let’s consider the situation where we wish to create a split where +females are used to train a model but males are used to evaluate the +model.

+
+

Create function +

+

Our gender split function requires a single parameter, the number of +folds used in cross validation. Therefore create a function with a +single nfold input that returns a list of class ‘splitSettings’ with the +‘fun’ attribute specifying the ‘implement’ function we will use.

+
+createGenderSplit <- function(nfold)
+  {
+  
+  # create list of inputs to implement function
+  splitSettings <- list(nfold = nfold)
+  
+  # specify the function that will implement the sampling
+  attr(splitSettings, "fun") <- "implementGenderSplit"
+
+  # make sure the object returned is of class "sampleSettings"
+  class(splitSettings) <- "splitSettings"
+  return(splitSettings)
+  
+}
+

We now need to create the ‘implement’ function +implementGenderSplit()

+
+
+

Implement function +

+

All ‘implement’ functions for data splitting must take as input the +population and the splitSettings (this is the output of the ‘create’ +function). They must return a data.frame containing columns: rowId and +index.

+

The index is used to determine whether the patient (identifed by the +rowId) is in the test set (index = -1) or train set (index > 0). In +in the train set, the value corresponds to the cross validation fold. +For example, if rowId 2 is assigned index 5, then it means the patient +with the rowId 2 is used to train the model and is in fold 5.

+
+implementGenderSplit <- function(population, splitSettings){
+
+  # find the people who are male:
+  males <- population$rowId[population$gender == 8507]
+  females <- population$rowId[population$gender == 8532]
+  
+  splitIds <- data.frame(
+    rowId = c(males, females),
+    index = c(
+      rep(-1, length(males)),
+      sample(1:splitSettings$nfold, length(females), replace = T)
+    )
+  )
+  
+  # return the updated trainData
+  return(splitIds)
+}
+
+
+
+

Acknowledgments +

+

Considerable work has been dedicated to provide the +PatientLevelPrediction package.

+
+citation("PatientLevelPrediction")
+
## 
+## To cite PatientLevelPrediction in publications use:
+## 
+##   Reps JM, Schuemie MJ, Suchard MA, Ryan PB, Rijnbeek P (2018). "Design
+##   and implementation of a standardized framework to generate and
+##   evaluate patient-level prediction models using observational
+##   healthcare data." _Journal of the American Medical Informatics
+##   Association_, *25*(8), 969-975.
+##   <https://doi.org/10.1093/jamia/ocy032>.
+## 
+## A BibTeX entry for LaTeX users is
+## 
+##   @Article{,
+##     author = {J. M. Reps and M. J. Schuemie and M. A. Suchard and P. B. Ryan and P. Rijnbeek},
+##     title = {Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data},
+##     journal = {Journal of the American Medical Informatics Association},
+##     volume = {25},
+##     number = {8},
+##     pages = {969-975},
+##     year = {2018},
+##     url = {https://doi.org/10.1093/jamia/ocy032},
+##   }
+

Please reference this paper if you use the PLP Package in +your work:

+

Reps JM, Schuemie +MJ, Suchard MA, Ryan PB, Rijnbeek PR. Design and implementation of a +standardized framework to generate and evaluate patient-level prediction +models using observational healthcare data. J Am Med Inform Assoc. +2018;25(8):969-975.

+

This work is supported in part through the National Science +Foundation grant IIS 1251151.

+
+
+ + + +
+ + + + +
+ + + + + + + + diff --git a/docs/articles/BenchmarkTasks.html b/docs/articles/BenchmarkTasks.html new file mode 100644 index 000000000..423651309 --- /dev/null +++ b/docs/articles/BenchmarkTasks.html @@ -0,0 +1,343 @@ + + + + + + + +Benchmark Tasks • PatientLevelPrediction + + + + + + + + + + + + +
+
+ + + + +
+
+ + + + + +
+

Benchmark Tasks For Large-Scale Empirical Analyses +

+

Here we provide a set of diverse prediction tasks that can be used +when evaluating the impact of the model design choice when developing +models using observational data.

+ ++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Target Cohort (index)OutcomeTime-at-riskLink
Patients with an outpatient visit in 2017 with no prior cancer +(first visit in 2017)Lung cancer1 day - 3 years after index
Patients newly diagnosed with major depressive disorder (date of +first record)Bipolar1 day - 365 day after index
Patients with an outpatient visit in 2019Dementia1 day - 3 years after index
Patients with an outpatient visit and a positive COVID testHospitalization with pneumonia1 day - 30 days after index
Patients with an outpatient visit and a positive COVID testHospitalization with pneumonia that required intensive services +(ventilation, intubation, tracheotomy, or extracorporeal membrane +oxygenation) or death1 day - 30 days after index
Patients with an outpatient visit and a positive COVID testDeath1 day - 30 days after index
Patients with T2DM who were treated with metformin and who became +new adult users of one of sulfonylureas, thiazolidinediones, dipeptidyl +peptidase-4 inhibitors, glucagon-like peptide-1 receptor agonists, or +sodium-glucose co-transporter-2 inhibitors (date of secondary drug). +Patients with HF or patients treated with insulin on or prior to the +index date were excluded from the analysis. Patients were required to +have been enrolled for at least 365 days before cohort entry.Heart Failure1 to 365 days
Patients newly diagnosed with atrial fibrilation (date of initial +afib record)Ischemic stroke1 to 365 days
Patients undergoing elective major non-cardiac surgery (date of +surgery). Patients were required to have been enrolled for at least 365 +days before cohort entry.Earliest of AMI cardiac arrest or death (MACE)O to 30 days
Patients starting intravitreal Anti-VEGF (date of +administration)Kidney Failure1 to 365 days
Pregnancy women (start of pregnancy)PreeclampsiaDuring pregnancy
Pregnancy women (start of pregnancy)Still birthDuring pregnancy
Patients with COPD (first record)Cardiovascular event and death1-30 days and 1-90 days
Patients starting menopause (first record)Depression1 day - 3-years
Patients with anemia (date of first anemia record)Colorectal cancer1 day - 1-year
Patients with quadriplegia (date of first quadriplegia record)Death1 day - 1-year
Patient undergoing
+
+
+ + + +
+ + + + +
+ + + + + + + + diff --git a/docs/articles/BestPractices.html b/docs/articles/BestPractices.html new file mode 100644 index 000000000..49fc4e02f --- /dev/null +++ b/docs/articles/BestPractices.html @@ -0,0 +1,453 @@ + + + + + + + +Best Practice Research • PatientLevelPrediction + + + + + + + + + + + + +
+
+ + + + +
+
+ + + + + +
+

Best practice publications using the OHDSI PatientLevelPrediction +framework +

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+Topic + +Research Summary + +Link +
+Problem Specification + +When is prediction suitable in observational data? + +Guidelines needed +
+Data Creation + +Comparison of cohort vs case-control design + +Journal +of Big Data +
+Data Creation + +Addressing loss to follow-up (right censoring) + +BMC +medical informatics and decision makingk +
+Data Creation + +Investigating how to address left censoring in features construction + +BMC +Medical Research Methodology +
+Data Creation + +Impact of over/under-sampling + + +Journal of big data +
+Data Creation + +Impact of phenotypes + +Study Done - Paper submitted +
+Model development + +How much data do we need for prediction - Learning curves at scale + +International +Journal of Medical Informatics +
+Model development + +What impact does test/train/validation design have on model performance + +BMJ Open +
+Model development + +What is the impact of the classifier + +JAMIA +
+Model development + +Can we find hyper-parameter combinations per classifier that +consistently lead to good performing models when using claims/EHR data? + +Study needs to be done +
+Model development + +Can we use ensembles to combine different algorithm models within a +database to improve models transportability? + + Caring is +Sharing–Exploiting the Value in Data for Health and Innovation +
+Model development + +Can we use ensembles to combine models developed using different +databases to improve models transportability? + + +BMC Medical Informatics and Decision Making +
+Model development + +Impact of regularization method + + +JAMIA +
+Evaluation + +Why prediction is not suitable for risk factor identification + + Machine +Learning for Healthcare Conference +
+Evaluation + +Iterative pairwise external validation to put validation into context + + +Drug Safety +
+Evaluation + +A novel method to estimate external validation using aggregate +statistics + + Study under review +
+Evaluation + +How should we present model performance? (e.g., new visualizations) + +JAMIA +Open +
+Evaluation + +How to interpret external validation performance (can we figure out why +the performance drops or stays consistent)? + +Study needs to be done +
+Evaluation + +Recalibration methods + +Study needs to be done +
+Evaluation + +Is there a way to automatically simplify models? + +Study +protocol under development +
+
+
+ + + +
+ + + + +
+ + + + + + + + diff --git a/docs/articles/BuildingMultiplePredictiveModels.html b/docs/articles/BuildingMultiplePredictiveModels.html new file mode 100644 index 000000000..45d404d03 --- /dev/null +++ b/docs/articles/BuildingMultiplePredictiveModels.html @@ -0,0 +1,567 @@ + + + + + + + +Automatically Build Multiple Patient-Level Predictive Models • PatientLevelPrediction + + + + + + + + + + + + +
+
+ + + + +
+
+ + + + + +
+

Introduction +

+

In our paper, +we propose a standardised framework for patient-level prediction that +utilizes the OMOP CDM and standardized vocabularies, and describe the +open-source software that we developed implementing the framework’s +pipeline. The framework is the first to enforce existing best practice +guidelines and will enable open dissemination of models that can be +extensively validated across the network of OHDSI collaborators.

+

One our best practices is that we see the selection of models and all +study setting as an emperical question, i.e. we should use a data-driven +approach in which we try many settings. This vignette describes how you +can use the Observational Health Data Sciencs and Informatics (OHDSI) PatientLevelPrediction +package to automatically build multiple patient-level predictive models, +e.g. different population settings, covariate settings, and +modelsetting. This vignette assumes you have read and are comfortable +with building single patient level prediction models as described in the +BuildingPredictiveModels +vignette.

+

Note that it is also possible to generate a Study Package directly in +Atlas that allows for multiple patient-level prediction analyses this is +out-of-scope for this vignette.

+
+
+

Creating a model design +

+

The first step is to specify each model you wish to develop by using +the createModelDesign function. This function requires the +following:

+ + ++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
The inputs for the model design
inputDescription
targetIdThe id for the target cohort
outcomeIdThe id for the outcome
restrictPlpDataSettingsThe settings used to restrict the target population, +created with createRestrictPlpDataSettings()
populationSettingsThe settings used to restrict the target population and +create the outcome labels, created with +createStudyPopulationSettings()
covariateSettingsThe settings used to define the covariates, created +with FeatureExtraction::createDefaultCovariateSettings()
sampleSettingsThe settings used to define any under/over sampling, +created with createSampleSettings()
featureEngineeringSettingsThe settings used to define any feature engineering, +created with createFeatureEngineeringSettings()
preprocessSettingsThe settings used to define any preprocessing, created +with createPreprocessSettings()
modelSettingsThe settings used to define the model fitting settings, +such as setLassoLogisticRegression()
+
+

Model design example 1 +

+

For example, if we wanted to predict the outcome (id 2) occuring for +the first time within 180 days of the the target population index date +(id 1). We are only interested in index dates betwrrn 2018-2020. +Finally, we only want to use age, gender in 5 year buckets and +conditions as features. The model can be specified by:

+
+# Model 1 is only using data between 2018-2020:
+restrictPlpDataSettings <- createRestrictPlpDataSettings(
+  studyStartDate = '20180101', 
+  studyEndDate = '20191231'
+  )
+
+# predict outcome within 1 to 180 days after index
+# remove people with outcome prior and with < 365 days observation
+populationSettings <- createStudyPopulationSettings(
+  binary = T, 
+  firstExposureOnly = T, 
+  washoutPeriod = 365, 
+  removeSubjectsWithPriorOutcome = T,
+  priorOutcomeLookback = 9999,
+  requireTimeAtRisk = F, 
+  riskWindowStart = 1, 
+  riskWindowEnd = 180
+)
+
+# use age/gender in groups and condition groups as features
+covariateSettings <- FeatureExtraction::createCovariateSettings(
+  useDemographicsGender = T, 
+  useDemographicsAgeGroup = T, 
+  useConditionGroupEraAnyTimePrior = T
+)
+
+modelDesign1 <- createModelDesign(
+  targetId = 1, 
+  outcomeId = 2, 
+  restrictPlpDataSettings = restrictPlpDataSettings, 
+  populationSettings = populationSettings, 
+  covariateSettings = covariateSettings, 
+  featureEngineeringSettings = createFeatureEngineeringSettings(),
+  sampleSettings = createSampleSettings(), 
+  splitSettings = createDefaultSplitSetting(), 
+  preprocessSettings = createPreprocessSettings(), 
+  modelSettings = setLassoLogisticRegression()
+  )
+
+
+

Model design example 2 +

+

For the second example, we want to predict the outcome (id 2) +occuring for the first time within 730 days of the the target population +index date (id 1). We want to train a random forest classifier. Finally, +we want to use age, gender in 5 year buckets, drug ingredients (and +groups) and conditions as features. The model can be specified by:

+
+# Model 2 has no restrictions when extracting data
+restrictPlpDataSettings <- createRestrictPlpDataSettings(
+  )
+
+# predict outcome within 1 to 730 days after index
+# remove people with outcome prior and with < 365 days observation
+populationSettings <- createStudyPopulationSettings(
+  binary = T, 
+  firstExposureOnly = T, 
+  washoutPeriod = 365, 
+  removeSubjectsWithPriorOutcome = T,
+  priorOutcomeLookback = 9999,
+  requireTimeAtRisk = F, 
+  riskWindowStart = 1, 
+  riskWindowEnd = 730
+)
+
+# use age/gender in groups and condition/drug groups as features
+covariateSettings <- FeatureExtraction::createCovariateSettings(
+  useDemographicsGender = T, 
+  useDemographicsAgeGroup = T, 
+  useConditionGroupEraAnyTimePrior = T, 
+  useDrugGroupEraAnyTimePrior = T 
+)
+
+modelDesign2 <- createModelDesign(
+  targetId = 1, 
+  outcomeId = 2, 
+  restrictPlpDataSettings = restrictPlpDataSettings, 
+  populationSettings = populationSettings, 
+  covariateSettings = covariateSettings, 
+  featureEngineeringSettings = createRandomForestFeatureSelection(ntrees = 500, maxDepth = 7),
+  sampleSettings = createSampleSettings(), 
+  splitSettings = createDefaultSplitSetting(), 
+  preprocessSettings = createPreprocessSettings(), 
+  modelSettings = setRandomForest()
+  )
+
+
+

Model design example 3 +

+

For the third example, we want to predict the outcome (id 5) occuring +during the cohort exposure of the the target population (id 1). We want +to train a gradient boosting machine. Finally, we want to use age, +gender in 5 year buckets and indications of measurements taken as +features. The model can be specified by:

+
+# Model 3 has no restrictions when extracting data
+restrictPlpDataSettings <- createRestrictPlpDataSettings(
+  )
+
+# predict outcome during target cohort start/end 
+# remove people with  < 365 days observation
+populationSettings <- createStudyPopulationSettings(
+  binary = T, 
+  firstExposureOnly = T, 
+  washoutPeriod = 365, 
+  removeSubjectsWithPriorOutcome = F,
+  requireTimeAtRisk = F, 
+  riskWindowStart = 0,
+  startAnchor =  'cohort start',
+  riskWindowEnd = 0, 
+  endAnchor = 'cohort end'
+)
+
+# use age/gender in groups and measurement indicators as features
+covariateSettings <- FeatureExtraction::createCovariateSettings(
+  useDemographicsGender = T, 
+  useDemographicsAgeGroup = T, 
+  useMeasurementAnyTimePrior = T,
+  endDays = -1
+)
+
+modelDesign3 <- createModelDesign(
+  targetId = 1, 
+  outcomeId = 5, 
+  restrictPlpDataSettings = restrictPlpDataSettings, 
+  populationSettings = populationSettings, 
+  covariateSettings = covariateSettings, 
+  featureEngineeringSettings = createFeatureEngineeringSettings(),
+  sampleSettings = createSampleSettings(), 
+  splitSettings = createDefaultSplitSetting(), 
+  preprocessSettings = createPreprocessSettings(), 
+  modelSettings = setGradientBoostingMachine()
+  )
+
+
+
+

Running multiple models +

+

As we will be downloading loads of data in the multiple plp analysis +it is useful to set the Andromeda temp folder to a directory with write +access and plenty of space. +options(andromedaTempFolder = "c:/andromedaTemp")

+

To run the study requires setting up a connectionDetails object

+
+dbms <- "your dbms"
+user <- "your username"
+pw <- "your password"
+server <- "your server"
+port <- "your port"
+
+connectionDetails <- DatabaseConnector::createConnectionDetails(dbms = dbms,
+                                                                server = server,
+                                                                user = user,
+                                                                password = pw,
+                                                                port = port)
+

Next you need to specify the cdmDatabaseSchema where your cdm +database is found and workDatabaseSchema where your target population +and outcome cohorts are and you need to specify a label for the database +name: a string with a shareable name of the database (this will be shown +to OHDSI researchers if the results get transported).

+
cdmDatabaseSchema <- "your cdmDatabaseSchema"
+workDatabaseSchema <- "your workDatabaseSchema"
+cdmDatabaseName <- "your cdmDatabaseName"
+cohortTable <- "your cohort table",
+
+databaseDetails <- createDatabaseDetails(
+  connectionDetails = connectionDetails, 
+  cdmDatabaseSchema = cdmDatabaseSchema, 
+  cdmDatabaseName = cdmDatabaseName , 
+  cohortDatabaseSchema = workDatabaseSchema, 
+  cohortTable = cohortTable, 
+  outcomeDatabaseSchema = workDatabaseSchema, 
+  outcomeTable = cohortTable 
+  cdmVersion = 5
+    )
+

Now you can run the multiple patient-level prediction analysis:

+
+results <- runMultiplePlp(
+  databaseDetails = databaseDetails, 
+  modelDesignList = list(
+    modelDesign1, 
+    modelDesign2, 
+    modelDesign3
+    ), 
+  onlyFetchData = F, 
+  logSettings = createLogSettings(), 
+  saveDirectory =  "./PlpMultiOutput"
+  )
+

This will then save all the plpData objects from the study into +“./PlpMultiOutput/plpData_T1_L” and the results into +“./PlpMultiOutput/Analysis_”. The csv file named settings.csv found +in “./PlpMultiOutput” has a row for each prediction model developed and +points to the plpData and settings used for the model development, it +also has descriptions of the cohorts if these are input by the user.

+

Note that if for some reason the run is interrupted, e.g. because of +an error, a new call to runMultiplePlp will continue and +not restart until you remove the output folder.

+
+
+

Validating multiple models +

+

If you have access to multiple databases on the same server in +different schemas you could evaluate accross these using this call:

+
+validationDatabaseDetails <- createDatabaseDetails(
+  connectionDetails = connectionDetails, 
+  cdmDatabaseSchema = 'new cdm schema', 
+  cdmDatabaseName = 'validation database', 
+  cohortDatabaseSchema = workDatabaseSchema, 
+  cohortTable = cohortTable, 
+  outcomeDatabaseSchema = workDatabaseSchema, 
+  outcomeTable = cohortTable, 
+  cdmVersion = 5
+  )
+
+val <- validateMultiplePlp(
+  analysesLocation = "./PlpMultiOutput",
+  valdiationDatabaseDetails = validationDatabaseDetails,
+  validationRestrictPlpDataSettings = createRestrictPlpDataSettings(),
+  recalibrate = NULL,
+  saveDirectory = "./PlpMultiOutput/Validation"
+  )
+

This then saves the external validation results in the +Validation folder of the main study (the outputLocation you +used in runPlpAnalyses).

+
+
+

Viewing the results +

+

To view the results for the multiple prediction analysis:

+
+viewMultiplePlp(analysesLocation="./PlpMultiOutput")
+

If the validation directory in “./PlpMultiOutput” has a sqlite +results database, the external validation will also be displayed.

+
+
+

Acknowledgments +

+

Considerable work has been dedicated to provide the +PatientLevelPrediction package.

+
+citation("PatientLevelPrediction")
+
## 
+## To cite PatientLevelPrediction in publications use:
+## 
+##   Reps JM, Schuemie MJ, Suchard MA, Ryan PB, Rijnbeek P (2018). "Design
+##   and implementation of a standardized framework to generate and
+##   evaluate patient-level prediction models using observational
+##   healthcare data." _Journal of the American Medical Informatics
+##   Association_, *25*(8), 969-975.
+##   <https://doi.org/10.1093/jamia/ocy032>.
+## 
+## A BibTeX entry for LaTeX users is
+## 
+##   @Article{,
+##     author = {J. M. Reps and M. J. Schuemie and M. A. Suchard and P. B. Ryan and P. Rijnbeek},
+##     title = {Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data},
+##     journal = {Journal of the American Medical Informatics Association},
+##     volume = {25},
+##     number = {8},
+##     pages = {969-975},
+##     year = {2018},
+##     url = {https://doi.org/10.1093/jamia/ocy032},
+##   }
+

Please reference this paper if you use the PLP Package in +your work:

+

Reps JM, Schuemie +MJ, Suchard MA, Ryan PB, Rijnbeek PR. Design and implementation of a +standardized framework to generate and evaluate patient-level prediction +models using observational healthcare data. J Am Med Inform Assoc. +2018;25(8):969-975.

+
+
+ + + +
+ + + + +
+ + + + + + + + diff --git a/docs/articles/BuildingPredictiveModels.html b/docs/articles/BuildingPredictiveModels.html new file mode 100644 index 000000000..4e13c99f2 --- /dev/null +++ b/docs/articles/BuildingPredictiveModels.html @@ -0,0 +1,2296 @@ + + + + + + + +Building patient-level predictive models • PatientLevelPrediction + + + + + + + + + + + + +
+
+ + + + +
+
+ + + + + +
+

Introduction +

+

Observational healthcare data, such as administrative claims and +electronic health records, are increasingly used for clinical +characterization of disease progression, quality improvement, and +population-level effect estimation for medical product safety +surveillance and comparative effectiveness. Advances in machine learning +for large dataset analysis have led to increased interest in applying +patient-level prediction on this type of data. Patient-level prediction +offers the potential for medical practice to move beyond average +treatment effects and to consider personalized risks as part of clinical +decision-making. However, many published efforts in +patient-level-prediction do not follow the model development guidelines, +fail to perform extensive external validation, or provide insufficient +model details that limits the ability of independent researchers to +reproduce the models and perform external validation. This makes it hard +to fairly evaluate the predictive performance of the models and reduces +the likelihood of the model being used appropriately in clinical +practice. To improve standards, several papers have been written +detailing guidelines for best practices in developing and reporting +prediction models.

+

The Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement +provides clear recommendations for reporting prediction model +development and validation and addresses some of the concerns related to +transparency. However, data structure heterogeneity and inconsistent +terminologies still make collaboration and model sharing difficult as +different researchers are often required to write new code to extract +the data from their databases and may define variables differently.

+

In our paper, +we propose a standardised framework for patient-level prediction that +utilizes the OMOP Common Data Model (CDM) and standardized vocabularies, +and describe the open-source software that we developed implementing the +framework’s pipeline. The framework is the first to support existing +best practice guidelines and will enable open dissemination of models +that can be extensively validated across the network of OHDSI +collaborators.

+

Figure 1, illustrates the prediction problem we address. Among a +population at risk, we aim to predict which patients at a defined moment +in time (t = 0) will experience some outcome during a time-at-risk. +Prediction is done using only information about the patients in an +observation window prior to that moment in time.

+
+The prediction problem
The prediction problem
+
+

As shown in Figure 2, to define a prediction problem we have to +define t=0 by a Target Cohort (T), the outcome we like to predict by an +outcome cohort (O), and the time-at-risk (TAR). Furthermore, we have to +make design choices for the model we like to develop, and determine the +observational datasets to perform internal and external validation. This +conceptual framework works for all type of prediction problems, for +example those presented in Figure 3.

+
+Design choices
Design choices
+
+
+Examples of prediction problems
Examples of prediction problems
+
+

This vignette describes how you can use the +PatientLevelPrediction package to build patient-level +predictive models. The package enables data extraction, model building, +and model evaluation using data from databases that are translated into +the OMOP CDM. In this vignette we assume you have installed the package +correctly using the InstallationGuide.

+
+
+

Study specification +

+

We have to clearly specify our study upfront to be able to implement +it. This means we need to define the prediction problem we like to +address, in which population we will build the model, which model we +will build and how we will evaluate its performance. To guide you +through this process we will use a “Disease onset and progression” +prediction type as an example.

+
+

Problem definition 1: Stroke in afibrilation patients +

+

Atrial fibrillation is a disease characterized by an irregular heart +rate that can cause poor blood flow. Patients with atrial fibrillation +are at increased risk of ischemic stroke. Anticoagulation is a +recommended prophylaxis treatment strategy for patients at high risk of +stroke, though the underuse of anticoagulants and persistent severity of +ischemic stroke represents a substantial unmet medical need. Various +strategies have been developed to predict risk of ischemic stroke in +patients with atrial fibrillation. CHADS2 (Gage JAMA 2001) was developed +as a risk score based on history of congestive heart failure, +hypertension, age>=75, diabetes and stroke. CHADS2 was initially +derived using Medicare claims data, where it achieved good +discrimination (AUC=0.82). However, subsequent external validation +studies revealed the CHADS2 had substantially lower predictive accuracy +(Keogh Thromb Haemost 2011). Subsequent stroke risk calculators have +been developed and evaluated, including the extension of CHADS2Vasc. The +management of atrial fibrillation has evolved substantially over the +last decade, for various reasons that include the introduction of novel +oral anticoagulants. With these innovations has come a renewed interest +in greater precision medicine for stroke prevention.

+

We will apply the PatientLevelPrediction package to observational +healthcare data to address the following patient-level prediction +question:

+

Amongst patients who are newly diagnosed with Atrial Fibrillation, +which patients will go on to have Ischemic Stroke within 1 year?

+

We will define ‘patients who are newly diagnosed with Atrial +Fibrillation’ as the first condition record of cardiac arrhythmia, which +is followed by another cardiac arrhythmia condition record, at least two +drug records for a drug used to treat arrhythmias, or a procedure to +treat arrhythmias. We will define ‘Ischemic stroke events’ as ischemic +stroke condition records during an inpatient or ER visit; successive +records with > 180 day gap are considered independent episodes.

+
+
+

Problem definition 2: Angioedema in ACE inhibitor users +

+

Angiotensin converting enzyme inhibitors (ACE inhibitors) are +medications used by patients with hypertension that widen the blood +vessles and therefore increse the amount of blood pumped by the heart +and decreases blood pressure. Ace inhibitors reduce a patients risk of +cardiovasular disease but can lead to drug-induced angioedema.

+

We will apply the PatientLevelPrediction package to observational +healthcare data to address the following patient-level prediction +question:

+

Amongst patients who are newly dispensed an ACE inhibitor, which +patients will go on to have angioedema within 1 year?

+

We will define ‘patients who are newly dispensed an ACE inhibitor’ as +the first drug record of sny ACE inhibitor, […]which is followed by +another cardiac arrhythmia condition record, at least two drug records +for a drug used to treat arrhythmias, or a procedure to treat +arrhythmias. We will define ‘angioedema’ as an angioedema condition +record.

+
+
+

Study population definition +

+

The final study population in which we will develop our model is +often a subset of the Target population, because we will e.g. apply +criteria that are dependent on T and O or we want to do sensitivity +analyses with subpopulations of T. For this we have to answer the +following questions:

+
    +
  • What is the minimum amount of observation time we require +before the start of the target cohort? This choice could depend on +the available patient time in your training data, but also on the time +you expect to be available in the data sources you want to apply the +model on in the future. The longer the minimum observation time, the +more baseline history time is available for each person to use for +feature extraction, but the fewer patients will qualify for analysis. +Moreover, there could be clinical reasons to choose a short or longer +lookback period. For our example, we will use a prior history as +lookback period (washout period).

  • +
  • Can patients enter the target cohort multiple times? In +the target cohort definition, a person may qualify for the cohort +multiple times during different spans of time, for example if they had +different episodes of a disease or separate periods of exposure to a +medical product. The cohort definition does not necessarily apply a +restriction to only let the patients enter once, but in the context of a +particular patient-level prediction problem, a user may want to restrict +the cohort to the first qualifying episode. In our example, a person +could only enter the target cohort once since our criteria was based on +first occurrence of atrial fibrillation.

  • +
  • Do we allow persons to enter the cohort if they experienced +the outcome before? Do we allow persons to enter the target cohort +if they experienced the outcome before qualifying for the target cohort? +Depending on the particular patient-level prediction problem, there may +be a desire to predict ‘incident’ first occurrence of an outcome, in +which case patients who have previously experienced the outcome are not +‘at-risk’ for having a first occurrence and therefore should be excluded +from the target cohort. In other circumstances, there may be a desire to +predict ‘prevalent’ episodes, whereby patients with prior outcomes can +be included in the analysis and the prior outcome itself can be a +predictor of future outcomes. For our prediction example, the answer to +this question is ‘Yes, allow persons with prior outcomes’ because we +know from the CHADS2 score that prior strokes are very predictive of +future strokes. If this answer would have been ‘No’ we also have to +decide how long we would look back for previous occurrences of the +outcome.

  • +
  • How do we define the period in which we will predict our +outcome relative to the target cohort start? We actually have to +make two decisions to answer that question. First, does the time-at-risk +window start at the date of the start of the target cohort or later? +Arguments to make it start later could be that you want to avoid +outcomes that were entered late in the record that actually occurred +before the start of the target cohort or you want to leave a gap where +interventions to prevent the outcome could theoretically be implemented. +Second, you need to define the time-at-risk by setting the risk window +end, as some specification of days offset relative to the target cohort +start or end dates. For our problem we will predict in a ‘time-at-risk’ +window starting 1 day after the start of the target cohort up to 365 +days later (to look for 1-year risk following atrial fibrillation +diagnosis).

  • +
  • Do we require a minimum amount of time-at-risk? We have +to decide if we want to include patients that did not experience the +outcome but did leave the database earlier than the end of our +time-at-risk period. These patients may experience the outcome when we +do not observe them. For our prediction problem we decide to answer this +question with ‘Yes, require a mimimum time-at-risk’ for that reason. +Furthermore, we have to decide if this constraint also applies to +persons who experienced the outcome or we will include all persons with +the outcome irrespective of their total time at risk. For example, if +the outcome is death, then persons with the outcome are likely censored +before the full time-at-risk period is complete.

  • +
+
+
+

Model development settings +

+

To develop the model we have to decide which algorithm(s) we like to +train. We see the selection of the best algorithm for a certain +prediction problem as an empirical question, i.e. you need to let the +data speak for itself and try different approaches to find the best one. +There is no algorithm that will work best for all problems (no free +lunch). In our package we therefore aim to implement many algorithms. +Furthermore, we made the system modular so you can add your own custom +algorithms as described in more detail in the AddingCustomModels +vignette.

+

Our package currently contains the following algorithms to choose +from:

+ +++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
AlgorihmDescriptionHyper-parameters
Regularized Logistic RegressionLasso logistic regression belongs to the family of generalized +linear models, where a linear combination of the variables is learned +and finally a logistic function maps the linear combination to a value +between 0 and 1. The lasso regularization adds a cost based on model +complexity to the objective function when training the model. This cost +is the sum of the absolute values of the linear combination of the +coefficients. The model automatically performs feature selection by +minimizing this cost. We use the Cyclic coordinate descent for logistic, +Poisson and survival analysis (Cyclops) package to perform large-scale +regularized logistic regression: https://github.com/OHDSI/Cyclops +var (starting variance), seed
Gradient boosting machinesGradient boosting machines is a boosting ensemble technique and in +our framework it combines multiple decision trees. Boosting works by +iteratively adding decision trees but adds more weight to the +data-points that are misclassified by prior decision trees in the cost +function when training the next tree. We use Extreme Gradient Boosting, +which is an efficient implementation of the gradient boosting framework +implemented in the xgboost R package available from CRAN.ntree (number of trees), max depth (max levels in tree), min rows +(minimum data points in in node), learning rate, balance (balance class +labels), seed
Random forestRandom forest is a bagging ensemble technique that combines multiple +decision trees. The idea behind bagging is to reduce the likelihood of +overfitting, by using weak classifiers, but combining multiple diverse +weak classifiers into a strong classifier. Random forest accomplishes +this by training multiple decision trees but only using a subset of the +variables in each tree and the subset of variables differ between trees. +Our packages uses the sklearn learn implementation of Random Forest in +python.mtry (number of features in each tree),ntree (number of trees), +maxDepth (max levels in tree), minRows (minimum data points in in +node),balance (balance class labels), seed
K-nearest neighborsK-nearest neighbors (KNN) is an algorithm that uses some metric to +find the K closest labelled data-points, given the specified metric, to +a new unlabelled data-point. The prediction of the new data-points is +then the most prevalent class of the K-nearest labelled data-points. +There is a sharing limitation of KNN, as the model requires labelled +data to perform the prediction on new data, and it is often not possible +to share this data across data sites.We included the BigKnn classifier +developed in OHDSI which is a large scale k-nearest neighbor classifier +using the Lucene search engine: https://github.com/OHDSI/BigKnn +k (number of neighbours),weighted (weight by inverse frequency)
Naive BayesThe Naive Bayes algorithm applies the Bayes theorem with the ‘naive’ +assumption of conditional independence between every pair of features +given the value of the class variable. Based on the likelihood the data +belongs to a class and the prior distribution of the class, a posterior +distribution is obtained.none
AdaBoostAdaBoost is a boosting ensemble technique. Boosting works by +iteratively adding classifiers but adds more weight to the data-points +that are misclassified by prior classifiers in the cost function when +training the next classifier. We use the sklearn ‘AdaboostClassifier’ +implementation in Python.nEstimators (the maximum number of estimators at which boosting is +terminated), learningRate (learning rate shrinks the contribution of +each classifier by learning_rate. There is a trade-off between +learningRate and nEstimators)
Decision TreeA decision tree is a classifier that partitions the variable space +using individual tests selected using a greedy approach. It aims to find +partitions that have the highest information gain to separate the +classes. The decision tree can easily overfit by enabling a large number +of partitions (tree depth) and often needs some regularization (e.g., +pruning or specifying hyper-parameters that limit the complexity of the +model). We use the sklearn ‘DecisionTreeClassifier’ implementation in +Python.maxDepth (the maximum depth of the tree), +minSamplesSplit,minSamplesLeaf, minImpuritySplit (threshold for early +stopping in tree growth. A node will split if its impurity is above the +threshold, otherwise it is a leaf.), seed,classWeight (‘Balance’ or +‘None’)
Multilayer PerceptionNeural networks contain multiple layers that weight their inputs +using a non-linear function. The first layer is the input layer, the +last layer is the output layer the between are the hidden layers. Neural +networks are generally trained using feed forward back-propagation. This +is when you go through the network with a data-point and calculate the +error between the true label and predicted label, then go backwards +through the network and update the linear function weights based on the +error. This can also be performed as a batch, where multiple data-points +are feesize (the number of hidden nodes), alpha (the l2 regularisation), +seed
Deep Learning (now in seperate DeepPatientLevelPrediction R +package)Deep learning such as deep nets, convolutional neural networks or +recurrent neural networks are similar to a neural network but have +multiple hidden layers that aim to learn latent representations useful +for prediction. In the seperate BuildingDeepLearningModels vignette we +describe these models and hyper-parameters in more detailsee OHDSI/DeepPatientLevelPrediction
+

Furthermore, we have to decide on the covariates +that we will use to train our model. This choice can be driven by domain +knowledge of available computational resources. In our example, we like +to add the Gender, Age, Conditions, Drugs Groups, and Visit Count. We +also have to specify in which time windows we will look and we decide to +look in year before and any time prior.

+

Finally, we have to define how we will train and test our model on +our data, i.e. how we perform internal validation. For +this we have to decide how we divide our dataset in a training and +testing dataset and how we randomly assign patients to these two sets. +Dependent on the size of the training set we can decide how much data we +like to use for training, typically this is a 75%, 25% split. If you +have very large datasets you can use more data for training. To randomly +assign patients to the training and testing set, there are two commonly +used approaches:

+
    +
  1. split by person. In this case a random seed is used to assign the +patient to either sets.
  2. +
  3. split by time. In this case a time point is used to split the +persons, e.g. 75% of the data is before and 25% is after this date. The +advantage of this is that you take into consideration that the health +care system has changed over time.
  4. +
+

We now completely defined our studies and implement them:

+ +
+
+
+

Example 1: Stroke in afibrilation patients +

+
+

Study Specification +

+

For our first prediction model we decide to start with a Regularized +Logistic Regression and will use the default parameters. We will do a +75%-25% split by person.

+ ++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
DefinitionValue
Problem Definition
Target Cohort (T)‘Patients who are newly diagnosed with Atrial Fibrillation’ defined +as the first condition record of cardiac arrhythmia, which is followed +by another cardiac arrhythmia condition record, at least two drug +records for a drug used to treat arrhythmias, or a procedure to treat +arrhythmias.
Outcome Cohort (O)‘Ischemic stroke events’ defined as ischemic stroke condition +records during an inpatient or ER visit; successive records with > +180 day gap are considered independent episodes.
Time-at-risk (TAR)1 day till 365 days from cohort start
Population Definition
Washout Period1095
Enter the target cohort multiple times?No
Allow prior outcomes?Yes
Start of time-at-risk1 day
End of time-at-risk365 days
Require a minimum amount of time-at-risk?Yes (364 days)
Model Development
AlgorithmRegularized Logistic Regression
Hyper-parametersvariance = 0.01 (Default)
CovariatesGender, Age, Conditions (ever before, <365), Drugs Groups (ever +before, <365), and Visit Count
Data split75% train, 25% test. Randomly assigned by person
+

According to the best practices we need to make a protocol that +completely specifies how we plan to execute our study. This protocol +will be assessed by the governance boards of the participating data +sources in your network study. For this a template could be used but we +prefer to automate this process as much as possible by adding +functionality to automatically generate study protocol from a study +specification. We will discuss this in more detail later.

+
+
+

Study implementation +

+

Now we have completely design our study we have to implement the +study. We have to generate the target and outcome cohorts and we need to +develop the R code to run against our CDM that will execute the full +study.

+
+

Cohort instantiation +

+

For our study we need to know when a person enters the target and +outcome cohorts. This is stored in a table on the server that contains +the cohort start date and cohort end date for all subjects for a +specific cohort definition. This cohort table has a very simple +structure as shown below:

+
    +
  • +cohort_definition_id, a unique identifier for +distinguishing between different types of cohorts, e.g. cohorts of +interest and outcome cohorts.
  • +
  • +subject_id, a unique identifier corresponding to the +person_id in the CDM.
  • +
  • +cohort_start_date, the date the subject enters the +cohort.
  • +
  • +cohort_end_date, the date the subject leaves the +cohort.
  • +
+

How do we fill this table according to our cohort definitions? There +are two options for this:

+
    +
  1. use the interactive cohort builder tool in ATLAS which can be used to create +cohorts based on inclusion criteria and will automatically populate this +cohort table.

  2. +
  3. write your own custom SQL statements to fill the cohort +table.

  4. +
+

Both methods are described below for our example prediction +problem.

+
+
+

ATLAS cohort builder +

+
+Target Cohort Atrial Fibrillation
Target Cohort Atrial Fibrillation
+
+

ATLAS allows you to define cohorts interactively by specifying cohort +entry and cohort exit criteria. Cohort entry criteria involve selecting +one or more initial events, which determine the start date for cohort +entry, and optionally specifying additional inclusion criteria which +filter to the qualifying events. Cohort exit criteria are applied to +each cohort entry record to determine the end date when the person’s +episode no longer qualifies for the cohort. For the outcome cohort the +end date is less relevant. As an example, Figure 4 shows how we created +the Atrial Fibrillation cohort and Figure 5 shows how we created the +stroke cohort in ATLAS.

+
+Outcome Cohort Stroke
Outcome Cohort Stroke
+
+

The T and O cohorts can be found here:

+ +

In depth explanation of cohort creation in ATLAS is out of scope of +this vignette but can be found on the OHDSI wiki pages (link).

+

Note that when a cohort is created in ATLAS the cohortid is needed to +extract the data in R. The cohortid can be found at the top of the ATLAS +screen, e.g. 1769447 in Figure 4.

+
+
+

Custom cohorts +

+

It is also possible to create cohorts without the use of ATLAS. Using +custom cohort code (SQL) you can make more advanced cohorts if +needed.

+

For our example study, we need to create at table to hold the cohort +data and we need to create SQL code to instantiate this table for both +the AF and Stroke cohorts. Therefore, we create a file called +AfStrokeCohorts.sql with the following contents:

+
/***********************************
+File AfStrokeCohorts.sql 
+***********************************/
+/*
+Create a table to store the persons in the T and C cohort
+*/
+
+IF OBJECT_ID('@resultsDatabaseSchema.PLPAFibStrokeCohort', 'U') IS NOT NULL 
+DROP TABLE @resultsDatabaseSchema.PLPAFibStrokeCohort;
+
+CREATE TABLE @resultsDatabaseSchema.PLPAFibStrokeCohort 
+( 
+cohort_definition_id INT, 
+subject_id BIGINT,
+cohort_start_date DATE, 
+cohort_end_date DATE
+);
+
+
+/*
+T cohort:  [PatientLevelPrediction vignette]:  T : patients who are newly 
+diagnosed with Atrial fibrillation
+- persons with a condition occurrence record of 'Atrial fibrillation' or 
+any descendants, indexed at the first diagnosis
+- who have >1095 days of prior observation before their first diagnosis
+- and have no warfarin exposure any time prior to first AFib diagnosis
+*/
+INSERT INTO @resultsDatabaseSchema.AFibStrokeCohort (cohort_definition_id, 
+subject_id, 
+cohort_start_date, 
+cohort_end_date)
+SELECT 1 AS cohort_definition_id,
+AFib.person_id AS subject_id,
+AFib.condition_start_date AS cohort_start_date,
+observation_period.observation_period_end_date AS cohort_end_date
+FROM
+(
+  SELECT person_id, min(condition_start_date) as condition_start_date
+  FROM @cdmDatabaseSchema.condition_occurrence
+  WHERE condition_concept_id IN (SELECT descendant_concept_id FROM 
+  @cdmDatabaseSchema.concept_ancestor WHERE ancestor_concept_id IN 
+  (313217 /*atrial fibrillation*/))
+  GROUP BY person_id
+) AFib
+  INNER JOIN @cdmDatabaseSchema.observation_period
+  ON AFib.person_id = observation_period.person_id
+  AND AFib.condition_start_date >= dateadd(dd,1095, 
+  observation_period.observation_period_start_date)
+  AND AFib.condition_start_date <= observation_period.observation_period_end_date
+  LEFT JOIN
+  (
+  SELECT person_id, min(drug_exposure_start_date) as drug_exposure_start_date
+  FROM @cdmDatabaseSchema.drug_exposure
+  WHERE drug_concept_id IN (SELECT descendant_concept_id FROM 
+  @cdmDatabaseSchema.concept_ancestor WHERE ancestor_concept_id IN 
+  (1310149 /*warfarin*/))
+  GROUP BY person_id
+  ) warfarin
+  ON Afib.person_id = warfarin.person_id
+  AND Afib.condition_start_date > warfarin.drug_exposure_start_date
+  WHERE warfarin.person_id IS NULL
+  ;
+  
+  /*
+  C cohort:  [PatientLevelPrediction vignette]:  O: Ischemic stroke events
+  - inpatient visits that include a condition occurrence record for 
+  'cerebral infarction' and descendants, 'cerebral thrombosis', 
+  'cerebral embolism', 'cerebral artery occlusion' 
+  */
+  INSERT INTO @resultsDatabaseSchema.AFibStrokeCohort (cohort_definition_id, 
+  subject_id, 
+  cohort_start_date, 
+  cohort_end_date)
+  SELECT 2 AS cohort_definition_id,
+  visit_occurrence.person_id AS subject_id,
+  visit_occurrence.visit_start_date AS cohort_start_date,
+  visit_occurrence.visit_end_date AS cohort_end_date
+  FROM  
+  (
+  SELECT person_id, condition_start_date
+  FROM @cdmDatabaseSchema.condition_occurrence
+  WHERE condition_concept_id IN (SELECT DISTINCT descendant_concept_id FROM 
+  @cdmDatabaseSchema.concept_ancestor WHERE ancestor_concept_id IN 
+  (443454 /*cerebral infarction*/) OR descendant_concept_id IN 
+  (441874 /*cerebral thrombosis*/, 375557 /*cerebral embolism*/, 
+  372924 /*cerebral artery occlusion*/))
+  ) stroke
+  INNER JOIN @cdmDatabaseSchema.visit_occurrence
+  ON stroke.person_id = visit_occurrence.person_id
+  AND stroke.condition_start_date >= visit_occurrence.visit_start_date
+  AND stroke.condition_start_date <= visit_occurrence.visit_end_date
+  AND visit_occurrence.visit_concept_id IN (9201, 262 /*'Inpatient Visit'  or 
+  'Emergency Room and Inpatient Visit'*/)
+  GROUP BY visit_occurrence.person_id, visit_occurrence.visit_start_date, 
+  visit_occurrence.visit_end_date
+  ;
+  
+

This is parameterized SQL which can be used by the SqlRender +package. We use parameterized SQL so we do not have to pre-specify the +names of the CDM and result schemas. That way, if we want to run the SQL +on a different schema, we only need to change the parameter values; we +do not have to change the SQL code. By also making use of translation +functionality in SqlRender, we can make sure the SQL code +can be run in many different environments.

+

To execute this sql against our CDM we first need to tell R how to +connect to the server. PatientLevelPrediction uses the DatabaseConnector +package, which provides a function called +createConnectionDetails. Type +?createConnectionDetails for the specific settings required +for the various database management systems (DBMS). For example, one +might connect to a PostgreSQL database using this code:

+
+  connectionDetails <- createConnectionDetails(dbms = "postgresql", 
+  server = "localhost/ohdsi", 
+  user = "joe", 
+  password = "supersecret")
+  
+  cdmDatabaseSchema <- "my_cdm_data"
+  cohortsDatabaseSchema <- "my_results"
+  cdmVersion <- "5"
+

The last three lines define the cdmDatabaseSchema and +cohortsDatabaseSchema variables, as well as the CDM +version. We will use these later to tell R where the data in CDM format +live, where we want to create the cohorts of interest, and what version +CDM is used. Note that for Microsoft SQL Server, databaseschemas need to +specify both the database and the schema, so for example +cdmDatabaseSchema <- "my_cdm_data.dbo".

+
+  library(SqlRender)
+  sql <- readSql("AfStrokeCohorts.sql")
+  sql <- renderSql(sql,
+  cdmDatabaseSchema = cdmDatabaseSchema,
+  cohortsDatabaseSchema = cohortsDatabaseSchema,
+  post_time = 30,
+  pre_time = 365)$sql
+  sql <- translateSql(sql, targetDialect = connectionDetails$dbms)$sql
+  
+  connection <- connect(connectionDetails)
+  executeSql(connection, sql)
+

In this code, we first read the SQL from the file into memory. In the +next line, we replace four parameter names with the actual values. We +then translate the SQL into the dialect appropriate for the DBMS we +already specified in the connectionDetails. Next, we +connect to the server, and submit the rendered and translated SQL.

+

If all went well, we now have a table with the events of interest. We +can see how many events per type:

+
+  sql <- paste("SELECT cohort_definition_id, COUNT(*) AS count",
+  "FROM @cohortsDatabaseSchema.AFibStrokeCohort",
+  "GROUP BY cohort_definition_id")
+  sql <- renderSql(sql, cohortsDatabaseSchema = cohortsDatabaseSchema)$sql
+  sql <- translateSql(sql, targetDialect = connectionDetails$dbms)$sql
+  
+  querySql(connection, sql)
+
##   cohort_definition_id  count
+## 1                    1 527616
+## 2                    2 221555
+
+
+

Study script creation +

+

In this section we assume that our cohorts have been created either +by using ATLAS or a custom SQL script. We will first explain how to +create an R script yourself that will execute our study as we have +defined earlier.

+
+
+

Data extraction +

+

Now we can tell PatientLevelPrediction to extract all +necessary data for our analysis. This is done using the FeatureExtractionPackage. +In short the FeatureExtractionPackage allows you to specify which +features (covariates) need to be extracted, e.g. all conditions and drug +exposures. It also supports the creation of custom covariates. For more +detailed information on the FeatureExtraction package see its vignettes. For our +example study we decided to use these settings:

+
+  covariateSettings <- createCovariateSettings(useDemographicsGender = TRUE,
+  useDemographicsAge = TRUE,
+  useConditionGroupEraLongTerm = TRUE,
+  useConditionGroupEraAnyTimePrior = TRUE,
+  useDrugGroupEraLongTerm = TRUE,
+  useDrugGroupEraAnyTimePrior = TRUE,
+  useVisitConceptCountLongTerm = TRUE,
+  longTermStartDays = -365,
+  endDays = -1)
+

The final step for extracting the data is to run the +getPlpData function and input the connection details, the +database schema where the cohorts are stored, the cohort definition ids +for the cohort and outcome, and the washoutPeriod which is the minimum +number of days prior to cohort index date that the person must have been +observed to be included into the data, and finally input the previously +constructed covariate settings.

+
+databaseDetails <- createDatabaseDetails(
+  connectionDetails = connectionDetails,
+  cdmDatabaseSchema = cdmDatabaseSchema,
+  cdmDatabaseName = '',
+  cohortDatabaseSchema = resultsDatabaseSchema,
+  cohortTable = 'AFibStrokeCohort',
+  cohortId = 1,
+  outcomeDatabaseSchema = resultsDatabaseSchema,
+  outcomeTable = 'AFibStrokeCohort',
+  outcomeIds = 2,
+  cdmVersion = 5
+  )
+
+# here you can define whether you want to sample the target cohort and add any
+# restrictions based on minimum prior observation, index date restrictions
+# or restricting to first index date (if people can be in target cohort multiple times)
+restrictPlpDataSettings <- createRestrictPlpDataSettings(sampleSize = 10000)
+
+  plpData <- getPlpData(
+    databaseDetails = databaseDetails, 
+    covariateSettings = covariateSettings,
+    restrictPlpDataSettings = restrictPlpDataSettings
+  )
+

Note that if the cohorts are created in ATLAS its corresponding +cohort database schema needs to be selected. There are many additional +parameters for the createRestrictPlpDataSettings function +which are all documented in the PatientLevelPrediction +manual. The resulting plpData object uses the package +Andromeda (which uses SQLite) to store +information in a way that ensures R does not run out of memory, even +when the data are large.

+

Creating the plpData object can take considerable +computing time, and it is probably a good idea to save it for future +sessions. Because plpData uses Andromeda, we +cannot use R’s regular save function. Instead, we’ll have to use the +savePlpData() function:

+
+savePlpData(plpData, "stroke_in_af_data")
+

We can use the loadPlpData() function to load the data +in a future session.

+
+
+

Additional inclusion criteria +

+

To completely define the prediction problem the final study +population is obtained by applying additional constraints on the two +earlier defined cohorts, e.g., a minumim time at risk can be enforced +(requireTimeAtRisk, minTimeAtRisk) and we can specify if +this also applies to patients with the outcome +(includeAllOutcomes). Here we also specify the start and +end of the risk window relative to target cohort start. For example, if +we like the risk window to start 30 days after the at-risk cohort start +and end a year later we can set riskWindowStart = 30 and +riskWindowEnd = 365. In some cases the risk window needs to +start at the cohort end date. This can be achieved by setting +addExposureToStart = TRUE which adds the cohort (exposure) +time to the start date.

+

In Appendix 1, we demonstrate the effect of these settings on the +subset of the persons in the target cohort that end up in the final +study population.

+

In the example below all the settings we defined for our study are +imposed:

+
+  populationSettings <- createStudyPopulationSettings(
+  washoutPeriod = 1095,
+  firstExposureOnly = FALSE,
+  removeSubjectsWithPriorOutcome = FALSE,
+  priorOutcomeLookback = 1,
+  riskWindowStart = 1,
+  riskWindowEnd = 365,
+  startAnchor =  'cohort start',
+  endAnchor =  'cohort start',
+  minTimeAtRisk = 364,
+  requireTimeAtRisk = TRUE,
+  includeAllOutcomes = TRUE
+  )
+
+
+

Spliting the data into training/validation/testing datasets +

+

When developing a prediction model using supervised learning (when +you have features paired with labels for a set of patients), the first +step is to design the development/internal validation process. This +requires specifying how to select the model hyper-parameters, how to +learn the model parameters and how to fairly evaluate the model. In +general, the validation set is used to pick hyper-parameters, the +training set is used to learn the model parameters and the test set is +used to perform fair internal validation. However, cross-validation can +be implemented to pick the hyper-parameters on the training data (so a +validation data set is not required). Cross validation can also be used +to estimate internal validation (so a testing data set is not +required).

+

In small data the best approach for internal validation has been +shown to be boostrapping. However, in big data (many patients and many +features) bootstrapping is generally not feasible. In big data our +research has shown that it is just important to have some form of fair +evaluation (use a test set or cross validation). For full details see our BMJ open paper.

+

In the PatientLevelPrediction package, the splitSettings define how +the plpData are partitioned into training/validation/testing data. Cross +validation is always done, but using a test set is optional (when the +data are small, it may be optimal to not use a test set). For the +splitSettings we can use the type (stratified/time/subject) and +testFraction parameters to split the data in a 75%-25% split and run the +patient-level prediction pipeline:

+
+  splitSettings <- createDefaultSplitSetting(
+    trainFraction = 0.75,
+    testFraction = 0.25,
+    type = 'stratified',
+    nfold = 2, 
+    splitSeed = 1234
+    )
+

Note: it is possible to add a custom method to specify how the +plpData are partitioned into training/validation/testing data, see vignette +for custom splitting.

+
+
+

Preprocessing the training data +

+

There a numerous data processing settings that a user must specify +when developing a prediction model. These are: * Whether to under-sample +or over-sample the training data (this may be useful when there is class +imballance (e.g., the outcome is very rare or very common)) * Whether to +perform feature engineering or feature selection (e.g., create latent +variables that are not observed in the data or reduce the dimensionality +of the data) * Whether to remove redundant features and normalize the +data (this is required for some models)

+

The default sample settings does nothing, it simply returns the +trainData as input, see below:

+
+  sampleSettings <- createSampleSettings()
+

However, the current package contains methods of under-sampling the +non-outcome patients. To perform undersampling, the type +input should be ‘underSample’ and +numberOutcomestoNonOutcomes must be specified (an integer +specifying the number of non-outcomes per outcome). It is possible to +add any custom function for over/under sampling, see vignette +for custom sampling.

+

It is possible to specify a combination of feature engineering +functions that take as input the trainData and output a new trainData +with different features. The default feature engineering setting does +nothing:

+
+  featureEngineeringSettings <- createFeatureEngineeringSettings()
+

However, it is possible to add custom feature engineering functions +into the pipeline, see vignette +for custom feature engineering.

+

Finally, the preprocessing setting is required. For this setting the +user can define minFraction, this removes any features that +is observed in the training data for less than 0.01 fraction of the +patients. So, if minFraction = 0.01 then any feature that +is seen in less than 1 percent of the target population is removed. The +input normalize specifies whether the features are scaled +between 0 and 1, this is required for certain models (e.g., LASSO +logistic regression). The input removeRedundancy specifies +whether features that are observed in all of the target population are +removed.

+
+  preprocessSettingsSettings <- createPreprocessSettings(
+    minFraction = 0.01, 
+    normalize = T, 
+    removeRedundancy = T
+      )
+
+
+

Model Development +

+

In the set function of an algorithm the user can specify a list of +eligible values for each hyper-parameter. All possible combinations of +the hyper-parameters are included in a so-called grid search using +cross-validation on the training set. If a user does not specify any +value then the default value is used instead.

+

For example, if we use the following settings for the +gradientBoostingMachine: ntrees=c(100,200), maxDepth=4 the grid search +will apply the gradient boosting machine algorithm with ntrees=100 and +maxDepth=4 plus the default settings for other hyper-parameters and +ntrees=200 and maxDepth=4 plus the default settings for other +hyper-parameters. The hyper-parameters that lead to the +bestcross-validation performance will then be chosen for the final +model. For our problem we choose to build a logistic regression model +with the default hyper-parameters

+ +

The runPlP function requires the plpData, +the outcomeId specifying the outcome being predicted and +the settings: populationSettings, +splitSettings, sampleSettings, +featureEngineeringSettings, preprocessSettings +and modelSettings to train and evaluate the model.

+
+  lrResults <- runPlp(
+    plpData = plpData,
+    outcomeId = 2, 
+    analysisId = 'singleDemo',
+    analysisName = 'Demonstration of runPlp for training single PLP models',
+    populationSettings = populationSettings, 
+    splitSettings = splitSettings,
+    sampleSettings = sampleSettings, 
+    featureEngineeringSettings = featureEngineeringSettings, 
+    preprocessSettings = preprocessSettings,
+    modelSettings = lrModel,
+    logSettings = createLogSettings(), 
+    executeSettings = createExecuteSettings(
+      runSplitData = T, 
+      runSampleData = T, 
+      runfeatureEngineering = T, 
+      runPreprocessData = T, 
+      runModelDevelopment = T, 
+      runCovariateSummary = T
+    ), 
+    saveDirectory = file.path(getwd(), 'singlePlp')
+    )
+

Under the hood the package will now use the Cyclops package to +fit a large-scale regularized regression using 75% of the data and will +evaluate the model on the remaining 25%. A results data structure is +returned containing information about the model, its performance +etc.

+

You can save the model using:

+
+savePlpModel(lrResults$model, dirPath = file.path(getwd(), "model"))
+

You can load the model using:

+
+plpModel <- loadPlpModel(file.path(getwd(), "model"))
+

You can also save the full results structure using:

+
+savePlpResult(lrResults, location = file.path(getwd(), "lr"))
+

To load the full results structure use:

+
+lrResults <- loadPlpResult(file.path(getwd(), "lr"))
+
+
+
+
+
+

Example 2: Angioedema in ACE inhibitor users +

+
+

Study Specification +

+ ++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
DefinitionValue
Problem Definition
Target Cohort (T)‘Patients who are newly dispensed an ACE inhibitor’ defined as the +first drug record of any ACE inhibitor
Outcome Cohort (O)‘Angioedema’ defined as an angioedema condition record during an +inpatient or ER visit
Time-at-risk (TAR)1 day till 365 days from cohort start
Population Definition
Washout Period365
Enter the target cohort multiple times?No
Allow prior outcomes?No
Start of time-at-risk1 day
End of time-at-risk365 days
Require a minimum amount of time-at-risk?Yes (364 days)
Model Development
AlgorithmGradient Boosting Machine
Hyper-parametersntree:5000, max depth:4 or 7 or 10 and learning rate: 0.001 or 0.01 +or 0.1 or 0.9
CovariatesGender, Age, Conditions (ever before, <365), Drugs Groups (ever +before, <365), and Visit Count
Data split75% train, 25% test. Randomly assigned by person
+

According to the best practices we need to make a protocol that +completely specifies how we plan to execute our study. This protocol +will be assessed by the governance boards of the participating data +sources in your network study. For this a template could be used but we +prefer to automate this process as much as possible by adding +functionality to automatically generate study protocol from a study +specification. We will discuss this in more detail later.

+
+
+

Study implementation +

+

Now we have completely design our study we have to implement the +study. We have to generate the target and outcome cohorts and we need to +develop the R code to run against our CDM that will execute the full +study.

+
+

Cohort instantiation +

+

For our study we need to know when a person enters the target and +outcome cohorts. This is stored in a table on the server that contains +the cohort start date and cohort end date for all subjects for a +specific cohort definition. This cohort table has a very simple +structure as shown below:

+
    +
  • +cohort_definition_id, a unique identifier for +distinguishing between different types of cohorts, e.g. cohorts of +interest and outcome cohorts.
  • +
  • +subject_id, a unique identifier corresponding to the +person_id in the CDM.
  • +
  • +cohort_start_date, the date the subject enters the +cohort.
  • +
  • +cohort_end_date, the date the subject leaves the +cohort.
  • +
+

How do we fill this table according to our cohort definitions? There +are two options for this:

+
    +
  1. use the interactive cohort builder tool in ATLAS which can be used to create +cohorts based on inclusion criteria and will automatically populate this +cohort table.

  2. +
  3. write your own custom SQL statements to fill the cohort +table.

  4. +
+

Both methods are described below for our example prediction +problem.

+
+
+

ATLAS cohort builder +

+
+Target Cohort ACE inhibitors
Target Cohort ACE inhibitors
+
+

ATLAS allows you to define cohorts interactively by specifying cohort +entry and cohort exit criteria. Cohort entry criteria involve selecting +one or more initial events, which determine the start date for cohort +entry, and optionally specifying additional inclusion criteria which +filter to the qualifying events. Cohort exit criteria are applied to +each cohort entry record to determine the end date when the person’s +episode no longer qualifies for the cohort. For the outcome cohort the +end date is less relevant. As an example, Figure 6 shows how we created +the ACE inhibitors cohort and Figure 7 shows how we created the +angioedema cohort in ATLAS.

+
+Outcome Cohort Angioedema
Outcome Cohort Angioedema
+
+

The T and O cohorts can be found here:

+ +

In depth explanation of cohort creation in ATLAS is out of scope of +this vignette but can be found on the OHDSI wiki pages (link).

+

Note that when a cohort is created in ATLAS the cohortid is needed to +extract the data in R. The cohortid can be found at the top of the ATLAS +screen, e.g. 1770617 in Figure 6.

+
+
+

Custom cohorts +

+

It is also possible to create cohorts without the use of ATLAS. Using +custom cohort code (SQL) you can make more advanced cohorts if +needed.

+

For our example study, we need to create at table to hold the cohort +data and we need to create SQL code to instantiate this table for both +the AF and Stroke cohorts. Therefore, we create a file called +AceAngioCohorts.sql with the following contents:

+
  /***********************************
+    File AceAngioCohorts.sql 
+  ***********************************/
+    /*
+    Create a table to store the persons in the T and C cohort
+  */
+    
+    IF OBJECT_ID('@resultsDatabaseSchema.PLPAceAngioCohort', 'U') IS NOT NULL 
+  DROP TABLE @resultsDatabaseSchema.PLPAceAngioCohort;
+  
+  CREATE TABLE @resultsDatabaseSchema.PLPAceAngioCohort 
+  ( 
+    cohort_definition_id INT, 
+    subject_id BIGINT,
+    cohort_start_date DATE, 
+    cohort_end_date DATE
+  );
+  
+  
+  /*
+    T cohort:  [PatientLevelPrediction vignette]:  T : patients who are newly 
+  dispensed an ACE inhibitor
+  - persons with a drug exposure record of any 'ACE inhibitor' or 
+  any descendants, indexed at the first diagnosis
+  - who have >364 days of prior observation before their first dispensing
+  */
+    INSERT INTO @resultsDatabaseSchema.AceAngioCohort (cohort_definition_id, 
+                                                       subject_id, 
+                                                       cohort_start_date, 
+                                                       cohort_end_date)
+  SELECT 1 AS cohort_definition_id,
+  Ace.person_id AS subject_id,
+  Ace.drug_start_date AS cohort_start_date,
+  observation_period.observation_period_end_date AS cohort_end_date
+  FROM
+  (
+    SELECT person_id, min(drug_exposure_date) as drug_start_date
+    FROM @cdmDatabaseSchema.drug_exposure
+    WHERE drug_concept_id IN (SELECT descendant_concept_id FROM 
+                              @cdmDatabaseSchema.concept_ancestor WHERE ancestor_concept_id IN 
+                              (1342439,1334456, 1331235, 1373225, 1310756, 1308216, 1363749, 1341927, 1340128, 1335471 /*ace inhibitors*/))
+    GROUP BY person_id
+  ) Ace
+  INNER JOIN @cdmDatabaseSchema.observation_period
+  ON Ace.person_id = observation_period.person_id
+  AND Ace.drug_start_date >= dateadd(dd,364, 
+                                     observation_period.observation_period_start_date)
+  AND Ace.drug_start_date <= observation_period.observation_period_end_date
+  ;
+  
+  /*
+    C cohort:  [PatientLevelPrediction vignette]:  O: Angioedema
+  */
+    INSERT INTO @resultsDatabaseSchema.AceAngioCohort (cohort_definition_id, 
+                                                       subject_id, 
+                                                       cohort_start_date, 
+                                                       cohort_end_date)
+  SELECT 2 AS cohort_definition_id,
+  angioedema.person_id AS subject_id,
+  angioedema.condition_start_date AS cohort_start_date,
+  angioedema.condition_start_date AS cohort_end_date
+  FROM  
+  (
+    SELECT person_id, condition_start_date
+    FROM @cdmDatabaseSchema.condition_occurrence
+    WHERE condition_concept_id IN (SELECT DISTINCT descendant_concept_id FROM 
+                                   @cdmDatabaseSchema.concept_ancestor WHERE ancestor_concept_id IN 
+                                   (432791 /*angioedema*/) OR descendant_concept_id IN 
+                                   (432791 /*angioedema*/)
+    ) angioedema
+    
+    ;
+    
+

This is parameterized SQL which can be used by the SqlRender +package. We use parameterized SQL so we do not have to pre-specify the +names of the CDM and result schemas. That way, if we want to run the SQL +on a different schema, we only need to change the parameter values; we +do not have to change the SQL code. By also making use of translation +functionality in SqlRender, we can make sure the SQL code +can be run in many different environments.

+

To execute this sql against our CDM we first need to tell R how to +connect to the server. PatientLevelPrediction uses the DatabaseConnector +package, which provides a function called +createConnectionDetails. Type +?createConnectionDetails for the specific settings required +for the various database management systems (DBMS). For example, one +might connect to a PostgreSQL database using this code:

+
+    connectionDetails <- createConnectionDetails(dbms = "postgresql", 
+                                                 server = "localhost/ohdsi", 
+                                                 user = "joe", 
+                                                 password = "supersecret")
+    
+    cdmDatabaseSchema <- "my_cdm_data"
+    cohortsDatabaseSchema <- "my_results"
+    cdmVersion <- "5"
+

The last three lines define the cdmDatabaseSchema and +cohortsDatabaseSchema variables, as well as the CDM +version. We will use these later to tell R where the data in CDM format +live, where we want to create the cohorts of interest, and what version +CDM is used. Note that for Microsoft SQL Server, databaseschemas need to +specify both the database and the schema, so for example +cdmDatabaseSchema <- "my_cdm_data.dbo".

+
+    library(SqlRender)
+    sql <- readSql("AceAngioCohorts.sql")
+    sql <- render(sql,
+                  cdmDatabaseSchema = cdmDatabaseSchema,
+                  cohortsDatabaseSchema = cohortsDatabaseSchema)
+    sql <- translate(sql, targetDialect = connectionDetails$dbms)
+    
+    connection <- connect(connectionDetails)
+    executeSql(connection, sql)
+

In this code, we first read the SQL from the file into memory. In the +next line, we replace four parameter names with the actual values. We +then translate the SQL into the dialect appropriate for the DBMS we +already specified in the connectionDetails. Next, we +connect to the server, and submit the rendered and translated SQL.

+

If all went well, we now have a table with the events of interest. We +can see how many events per type:

+
+    sql <- paste("SELECT cohort_definition_id, COUNT(*) AS count",
+                 "FROM @cohortsDatabaseSchema.AceAngioCohort",
+                 "GROUP BY cohort_definition_id")
+    sql <- render(sql, cohortsDatabaseSchema = cohortsDatabaseSchema)
+    sql <- translate(sql, targetDialect = connectionDetails$dbms)
+    
+    querySql(connection, sql)
+
##   cohort_definition_id count
+## 1                    1     0
+## 2                    2     0
+
+
+

Study script creation +

+

In this section we assume that our cohorts have been created either +by using ATLAS or a custom SQL script. We will first explain how to +create an R script yourself that will execute our study as we have +defined earlier.

+
+
+

Data extraction +

+

Now we can tell PatientLevelPrediction to extract all +necessary data for our analysis. This is done using the FeatureExtractionPackage. +In short the FeatureExtractionPackage allows you to specify which +features (covariates) need to be extracted, e.g. all conditions and drug +exposures. It also supports the creation of custom covariates. For more +detailed information on the FeatureExtraction package see its vignettes. For our +example study we decided to use these settings:

+
+    covariateSettings <- createCovariateSettings(useDemographicsGender = TRUE,
+                                                 useDemographicsAge = TRUE,
+                                                 useConditionGroupEraLongTerm = TRUE,
+                                                 useConditionGroupEraAnyTimePrior = TRUE,
+                                                 useDrugGroupEraLongTerm = TRUE,
+                                                 useDrugGroupEraAnyTimePrior = TRUE,
+                                                 useVisitConceptCountLongTerm = TRUE,
+                                                 longTermStartDays = -365,
+                                                 endDays = -1)
+

The final step for extracting the data is to run the +getPlpData function and input the connection details, the +database schema where the cohorts are stored, the cohort definition ids +for the cohort and outcome, and the washoutPeriod which is the minimum +number of days prior to cohort index date that the person must have been +observed to be included into the data, and finally input the previously +constructed covariate settings.

+
+databaseDetails <- createDatabaseDetails(
+  connectionDetails = connectionDetails,
+  cdmDatabaseSchema = cdmDatabaseSchema,
+  cohortDatabaseSchema = resultsDatabaseSchema,
+  cohortTable = 'AceAngioCohort',
+  cohortId = 1,
+  outcomeDatabaseSchema = resultsDatabaseSchema,
+  outcomeTable = 'AceAngioCohort',
+  outcomeIds = 2
+  )
+
+restrictPlpDataSettings <- createRestrictPlpDataSettings(
+  sampleSize = 10000
+  )
+
+plpData <- getPlpData(
+  databaseDetails = databaseDetails, 
+  covariateSettings = covariateSettings, 
+  restrictPlpDataSettings = restrictPlpDataSettings
+  )
+

Note that if the cohorts are created in ATLAS its corresponding +cohort database schema needs to be selected. There are many additional +parameters for the getPlpData function which are all +documented in the PatientLevelPrediction manual. The +resulting plpData object uses the package ff +to store information in a way that ensures R does not run out of memory, +even when the data are large.

+

Creating the plpData object can take considerable +computing time, and it is probably a good idea to save it for future +sessions. Because plpData uses ff, we cannot +use R’s regular save function. Instead, we’ll have to use the +savePlpData() function:

+
+savePlpData(plpData, "angio_in_ace_data")
+

We can use the loadPlpData() function to load the data +in a future session.

+
+
+

Additional inclusion criteria +

+

To completely define the prediction problem the final study +population is obtained by applying additional constraints on the two +earlier defined cohorts, e.g., a minumim time at risk can be enforced +(requireTimeAtRisk, minTimeAtRisk) and we can specify if +this also applies to patients with the outcome +(includeAllOutcomes). Here we also specify the start and +end of the risk window relative to target cohort start. For example, if +we like the risk window to start 30 days after the at-risk cohort start +and end a year later we can set riskWindowStart = 30 and +riskWindowEnd = 365. In some cases the risk window needs to +start at the cohort end date. This can be achieved by setting +addExposureToStart = TRUE which adds the cohort (exposure) +time to the start date.

+

In Appendix 1, we demonstrate the effect of these settings on the +subset of the persons in the target cohort that end up in the final +study population.

+

In the example below all the settings we defined for our study are +imposed:

+
+    populationSettings <- createStudyPopulationSettings(
+      washoutPeriod = 364,
+      firstExposureOnly = FALSE,
+      removeSubjectsWithPriorOutcome = TRUE,
+      priorOutcomeLookback = 9999,
+      riskWindowStart = 1,
+      riskWindowEnd = 365, 
+      minTimeAtRisk = 364,
+      startAnchor = 'cohort start',
+      endAnchor = 'cohort start',
+      requireTimeAtRisk = TRUE,
+      includeAllOutcomes = TRUE
+    )
+
+
+

Spliting the data into training/validation/testing datasets +

+

When developing a prediction model using supervised learning (when +you have features paired with labels for a set of patients), the first +step is to design the development/internal validation process. This +requires specifying how to select the model hyper-parameters, how to +learn the model parameters and how to fairly evaluate the model. In +general, the validation set is used to pick hyper-parameters, the +training set is used to learn the model parameters and the test set is +used to perform fair internal validation. However, cross-validation can +be implemented to pick the hyper-parameters on the training data (so a +validation data set is not required). Cross validation can also be used +to estimate internal validation (so a testing data set is not +required).

+

In small data the best approach for internal validation has been +shown to be boostrapping. However, in big data (many patients and many +features) bootstrapping is generally not feasible. In big data our +research has shown that it is just important to have some form of fair +evaluation (use a test set or cross validation). For full details see our BMJ open paper.

+

In the PatientLevelPrediction package, the splitSettings define how +the plpData are partitioned into training/validation/testing data. Cross +validation is always done, but using a test set is optional (when the +data are small, it may be optimal to not use a test set). For the +splitSettings we can use the type (stratified/time/subject) and +testFraction parameters to split the data in a 75%-25% split and run the +patient-level prediction pipeline:

+
+  splitSettings <- createDefaultSplitSetting(
+    trainFraction = 0.75,
+    testFraction = 0.25,
+    type = 'stratified',
+    nfold = 2, 
+    splitSeed = 1234
+    )
+

Note: it is possible to add a custom method to specify how the +plpData are partitioned into training/validation/testing data, see vignette +for custom splitting.

+
+
+

Preprocessing the training data +

+

There a numerous data processing settings that a user must specify +when developing a prediction model. These are: * Whether to under-sample +or over-sample the training data (this may be useful when there is class +imballance (e.g., the outcome is very rare or very common)) * Whether to +perform feature engineering or feature selection (e.g., create latent +variables that are not observed in the data or reduce the dimensionality +of the data) * Whether to remove redundant features and normalize the +data (this is required for some models)

+

The default sample settings does nothing, it simply returns the +trainData as input, see below:

+
+  sampleSettings <- createSampleSettings()
+

However, the current package contains methods of under-sampling the +non-outcome patients. To perform undersampling, the type +input should be ‘underSample’ and +numberOutcomestoNonOutcomes must be specified (an integer +specifying the number of non-outcomes per outcome). It is possible to +add any custom function for over/under sampling, see vignette +for custom sampling.

+

It is possible to specify a combination of feature engineering +functions that take as input the trainData and output a new trainData +with different features. The default feature engineering setting does +nothing:

+
+  featureEngineeringSettings <- createFeatureEngineeringSettings()
+

However, it is possible to add custom feature engineering functions +into the pipeline, see vignette +for custom feature engineering.

+

Finally, the preprocessing setting is required. For this setting the +user can define minFraction, this removes any features that +is observed in the training data for less than 0.01 fraction of the +patients. So, if minFraction = 0.01 then any feature that +is seen in less than 1 percent of the target population is removed. The +input normalize specifies whether the features are scaled +between 0 and 1, this is required for certain models (e.g., LASSO +logistic regression). The input removeRedundancy specifies +whether features that are observed in all of the target population are +removed.

+
+  preprocessSettingsSettings <- createPreprocessSettings(
+    minFraction = 0.01, 
+    normalize = T, 
+    removeRedundancy = T
+      )
+
+
+

Model Development +

+

In the set function of an algorithm the user can specify a list of +eligible values for each hyper-parameter. All possible combinations of +the hyper-parameters are included in a so-called grid search using +cross-validation on the training set. If a user does not specify any +value then the default value is used instead.

+

For example, if we use the following settings for the +gradientBoostingMachine: ntrees=c(100,200), maxDepth=4 the grid search +will apply the gradient boosting machine algorithm with ntrees=100 and +maxDepth=4 plus the default settings for other hyper-parameters and +ntrees=200 and maxDepth=4 plus the default settings for other +hyper-parameters. The hyper-parameters that lead to the +bestcross-validation performance will then be chosen for the final +model. For our problem we choose to build a logistic regression model +with the default hyper-parameters

+
+gbmModel <- setGradientBoostingMachine(ntrees = 5000, maxDepth = c(4, 7, 10), learnRate = c(0.001,
+    0.01, 0.1, 0.9))
+

The runPlP function requires the plpData, +the outcomeId specifying the outcome being predicted and +the settings: populationSettings, +splitSettings, sampleSettings, +featureEngineeringSettings, preprocessSettings +and modelSettings to train and evaluate the model.

+
+  gbmResults <- runPlp(
+    plpData = plpData,
+    outcomeId = 2, 
+    analysisId = 'singleDemo2',
+    analysisName = 'Demonstration of runPlp for training single PLP models',
+    populationSettings = populationSettings, 
+    splitSettings = splitSettings,
+    sampleSettings = sampleSettings, 
+    featureEngineeringSettings = featureEngineeringSettings, 
+    preprocessSettings = preprocessSettings,
+    modelSettings = gbmModel,
+    logSettings = createLogSettings(), 
+    executeSettings = createExecuteSettings(
+      runSplitData = T, 
+      runSampleData = T, 
+      runfeatureEngineering = T, 
+      runPreprocessData = T, 
+      runModelDevelopment = T, 
+      runCovariateSummary = T
+    ), 
+    saveDirectory = file.path(getwd(), 'singlePlpExample2')
+    )
+

Under the hood the package will now use the R xgboost package to fit +a a gradient boosting machine model using 75% of the data and will +evaluate the model on the remaining 25%. A results data structure is +returned containing information about the model, its performance +etc.

+

You can save the model using:

+
+savePlpModel(gbmResults$model, dirPath = file.path(getwd(), "model"))
+

You can load the model using:

+
+plpModel <- loadPlpModel(file.path(getwd(), "model"))
+

You can also save the full results structure using:

+
+savePlpResult(gbmResults, location = file.path(getwd(), "gbm"))
+

To load the full results structure use:

+
+gbmResults <- loadPlpResult(file.path(getwd(), "gbm"))
+
+
+
+
+
+

Study package creation +

+

The script we created manually above can also be automatically +created using a powerful feature in ATLAS. By creating a new prediction +study (left menu) you can select the Target and Outcome as created in +ATLAS, set all the study parameters, and then you can download a R +package that you can use to execute your study. What is really powerful +is that you can add multiple Ts, Os, covariate settings etc. The package +will then run all the combinations of automatically as separate +analyses. The screenshots below explain this process.

+
    +
  1. +
    +Create a new prediction study and select your target and outcome +cohorts. +
    +
    + +
    +
  2. +
  3. +
    +
    +Specify one or more analysis settings. +
    +
    + +
    +
    +
    +
  4. +
  5. +
    +Specify the trainings settigns +
    +
    + +
    +
  6. +
  7. +
    +
    +Specify the execution settings +
    +
    + +
    +
    +]
  8. +
+

ATLAS can build a R package for you that will execute the full study +against you CDM. Below the steps are explained how to do this in +ATLAS.

+
    +
  1. +
    +

    Under utilities you can find download. Click on the button to review +the full study specification

    +
    +
    +
    +R package download functionality in ATLAS
    R package download functionality in ATLAS
    +
    +
    +
  2. +
  3. +
    +

    You now have to review that you indeed want to run all these analyses +(cartesian product of all the settings for each T and O combination.

    +
    +
    +
    +R package download functionality in ATLAS
    R package download functionality in ATLAS
    +
    +
    +
  4. +
  5. If you agree, you give the package a name, and download the +package as a zipfile.

  6. +
  7. By opening the R package in R studio and building the package you +can run the study using the execute function. Theres is +also an example CodeToRun.R script available in the extras folder of the +package with extra instructions.

  8. +
+
+
+

Internal validation +

+

Once we execute the study, the runPlp() function returns the trained +model and the evaluation of the model on the train/test sets.

+

You can interactively view the results by running: +viewPlp(runPlp=lrResults). This will generate a Shiny App +in your browser in which you can view all performance measures created +by the framework as shown in the figure below.

+
+
+Summary of all the performance measures of the analyses
Summary of all the performance measures of the +analyses
+
+
+

Furthermore, many interactive plots are available in the Shiny App, +for example the ROC curve in which you can move over the plot to see the +threshold and the corresponding sensitivity and specificity values.

+
+
+Example of the interactive ROC curve
Example of the interactive ROC curve
+
+
+

To generate and save all the evaluation plots to a folder run the +following code:

+
+plotPlp(lrResults, dirPath = getwd())
+

The plots are described in more detail in the next sections.

+
+
+

Discrimination +

+

The Receiver Operating Characteristics (ROC) plot shows the +sensitivity against 1-specificity on the test set. The plot illustrates +how well the model is able to discriminate between the people with the +outcome and those without. The dashed diagonal line is the performance +of a model that randomly assigns predictions. The higher the area under +the ROC plot the better the discrimination of the model. The plot is +created by changing the probability threshold to assign the positive +class.

+
+
+Receiver Operating Characteristic Plot
Receiver Operating Characteristic Plot
+
+
+
+

## Calibration

+

The calibration plot shows how close the predicted risk is to the +observed risk. The diagonal dashed line thus indicates a perfectly +calibrated model. The ten (or fewer) dots represent the mean predicted +values for each quantile plotted against the observed fraction of people +in that quantile who had the outcome (observed fraction). The straight +black line is the linear regression using these 10 plotted quantile mean +predicted vs observed fraction points. The straight vertical lines +represented the 95% lower and upper confidence intervals of the slope of +the fitted line.

+
+
+Calibration Plot
Calibration Plot
+
+
+
+
+
+

Smooth Calibration +

+

Similar to the traditional calibration shown above the Smooth +Calibration plot shows the relationship between predicted and observed +risk. the major difference is that the smooth fit allows for a more fine +grained examination of this. Whereas the traditional plot will be +heavily influenced by the areas with the highest density of data the +smooth plot will provide the same information for this region as well as +a more accurate interpretation of areas with lower density. the plot +also contains information on the distribution of the outcomes relative +to predicted risk.

+

However, the increased information gain comes at a computational +cost. It is recommended to use the traditional plot for examination and +then to produce the smooth plot for final versions. To create the smooth +calibarion plot you have to run the follow command:

+ +

See the help function for more information, on how to set the +smoothing method etc.

+

The example below is from another study that better demonstrates the +impact of using a smooth calibration plot. The default line fit would +not highlight the miss-calibration at the lower predicted probability +levels that well.

+
+
+Smooth Calibration plot
Smooth Calibration plot
+
+
+
+

## Preference distribution

+

The preference distribution plots are the preference score +distributions corresponding to i) people in the test set with the +outcome (red) and ii) people in the test set without the outcome +(blue).

+
+
+Preference Plot
Preference Plot
+
+
+
+

## Predicted probability distribution

+

The prediction distribution box plots are for the predicted risks of +the people in the test set with the outcome (class 1: blue) and without +the outcome (class 0: red).

+

The box plots in the Figure show that the predicted probability of +the outcome is indeed higher for those with the outcome but there is +also overlap between the two distribution which lead to an imperfect +discrimination.

+
+
+Prediction Distribution Box Plot
Prediction Distribution Box Plot
+
+
+
+

## Test-Train similarity

+

The test-train similarity is assessed by plotting the mean covariate +values in the train set against those in the test set for people with +and without the outcome.

+

The results for our example of look very promising since the mean +values of the covariates are on the diagonal.

+
+
+Similarity plots of train and test set
Similarity plots of train and test set
+
+
+
+

## Variable scatter plot

+

The variable scatter plot shows the mean covariate value for the +people with the outcome against the mean covariate value for the people +without the outcome. The color of the dots corresponds to the inclusion +(green) or exclusion in the model (blue), respectively. It is highly +recommended to use the Shiny App since this allows you to hoover over a +covariate to show more details (name, value etc).

+

The plot shows that the mean of most of the covariates is higher for +subjects with the outcome compared to those without.

+
+
+Variabel scatter Plot
Variabel scatter Plot
+
+
+
+

## Precision recall

+

Precision (P) is defined as the number of true positives (Tp) over +the number of true positives plus the number of false positives +(Fp).

+
+P <- Tp/(Tp + Fp)
+

Recall (R) is defined as the number of true positives (Tp) over the +number of true positives plus the number of false negatives (Fn).

+
+R <- Tp/(Tp + Fn)
+

These quantities are also related to the (F1) score, which is defined +as the harmonic mean of precision and recall.

+
+F1 <- 2 * P * R/(P + R)
+

Note that the precision can either decrease or increase if the +threshold is lowered. Lowering the threshold of a classifier may +increase the denominator, by increasing the number of results returned. +If the threshold was previously set too high, the new results may all be +true positives, which will increase precision. If the previous threshold +was about right or too low, further lowering the threshold will +introduce false positives, decreasing precision.

+

For Recall the denominator does not depend on the classifier +threshold (Tp+Fn is a constant). This means that lowering the classifier +threshold may increase recall, by increasing the number of true positive +results. It is also possible that lowering the threshold may leave +recall unchanged, while the precision fluctuates.

+
+
+Precision Recall Plot
Precision Recall Plot
+
+
+
+

## Demographic summary

+

This plot shows for females and males the expected and observed risk +in different age groups together with a confidence area.

+

The results show that our model is well calibrated across gender and +age groups.

+
+
+Demographic Summary Plot
Demographic Summary Plot
+
+
+
+

# External validation

+

We recommend to always perform external validation, i.e. apply the +final model on as much new datasets as feasible and evaluate its +performance.

+
+# load the trained model
+plpModel <- loadPlpModel(getwd(),'model')
+
+# add details of new database
+validationDatabaseDetails <- createDatabaseDetails()
+
+# to externally validate the model and perform recalibration run:
+externalValidateDbPlp(
+  plpModel = plpModel,
+  validationDatabaseDetails = validationDatabaseDetails,
+  validationRestrictPlpDataSettings = plpModel$settings$plpDataSettings,
+  settings = createValidationSettings(
+    recalibrate = 'weakRecalibration'
+    ),
+  outputFolder = getwd()
+)
+

This will extract the new plpData from the specified schemas and +cohort tables. It will then apply the same population settings and the +trained plp model. Finally, it will evaluate the performance and return +the standard output as validation$performanceEvaluation and +it will also return the prediction on the population as +validation$prediction. They can be inserted into the shiny +app for viewing the model and validation by running: +viewPlp(runPlp=plpResult, validatePlp=validation ).

+
+
+
+
+

Other functionality +

+

The package has much more functionality than described in this +vignette and contributions have been made my many persons in the OHDSI +community. The table below provides an overview:

+ +++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FunctionalityDescriptionVignette
Builing Multiple ModelsThis vignette describes how you can run multiple models +automaticallyVignette
Custom ModelsThis vignette describes how you can add your own custom algorithms +in the frameworkVignette
Custom Splitting FunctionsThis vignette describes how you can add your own custom +training/validation/testing splitting functions in the frameworkVignette
Custom Sampling FunctionsThis vignette describes how you can add your own custom sampling +functions in the frameworkVignette
Custom Feature Engineering/SelectionThis vignette describes how you can add your own custom feature +engineering and selection functions in the frameworkVignette
Ensemble modelsThis vignette describes how you can use the framework to build +ensemble models, i.e combine multiple models in a super learnerVignette
Learning curvesLearning curves assess the effect of training set size on model +performance by training a sequence of prediction models on successively +larger subsets of the training set. A learning curve plot can also help +in diagnosing a bias or variance problem as explained below.Vignette
+
+
+

Demos +

+

We have added several demos in the package that run on simulated +data:

+
+# Show all demos in our package: 
+demo(package = "PatientLevelPrediction")
+
+# For example, to run the SingleModelDemo that runs Lasso and shows you how to run the Shiny App use this call
+demo("SingleModelDemo", package = "PatientLevelPrediction")
+
+
+
+

Acknowledgments +

+

Considerable work has been dedicated to provide the +PatientLevelPrediction package.

+
+citation("PatientLevelPrediction")
+
## 
+## To cite PatientLevelPrediction in publications use:
+## 
+##   Reps JM, Schuemie MJ, Suchard MA, Ryan PB, Rijnbeek P (2018). "Design
+##   and implementation of a standardized framework to generate and
+##   evaluate patient-level prediction models using observational
+##   healthcare data." _Journal of the American Medical Informatics
+##   Association_, *25*(8), 969-975.
+##   <https://doi.org/10.1093/jamia/ocy032>.
+## 
+## A BibTeX entry for LaTeX users is
+## 
+##   @Article{,
+##     author = {J. M. Reps and M. J. Schuemie and M. A. Suchard and P. B. Ryan and P. Rijnbeek},
+##     title = {Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data},
+##     journal = {Journal of the American Medical Informatics Association},
+##     volume = {25},
+##     number = {8},
+##     pages = {969-975},
+##     year = {2018},
+##     url = {https://doi.org/10.1093/jamia/ocy032},
+##   }
+

Further, PatientLevelPrediction makes extensive use of +the Cyclops package.

+
+citation("Cyclops")
+
## 
+## To cite Cyclops in publications use:
+## 
+##   Suchard MA, Simpson SE, Zorych I, Ryan P, Madigan D (2013). "Massive
+##   parallelization of serial inference algorithms for complex
+##   generalized linear models." _ACM Transactions on Modeling and
+##   Computer Simulation_, *23*, 10.
+##   <https://dl.acm.org/doi/10.1145/2414416.2414791>.
+## 
+## A BibTeX entry for LaTeX users is
+## 
+##   @Article{,
+##     author = {M. A. Suchard and S. E. Simpson and I. Zorych and P. Ryan and D. Madigan},
+##     title = {Massive parallelization of serial inference algorithms for complex generalized linear models},
+##     journal = {ACM Transactions on Modeling and Computer Simulation},
+##     volume = {23},
+##     pages = {10},
+##     year = {2013},
+##     url = {https://dl.acm.org/doi/10.1145/2414416.2414791},
+##   }
+

Please reference this paper if you use the PLP Package in +your work:

+

Reps JM, Schuemie +MJ, Suchard MA, Ryan PB, Rijnbeek PR. Design and implementation of a +standardized framework to generate and evaluate patient-level prediction +models using observational healthcare data. J Am Med Inform Assoc. +2018;25(8):969-975.

+

This work is supported in part through the National Science +Foundation grant IIS 1251151.

+
+
+
+

Appendix 1: Study population settings +details +

+

In the figures below the effect is shown of the +removeSubjectsWithPriorOutcome, requireTimAtRisk, and includeAllOutcomes +booleans on the final study population. We start with a Target Cohort +with firstExposureOnly = false and we require a washout period = 1095. +We then subset the target cohort based on additional constraints. The +final study population in the Venn diagrams below are colored green.

+
    +
  1. +
    +Require minimum time-at-risk for all person in the target cohort +
    +
    + +
    +
  2. +
  3. +
    +Require minumum time-at-risk for target cohort, except for persons with +outcomes during time-at-risk. +
    +
    + +
    +
  4. +
+

)

+
+
+Include all persons in the target cohort exclude persons with prior +outcomes +
+
+ +
+
+
    +
  1. +
    +Require minimum time-at-risk for target cohort, except for persons with +outcomes during time-at-risk, exclude persons with prior outcomes +
    +
    + +
    +
  2. +
+

)

+
+
+Include all persons in target cohort exclude persons with prior outcomes +
+
+ +
+
+
    +
  1. +
    +Include all persons in target cohort +
    +
    + +
    +
  2. +
+
+
+ + + +
+ + + + +
+ + + + + + + + diff --git a/docs/articles/ClinicalModels.html b/docs/articles/ClinicalModels.html new file mode 100644 index 000000000..d3cd57891 --- /dev/null +++ b/docs/articles/ClinicalModels.html @@ -0,0 +1,273 @@ + + + + + + + +Clinical Models • PatientLevelPrediction + + + + + + + + + + + + +
+
+ + + + +
+
+ + + + + +
+

Clinical models developed using the OHDSI PatientLevelPrediction +framework +

+ ++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
TitleLink
Using Machine Learning Applied to Real-World Healthcare Data for +Predictive Analytics: An Applied Example in Bariatric SurgeryValue +in Health
Development and validation of a prognostic model predicting +symptomatic hemorrhagic transformation in acute ischemic stroke at scale +in the OHDSI networkPLoS +One
Wisdom of the CROUD: development and validation of a patient-level +prediction model for opioid use disorder using population-level claims +dataPLoS +One
Developing predictive models to determine Patients in End-of-life +Care in Administrative datasetsDrug +Safety
Predictors of diagnostic transition from major depressive disorder +to bipolar disorder: a retrospective observational network studyTranslational +psychiatry
Seek COVER: using a disease proxy to rapidly develop and validate a +personalized risk calculator for COVID-19 outcomes in an international +networkBMC +Medical Research Methodology
90-Day all-cause mortality can be predicted following a total knee +replacement: an international, network study to develop and validate a +prediction modelKnee +Surgery, Sports Traumatology, Arthroscopy
Machine learning and real-world data to predict lung cancer risk in +routine careCancer +Epidemiology, Biomarkers & Prevention
Development and validation of a patient-level model to predict +dementia across a network of observational databasesBMC +medicine
+
+
+ + + +
+ + + + +
+ + + + + + + + diff --git a/docs/articles/ConstrainedPredictors.html b/docs/articles/ConstrainedPredictors.html new file mode 100644 index 000000000..24fa22693 --- /dev/null +++ b/docs/articles/ConstrainedPredictors.html @@ -0,0 +1,561 @@ + + + + + + + +Constrained predictors • PatientLevelPrediction + + + + + + + + + + + + +
+
+ + + + +
+
+ + + + + +
+

Constrained Predictors +

+
+

How to use the PhenotypeLibrary R package +

+

Here we provide a set of phenotypes that can be used as predictors in +prediction models or best practice research.

+

These phenotypes can be extracted from the PhenotypeLibrary R +package. To install the R package run:

+
+remotes::install_github('ohdsi/PhenotypeLibrary')
+

To extract the cohort definition for Alcoholism with an id of 1165, +just run:

+
+PhenotypeLibrary::getPlCohortDefinitionSet(1165)
+

in general you can extract all the cohorts by running:

+
+phenotypeDefinitions <- PhenotypeLibrary::getPlCohortDefinitionSet(1152:1215)
+
+
+

The full set of predictor phenotypes +

+ +++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Phenotype NameDisorder classificationOHDSI Phenotype library ID
AlcoholismBehavioral1165
SmokingBehavioral1166
AnemiaBlood1188
OsteoarthritisBone1184
OsteoporosisBone1185
CancerCancer1215
Atrial fibrillationCardiovascular1160
Congestive heart failureCardiovascular1154
Coronary artery diseaseCardiovascular1162
Heart valve disorderCardiovascular1172
HyperlipidemiaCardiovascular1170
HypertensionCardiovascular1198
AnginaCardiovascular1159
Skin UlcerDebility1168
Diabetes type 1Endocrine1193
Diabetes type 2Endocrine1194
HypothyroidismEndocrine1171
ObesityEndocrine1179
Gastroesophageal reflux disease (GERD)GI1178
Gastrointestinal (GI) bleedGI1197
Inflammatory bowel disorder (IBD)GI/Rheumatology1180
Hormonal contraceptivesGynecologic1190
Antibiotics AminoglycosidesInfection1201
Antibiotics CarbapenemsInfection1202
Antibiotics CephalosporinsInfection1203
Antibiotics FluoroquinolonesInfection1204
Antibiotics Glycopeptides and lipoglycopeptidesInfection1205
Antibiotics MacrolidesInfection1206
Antibiotics MonobactamsInfection1207
Antibiotics OxazolidinonesInfection1208
Antibiotics PenicillinsInfection1209
Antibiotics PolypeptidesInfection1210
Antibiotics RifamycinsInfection1211
Antibiotics SulfonamidesInfection1212
Antibiotics StreptograminsInfection1213
Antibiotics TetracyclinesInfection1214
PneumoniaInfection/Respiratory1199
SepsisInfection1176
Urinary tract infection (UTI)Infection1186
HepatitisLiver1169
AnxietyMood1189
Depression (MDD)Mood1161
Psychotic disorderMood1175
Antiepileptics (pain)Neurology/Pain1183
SeizureNeurology1153
Hemorrhagic strokeNeurology/Vascular1156
Non-hemorrhagic strokeNeurology/Vascular1155
Acetaminophen prescriptionPain/Infection1187
Low back painPain1173
NeuropathyPain/Neurology1174
OpioidsPain1182
Acute kidney injuryKidney1163
Chronic kidney diseaseKidney1191
AsthmaRespiratory1164
Chronic obstructive pulmonary disorder (COPD)Respiratory1192
DyspneaRespiratory1195
Respiratory failureRespiratory1177
Sleep apneaRespiratory1167
Rheumatoid arthritisRheumatology1200
SteroidsRheumatology/Pain/Pulmonary1181
Peripheral vascular diseaseVascular1157
AspirinVascular1158
Deep vein thrombosis (DVT)Vascular1152
EdemaVascular1196
Inpatient visitNANA
+
+
+
+ + + +
+ + + + +
+ + + + + + + + diff --git a/docs/articles/CreatingLearningCurves.html b/docs/articles/CreatingLearningCurves.html new file mode 100644 index 000000000..2f44ca17f --- /dev/null +++ b/docs/articles/CreatingLearningCurves.html @@ -0,0 +1,433 @@ + + + + + + + +Creating Learning Curves • PatientLevelPrediction + + + + + + + + + + + + +
+
+ + + + +
+
+ + + + + +
+

Introduction +

+

This vignette describes how you can use the Observational Health Data +Sciences and Informatics (OHDSI) PatientLevelPrediction +package to create learning curves. This vignette assumes you have read +and are comfortable with building patient level prediction models as +described in the BuildingPredictiveModels +vignette.

+

Prediction models will show overly-optimistic performance when +predicting on the same data as used for training. Therefore, +best-practice is to partition our data into a training set and testing +set. We then train our prediction model on the training set portion and +asses its ability to generalize to unseen data by measuring its +performance on the testing set.

+

Learning curves assess the effect of training set size on model +performance by training a sequence of prediction models on successively +larger subsets of the training set. A learning curve plot can also help +in diagnosing a bias or variance problem as explained below.

+
+Learning curve example.
Learning curve example.
+
+

Figure 1, shows an example of learning curve plot in which the +vertical axis represents the model performance and the horizontal axis +the training set size. If training set size is small, the performance on +the training set is high, because a model can often be fitted well to a +limited number of training examples. At the same time, the performance +on the testing set will be poor, because the model trained on such a +limited number of training examples will not generalize well to unseen +data in the testing set. As the training set size increases, the +performance of the model on the training set will decrease. It becomes +more difficult for the model to find a good fit through all the training +examples. Also, the model will be trained on a more representative +portion of training examples, making it generalize better to unseen +data. This can be observed by the increasin testing set performance.

+

The learning curve can help us in diagnosing bias and variance +problems with our classifier which will provide guidance on how to +further improve our model. We can observe high variance (overfitting) in +a prediction model if it performs well on the training set, but poorly +on the testing set (Figure 2). Adding additional data is a common +approach to counteract high variance. From the learning curve it becomes +apparent, that adding additional data may improve performance on the +testing set a little further, as the learning curve has not yet +plateaued and, thus, the model is not saturated yet. Therefore, adding +more data will decrease the gap between training set and testing set, +which is the main indicator for a high variance problem.

+
+Prediction model suffering from high variance.
Prediction model suffering from high +variance.
+
+

Furthermore, we can observe high bias (underfitting) if a prediction +model performs poorly on the training set as well as on the testing set +(Figure 3). The learning curves of training set and testing set have +flattened on a low performance with only a small gap in between them. +Adding additional data will in this case have little to no impact on the +model performance. Choosing another prediction algorithm that can find +more complex (for example non-linear) relationships in the data may be +an alternative approach to consider in this high bias situation.

+
+Prediction model suffering from high bias.
Prediction model suffering from high bias.
+
+
+
+

Creating the learning curve +

+

Use the PatientLevelPrediction +package to create a plpData object . Alternatively, you can +make use of the data simulator. The following code snippet creates data +for 12000 patients.

+
+set.seed(1234)
+data(plpDataSimulationProfile)
+sampleSize <- 12000
+plpData <- simulatePlpData(
+  plpDataSimulationProfile,
+  n = sampleSize
+)
+

Specify the population settings (this does additional exclusions such +as requiring minimum prior observation or no prior outcome as well as +specifying the time-at-risk period to enable labels to be created):

+
+populationSettings <- createStudyPopulationSettings(
+  binary = TRUE,
+  firstExposureOnly = FALSE,
+  washoutPeriod = 0,
+  removeSubjectsWithPriorOutcome = FALSE,
+  priorOutcomeLookback = 99999,
+  requireTimeAtRisk = FALSE,
+  minTimeAtRisk = 0,
+  riskWindowStart = 0,
+  riskWindowEnd = 365,
+  verbosity = "INFO"
+)
+

Specify the prediction algorithm to be used.

+
+# Use LASSO logistic regression
+modelSettings <- setLassoLogisticRegression()
+

Specify the split settings and a sequence of training set fractions +(these over ride the splitSetting trainFraction). Alternatively, instead +of trainFractions, you can provide a sequence of training +events (trainEvents) instead of the training set fractions. +This is recommended, because our research has shown that number of +events is the important determinant of model performance. Make sure that +your training set contains the number of events specified.

+
+splitSettings = createDefaultSplitSetting(
+  testFraction = 0.2,  
+  type = 'stratified',
+  splitSeed = 1000
+  )
+
+trainFractions <- seq(0.1, 0.8, 0.1) # Create eight training set fractions
+
+# alternatively use a sequence of training events by uncommenting the line below.
+# trainEvents <- seq(100, 5000, 100)
+

Create the learning curve object.

+
+learningCurve <- createLearningCurve(
+  plpData = plpData,
+  outcomeId = 2,  
+  parallel = T,
+  cores = 4,
+  modelSettings = modelSettings,
+  saveDirectory = getwd(),
+  analysisId = 'learningCurve',
+  populationSettings = populationSettings,
+  splitSettings = splitSettings,
+  trainFractions = trainFractions,
+  trainEvents = NULL,
+  preprocessSettings = createPreprocessSettings(
+    minFraction = 0.001,
+    normalize = T
+  ),
+  executeSettings = createExecuteSettings(
+    runSplitData = T, 
+    runSampleData = F,
+    runfeatureEngineering = F,
+    runPreprocessData = T,
+    runModelDevelopment = T,
+    runCovariateSummary = F
+    )
+)
+

Plot the learning curve object (Figure 4). Specify one of the +available metrics: AUROC, AUPRC, +sBrier. Moreover, you can specify what metric to put on the +abscissa, number of observations or number of +events. We recommend the latter, because +events are determinant of model performance and allow you +to better compare learning curves across different prediction problems +and databases.

+
+plotLearningCurve(
+  learningCurve,
+  metric = 'AUROC',
+  abscissa = 'events',
+  plotTitle = 'Learning Curve',
+  plotSubtitle = 'AUROC performance'
+)
+
+Learning curve plot.
Learning curve plot.
+
+
+
+

Parallel processing +

+

The learning curve object can be created in parallel, which can +reduce computation time significantly. Whether to run the code in +parallel or not is specified using the parallel input. +Currently this functionality is only available for LASSO logistic +regression and gradient boosting machines. Depending on the number of +parallel workers it may require a significant amount of memory. We +advise to use the parallelized learning curve function for parameter +search and exploratory data analysis.

+

When running in parrallel, R will find the number of available +processing cores automatically and register the required parallel +backend. Alternatively, you can provide the number of cores you wish to +use via the cores input.

+
+
+

Demo +

+

We have added a demo of the learningcurve:

+
+# Show all demos in our package: 
+ demo(package = "PatientLevelPrediction")
+
+# Run the learning curve
+ demo("LearningCurveDemo", package = "PatientLevelPrediction")
+

Do note that running this demo can take a considerable amount of time +(15 min on Quad core running in parallel)!

+
+
+

Publication +

+

A publication titled ‘How little data do we need for patient-level +prediction?’ uses the learning curve functionality in this package and +can be accessed as preprint in the arXiv archives at https://arxiv.org/abs/2008.07361.

+
+
+

Acknowledgments +

+

Considerable work has been dedicated to provide the +PatientLevelPrediction package.

+
+citation("PatientLevelPrediction")
+
## 
+## To cite PatientLevelPrediction in publications use:
+## 
+##   Reps JM, Schuemie MJ, Suchard MA, Ryan PB, Rijnbeek P (2018). "Design
+##   and implementation of a standardized framework to generate and
+##   evaluate patient-level prediction models using observational
+##   healthcare data." _Journal of the American Medical Informatics
+##   Association_, *25*(8), 969-975.
+##   <https://doi.org/10.1093/jamia/ocy032>.
+## 
+## A BibTeX entry for LaTeX users is
+## 
+##   @Article{,
+##     author = {J. M. Reps and M. J. Schuemie and M. A. Suchard and P. B. Ryan and P. Rijnbeek},
+##     title = {Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data},
+##     journal = {Journal of the American Medical Informatics Association},
+##     volume = {25},
+##     number = {8},
+##     pages = {969-975},
+##     year = {2018},
+##     url = {https://doi.org/10.1093/jamia/ocy032},
+##   }
+

Please reference this paper if you use the PLP Package in +your work:

+

Reps JM, Schuemie +MJ, Suchard MA, Ryan PB, Rijnbeek PR. Design and implementation of a +standardized framework to generate and evaluate patient-level prediction +models using observational healthcare data. J Am Med Inform Assoc. +2018;25(8):969-975.

+
+
+ + + +
+ + + + +
+ + + + + + + + diff --git a/docs/articles/CreatingNetworkStudies.html b/docs/articles/CreatingNetworkStudies.html new file mode 100644 index 000000000..bcf3648fa --- /dev/null +++ b/docs/articles/CreatingNetworkStudies.html @@ -0,0 +1,300 @@ + + + + + + + +Making patient-level predictive network study packages • PatientLevelPrediction + + + + + + + + + + + + +
+
+ + + + +
+
+ + + + + +
+
+

Introduction +

+

The OHDSI Patient Level Prediction (PLP) package provides the +framework to implement prediction models at scale. This can range from +developing a large number of models across sites (methodology and study +design insight) to extensive external validation of existing models in +the OHDSI PLP framework (model insight). This vignette describes how you +can use the PatientLevelPrediction package to create a +network study package.

+
+
+

Useful publication +

+

The open access publication A standardized +analytics pipeline for reliable and rapid development and validation of +prediction models using observational health data details the +process used to develop and validate prediction models using the OHDSI +prediction framework and tools. This publication describes each of the +steps and then demonstrates these by focusing on predicting death in +those who have covid-19.

+
+
+

Main steps for running a network study +

+
+

Step 1 – developing the study +

+
    +
  • Design the study: target/outcome cohort logic, concept sets for +medical definitions, settings for developing new model or validation of +adding existing models to framework. Suggestion: look in literature for +validated definitions.
  • +
  • Write a protocol that motivates the study and provides full details +(sufficient for people to replicate the study in the future).
  • +
  • Write an R package for implementing the study across diverse +computational environments [see guidance below for structure of package +and use the skeleton github package here: https://github.com/OHDSI/SkeletonPredictionStudy ]
  • +
+
+
+

Step 2 – implementing the study part 1 +

+
    +
  • Get contributors to install the package and dependencies. Ensure the +package is installed correctly for each contributor by asking them to +run the checkInstall functions (as specified in the +InstallationGuide).
  • +
  • Get contributors to run the createCohort function to inspect the +target/outcome definitions. If the definitions are not suitable for a +site, go back to step 1 and revise the cohort definitions.
  • +
+
+
+

Step 3 – implementing the study part 2 (make sure the package is +functioning as planned and the definitions are valid across sites) +

+
    +
  • Get contributors to run the main.R with the settings configured to +their environment
  • +
  • Get the contributors to submit the results
  • +
+
+
+

Step 4 – Publication +

+

The study creator has the first option to be first author, if he/she +does not wish to be first author then he/she can pick the most suitable +person from the contributors. All contributors will be listed as authors +on the paper. The last author will be the person who lead/managed the +study, if this was the first author then the first author can pick the +most suitable last author. All authors between the first and last author +will be alphabetical by last name.

+
+
+
+

Package Skeleton - File Structure +

+
    +
  • DESCRIPTION: This file describes the R package and the +dependencies
  • +
  • NAMESPACE: This file is created automatically by Roxygen
  • +
  • Readme.md: This file should provide the step by step guidance on +implementing the package
  • +
  • R
  • +
  • helpers.r: all the custom functions used by the package should be in +this file (e.g., checkInstall)
  • +
  • main.r: this file will call the functions in helpers.r to execute +the full study
  • +
  • submit.r: this file will be called at the end to submit the +compressed folder to the study creator/manager.
  • +
  • Man: this folder will contain the documentation for the functions in +helpers.r (this should be automatically generated by roxygen)
  • +
  • Inst
  • +
  • sql/sql_sever * targetCohort: the target cohort parameterised sql +code * outcomeCohort: the outcome cohort parameterised sql code
  • +
  • plp_models: place any PLP models here
  • +
  • Extras
  • +
+
+
+ + + +
+ + + + +
+ + + + + + + + diff --git a/docs/articles/Figure1.webp b/docs/articles/Figure1.webp new file mode 100644 index 000000000..42ad71d7f Binary files /dev/null and b/docs/articles/Figure1.webp differ diff --git a/docs/articles/InstallationGuide.html b/docs/articles/InstallationGuide.html new file mode 100644 index 000000000..054ce33de --- /dev/null +++ b/docs/articles/InstallationGuide.html @@ -0,0 +1,350 @@ + + + + + + + +Patient-Level Prediction Installation Guide • PatientLevelPrediction + + + + + + + + + + + + +
+
+ + + + +
+
+ + + + + +
+

Introduction +

+

This vignette describes how you need to install the Observational +Health Data Science and Informatics (OHDSI) PatientLevelPrediction +package under Windows, Mac, and Linux.

+
+
+

Software Prerequisites +

+
+

Windows Users +

+

Under Windows the OHDSI Patient Level Prediction (PLP) package +requires installing:

+ +
+
+

Mac/Linux Users +

+

Under Mac and Linux the OHDSI Patient Level Prediction (PLP) package +requires installing:

+ +
+
+
+

Installing the Package +

+

The preferred way to install the package is by using +remotes, which will automatically install the latest +release and all the latest dependencies.

+

If you do not want the official release you could install the +bleeding edge version of the package (latest develop branch).

+

Note that the latest develop branch could contain bugs, please report +them to us if you experience problems.

+
+

Installing PatientLevelPrediction using remotes +

+

To install using remotes run:

+
+install.packages("remotes")
+remotes::install_github("OHDSI/PatientLevelPrediction")
+

When installing make sure to close any other Rstudio sessions that +are using PatientLevelPrediction or any dependency. Keeping +Rstudio sessions open can cause locks that prevent the package +installing.

+
+
+
+

Creating Python Reticulate Environment +

+

Many of the classifiers in the PatientLevelPrediction +use a Python backend. To set up a python environment run:

+
+library(PatientLevelPrediction)
+reticulate::install_miniconda()
+configurePython(envname='r-reticulate', envtype='conda')
+
+
+

Installation issues +

+

Installation issues need to be posted in our issue tracker: http://github.com/OHDSI/PatientLevelPrediction/issues

+

The list below provides solutions for some common issues:

+
    +
  1. If you have an error when trying to install a package in R saying +‘Dependancy X not available …’ then this can sometimes +be fixed by running install.packages('X') and then once +that completes trying to reinstall the package that had the +error.

  2. +
  3. I have found that using the github remotes to +install packages can be impacted if you have multiple R +sessions open as one session with a library open can cause the +library to be locked and this can prevent an install of a package that +depends on that library.

  4. +
+
+

Common issues +

+
+

python environment Mac/linux users: +

+

to make sure R uses the r-reticulate python environment you may need +to edit your .Rprofile with the location of the python binary for the +PLP environment. Edit the .Rprofile by running:

+
+usethis::edit_r_profile()
+

and add

+
+Sys.setenv(PATH = paste("your python bin location", Sys.getenv("PATH"), sep=":"))
+

to the file then save. Where your python bin location is the location +returned by

+
+reticulate::conda_list() 
+

e.g., My PLP virtual environment location was +/anaconda3/envs/PLP/bin/python so I added:
+Sys.setenv(PATH = paste(“/anaconda3/envs/PLP/bin”, Sys.getenv(“PATH”), +sep=“:”))

+
+
+
+
+

Acknowledgments +

+

Considerable work has been dedicated to provide the +PatientLevelPrediction package.

+
+citation("PatientLevelPrediction")
+
## 
+## To cite PatientLevelPrediction in publications use:
+## 
+##   Reps JM, Schuemie MJ, Suchard MA, Ryan PB, Rijnbeek P (2018). "Design
+##   and implementation of a standardized framework to generate and
+##   evaluate patient-level prediction models using observational
+##   healthcare data." _Journal of the American Medical Informatics
+##   Association_, *25*(8), 969-975.
+##   <https://doi.org/10.1093/jamia/ocy032>.
+## 
+## A BibTeX entry for LaTeX users is
+## 
+##   @Article{,
+##     author = {J. M. Reps and M. J. Schuemie and M. A. Suchard and P. B. Ryan and P. Rijnbeek},
+##     title = {Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data},
+##     journal = {Journal of the American Medical Informatics Association},
+##     volume = {25},
+##     number = {8},
+##     pages = {969-975},
+##     year = {2018},
+##     url = {https://doi.org/10.1093/jamia/ocy032},
+##   }
+

Please reference this paper if you use the PLP Package in +your work:

+

Reps JM, Schuemie +MJ, Suchard MA, Ryan PB, Rijnbeek PR. Design and implementation of a +standardized framework to generate and evaluate patient-level prediction +models using observational healthcare data. J Am Med Inform Assoc. +2018;25(8):969-975.

+

This work is supported in part through the National Science +Foundation grant IIS 1251151.

+
+
+ + + +
+ + + + +
+ + + + + + + + diff --git a/docs/articles/Videos.html b/docs/articles/Videos.html new file mode 100644 index 000000000..7277e6d0f --- /dev/null +++ b/docs/articles/Videos.html @@ -0,0 +1,314 @@ + + + + + + + +Demo Videos • PatientLevelPrediction + + + + + + + + + + + + +
+
+ + + + +
+
+ + + + + +
+

What is a cohort table? +

+ ++++ + + + + + + + + +
Click To LaunchDescription of Demo
Video Vignette PLP PackageLearn what a cohort table looks like and what columns are +required.
+
+
+

Setting up a connection between your database and R +

+ ++++ + + + + + + + + +
Click To LaunchDescription of Demo
Video Vignette PLP PackageLearn how to configure the connection to your OMOP CDM data from R +using the OHDSI DatabaseConnector package.
+
+
+

Running a single PatientLevelPrediction model +

+ ++++ + + + + + + + + +
Click To LaunchDescription of Demo
Video Vignette PLP PackageLearn how to develop and validate a single PatientLevelPrediction +model.
+
+
+

Running multiple PatientLevelPrediction models study +

+ ++++ + + + + + + + + +
Click To LaunchDescription of Demo
Video Vignette PLP PackageLearn how to develop and validate multiple PatientLevelPrediction +models.
+
+
+

Exploring the results in the shiny app +

+ ++++ + + + + + + + + +
Click To LaunchDescription of Demo
Video Vignette PLP PackageLearn how to interactively explore the model performance and model +via the shiny apps viewPlp() and viewMultiplePlp()
+
+
+

Validating existing models on OMOP CDM data +

+ ++++ + + + + + + + + +
Click To LaunchDescription of Demo
Video Vignette PLP PackageThis demo shows how you can add any existing score or logistic model +and validate the model on new OMOP CDM data. This is useful for +benchmarking when developing new models or to perform extensive external +validation of a model across the OHDSI network.
+
+
+ + + +
+ + + + +
+ + + + + + + + diff --git a/docs/articles/atlasdownload1.webp b/docs/articles/atlasdownload1.webp new file mode 100644 index 000000000..6cac340ed Binary files /dev/null and b/docs/articles/atlasdownload1.webp differ diff --git a/docs/articles/atlasdownload2.webp b/docs/articles/atlasdownload2.webp new file mode 100644 index 000000000..452c5ca21 Binary files /dev/null and b/docs/articles/atlasdownload2.webp differ diff --git a/docs/articles/atlasplp1.webp b/docs/articles/atlasplp1.webp new file mode 100644 index 000000000..71a3c1ce9 Binary files /dev/null and b/docs/articles/atlasplp1.webp differ diff --git a/docs/articles/atlasplp3.webp b/docs/articles/atlasplp3.webp new file mode 100644 index 000000000..523d0143c Binary files /dev/null and b/docs/articles/atlasplp3.webp differ diff --git a/docs/articles/demographicSummary.webp b/docs/articles/demographicSummary.webp new file mode 100644 index 000000000..7d0437deb Binary files /dev/null and b/docs/articles/demographicSummary.webp differ diff --git a/docs/articles/example1/ATLAS_O.webp b/docs/articles/example1/ATLAS_O.webp new file mode 100644 index 000000000..85e63dc9e Binary files /dev/null and b/docs/articles/example1/ATLAS_O.webp differ diff --git a/docs/articles/example1/ATLAS_T.webp b/docs/articles/example1/ATLAS_T.webp new file mode 100644 index 000000000..df3a8245f Binary files /dev/null and b/docs/articles/example1/ATLAS_T.webp differ diff --git a/docs/articles/example2/aceinhibitors.webp b/docs/articles/example2/aceinhibitors.webp new file mode 100644 index 000000000..564f8af77 Binary files /dev/null and b/docs/articles/example2/aceinhibitors.webp differ diff --git a/docs/articles/example2/angioedema.webp b/docs/articles/example2/angioedema.webp new file mode 100644 index 000000000..8c728ce50 Binary files /dev/null and b/docs/articles/example2/angioedema.webp differ diff --git a/docs/articles/generalizability.webp b/docs/articles/generalizability.webp new file mode 100644 index 000000000..ba6d14de4 Binary files /dev/null and b/docs/articles/generalizability.webp differ diff --git a/docs/articles/index.html b/docs/articles/index.html new file mode 100644 index 000000000..631931e62 --- /dev/null +++ b/docs/articles/index.html @@ -0,0 +1,181 @@ + +Articles • PatientLevelPrediction + + +
+
+ + + +
+ +
+ + +
+ + + + + + + + diff --git a/docs/articles/learningCurve.png b/docs/articles/learningCurve.png new file mode 100644 index 000000000..19cd06691 Binary files /dev/null and b/docs/articles/learningCurve.png differ diff --git a/docs/articles/learningCurveBias.png b/docs/articles/learningCurveBias.png new file mode 100644 index 000000000..3bd9f580a Binary files /dev/null and b/docs/articles/learningCurveBias.png differ diff --git a/docs/articles/learningCurvePlot.png b/docs/articles/learningCurvePlot.png new file mode 100644 index 000000000..a5e1f9e96 Binary files /dev/null and b/docs/articles/learningCurvePlot.png differ diff --git a/docs/articles/learningCurveVariance.png b/docs/articles/learningCurveVariance.png new file mode 100644 index 000000000..3212e6106 Binary files /dev/null and b/docs/articles/learningCurveVariance.png differ diff --git a/docs/articles/popdef1.webp b/docs/articles/popdef1.webp new file mode 100644 index 000000000..83ef7afd6 Binary files /dev/null and b/docs/articles/popdef1.webp differ diff --git a/docs/articles/popdef2.webp b/docs/articles/popdef2.webp new file mode 100644 index 000000000..31887dd1b Binary files /dev/null and b/docs/articles/popdef2.webp differ diff --git a/docs/articles/popdef3.webp b/docs/articles/popdef3.webp new file mode 100644 index 000000000..8b409ed49 Binary files /dev/null and b/docs/articles/popdef3.webp differ diff --git a/docs/articles/popdef4.webp b/docs/articles/popdef4.webp new file mode 100644 index 000000000..2709497e7 Binary files /dev/null and b/docs/articles/popdef4.webp differ diff --git a/docs/articles/popdef5.webp b/docs/articles/popdef5.webp new file mode 100644 index 000000000..748b8901b Binary files /dev/null and b/docs/articles/popdef5.webp differ diff --git a/docs/articles/popdef6.webp b/docs/articles/popdef6.webp new file mode 100644 index 000000000..583dc9fba Binary files /dev/null and b/docs/articles/popdef6.webp differ diff --git a/docs/articles/precisionRecall.webp b/docs/articles/precisionRecall.webp new file mode 100644 index 000000000..af6b0cfe5 Binary files /dev/null and b/docs/articles/precisionRecall.webp differ diff --git a/docs/articles/preferencePDF.webp b/docs/articles/preferencePDF.webp new file mode 100644 index 000000000..189a356be Binary files /dev/null and b/docs/articles/preferencePDF.webp differ diff --git a/docs/articles/problems.webp b/docs/articles/problems.webp new file mode 100644 index 000000000..5c1c27bb4 Binary files /dev/null and b/docs/articles/problems.webp differ diff --git a/docs/articles/shinyroc.webp b/docs/articles/shinyroc.webp new file mode 100644 index 000000000..a11724623 Binary files /dev/null and b/docs/articles/shinyroc.webp differ diff --git a/docs/articles/shinysummary.webp b/docs/articles/shinysummary.webp new file mode 100644 index 000000000..0d256ade1 Binary files /dev/null and b/docs/articles/shinysummary.webp differ diff --git a/docs/articles/smoothCalibration.jpeg b/docs/articles/smoothCalibration.jpeg new file mode 100644 index 000000000..72c3cdb7a Binary files /dev/null and b/docs/articles/smoothCalibration.jpeg differ diff --git a/docs/articles/sparseCalibration.webp b/docs/articles/sparseCalibration.webp new file mode 100644 index 000000000..043019e5b Binary files /dev/null and b/docs/articles/sparseCalibration.webp differ diff --git a/docs/articles/sparseRoc.webp b/docs/articles/sparseRoc.webp new file mode 100644 index 000000000..2ea3ea56f Binary files /dev/null and b/docs/articles/sparseRoc.webp differ diff --git a/docs/articles/studydesign.webp b/docs/articles/studydesign.webp new file mode 100644 index 000000000..28717c7d2 Binary files /dev/null and b/docs/articles/studydesign.webp differ diff --git a/docs/articles/variableScatterplot.webp b/docs/articles/variableScatterplot.webp new file mode 100644 index 000000000..de6f8999d Binary files /dev/null and b/docs/articles/variableScatterplot.webp differ diff --git a/docs/authors.html b/docs/authors.html new file mode 100644 index 000000000..caa14ab0f --- /dev/null +++ b/docs/authors.html @@ -0,0 +1,201 @@ + +Authors and Citation • PatientLevelPrediction + + +
+
+ + + +
+
+
+ + + +
  • +

    Jenna Reps. Author, maintainer. +

    +
  • +
  • +

    Martijn Schuemie. Author. +

    +
  • +
  • +

    Marc Suchard. Author. +

    +
  • +
  • +

    Patrick Ryan. Author. +

    +
  • +
  • +

    Peter Rijnbeek. Author. +

    +
  • +
  • +

    Egill Fridgeirsson. Author. +

    +
  • +
+
+
+

Citation

+ Source: inst/CITATION +
+
+ + +

Reps JM, Schuemie MJ, Suchard MA, Ryan PB, Rijnbeek P (2018). +“Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data.” +Journal of the American Medical Informatics Association, 25(8), 969-975. +https://doi.org/10.1093/jamia/ocy032. +

+
@Article{,
+  author = {J. M. Reps and M. J. Schuemie and M. A. Suchard and P. B. Ryan and P. Rijnbeek},
+  title = {Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data},
+  journal = {Journal of the American Medical Informatics Association},
+  volume = {25},
+  number = {8},
+  pages = {969-975},
+  year = {2018},
+  url = {https://doi.org/10.1093/jamia/ocy032},
+}
+ +
+ +
+ + + +
+ + + + + + + + diff --git a/docs/bootstrap-toc.css b/docs/bootstrap-toc.css new file mode 100644 index 000000000..5a859415c --- /dev/null +++ b/docs/bootstrap-toc.css @@ -0,0 +1,60 @@ +/*! + * Bootstrap Table of Contents v0.4.1 (http://afeld.github.io/bootstrap-toc/) + * Copyright 2015 Aidan Feldman + * Licensed under MIT (https://github.com/afeld/bootstrap-toc/blob/gh-pages/LICENSE.md) */ + +/* modified from https://github.com/twbs/bootstrap/blob/94b4076dd2efba9af71f0b18d4ee4b163aa9e0dd/docs/assets/css/src/docs.css#L548-L601 */ + +/* All levels of nav */ +nav[data-toggle='toc'] .nav > li > a { + display: block; + padding: 4px 20px; + font-size: 13px; + font-weight: 500; + color: #767676; +} +nav[data-toggle='toc'] .nav > li > a:hover, +nav[data-toggle='toc'] .nav > li > a:focus { + padding-left: 19px; + color: #563d7c; + text-decoration: none; + background-color: transparent; + border-left: 1px solid #563d7c; +} +nav[data-toggle='toc'] .nav > .active > a, +nav[data-toggle='toc'] .nav > .active:hover > a, +nav[data-toggle='toc'] .nav > .active:focus > a { + padding-left: 18px; + font-weight: bold; + color: #563d7c; + background-color: transparent; + border-left: 2px solid #563d7c; +} + +/* Nav: second level (shown on .active) */ +nav[data-toggle='toc'] .nav .nav { + display: none; /* Hide by default, but at >768px, show it */ + padding-bottom: 10px; +} +nav[data-toggle='toc'] .nav .nav > li > a { + padding-top: 1px; + padding-bottom: 1px; + padding-left: 30px; + font-size: 12px; + font-weight: normal; +} +nav[data-toggle='toc'] .nav .nav > li > a:hover, +nav[data-toggle='toc'] .nav .nav > li > a:focus { + padding-left: 29px; +} +nav[data-toggle='toc'] .nav .nav > .active > a, +nav[data-toggle='toc'] .nav .nav > .active:hover > a, +nav[data-toggle='toc'] .nav .nav > .active:focus > a { + padding-left: 28px; + font-weight: 500; +} + +/* from https://github.com/twbs/bootstrap/blob/e38f066d8c203c3e032da0ff23cd2d6098ee2dd6/docs/assets/css/src/docs.css#L631-L634 */ +nav[data-toggle='toc'] .nav > .active > ul { + display: block; +} diff --git a/docs/bootstrap-toc.js b/docs/bootstrap-toc.js new file mode 100644 index 000000000..1cdd573b2 --- /dev/null +++ b/docs/bootstrap-toc.js @@ -0,0 +1,159 @@ +/*! + * Bootstrap Table of Contents v0.4.1 (http://afeld.github.io/bootstrap-toc/) + * Copyright 2015 Aidan Feldman + * Licensed under MIT (https://github.com/afeld/bootstrap-toc/blob/gh-pages/LICENSE.md) */ +(function() { + 'use strict'; + + window.Toc = { + helpers: { + // return all matching elements in the set, or their descendants + findOrFilter: function($el, selector) { + // http://danielnouri.org/notes/2011/03/14/a-jquery-find-that-also-finds-the-root-element/ + // http://stackoverflow.com/a/12731439/358804 + var $descendants = $el.find(selector); + return $el.filter(selector).add($descendants).filter(':not([data-toc-skip])'); + }, + + generateUniqueIdBase: function(el) { + var text = $(el).text(); + var anchor = text.trim().toLowerCase().replace(/[^A-Za-z0-9]+/g, '-'); + return anchor || el.tagName.toLowerCase(); + }, + + generateUniqueId: function(el) { + var anchorBase = this.generateUniqueIdBase(el); + for (var i = 0; ; i++) { + var anchor = anchorBase; + if (i > 0) { + // add suffix + anchor += '-' + i; + } + // check if ID already exists + if (!document.getElementById(anchor)) { + return anchor; + } + } + }, + + generateAnchor: function(el) { + if (el.id) { + return el.id; + } else { + var anchor = this.generateUniqueId(el); + el.id = anchor; + return anchor; + } + }, + + createNavList: function() { + return $(''); + }, + + createChildNavList: function($parent) { + var $childList = this.createNavList(); + $parent.append($childList); + return $childList; + }, + + generateNavEl: function(anchor, text) { + var $a = $(''); + $a.attr('href', '#' + anchor); + $a.text(text); + var $li = $('
  • '); + $li.append($a); + return $li; + }, + + generateNavItem: function(headingEl) { + var anchor = this.generateAnchor(headingEl); + var $heading = $(headingEl); + var text = $heading.data('toc-text') || $heading.text(); + return this.generateNavEl(anchor, text); + }, + + // Find the first heading level (`

    `, then `

    `, etc.) that has more than one element. Defaults to 1 (for `

    `). + getTopLevel: function($scope) { + for (var i = 1; i <= 6; i++) { + var $headings = this.findOrFilter($scope, 'h' + i); + if ($headings.length > 1) { + return i; + } + } + + return 1; + }, + + // returns the elements for the top level, and the next below it + getHeadings: function($scope, topLevel) { + var topSelector = 'h' + topLevel; + + var secondaryLevel = topLevel + 1; + var secondarySelector = 'h' + secondaryLevel; + + return this.findOrFilter($scope, topSelector + ',' + secondarySelector); + }, + + getNavLevel: function(el) { + return parseInt(el.tagName.charAt(1), 10); + }, + + populateNav: function($topContext, topLevel, $headings) { + var $context = $topContext; + var $prevNav; + + var helpers = this; + $headings.each(function(i, el) { + var $newNav = helpers.generateNavItem(el); + var navLevel = helpers.getNavLevel(el); + + // determine the proper $context + if (navLevel === topLevel) { + // use top level + $context = $topContext; + } else if ($prevNav && $context === $topContext) { + // create a new level of the tree and switch to it + $context = helpers.createChildNavList($prevNav); + } // else use the current $context + + $context.append($newNav); + + $prevNav = $newNav; + }); + }, + + parseOps: function(arg) { + var opts; + if (arg.jquery) { + opts = { + $nav: arg + }; + } else { + opts = arg; + } + opts.$scope = opts.$scope || $(document.body); + return opts; + } + }, + + // accepts a jQuery object, or an options object + init: function(opts) { + opts = this.helpers.parseOps(opts); + + // ensure that the data attribute is in place for styling + opts.$nav.attr('data-toggle', 'toc'); + + var $topContext = this.helpers.createChildNavList(opts.$nav); + var topLevel = this.helpers.getTopLevel(opts.$scope); + var $headings = this.helpers.getHeadings(opts.$scope, topLevel); + this.helpers.populateNav($topContext, topLevel, $headings); + } + }; + + $(function() { + $('nav[data-toggle="toc"]').each(function(i, el) { + var $nav = $(el); + Toc.init($nav); + }); + }); +})(); diff --git a/docs/docsearch.css b/docs/docsearch.css new file mode 100644 index 000000000..e5f1fe1df --- /dev/null +++ b/docs/docsearch.css @@ -0,0 +1,148 @@ +/* Docsearch -------------------------------------------------------------- */ +/* + Source: https://github.com/algolia/docsearch/ + License: MIT +*/ + +.algolia-autocomplete { + display: block; + -webkit-box-flex: 1; + -ms-flex: 1; + flex: 1 +} + +.algolia-autocomplete .ds-dropdown-menu { + width: 100%; + min-width: none; + max-width: none; + padding: .75rem 0; + background-color: #fff; + background-clip: padding-box; + border: 1px solid rgba(0, 0, 0, .1); + box-shadow: 0 .5rem 1rem rgba(0, 0, 0, .175); +} + +@media (min-width:768px) { + .algolia-autocomplete .ds-dropdown-menu { + width: 175% + } +} + +.algolia-autocomplete .ds-dropdown-menu::before { + display: none +} + +.algolia-autocomplete .ds-dropdown-menu [class^=ds-dataset-] { + padding: 0; + background-color: rgb(255,255,255); + border: 0; + max-height: 80vh; +} + +.algolia-autocomplete .ds-dropdown-menu .ds-suggestions { + margin-top: 0 +} + +.algolia-autocomplete .algolia-docsearch-suggestion { + padding: 0; + overflow: visible +} + +.algolia-autocomplete .algolia-docsearch-suggestion--category-header { + padding: .125rem 1rem; + margin-top: 0; + font-size: 1.3em; + font-weight: 500; + color: #00008B; + border-bottom: 0 +} + +.algolia-autocomplete .algolia-docsearch-suggestion--wrapper { + float: none; + padding-top: 0 +} + +.algolia-autocomplete .algolia-docsearch-suggestion--subcategory-column { + float: none; + width: auto; + padding: 0; + text-align: left +} + +.algolia-autocomplete .algolia-docsearch-suggestion--content { + float: none; + width: auto; + padding: 0 +} + +.algolia-autocomplete .algolia-docsearch-suggestion--content::before { + display: none +} + +.algolia-autocomplete .ds-suggestion:not(:first-child) .algolia-docsearch-suggestion--category-header { + padding-top: .75rem; + margin-top: .75rem; + border-top: 1px solid rgba(0, 0, 0, .1) +} + +.algolia-autocomplete .ds-suggestion .algolia-docsearch-suggestion--subcategory-column { + display: block; + padding: .1rem 1rem; + margin-bottom: 0.1; + font-size: 1.0em; + font-weight: 400 + /* display: none */ +} + +.algolia-autocomplete .algolia-docsearch-suggestion--title { + display: block; + padding: .25rem 1rem; + margin-bottom: 0; + font-size: 0.9em; + font-weight: 400 +} + +.algolia-autocomplete .algolia-docsearch-suggestion--text { + padding: 0 1rem .5rem; + margin-top: -.25rem; + font-size: 0.8em; + font-weight: 400; + line-height: 1.25 +} + +.algolia-autocomplete .algolia-docsearch-footer { + width: 110px; + height: 20px; + z-index: 3; + margin-top: 10.66667px; + float: right; + font-size: 0; + line-height: 0; +} + +.algolia-autocomplete .algolia-docsearch-footer--logo { + background-image: url("data:image/svg+xml;utf8,"); + background-repeat: no-repeat; + background-position: 50%; + background-size: 100%; + overflow: hidden; + text-indent: -9000px; + width: 100%; + height: 100%; + display: block; + transform: translate(-8px); +} + +.algolia-autocomplete .algolia-docsearch-suggestion--highlight { + color: #FF8C00; + background: rgba(232, 189, 54, 0.1) +} + + +.algolia-autocomplete .algolia-docsearch-suggestion--text .algolia-docsearch-suggestion--highlight { + box-shadow: inset 0 -2px 0 0 rgba(105, 105, 105, .5) +} + +.algolia-autocomplete .ds-suggestion.ds-cursor .algolia-docsearch-suggestion--content { + background-color: rgba(192, 192, 192, .15) +} diff --git a/docs/docsearch.js b/docs/docsearch.js new file mode 100644 index 000000000..b35504cd3 --- /dev/null +++ b/docs/docsearch.js @@ -0,0 +1,85 @@ +$(function() { + + // register a handler to move the focus to the search bar + // upon pressing shift + "/" (i.e. "?") + $(document).on('keydown', function(e) { + if (e.shiftKey && e.keyCode == 191) { + e.preventDefault(); + $("#search-input").focus(); + } + }); + + $(document).ready(function() { + // do keyword highlighting + /* modified from https://jsfiddle.net/julmot/bL6bb5oo/ */ + var mark = function() { + + var referrer = document.URL ; + var paramKey = "q" ; + + if (referrer.indexOf("?") !== -1) { + var qs = referrer.substr(referrer.indexOf('?') + 1); + var qs_noanchor = qs.split('#')[0]; + var qsa = qs_noanchor.split('&'); + var keyword = ""; + + for (var i = 0; i < qsa.length; i++) { + var currentParam = qsa[i].split('='); + + if (currentParam.length !== 2) { + continue; + } + + if (currentParam[0] == paramKey) { + keyword = decodeURIComponent(currentParam[1].replace(/\+/g, "%20")); + } + } + + if (keyword !== "") { + $(".contents").unmark({ + done: function() { + $(".contents").mark(keyword); + } + }); + } + } + }; + + mark(); + }); +}); + +/* Search term highlighting ------------------------------*/ + +function matchedWords(hit) { + var words = []; + + var hierarchy = hit._highlightResult.hierarchy; + // loop to fetch from lvl0, lvl1, etc. + for (var idx in hierarchy) { + words = words.concat(hierarchy[idx].matchedWords); + } + + var content = hit._highlightResult.content; + if (content) { + words = words.concat(content.matchedWords); + } + + // return unique words + var words_uniq = [...new Set(words)]; + return words_uniq; +} + +function updateHitURL(hit) { + + var words = matchedWords(hit); + var url = ""; + + if (hit.anchor) { + url = hit.url_without_anchor + '?q=' + escape(words.join(" ")) + '#' + hit.anchor; + } else { + url = hit.url + '?q=' + escape(words.join(" ")); + } + + return url; +} diff --git a/docs/index.html b/docs/index.html new file mode 100644 index 000000000..e9e2def0d --- /dev/null +++ b/docs/index.html @@ -0,0 +1,359 @@ + + + + + + + +Developing patient level prediction using data in the OMOP Common Data + Model • PatientLevelPrediction + + + + + + + + + + + + +
    +
    + + + + +
    +
    + +
    + +

    Build Status

    +

    codecov.io

    +

    PatientLevelPrediction is part of HADES.

    +
    +
    +

    Introduction +

    +

    PatientLevelPrediction is an R package for building and validating patient-level predictive models using data in the OMOP Common Data Model format.

    +

    Reps JM, Schuemie MJ, Suchard MA, Ryan PB, Rijnbeek PR. Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data. J Am Med Inform Assoc. 2018;25(8):969-975.

    +

    The figure below illustrates the prediction problem we address. Among a population at risk, we aim to predict which patients at a defined moment in time (t = 0) will experience some outcome during a time-at-risk. Prediction is done using only information about the patients in an observation window prior to that moment in time.

    +

    +

    To define a prediction problem we have to define t=0 by a Target Cohort (T), the outcome we like to predict by an outcome cohort (O), and the time-at-risk (TAR). Furthermore, we have to make design choices for the model we like to develop, and determine the observational datasets to perform internal and external validation. This conceptual framework works for all type of prediction problems, for example those presented below (T=green, O=red).

    +

    +
    +
    +

    Features +

    +
      +
    • Takes one or more target cohorts (Ts) and one or more outcome cohorts (Os) and develops and validates models for all T and O combinations.
    • +
    • Allows for multiple prediction design options.
    • +
    • Extracts the necessary data from a database in OMOP Common Data Model format for multiple covariate settings.
    • +
    • Uses a large set of covariates including for example all drugs, diagnoses, procedures, as well as age, comorbidity indexes, and custom covariates.
    • +
    • Allows you to add custom covariates or cohort covariates.
    • +
    • Includes a large number of state-of-the-art machine learning algorithms that can be used to develop predictive models, including Regularized logistic regression, Random forest, Gradient boosting machines, Decision tree, Naive Bayes, K-nearest neighbours, Neural network, AdaBoost and Support vector machines.
    • +
    • Allows you to add custom algorithms.
    • +
    • Allows you to add custom feature engineering
    • +
    • Allows you to add custom under/over sampling (or any other sampling) [note: based on existing research this is not recommended]
    • +
    • Contains functionality to externally validate models.
    • +
    • Includes functions to plot and explore model performance (ROC + Calibration).
    • +
    • Build ensemble models using EnsemblePatientLevelPrediction.
    • +
    • Build Deep Learning models using DeepPatientLevelPrediction.
    • +
    • Generates learning curves.
    • +
    • Includes a shiny app to interactively view and explore results.
    • +
    • In the shiny app you can create a html file document (report or protocol) containing all the study results.
    • +
    +
    +
    +

    Screenshots +

    + + + + + + + + + +
    +

    Calibration plot

    +
    +

    ROC plot

    +
    +Calibration Plot + +ROC Plot +
    +

    Demo of the Shiny Apps can be found here:

    + +
    +
    +

    Technology +

    +

    PatientLevelPrediction is an R package, with some functions using python through reticulate.

    +
    +
    +

    System Requirements +

    +

    Requires R (version 4.0 or higher). Installation on Windows requires RTools. Libraries used in PatientLevelPrediction require Java and Python.

    +

    The python installation is required for some of the machine learning algorithms. We advise to install Python 3.8 or higher using Anaconda (https://www.continuum.io/downloads).

    +
    +
    +

    Getting Started +

    +
      +
    • To install the package please read the Package Installation guide

    • +
    • Have a look at the video below for an extensive demo of the package.

    • +
    +

    Video Vignette PLP Package

    +

    Please read the main vignette for the package:

    + +

    In addition we have created vignettes that describe advanced functionality in more detail:

    + +

    Package manual: PatientLevelPrediction.pdf

    +
    +
    +

    User Documentation +

    +

    Documentation can be found on the package website.

    +

    PDF versions of the documentation are also available, as mentioned above.

    +
    +
    +

    Support +

    + +
    +
    +

    Contributing +

    +

    Read here how you can contribute to this package.

    +
    +
    +

    License +

    +

    PatientLevelPrediction is licensed under Apache License 2.0

    +
    +
    +

    Development +

    +

    PatientLevelPrediction is being developed in R Studio.

    +
    +
    +

    Acknowledgements +

    +
      +
    • The package is maintained by Jenna Reps and Peter Rijnbeek and has been developed with major contributions from Martijn Schuemie, Patrick Ryan, and Marc Suchard.
    • +
    • We like to thank the following persons for their contributions to the package: Seng Chan You, Ross Williams, Henrik John, Xiaoyong Pan, James Wiggins, Egill Fridgeirsson, Alex Rekkas
    • +
    • This project is supported in part through the National Science Foundation grant IIS 1251151.
    • +
    +
    + +
    + + +
    + + +
    + +
    +

    +

    Site built with pkgdown 2.0.7.

    +
    + +
    +
    + + + + + + + + diff --git a/docs/link.svg b/docs/link.svg new file mode 100644 index 000000000..88ad82769 --- /dev/null +++ b/docs/link.svg @@ -0,0 +1,12 @@ + + + + + + diff --git a/docs/news/index.html b/docs/news/index.html new file mode 100644 index 000000000..10eee02ad --- /dev/null +++ b/docs/news/index.html @@ -0,0 +1,606 @@ + +Changelog • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    + +
    • Hotfix adding schema to DatabaseConnector::getTableNames when creating results tables
    • +
    +
    + +
    • Add support for R4.4
    • +
    • Fix notes around documentation (vignette engine and brackets in itemize)
    • +
    • Use webp image format where possible (not in pdfs) for smaller size
    • +
    • Make sure random table names are unique in tests
    • +
    • Remove remote info for Eunomia since it’s in CRAN
    • +
    +
    + +
    • Clean up dependencies, tibble removed and IHT and ParallelLogger from CRAN
    • +
    • Use cohortIds for cohortCovariates to comply with FeatureExtraction
    • +
    • Add cdmDatabaseName from DatabaseDetails to model output
    • +
    • Fix bug when attributes weren’t preserved on trainData$covariateData after split
    • +
    • Fix warnings in tests and speed them up
    • +
    • Fix bug in assignment operator in configurePython
    • +
    • Delay evaluation of plpData when using do.call like in learningCurves and runMultiplePlp
    • +
    • Speed up population generation when subjectId’s are distinct
    • +
    • Fix bug when population was still generated when provided to runPlp
    • +
    +
    + +
    • fix bug with ohdsi shiny modules version check (issue 415)
    • +
    +
    + +
    • Fix sklearnToJson to be compatible with scikit-learn>=1.3
    • +
    • Fix github actions so it’s not hardcoded to use python 3.7
    • +
    +
    + +
    • added spline feature engineering
    • +
    • added age/sex stratified imputation feature engineering
    • +
    • changed result table execution date types to varchar
    • +
    • updated covariateSummary to use feature engineering
    • +
    +
    + +
    • fixed bug introduced with new reticulate update in model saving to json tests
    • +
    +
    + +
    • fixed bug with database insert if result is incomplete
    • +
    • updated/fixed documentation (Egill)
    • +
    • added model path to models (Henrik)
    • +
    • updated hyper-parameter saving to data.frame and made consistent
    • +
    +
    + +
    • fixed bug with multiple covariate settings in diagnose plp
    • +
    • added min cell count when exporting database results to csv files
    • +
    • light GBM added (thanks Jin Choi and Chungsoo Kim)
    • +
    • fixed minor bugs when uploading results to database
    • +
    +
    + +
    • added ensure_installed(“ResultModelManager”) to getDataMigrator()
    • +
    +
    + +
    • shiny app is now using ShinyAppBuilder with a config saved in the /inst folder
    • +
    +
    + +
    • fixed bugs introduced when sklearn inputs changed
    • +
    • added sklearn model being saved as jsons
    • +
    • made changes around the DatabaseConnection get table names function to make it work for the updated DatabaseConnection
    • +
    • removed check RAM stop (now it just warns)
    • +
    +
    + +
    • Updated test to skip test for FE setting if the model does not fit (this was causing occasional test fail)
    • +
    • replaced .data$ with “” for all dplyr::select to remove warnings
    • +
    +
    + +
    • Fix bug with python type being required to be int
    • +
    +
    + +
    • Allow priorType to be passed down to getCV function in case prior is not ‘laplace’
    • +
    • Seed specified in Cyclops model wasn’t passed to Cyclops
    • +
    +
    + +
    • fixed issue with shiny viewer converting connection details to large json
    • +
    +
    + +
    • added check for cdmDatabaseId into createDatabaseDetails
    • +
    • added test for check for cdmDatabaseId into createDatabaseDetails to error when NULL
    • +
    • removed session$onSessionEnded(shiny::stopApp) from shiny server
    • +
    +
    + +
    • fixing cox predictions
    • +
    +
    + +
    • forcing cdmDatabaseId to be a string if integer is input
    • +
    +
    + +
    • replaced utils::read.csv with readr::read_csv when inserting results from csv
    • +
    +
    + +
    • replaced gsub with sub when inserting csvs to database
    • +
    +
    + +
    • saved result specification csv in windows to fix odd formating issue
    • +
    +
    + +
    • fixed sample data bugs
    • +
    • updated to use v1.0.0 of OhdsiShinyModules
    • +
    • updated plp database result tables to use the same structure for cohort and database as other HADES packages
    • +
    • added function to insert csv results into plp database result tables
    • +
    • added input for databaseId (database and version) when extracting data to be consistent with other HADES packages. This is saved in plp objects.
    • +
    +
    + +
    • fixed issue with ‘preprocess’ vs ‘preprocessing’ inconsistently used across models
    • +
    • added metaData tracking for feature engineering or preprocessing when predicting
    • +
    • fixed issue with FE using trainData$covariateData metaData rather than trainData
    • +
    • fixed bug when using sameData for FE
    • +
    +
    + +
    • pulled in multiple bug fixes and test improvements from Egill
    • +
    • pulled in fix for learning curves from Henrik
    • +
    • Pulled in fix for feature engineering from Solomon
    • +
    • Cleaned check messages about comparing class(x) with a string by changing to inherits()
    • +
    +
    + +
    • removed json saving for sklearn models since sklearn-json is no longer working for the latest sklearn
    • +
    +
    + +
    • renamed the input corresponding to the string that gets appended to the results table names to tablePrefix
    • +
    • fixed issues with system.file() from SqlRender code breaking the tests
    • +
    • added an input fileAppend to the function that exports the database tables to csv files
    • +
    • moved the plp model (including preprocessing details) outside of the result database (into a specified folder) due to the size of the objects (too large to insert into the database).
    • +
    +
    + +
    • added saving of plp models into the result database
    • +
    • added default cohortDefinitions in runMultiplePlp
    • +
    +
    + +
    • added modelType to all models for database upload
    • +
    +
    + +
    • moved FeatureExtraction to depends
    • +
    • fixed using inherits()
    • +
    +
    + +
    • moved most of the shiny app code into OhdsiShinyModules
    • +
    • removed shiny dependencies and added OhdsiShinyModules to suggests
    • +
    • fixed bug with linux sklearn saving
    • +
    +
    + +
    • replaced cohortId to targetId for consistency throughout code
    • +
    +
    + +
    • replaced targetId in model design to cohortId for consistency throughout code
    • +
    • replaced plpDataSettings to restrictPlpDataSettings to improve naming consistency
    • +
    • added ability to use initial population in runPlp by adding the population to plpData$population
    • +
    • added splitSettings into modelDesign
    • +
    • replaced saving json settings with ParallelLogger function
    • +
    • updated database result schema (removed researcher_id from tables - if desired a new table with the setting_ids and researcher_id could be added, removed study tables and revised results table to performances table with a reference to model_design_id and development_database_id to enable validation results without a model to be inserted)
    • +
    • added diagnostic code based on PROBAST
    • +
    • added diagnostic shiny module
    • +
    • added code to create sqlite database and populate in uploadToDatabase
    • +
    • add code to convert runPlp+val to sqlite database when viewing shiny
    • +
    • added code to extract database results into csv files: extractDatabaseToCsv()
    • +
    +
    + +
    • pulled in GBM update (default hyper-parameters and variable importance fix) work done by Egill (egillax)
    • +
    +
    + +
    • updated installation documents
    • +
    • added tryCatch around plots to prevent code stopping
    • +
    +
    + +
    • updated result schema (added model_design table with settings and added attrition table)
    • +
    • updated shiny app for new database result schema
    • +
    • removed C++ code for AUC and Rcpp dependency, now using pROC instead as faster
    • +
    • made covariate summary optional when externally validating
    • +
    +
    + +
    • updated json structure for specifying study design (made it friendlier to read)
    • +
    • includes smooth calibration plot fix - work done by Alex (rekkasa)
    • +
    • fixed bug with multiple sample methods or feature engineering settings causing invalid error
    • +
    +
    + +
    • plpModel now saved as json files when possible
    • +
    • Updated runPlp to make more modular
    • +
    • now possible to customise data splitting, feature engineering, sampling (over/under) and learning algorithm
    • +
    • added function for extracting cohort covariates
    • +
    • updated evalaution to evaluate per strata (evaluation column)
    • +
    • updated plpModel structure
    • +
    • updated runPlp structure
    • +
    • updated shiny and package to use tidyr and not reshape2
    • +
    • sklearn learning algorithms share the same fit function
    • +
    • r learning algorithms share the same fit function
    • +
    • interface to cyclops code revised
    • +
    • ensemble learning removed (will be in separate package)
    • +
    • deep learning removed (will be in DeepPatientLevelPrediction package)
    • +
    +
    + +
    • revised toSparseM() to do conversion in one go but check RAM availablility beforehand.
    • +
    • removed temporal plpData conversion in toSparseM (this will be done in DeepPatientLevelPrediction)
    • +
    +
    + +
    • shiny can now read csv results
    • +
    • objects loaded via loadPlpFromCsv() can be saved using savePlpResult()
    • +
    +
    + +
    • added database result storage
    • +
    • added interface to database results in shiny
    • +
    • merged in shinyRepo that changed the shiny app to make it modular and added new features
    • +
    • removed deep learning as this is being added into new OHDSI package DeepPatientLevelPrediction
    • +
    +
    + +
    • save xgboost model as json file for transparency
    • +
    • set connectionDetails to NULL in getPlpData
    • +
    +
    + +
    • updated andromeda functions - restrict to pop and tidy covs for speed
    • +
    • quick fix for GBM survival predicting negative values
    • +
    • fixed occasional demoSum error for survival models
    • +
    • updated index creation to use Andromeda function
    • +
    +
    + +
    • fixed bug when normalize data is false
    • +
    • fixed bugs when single feature (gbm + python)
    • +
    • updated GBM
    • +
    +
    + +
    • updated calibration slope
    • +
    • fixed missing age/gender in prediction
    • +
    • fixed shiny intercept bug
    • +
    • fixed diagnostic
    • +
    • fixed missing covariateSettings in load cvs plp
    • +
    +
    + +
    • Removed plpData from evaluation
    • +
    • Added recalibration into externalVal
    • +
    • Updated shiny app for recalibration
    • +
    • Added population creation setting to use cohortEndDate as timeAtRisk end
    • +
    • fixed tests
    • +
    +
    + +
    • Reduced imports by adding code to install some dependencies when used
    • +
    +
    + +
    • fixed csv result saving bug when no model param
    • +
    +
    + +
    • fixed r check vignette issues
    • +
    • added conda install to test
    • +
    +
    + +
    • finalised permutation feature importance
    • +
    +
    + +
    • fixed deepNN index issue (reported on github - thanks dapritchard)
    • +
    • add compression to python pickles
    • +
    • removed requirement to have outcomeCount for prediction with python models
    • +
    +
    + +
    • cleaned all checks
    • +
    • fixed bug in python toSparseMatrix
    • +
    • fixed warning in studyPop
    • +
    +
    + +
    • fixed bug (identified by Chungsoo) in covariateSummary
    • +
    • fixed bug with thresholdSummary
    • +
    • edited threshold summary function to make it cleaner
    • +
    • added to ensemble where you can combine multiple models into an ensemble
    • +
    • cleaned up the notes and tests
    • +
    • updated simulated data covariateId in tests to use integer64
    • +
    • fixed description imports (and sorted them)
    • +
    +
    + +
    • fixed Cox model calibration plots
    • +
    • fixed int64 conversion bug
    • +
    +
    + +
    • added baseline risk to Cox model
    • +
    +
    + +
    • updated shiny: added attrition and hyper-parameter grid search into settings
    • +
    +
    + +
    • updated shiny app added 95% CI to AUC in summary, size is now complete data size and there is a column valPercent that tells what percentage of the data were used for validation
    • +
    +
    + +
    • updated GBMsurvival to use survival metrics and c-stat
    • +
    +
    + +
    • added survival metrics
    • +
    +
    + +
    • added updates and fixes into master from development branch
    • +
    +
    + +
    • fixed bug with pdw data extraction due to multiple person_id columns
    • +
    • fixed bug in shiny app converting covariate values due to tibble
    • +
    +
    + +
    • added calibration updates: cal-in-large, weak cal
    • +
    • updated smooth cal plot (sample for speed in big data)
    • +
    • defaulted to 100 values in calibrationSummary + updated cal plot
    • +
    +
    + +
    • fixed backwards compat with normalization
    • +
    • fixed python joblib dependancy
    • +
    +
    + +
    • fixed bug in preprocessing
    • +
    • added cross validation aucs to LR, GBM, RF and MLP
    • +
    • added more settings into MLP
    • +
    • added threads option in LR
    • +
    +
    + +
    • fixed minor bug with shiny dependency
    • +
    • fixed some tests
    • +
    • added standardizedMeanDiff to covariatesummary
    • +
    • updated createStudyPopulation to make it cleaner to read and count outcome per TAR
    • +
    +
    + +
    • Andromeda replaced ff data objects
    • +
    • added age/gender into cohort
    • +
    • fixed python warnings
    • +
    • updated shiny plp viewer
    • +
    +
    + +
    • Fixed bug when running multiple analyses using a data extraction sample with multiple covariate settings
    • +
    +
    + +
    • improved shiny PLP viewer
    • +
    • added diagnostic shiny viewer
    • +
    +
    + +
    • updated external validate code to enable custom covariates using ATLAS cohorts
    • +
    • fixed issues with startAnchor and endAnchor
    • +
    +
    + +
    • Deprecating addExposureDaysToStart and addExposureDaysToEnd arguments in createStudyPopulation, adding new arguments called startAnchor and endAnchor. The hope is this is less confusing.
    • +
    • fixed transfer learning code (can now transfer or fine-tune model)
    • +
    • made view plp shiny apps work when some results are missing
    • +
    +
    + +
    • set up testing
    • +
    • fixed build warnings
    • +
    +
    + +
    • added tests to get >70% coverage (keras tests too slow for travis)
    • +
    • Fixed minor bugs
    • +
    • Fixed deep learning code and removed pythonInR dependancy
    • +
    • combined shiny into one file with one interface
    • +
    +
    + +
    • added recalibration using 25% sample in existing models
    • +
    • added option to provide score to probabilities for existing models
    • +
    • fixed warnings with some plots
    • +
    +
    + +

    Small bug fixes: - added analysisId into model saving/loading - made external validation saving recursive - added removal of patients with negative TAR when creating population - added option to apply model without preprocessing settings (make them NULL) - updated create study population to remove patients with negative time-at-risk

    +
    +
    + +

    Changes: - merged in bug fix from Martijn - fixed AUC bug causing crash with big data - update SQL code to be compatible with v6.0 OMOP CDM - added save option to external validate PLP

    +
    +
    + +

    Changes: - Updated splitting functions to include a splitby subject and renamed personSplitter to randomSplitter - Cast indices to integer in python functions to fix bug with non integer sparse matrix indices

    +
    +
    + +

    Changes: - Added GLM status to log (will now inform about any fitting issue in log) - Added GBM survival model (still under development) - Added RF quantile regression (still under development) - Updated viewMultiplePlp() to match PLP skeleton package app - Updated single plp vignette with additional example - Merge in deep learning updates from Chan

    +
    +
    + +

    Changes: - Updated website

    +
    +
    + +

    Changes: - Added more tests - test files now match R files

    +
    +
    + +

    Changes: - Fixed ensemble stacker

    +
    +
    + +

    Changes: - Using reticulate for python interface - Speed improvements - Bug fixes

    +
    +
    + + + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/pkgdown.css b/docs/pkgdown.css new file mode 100644 index 000000000..80ea5b838 --- /dev/null +++ b/docs/pkgdown.css @@ -0,0 +1,384 @@ +/* Sticky footer */ + +/** + * Basic idea: https://philipwalton.github.io/solved-by-flexbox/demos/sticky-footer/ + * Details: https://github.com/philipwalton/solved-by-flexbox/blob/master/assets/css/components/site.css + * + * .Site -> body > .container + * .Site-content -> body > .container .row + * .footer -> footer + * + * Key idea seems to be to ensure that .container and __all its parents__ + * have height set to 100% + * + */ + +html, body { + height: 100%; +} + +body { + position: relative; +} + +body > .container { + display: flex; + height: 100%; + flex-direction: column; +} + +body > .container .row { + flex: 1 0 auto; +} + +footer { + margin-top: 45px; + padding: 35px 0 36px; + border-top: 1px solid #e5e5e5; + color: #666; + display: flex; + flex-shrink: 0; +} +footer p { + margin-bottom: 0; +} +footer div { + flex: 1; +} +footer .pkgdown { + text-align: right; +} +footer p { + margin-bottom: 0; +} + +img.icon { + float: right; +} + +/* Ensure in-page images don't run outside their container */ +.contents img { + max-width: 100%; + height: auto; +} + +/* Fix bug in bootstrap (only seen in firefox) */ +summary { + display: list-item; +} + +/* Typographic tweaking ---------------------------------*/ + +.contents .page-header { + margin-top: calc(-60px + 1em); +} + +dd { + margin-left: 3em; +} + +/* Section anchors ---------------------------------*/ + +a.anchor { + display: none; + margin-left: 5px; + width: 20px; + height: 20px; + + background-image: url(./link.svg); + background-repeat: no-repeat; + background-size: 20px 20px; + background-position: center center; +} + +h1:hover .anchor, +h2:hover .anchor, +h3:hover .anchor, +h4:hover .anchor, +h5:hover .anchor, +h6:hover .anchor { + display: inline-block; +} + +/* Fixes for fixed navbar --------------------------*/ + +.contents h1, .contents h2, .contents h3, .contents h4 { + padding-top: 60px; + margin-top: -40px; +} + +/* Navbar submenu --------------------------*/ + +.dropdown-submenu { + position: relative; +} + +.dropdown-submenu>.dropdown-menu { + top: 0; + left: 100%; + margin-top: -6px; + margin-left: -1px; + border-radius: 0 6px 6px 6px; +} + +.dropdown-submenu:hover>.dropdown-menu { + display: block; +} + +.dropdown-submenu>a:after { + display: block; + content: " "; + float: right; + width: 0; + height: 0; + border-color: transparent; + border-style: solid; + border-width: 5px 0 5px 5px; + border-left-color: #cccccc; + margin-top: 5px; + margin-right: -10px; +} + +.dropdown-submenu:hover>a:after { + border-left-color: #ffffff; +} + +.dropdown-submenu.pull-left { + float: none; +} + +.dropdown-submenu.pull-left>.dropdown-menu { + left: -100%; + margin-left: 10px; + border-radius: 6px 0 6px 6px; +} + +/* Sidebar --------------------------*/ + +#pkgdown-sidebar { + margin-top: 30px; + position: -webkit-sticky; + position: sticky; + top: 70px; +} + +#pkgdown-sidebar h2 { + font-size: 1.5em; + margin-top: 1em; +} + +#pkgdown-sidebar h2:first-child { + margin-top: 0; +} + +#pkgdown-sidebar .list-unstyled li { + margin-bottom: 0.5em; +} + +/* bootstrap-toc tweaks ------------------------------------------------------*/ + +/* All levels of nav */ + +nav[data-toggle='toc'] .nav > li > a { + padding: 4px 20px 4px 6px; + font-size: 1.5rem; + font-weight: 400; + color: inherit; +} + +nav[data-toggle='toc'] .nav > li > a:hover, +nav[data-toggle='toc'] .nav > li > a:focus { + padding-left: 5px; + color: inherit; + border-left: 1px solid #878787; +} + +nav[data-toggle='toc'] .nav > .active > a, +nav[data-toggle='toc'] .nav > .active:hover > a, +nav[data-toggle='toc'] .nav > .active:focus > a { + padding-left: 5px; + font-size: 1.5rem; + font-weight: 400; + color: inherit; + border-left: 2px solid #878787; +} + +/* Nav: second level (shown on .active) */ + +nav[data-toggle='toc'] .nav .nav { + display: none; /* Hide by default, but at >768px, show it */ + padding-bottom: 10px; +} + +nav[data-toggle='toc'] .nav .nav > li > a { + padding-left: 16px; + font-size: 1.35rem; +} + +nav[data-toggle='toc'] .nav .nav > li > a:hover, +nav[data-toggle='toc'] .nav .nav > li > a:focus { + padding-left: 15px; +} + +nav[data-toggle='toc'] .nav .nav > .active > a, +nav[data-toggle='toc'] .nav .nav > .active:hover > a, +nav[data-toggle='toc'] .nav .nav > .active:focus > a { + padding-left: 15px; + font-weight: 500; + font-size: 1.35rem; +} + +/* orcid ------------------------------------------------------------------- */ + +.orcid { + font-size: 16px; + color: #A6CE39; + /* margins are required by official ORCID trademark and display guidelines */ + margin-left:4px; + margin-right:4px; + vertical-align: middle; +} + +/* Reference index & topics ----------------------------------------------- */ + +.ref-index th {font-weight: normal;} + +.ref-index td {vertical-align: top; min-width: 100px} +.ref-index .icon {width: 40px;} +.ref-index .alias {width: 40%;} +.ref-index-icons .alias {width: calc(40% - 40px);} +.ref-index .title {width: 60%;} + +.ref-arguments th {text-align: right; padding-right: 10px;} +.ref-arguments th, .ref-arguments td {vertical-align: top; min-width: 100px} +.ref-arguments .name {width: 20%;} +.ref-arguments .desc {width: 80%;} + +/* Nice scrolling for wide elements --------------------------------------- */ + +table { + display: block; + overflow: auto; +} + +/* Syntax highlighting ---------------------------------------------------- */ + +pre, code, pre code { + background-color: #f8f8f8; + color: #333; +} +pre, pre code { + white-space: pre-wrap; + word-break: break-all; + overflow-wrap: break-word; +} + +pre { + border: 1px solid #eee; +} + +pre .img, pre .r-plt { + margin: 5px 0; +} + +pre .img img, pre .r-plt img { + background-color: #fff; +} + +code a, pre a { + color: #375f84; +} + +a.sourceLine:hover { + text-decoration: none; +} + +.fl {color: #1514b5;} +.fu {color: #000000;} /* function */ +.ch,.st {color: #036a07;} /* string */ +.kw {color: #264D66;} /* keyword */ +.co {color: #888888;} /* comment */ + +.error {font-weight: bolder;} +.warning {font-weight: bolder;} + +/* Clipboard --------------------------*/ + +.hasCopyButton { + position: relative; +} + +.btn-copy-ex { + position: absolute; + right: 0; + top: 0; + visibility: hidden; +} + +.hasCopyButton:hover button.btn-copy-ex { + visibility: visible; +} + +/* headroom.js ------------------------ */ + +.headroom { + will-change: transform; + transition: transform 200ms linear; +} +.headroom--pinned { + transform: translateY(0%); +} +.headroom--unpinned { + transform: translateY(-100%); +} + +/* mark.js ----------------------------*/ + +mark { + background-color: rgba(255, 255, 51, 0.5); + border-bottom: 2px solid rgba(255, 153, 51, 0.3); + padding: 1px; +} + +/* vertical spacing after htmlwidgets */ +.html-widget { + margin-bottom: 10px; +} + +/* fontawesome ------------------------ */ + +.fab { + font-family: "Font Awesome 5 Brands" !important; +} + +/* don't display links in code chunks when printing */ +/* source: https://stackoverflow.com/a/10781533 */ +@media print { + code a:link:after, code a:visited:after { + content: ""; + } +} + +/* Section anchors --------------------------------- + Added in pandoc 2.11: https://github.com/jgm/pandoc-templates/commit/9904bf71 +*/ + +div.csl-bib-body { } +div.csl-entry { + clear: both; +} +.hanging-indent div.csl-entry { + margin-left:2em; + text-indent:-2em; +} +div.csl-left-margin { + min-width:2em; + float:left; +} +div.csl-right-inline { + margin-left:2em; + padding-left:1em; +} +div.csl-indent { + margin-left: 2em; +} diff --git a/docs/pkgdown.js b/docs/pkgdown.js new file mode 100644 index 000000000..6f0eee40b --- /dev/null +++ b/docs/pkgdown.js @@ -0,0 +1,108 @@ +/* http://gregfranko.com/blog/jquery-best-practices/ */ +(function($) { + $(function() { + + $('.navbar-fixed-top').headroom(); + + $('body').css('padding-top', $('.navbar').height() + 10); + $(window).resize(function(){ + $('body').css('padding-top', $('.navbar').height() + 10); + }); + + $('[data-toggle="tooltip"]').tooltip(); + + var cur_path = paths(location.pathname); + var links = $("#navbar ul li a"); + var max_length = -1; + var pos = -1; + for (var i = 0; i < links.length; i++) { + if (links[i].getAttribute("href") === "#") + continue; + // Ignore external links + if (links[i].host !== location.host) + continue; + + var nav_path = paths(links[i].pathname); + + var length = prefix_length(nav_path, cur_path); + if (length > max_length) { + max_length = length; + pos = i; + } + } + + // Add class to parent
  • , and enclosing
  • if in dropdown + if (pos >= 0) { + var menu_anchor = $(links[pos]); + menu_anchor.parent().addClass("active"); + menu_anchor.closest("li.dropdown").addClass("active"); + } + }); + + function paths(pathname) { + var pieces = pathname.split("/"); + pieces.shift(); // always starts with / + + var end = pieces[pieces.length - 1]; + if (end === "index.html" || end === "") + pieces.pop(); + return(pieces); + } + + // Returns -1 if not found + function prefix_length(needle, haystack) { + if (needle.length > haystack.length) + return(-1); + + // Special case for length-0 haystack, since for loop won't run + if (haystack.length === 0) { + return(needle.length === 0 ? 0 : -1); + } + + for (var i = 0; i < haystack.length; i++) { + if (needle[i] != haystack[i]) + return(i); + } + + return(haystack.length); + } + + /* Clipboard --------------------------*/ + + function changeTooltipMessage(element, msg) { + var tooltipOriginalTitle=element.getAttribute('data-original-title'); + element.setAttribute('data-original-title', msg); + $(element).tooltip('show'); + element.setAttribute('data-original-title', tooltipOriginalTitle); + } + + if(ClipboardJS.isSupported()) { + $(document).ready(function() { + var copyButton = ""; + + $("div.sourceCode").addClass("hasCopyButton"); + + // Insert copy buttons: + $(copyButton).prependTo(".hasCopyButton"); + + // Initialize tooltips: + $('.btn-copy-ex').tooltip({container: 'body'}); + + // Initialize clipboard: + var clipboardBtnCopies = new ClipboardJS('[data-clipboard-copy]', { + text: function(trigger) { + return trigger.parentNode.textContent.replace(/\n#>[^\n]*/g, ""); + } + }); + + clipboardBtnCopies.on('success', function(e) { + changeTooltipMessage(e.trigger, 'Copied!'); + e.clearSelection(); + }); + + clipboardBtnCopies.on('error', function() { + changeTooltipMessage(e.trigger,'Press Ctrl+C or Command+C to copy'); + }); + }); + } +})(window.jQuery || window.$) diff --git a/docs/pkgdown.yml b/docs/pkgdown.yml new file mode 100644 index 000000000..24865cad6 --- /dev/null +++ b/docs/pkgdown.yml @@ -0,0 +1,20 @@ +pandoc: 3.1.11 +pkgdown: 2.0.7 +pkgdown_sha: ~ +articles: + AddingCustomFeatureEngineering: AddingCustomFeatureEngineering.html + AddingCustomModels: AddingCustomModels.html + AddingCustomSamples: AddingCustomSamples.html + AddingCustomSplitting: AddingCustomSplitting.html + BenchmarkTasks: BenchmarkTasks.html + BestPractices: BestPractices.html + BuildingMultiplePredictiveModels: BuildingMultiplePredictiveModels.html + BuildingPredictiveModels: BuildingPredictiveModels.html + ClinicalModels: ClinicalModels.html + ConstrainedPredictors: ConstrainedPredictors.html + CreatingLearningCurves: CreatingLearningCurves.html + CreatingNetworkStudies: CreatingNetworkStudies.html + InstallationGuide: InstallationGuide.html + Videos: Videos.html +last_built: 2024-09-09T14:25Z + diff --git a/docs/reference/MapIds.html b/docs/reference/MapIds.html new file mode 100644 index 000000000..5cb0b7906 --- /dev/null +++ b/docs/reference/MapIds.html @@ -0,0 +1,178 @@ + +Map covariate and row Ids so they start from 1 — MapIds • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    this functions takes covariate data and a cohort/population and remaps +the covariate and row ids, restricts to pop and saves/creates mapping

    +
    + +
    +
    MapIds(covariateData, cohort = NULL, mapping = NULL)
    +
    + +
    +

    Arguments

    +
    covariateData
    +

    a covariateData object

    + + +
    cohort
    +

    if specified rowIds restricted to the ones in cohort

    + + +
    mapping
    +

    A pre defined mapping to use

    + +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/PatientLevelPrediction.html b/docs/reference/PatientLevelPrediction.html new file mode 100644 index 000000000..77e24885e --- /dev/null +++ b/docs/reference/PatientLevelPrediction.html @@ -0,0 +1,175 @@ + +PatientLevelPrediction — PatientLevelPrediction • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    A package for running predictions using data in the OMOP CDM

    +
    + + + +
    +

    Author

    +

    Maintainer: Jenna Reps jreps@its.jnj.com

    +

    Authors:

    • Martijn Schuemie

    • +
    • Marc Suchard

    • +
    • Patrick Ryan

    • +
    • Peter Rijnbeek

    • +
    • Egill Fridgeirsson

    • +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/Rplot001.png b/docs/reference/Rplot001.png new file mode 100644 index 000000000..17a358060 Binary files /dev/null and b/docs/reference/Rplot001.png differ diff --git a/docs/reference/accuracy.html b/docs/reference/accuracy.html new file mode 100644 index 000000000..111a58e86 --- /dev/null +++ b/docs/reference/accuracy.html @@ -0,0 +1,190 @@ + +Calculate the accuracy — accuracy • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Calculate the accuracy

    +
    + +
    +
    accuracy(TP, TN, FN, FP)
    +
    + +
    +

    Arguments

    +
    TP
    +

    Number of true positives

    + + +
    TN
    +

    Number of true negatives

    + + +
    FN
    +

    Number of false negatives

    + + +
    FP
    +

    Number of false positives

    + +
    +
    +

    Value

    + + +

    accuracy value

    +
    +
    +

    Details

    +

    Calculate the accuracy

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/addDiagnosePlpToDatabase.html b/docs/reference/addDiagnosePlpToDatabase.html new file mode 100644 index 000000000..31b3a41d3 --- /dev/null +++ b/docs/reference/addDiagnosePlpToDatabase.html @@ -0,0 +1,207 @@ + +Insert a diagnostic result into a PLP result schema database — addDiagnosePlpToDatabase • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    This function inserts a diagnostic result into the result schema

    +
    + +
    +
    addDiagnosePlpToDatabase(
    +  diagnosePlp,
    +  connectionDetails,
    +  databaseSchemaSettings,
    +  cohortDefinitions,
    +  databaseList = NULL,
    +  overWriteIfExists = T
    +)
    +
    + +
    +

    Arguments

    +
    diagnosePlp
    +

    An object of class diagnosePlp

    + + +
    connectionDetails
    +

    A connection details created by using the +function createConnectionDetails in the +DatabaseConnector package.

    + + +
    databaseSchemaSettings
    +

    A object created by createDatabaseSchemaSettings with all the settings specifying the result tables

    + + +
    cohortDefinitions
    +

    A set of one or more cohorts extracted using ROhdsiWebApi::exportCohortDefinitionSet()

    + + +
    databaseList
    +

    (Optional) If you wish to overwrite the settings in the plp object use createdatabaseList to specify the databases

    + + +
    overWriteIfExists
    +

    (default: T) Whether to delete existing results and overwrite them

    + +
    +
    +

    Value

    + + +

    Returns NULL but uploads the diagnostic into the database schema specified in databaseSchemaSettings

    +
    +
    +

    Details

    +

    This function can be used to upload a diagnostic result into a database

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/addMultipleDiagnosePlpToDatabase.html b/docs/reference/addMultipleDiagnosePlpToDatabase.html new file mode 100644 index 000000000..56ee9c4c8 --- /dev/null +++ b/docs/reference/addMultipleDiagnosePlpToDatabase.html @@ -0,0 +1,202 @@ + +Insert mutliple diagnosePlp results saved to a directory into a PLP result schema database — addMultipleDiagnosePlpToDatabase • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    This function inserts diagnosePlp results into the result schema

    +
    + +
    +
    addMultipleDiagnosePlpToDatabase(
    +  connectionDetails,
    +  databaseSchemaSettings,
    +  cohortDefinitions,
    +  databaseList = NULL,
    +  resultLocation
    +)
    +
    + +
    +

    Arguments

    +
    connectionDetails
    +

    A connection details created by using the +function createConnectionDetails in the +DatabaseConnector package.

    + + +
    databaseSchemaSettings
    +

    A object created by createDatabaseSchemaSettings with all the settings specifying the result tables

    + + +
    cohortDefinitions
    +

    (list) A list of cohortDefinitions (each list must contain: name, id)

    + + +
    databaseList
    +

    (Optional) ...

    + + +
    resultLocation
    +

    The location of the diagnostic results

    + +
    +
    +

    Value

    + + +

    Returns NULL but uploads multiple diagnosePlp results into the database schema specified in databaseSchemaSettings

    +
    +
    +

    Details

    +

    This function can be used to upload diagnosePlp results into a database

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/addMultipleRunPlpToDatabase.html b/docs/reference/addMultipleRunPlpToDatabase.html new file mode 100644 index 000000000..47b53a4c0 --- /dev/null +++ b/docs/reference/addMultipleRunPlpToDatabase.html @@ -0,0 +1,212 @@ + +Populate the PatientLevelPrediction results tables — addMultipleRunPlpToDatabase • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    This function formats and uploads results that have been generated via an ATLAS prediction package into a database

    +
    + +
    +
    addMultipleRunPlpToDatabase(
    +  connectionDetails,
    +  databaseSchemaSettings = createDatabaseSchemaSettings(resultSchema = "main"),
    +  cohortDefinitions,
    +  databaseList = NULL,
    +  resultLocation = NULL,
    +  resultLocationVector,
    +  modelSaveLocation
    +)
    +
    + +
    +

    Arguments

    +
    connectionDetails
    +

    A connection details created by using the +function createConnectionDetails in the +DatabaseConnector package.

    + + +
    databaseSchemaSettings
    +

    A object created by createDatabaseSchemaSettings with all the settings specifying the result tables

    + + +
    cohortDefinitions
    +

    A set of one or more cohorts extracted using ROhdsiWebApi::exportCohortDefinitionSet()

    + + +
    databaseList
    +

    (Optional) A list created by createDatabaseList to specify the databases

    + + +
    resultLocation
    +

    (string) location of directory where the main package results were saved

    + + +
    resultLocationVector
    +

    (only used when resultLocation is missing) a vector of locations with development or validation results

    + + +
    modelSaveLocation
    +

    The location of the file system for saving the models in a subdirectory

    + +
    +
    +

    Value

    + + +

    Returns NULL but uploads all the results in resultLocation to the PatientLevelPrediction result tables in resultSchema

    +
    +
    +

    Details

    +

    This function can be used upload PatientLevelPrediction results into a database

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/addRunPlpToDatabase.html b/docs/reference/addRunPlpToDatabase.html new file mode 100644 index 000000000..9752d3ec5 --- /dev/null +++ b/docs/reference/addRunPlpToDatabase.html @@ -0,0 +1,207 @@ + +Function to add the run plp (development or validation) to database — addRunPlpToDatabase • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    This function adds a runPlp or external validation result into a database

    +
    + +
    +
    addRunPlpToDatabase(
    +  runPlp,
    +  connectionDetails,
    +  databaseSchemaSettings,
    +  cohortDefinitions,
    +  modelSaveLocation,
    +  databaseList = NULL
    +)
    +
    + +
    +

    Arguments

    +
    runPlp
    +

    An object of class runPlp or class externalValidatePlp

    + + +
    connectionDetails
    +

    A connection details created by using the +function createConnectionDetails in the +DatabaseConnector package.

    + + +
    databaseSchemaSettings
    +

    A object created by createDatabaseSchemaSettings with all the settings specifying the result tables

    + + +
    cohortDefinitions
    +

    A set of one or more cohorts extracted using ROhdsiWebApi::exportCohortDefinitionSet()

    + + +
    modelSaveLocation
    +

    The location of the directory that models will be saved to

    + + +
    databaseList
    +

    (Optional) If you want to change the database name then used createDatabaseList to specify the database settings but use the same cdmDatabaseId was model development/validation

    + +
    +
    +

    Value

    + + +

    Returns a data.frame with the database details

    +
    +
    +

    Details

    +

    This function is used when inserting results into the PatientLevelPrediction database results schema

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/averagePrecision.html b/docs/reference/averagePrecision.html new file mode 100644 index 000000000..658174496 --- /dev/null +++ b/docs/reference/averagePrecision.html @@ -0,0 +1,178 @@ + +Calculate the average precision — averagePrecision • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Calculate the average precision

    +
    + +
    +
    averagePrecision(prediction)
    +
    + +
    +

    Arguments

    +
    prediction
    +

    A prediction object

    + +
    +
    +

    Value

    + + +

    The average precision

    +
    +
    +

    Details

    +

    Calculates the average precision from a predition object

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/brierScore.html b/docs/reference/brierScore.html new file mode 100644 index 000000000..678bf1efe --- /dev/null +++ b/docs/reference/brierScore.html @@ -0,0 +1,178 @@ + +brierScore — brierScore • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    brierScore

    +
    + +
    +
    brierScore(prediction)
    +
    + +
    +

    Arguments

    +
    prediction
    +

    A prediction object

    + +
    +
    +

    Value

    + + +

    A list containing the brier score and the scaled brier score

    +
    +
    +

    Details

    +

    Calculates the brierScore from prediction object

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/calibrationLine.html b/docs/reference/calibrationLine.html new file mode 100644 index 000000000..425cccd2b --- /dev/null +++ b/docs/reference/calibrationLine.html @@ -0,0 +1,176 @@ + +calibrationLine — calibrationLine • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    calibrationLine

    +
    + +
    +
    calibrationLine(prediction, numberOfStrata = 10)
    +
    + +
    +

    Arguments

    +
    prediction
    +

    A prediction object

    + + +
    numberOfStrata
    +

    The number of groups to split the prediction into

    + +
    +
    +

    Details

    +

    Calculates the calibration from prediction object

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/computeAuc.html b/docs/reference/computeAuc.html new file mode 100644 index 000000000..9c79d5ffb --- /dev/null +++ b/docs/reference/computeAuc.html @@ -0,0 +1,178 @@ + +Compute the area under the ROC curve — computeAuc • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Compute the area under the ROC curve

    +
    + +
    +
    computeAuc(prediction, confidenceInterval = FALSE)
    +
    + +
    +

    Arguments

    +
    prediction
    +

    A prediction object as generated using the +predict functions.

    + + +
    confidenceInterval
    +

    Should 95 percebt confidence intervals be computed?

    + +
    +
    +

    Details

    +

    Computes the area under the ROC curve for the predicted probabilities, given the true observed +outcomes.

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/computeGridPerformance.html b/docs/reference/computeGridPerformance.html new file mode 100644 index 000000000..5865fa6ea --- /dev/null +++ b/docs/reference/computeGridPerformance.html @@ -0,0 +1,183 @@ + +Computes grid performance with a specified performance function — computeGridPerformance • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Computes grid performance with a specified performance function

    +
    + +
    +
    computeGridPerformance(prediction, param, performanceFunct = "computeAuc")
    +
    + +
    +

    Arguments

    +
    prediction
    +

    a dataframe with predictions and outcomeCount per rowId

    + + +
    param
    +

    a list of hyperparameters

    + + +
    performanceFunct
    +

    a string specifying which performance function to use +. Default ``'compute_AUC'``

    + +
    +
    +

    Value

    + + +

    A list with overview of the performance

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/configurePython.html b/docs/reference/configurePython.html new file mode 100644 index 000000000..1abf63794 --- /dev/null +++ b/docs/reference/configurePython.html @@ -0,0 +1,181 @@ + +Sets up a virtual environment to use for PLP (can be conda or python) — configurePython • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Sets up a virtual environment to use for PLP (can be conda or python)

    +
    + +
    +
    configurePython(envname = "PLP", envtype = NULL, condaPythonVersion = "3.11")
    +
    + +
    +

    Arguments

    +
    envname
    +

    A string for the name of the virtual environment (default is 'PLP')

    + + +
    envtype
    +

    An option for specifying the environment as'conda' or 'python'. If NULL then the default is 'conda' for windows users and 'python' for non-windows users

    + + +
    condaPythonVersion
    +

    String, Python version to use when creating a conda environment

    + +
    +
    +

    Details

    +

    This function creates a virtual environment that can be used by PatientLevelPrediction +and installs all the required package dependancies. If using python, pip must be set up.

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/covariateSummary.html b/docs/reference/covariateSummary.html new file mode 100644 index 000000000..0ca6ee4c9 --- /dev/null +++ b/docs/reference/covariateSummary.html @@ -0,0 +1,216 @@ + +covariateSummary — covariateSummary • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Summarises the covariateData to calculate the mean and standard deviation per covaraite +if the labels are input it also stratifies this by class label and if the trainRowIds and testRowIds +specifying the patients in the train/test sets respectively are input, these values are also stratified +by train and test set

    +
    + +
    +
    covariateSummary(
    +  covariateData,
    +  cohort,
    +  labels = NULL,
    +  strata = NULL,
    +  variableImportance = NULL,
    +  featureEngineering = NULL
    +)
    +
    + +
    +

    Arguments

    +
    covariateData
    +

    The covariateData part of the plpData that is +extracted using getPlpData

    + + +
    cohort
    +

    The patient cohort to calculate the summary

    + + +
    labels
    +

    A data.frame with the columns rowId and outcomeCount

    + + +
    strata
    +

    A data.frame containing the columns rowId, strataName

    + + +
    variableImportance
    +

    A data.frame with the columns covariateId and +value (the variable importance value)

    + + +
    featureEngineering
    +

    (currently not used ) +A function or list of functions specifying any feature engineering +to create covariates before summarising

    + +
    +
    +

    Value

    + + +

    A data.frame containing: CovariateCount CovariateMean and CovariateStDev plus these values +for any specified stratification

    +
    +
    +

    Details

    +

    The function calculates various metrics to measure the performance of the model

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/createCohortCovariateSettings.html b/docs/reference/createCohortCovariateSettings.html new file mode 100644 index 000000000..1f4bc8bc9 --- /dev/null +++ b/docs/reference/createCohortCovariateSettings.html @@ -0,0 +1,233 @@ + +Extracts covariates based on cohorts — createCohortCovariateSettings • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Extracts covariates based on cohorts

    +
    + +
    +
    createCohortCovariateSettings(
    +  cohortName,
    +  settingId,
    +  cohortDatabaseSchema,
    +  cohortTable,
    +  cohortId,
    +  startDay = -30,
    +  endDay = 0,
    +  count = F,
    +  ageInteraction = F,
    +  lnAgeInteraction = F,
    +  analysisId = 456
    +)
    +
    + +
    +

    Arguments

    +
    cohortName
    +

    Name for the cohort

    + + +
    settingId
    +

    A unique id for the covariate time and

    + + +
    cohortDatabaseSchema
    +

    The schema of the database with the cohort

    + + +
    cohortTable
    +

    the table name that contains the covariate cohort

    + + +
    cohortId
    +

    cohort id for the covariate cohort

    + + +
    startDay
    +

    The number of days prior to index to start observing the cohort

    + + +
    endDay
    +

    The number of days prior to index to stop observing the cohort

    + + +
    count
    +

    If FALSE the covariate value is binary (1 means cohort occurred between index+startDay and index+endDay, 0 means it did not) +If TRUE then the covariate value is the number of unique cohort_start_dates between index+startDay and index+endDay

    + + +
    ageInteraction
    +

    If TRUE multiple covariate value by the patient's age in years

    + + +
    lnAgeInteraction
    +

    If TRUE multiple covariate value by the log of the patient's age in years

    + + +
    analysisId
    +

    The analysisId for the covariate

    + +
    +
    +

    Value

    + + +

    An object of class covariateSettings specifying how to create the cohort covariate with the covariateId + cohortId x 100000 + settingId x 1000 + analysisId

    +
    +
    +

    Details

    +

    The user specifies a cohort and time period and then a covariate is constructed whether they are in the +cohort during the time periods relative to target population cohort index

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/createDatabaseDetails.html b/docs/reference/createDatabaseDetails.html new file mode 100644 index 000000000..74b2f2d03 --- /dev/null +++ b/docs/reference/createDatabaseDetails.html @@ -0,0 +1,256 @@ + +Create a setting that holds the details about the cdmDatabase connection for data extraction — createDatabaseDetails • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Create a setting that holds the details about the cdmDatabase connection for data extraction

    +
    + +
    +
    createDatabaseDetails(
    +  connectionDetails,
    +  cdmDatabaseSchema,
    +  cdmDatabaseName,
    +  cdmDatabaseId,
    +  tempEmulationSchema = cdmDatabaseSchema,
    +  cohortDatabaseSchema = cdmDatabaseSchema,
    +  cohortTable = "cohort",
    +  outcomeDatabaseSchema = cdmDatabaseSchema,
    +  outcomeTable = "cohort",
    +  targetId = NULL,
    +  outcomeIds = NULL,
    +  cdmVersion = 5,
    +  cohortId = NULL
    +)
    +
    + +
    +

    Arguments

    +
    connectionDetails
    +

    An R object of type connectionDetails created using the +function createConnectionDetails in the +DatabaseConnector package.

    + + +
    cdmDatabaseSchema
    +

    The name of the database schema that contains the OMOP CDM +instance. Requires read permissions to this database. On SQL +Server, this should specifiy both the database and the schema, +so for example 'cdm_instance.dbo'.

    + + +
    cdmDatabaseName
    +

    A string with the name of the database - this is used in the shiny app and when externally validating models to name the result list and to specify the folder name when saving validation results (defaults to cdmDatabaseSchema if not specified)

    + + +
    cdmDatabaseId
    +

    A string with a unique identifier for the database and version - this is stored in the plp object for future reference and used by the shiny app (defaults to cdmDatabaseSchema if not specified)

    + + +
    tempEmulationSchema
    +

    For dmbs like Oracle only: the name of the database schema where you +want all temporary tables to be managed. Requires +create/insert permissions to this database.

    + + +
    cohortDatabaseSchema
    +

    The name of the database schema that is the location where the +target cohorts are available. Requires read +permissions to this database.

    + + +
    cohortTable
    +

    The tablename that contains the target cohorts. Expectation is cohortTable +has format of COHORT table: COHORT_DEFINITION_ID, SUBJECT_ID, +COHORT_START_DATE, COHORT_END_DATE.

    + + +
    outcomeDatabaseSchema
    +

    The name of the database schema that is the location where the +data used to define the outcome cohorts is available. Requires read permissions to +this database.

    + + +
    outcomeTable
    +

    The tablename that contains the outcome cohorts. Expectation is +outcomeTable has format of COHORT table: COHORT_DEFINITION_ID, +SUBJECT_ID, COHORT_START_DATE, COHORT_END_DATE.

    + + +
    targetId
    +

    An integer specifying the cohort id for the target cohort

    + + +
    outcomeIds
    +

    A single integer or vector of integers specifying the cohort ids for the outcome cohorts

    + + +
    cdmVersion
    +

    Define the OMOP CDM version used: currently support "4" and "5".

    + + +
    cohortId
    +

    (depreciated: use targetId) old input for the target cohort id

    + +
    +
    +

    Value

    + + +

    A list with the the database specific settings (this is used by the runMultiplePlp function and the skeleton packages)

    +
    +
    +

    Details

    +

    This function simply stores the settings for communicating with the cdmDatabase when extracting +the target cohort and outcomes

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/createDatabaseList.html b/docs/reference/createDatabaseList.html new file mode 100644 index 000000000..47e5f277b --- /dev/null +++ b/docs/reference/createDatabaseList.html @@ -0,0 +1,186 @@ + +Create a list with the database details and database meta data entries — createDatabaseList • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    This function creates a list with the database details and database meta data entries used in the study

    +
    + +
    +
    createDatabaseList(cdmDatabaseSchemas, cdmDatabaseNames, databaseRefIds = NULL)
    +
    + +
    +

    Arguments

    +
    cdmDatabaseSchemas
    +

    (string vector) A vector of the cdmDatabaseSchemas used in the study - if the schemas are not unique per database please also specify databaseRefId

    + + +
    cdmDatabaseNames
    +

    Sharable names for the databases

    + + +
    databaseRefIds
    +

    (string vector) Unique database identifiers - what you specified as cdmDatabaseId in PatientLevelPrediction::createDatabaseDetails() when developing the models

    + +
    +
    +

    Value

    + + +

    Returns a data.frame with the database details

    +
    +
    +

    Details

    +

    This function is used when inserting database details into the PatientLevelPrediction database results schema

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/createDatabaseSchemaSettings.html b/docs/reference/createDatabaseSchemaSettings.html new file mode 100644 index 000000000..d5486f118 --- /dev/null +++ b/docs/reference/createDatabaseSchemaSettings.html @@ -0,0 +1,215 @@ + +Create the PatientLevelPrediction database result schema settings — createDatabaseSchemaSettings • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    This function specifies where the results schema is and lets you pick a different schema for the cohorts and databases

    +
    + +
    +
    createDatabaseSchemaSettings(
    +  resultSchema = "main",
    +  tablePrefix = "",
    +  targetDialect = "sqlite",
    +  tempEmulationSchema = getOption("sqlRenderTempEmulationSchema"),
    +  cohortDefinitionSchema = resultSchema,
    +  tablePrefixCohortDefinitionTables = tablePrefix,
    +  databaseDefinitionSchema = resultSchema,
    +  tablePrefixDatabaseDefinitionTables = tablePrefix
    +)
    +
    + +
    +

    Arguments

    +
    resultSchema
    +

    (string) The name of the database schema with the result tables.

    + + +
    tablePrefix
    +

    (string) A string that appends to the PatientLevelPrediction result tables

    + + +
    targetDialect
    +

    (string) The database management system being used

    + + +
    tempEmulationSchema
    +

    (string) The temp schema used when the database management system is oracle

    + + +
    cohortDefinitionSchema
    +

    (string) The name of the database schema with the cohort definition tables (defaults to resultSchema).

    + + +
    tablePrefixCohortDefinitionTables
    +

    (string) A string that appends to the cohort definition tables

    + + +
    databaseDefinitionSchema
    +

    (string) The name of the database schema with the database definition tables (defaults to resultSchema).

    + + +
    tablePrefixDatabaseDefinitionTables
    +

    (string) A string that appends to the database definition tables

    + +
    +
    +

    Value

    + + +

    Returns a list of class 'plpDatabaseResultSchema' with all the database settings

    +
    +
    +

    Details

    +

    This function can be used to specify the database settings used to upload PatientLevelPrediction results into a database

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/createDefaultExecuteSettings.html b/docs/reference/createDefaultExecuteSettings.html new file mode 100644 index 000000000..e4e409d17 --- /dev/null +++ b/docs/reference/createDefaultExecuteSettings.html @@ -0,0 +1,172 @@ + +Creates default list of settings specifying what parts of runPlp to execute — createDefaultExecuteSettings • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Creates default list of settings specifying what parts of runPlp to execute

    +
    + +
    +
    createDefaultExecuteSettings()
    +
    + +
    +

    Value

    + + +

    list with TRUE for split, preprocess, model development and covariate summary

    +
    +
    +

    Details

    +

    runs split, preprocess, model development and covariate summary

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/createDefaultSplitSetting.html b/docs/reference/createDefaultSplitSetting.html new file mode 100644 index 000000000..f92e40aee --- /dev/null +++ b/docs/reference/createDefaultSplitSetting.html @@ -0,0 +1,211 @@ + +Create the settings for defining how the plpData are split into test/validation/train sets using +default splitting functions (either random stratified by outcome, time or subject splitting) — createDefaultSplitSetting • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Create the settings for defining how the plpData are split into test/validation/train sets using +default splitting functions (either random stratified by outcome, time or subject splitting)

    +
    + +
    +
    createDefaultSplitSetting(
    +  testFraction = 0.25,
    +  trainFraction = 0.75,
    +  splitSeed = sample(1e+05, 1),
    +  nfold = 3,
    +  type = "stratified"
    +)
    +
    + +
    +

    Arguments

    +
    testFraction
    +

    (numeric) A real number between 0 and 1 indicating the test set fraction of the data

    + + +
    trainFraction
    +

    (numeric) A real number between 0 and 1 indicating the train set fraction of the data. +If not set train is equal to 1 - test

    + + +
    splitSeed
    +

    (numeric) A seed to use when splitting the data for reproducibility (if not set a random number will be generated)

    + + +
    nfold
    +

    (numeric) An integer > 1 specifying the number of folds used in cross validation

    + + +
    type
    +

    (character) Choice of:

    • 'stratified' Each data point is randomly assigned into the test or a train fold set but this is done stratified such that the outcome rate is consistent in each partition

    • +
    • 'time' Older data are assigned into the training set and newer data are assigned into the test set

    • +
    • 'subject' Data are partitioned by subject, if a subject is in the data more than once, all the data points for the subject are assigned either into the test data or into the train data (not both).

    • +
    + +
    +
    +

    Value

    + + +

    An object of class splitSettings

    + + +
    +
    +

    Details

    +

    Returns an object of class splitSettings that specifies the splitting function that will be called and the settings

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/createExecuteSettings.html b/docs/reference/createExecuteSettings.html new file mode 100644 index 000000000..dfb91aeed --- /dev/null +++ b/docs/reference/createExecuteSettings.html @@ -0,0 +1,205 @@ + +Creates list of settings specifying what parts of runPlp to execute — createExecuteSettings • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Creates list of settings specifying what parts of runPlp to execute

    +
    + +
    +
    createExecuteSettings(
    +  runSplitData = F,
    +  runSampleData = F,
    +  runfeatureEngineering = F,
    +  runPreprocessData = F,
    +  runModelDevelopment = F,
    +  runCovariateSummary = F
    +)
    +
    + +
    +

    Arguments

    +
    runSplitData
    +

    TRUE or FALSE whether to split data into train/test

    + + +
    runSampleData
    +

    TRUE or FALSE whether to over or under sample

    + + +
    runfeatureEngineering
    +

    TRUE or FALSE whether to do feature engineering

    + + +
    runPreprocessData
    +

    TRUE or FALSE whether to do preprocessing

    + + +
    runModelDevelopment
    +

    TRUE or FALSE whether to develop the model

    + + +
    runCovariateSummary
    +

    TRUE or FALSE whether to create covariate summary

    + +
    +
    +

    Value

    + + +

    list with TRUE/FALSE for each part of runPlp

    +
    +
    +

    Details

    +

    define what parts of runPlp to execute

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/createFeatureEngineeringSettings.html b/docs/reference/createFeatureEngineeringSettings.html new file mode 100644 index 000000000..18defa02b --- /dev/null +++ b/docs/reference/createFeatureEngineeringSettings.html @@ -0,0 +1,181 @@ + +Create the settings for defining any feature engineering that will be done — createFeatureEngineeringSettings • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Create the settings for defining any feature engineering that will be done

    +
    + +
    +
    createFeatureEngineeringSettings(type = "none")
    +
    + +
    +

    Arguments

    +
    type
    +

    (character) Choice of:

    • 'none' No feature engineering - this is the default

    • +
    + +
    +
    +

    Value

    + + +

    An object of class featureEngineeringSettings

    + + +
    +
    +

    Details

    +

    Returns an object of class featureEngineeringSettings that specifies the sampling function that will be called and the settings

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/createLearningCurve.html b/docs/reference/createLearningCurve.html new file mode 100644 index 000000000..f7e442e2f --- /dev/null +++ b/docs/reference/createLearningCurve.html @@ -0,0 +1,293 @@ + +createLearningCurve — createLearningCurve • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Creates a learning curve object, which can be plotted using the + plotLearningCurve() function.

    +
    + +
    +
    createLearningCurve(
    +  plpData,
    +  outcomeId,
    +  parallel = T,
    +  cores = 4,
    +  modelSettings,
    +  saveDirectory = getwd(),
    +  analysisId = "learningCurve",
    +  populationSettings = createStudyPopulationSettings(),
    +  splitSettings = createDefaultSplitSetting(),
    +  trainFractions = c(0.25, 0.5, 0.75),
    +  trainEvents = NULL,
    +  sampleSettings = createSampleSettings(),
    +  featureEngineeringSettings = createFeatureEngineeringSettings(),
    +  preprocessSettings = createPreprocessSettings(minFraction = 0.001, normalize = T),
    +  logSettings = createLogSettings(),
    +  executeSettings = createExecuteSettings(runSplitData = T, runSampleData = F,
    +    runfeatureEngineering = F, runPreprocessData = T, runModelDevelopment = T,
    +    runCovariateSummary = F)
    +)
    +
    + +
    +

    Arguments

    +
    plpData
    +

    An object of type plpData - the patient level prediction +data extracted from the CDM.

    + + +
    outcomeId
    +

    (integer) The ID of the outcome.

    + + +
    parallel
    +

    Whether to run the code in parallel

    + + +
    cores
    +

    The number of computer cores to use if running in parallel

    + + +
    modelSettings
    +

    An object of class modelSettings created using one of the function:

    + + +
    saveDirectory
    +

    The path to the directory where the results will be saved (if NULL uses working directory)

    + + +
    analysisId
    +

    (integer) Identifier for the analysis. It is used to create, e.g., the result folder. Default is a timestamp.

    + + +
    populationSettings
    +

    An object of type populationSettings created using createStudyPopulationSettings that +specifies how the data class labels are defined and addition any exclusions to apply to the +plpData cohort

    + + +
    splitSettings
    +

    An object of type splitSettings that specifies how to split the data into train/validation/test. +The default settings can be created using createDefaultSplitSetting.

    + + +
    trainFractions
    +

    A list of training fractions to create models for. +Note, providing trainEvents will override your input to +trainFractions.

    + + +
    trainEvents
    +

    Events have shown to be determinant of model performance. +Therefore, it is recommended to provide trainEvents rather than +trainFractions. Note, providing trainEvents will override +your input to trainFractions. The format should be as follows:

    • c(500, 1000, 1500) - a list of training events

    • +
    + + +
    sampleSettings
    +

    An object of type sampleSettings that specifies any under/over sampling to be done. +The default is none.

    + + +
    featureEngineeringSettings
    +

    An object of featureEngineeringSettings specifying any feature engineering to be learned (using the train data)

    + + +
    preprocessSettings
    +

    An object of preprocessSettings. This setting specifies the minimum fraction of +target population who must have a covariate for it to be included in the model training +and whether to normalise the covariates before training

    + + +
    logSettings
    +

    An object of logSettings created using createLogSettings +specifying how the logging is done

    + + +
    executeSettings
    +

    An object of executeSettings specifying which parts of the analysis to run

    + +
    +
    +

    Value

    + + +

    A learning curve object containing the various performance measures + obtained by the model for each training set fraction. It can be plotted + using plotLearningCurve.

    +
    + +
    +

    Examples

    +
    if (FALSE) {
    +# define model
    +modelSettings = PatientLevelPrediction::setLassoLogisticRegression()
    +
    +# create learning curve
    +learningCurve <- PatientLevelPrediction::createLearningCurve(population,
    +                                                             plpData,
    +                                                             modelSettings)
    +# plot learning curve
    +PatientLevelPrediction::plotLearningCurve(learningCurve)
    +}
    +
    +
    +
    +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/createLogSettings.html b/docs/reference/createLogSettings.html new file mode 100644 index 000000000..05c7e8c73 --- /dev/null +++ b/docs/reference/createLogSettings.html @@ -0,0 +1,194 @@ + +Create the settings for logging the progression of the analysis — createLogSettings • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Create the settings for logging the progression of the analysis

    +
    + +
    +
    createLogSettings(verbosity = "DEBUG", timeStamp = T, logName = "runPlp Log")
    +
    + +
    +

    Arguments

    +
    verbosity
    +

    Sets the level of the verbosity. If the log level is at or higher in priority than the logger threshold, a message will print. The levels are:

    • DEBUG Highest verbosity showing all debug statements

    • +
    • TRACE Showing information about start and end of steps

    • +
    • INFO Show informative information (Default)

    • +
    • WARN Show warning messages

    • +
    • ERROR Show error messages

    • +
    • FATAL Be silent except for fatal errors

    • +
    + + +
    timeStamp
    +

    If TRUE a timestamp will be added to each logging statement. Automatically switched on for TRACE level.

    + + +
    logName
    +

    A string reference for the logger

    + +
    +
    +

    Value

    + + +

    An object of class logSettings

    + + +
    +
    +

    Details

    +

    Returns an object of class logSettings that specifies the logger settings

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/createModelDesign.html b/docs/reference/createModelDesign.html new file mode 100644 index 000000000..f1fdfafec --- /dev/null +++ b/docs/reference/createModelDesign.html @@ -0,0 +1,231 @@ + +Specify settings for deceloping a single model — createModelDesign • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Specify settings for deceloping a single model

    +
    + +
    +
    createModelDesign(
    +  targetId,
    +  outcomeId,
    +  restrictPlpDataSettings = createRestrictPlpDataSettings(),
    +  populationSettings = createStudyPopulationSettings(),
    +  covariateSettings = FeatureExtraction::createDefaultCovariateSettings(),
    +  featureEngineeringSettings = NULL,
    +  sampleSettings = NULL,
    +  preprocessSettings = NULL,
    +  modelSettings = NULL,
    +  splitSettings = createDefaultSplitSetting(type = "stratified", testFraction = 0.25,
    +    trainFraction = 0.75, splitSeed = 123, nfold = 3),
    +  runCovariateSummary = T
    +)
    +
    + +
    +

    Arguments

    +
    targetId
    +

    The id of the target cohort that will be used for data extraction (e.g., the ATLAS id)

    + + +
    outcomeId
    +

    The id of the outcome that will be used for data extraction (e.g., the ATLAS id)

    + + +
    restrictPlpDataSettings
    +

    The settings specifying the extra restriction settings when extracting the data created using createRestrictPlpDataSettings().

    + + +
    populationSettings
    +

    The population settings specified by createStudyPopulationSettings()

    + + +
    covariateSettings
    +

    The covariate settings, this can be a list or a single 'covariateSetting' object.

    + + +
    featureEngineeringSettings
    +

    Either NULL or an object of class featureEngineeringSettings specifying any feature engineering used during model development

    + + +
    sampleSettings
    +

    Either NULL or an object of class sampleSettings with the over/under sampling settings used for model development

    + + +
    preprocessSettings
    +

    Either NULL or an object of class preprocessSettings created using createPreprocessingSettings()

    + + +
    modelSettings
    +

    The model settings such as setLassoLogisticRegression()

    + + +
    splitSettings
    +

    The train/validation/test splitting used by all analyses created using createDefaultSplitSetting()

    + + +
    runCovariateSummary
    +

    Whether to run the covariateSummary

    + +
    +
    +

    Value

    + + +

    A list with analysis settings used to develop a single prediction model

    +
    +
    +

    Details

    +

    This specifies a single analysis for developing as single model

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/createPlpResultTables.html b/docs/reference/createPlpResultTables.html new file mode 100644 index 000000000..41c22b897 --- /dev/null +++ b/docs/reference/createPlpResultTables.html @@ -0,0 +1,215 @@ + +Create the results tables to store PatientLevelPrediction models and results into a database — createPlpResultTables • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    This function executes a large set of SQL statements to create tables that can store models and results

    +
    + +
    +
    createPlpResultTables(
    +  connectionDetails,
    +  targetDialect = "postgresql",
    +  resultSchema,
    +  deleteTables = T,
    +  createTables = T,
    +  tablePrefix = "",
    +  tempEmulationSchema = getOption("sqlRenderTempEmulationSchema"),
    +  testFile = NULL
    +)
    +
    + +
    +

    Arguments

    +
    connectionDetails
    +

    The database connection details

    + + +
    targetDialect
    +

    The database management system being used

    + + +
    resultSchema
    +

    The name of the database schema that the result tables will be created.

    + + +
    deleteTables
    +

    If true any existing tables matching the PatientLevelPrediction result tables names will be deleted

    + + +
    createTables
    +

    If true the PatientLevelPrediction result tables will be created

    + + +
    tablePrefix
    +

    A string that appends to the PatientLevelPrediction result tables

    + + +
    tempEmulationSchema
    +

    The temp schema used when the database management system is oracle

    + + +
    testFile
    +

    (used for testing) The location of an sql file with the table creation code

    + +
    +
    +

    Value

    + + +

    Returns NULL but creates the required tables into the specified database schema(s).

    +
    +
    +

    Details

    +

    This function can be used to create (or delete) PatientLevelPrediction result tables

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/createPreprocessSettings.html b/docs/reference/createPreprocessSettings.html new file mode 100644 index 000000000..74d078c2b --- /dev/null +++ b/docs/reference/createPreprocessSettings.html @@ -0,0 +1,192 @@ + +Create the settings for preprocessing the trainData. — createPreprocessSettings • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Create the settings for preprocessing the trainData.

    +
    + +
    +
    createPreprocessSettings(
    +  minFraction = 0.001,
    +  normalize = TRUE,
    +  removeRedundancy = TRUE
    +)
    +
    + +
    +

    Arguments

    +
    minFraction
    +

    The minimum fraction of target population who must have a covariate for it to be included in the model training

    + + +
    normalize
    +

    Whether to normalise the covariates before training (Default: TRUE)

    + + +
    removeRedundancy
    +

    Whether to remove redundant features (Default: TRUE)

    + +
    +
    +

    Value

    + + +

    An object of class preprocessingSettings

    + + +
    +
    +

    Details

    +

    Returns an object of class preprocessingSettings that specifies how to preprocess the training data

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/createRandomForestFeatureSelection.html b/docs/reference/createRandomForestFeatureSelection.html new file mode 100644 index 000000000..3b2803c55 --- /dev/null +++ b/docs/reference/createRandomForestFeatureSelection.html @@ -0,0 +1,184 @@ + +Create the settings for random foreat based feature selection — createRandomForestFeatureSelection • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Create the settings for random foreat based feature selection

    +
    + +
    +
    createRandomForestFeatureSelection(ntrees = 2000, maxDepth = 17)
    +
    + +
    +

    Arguments

    +
    ntrees
    +

    number of tree in forest

    + + +
    maxDepth
    +

    MAx depth of each tree

    + +
    +
    +

    Value

    + + +

    An object of class featureEngineeringSettings

    + + +
    +
    +

    Details

    +

    Returns an object of class featureEngineeringSettings that specifies the sampling function that will be called and the settings

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/createRestrictPlpDataSettings.html b/docs/reference/createRestrictPlpDataSettings.html new file mode 100644 index 000000000..923489204 --- /dev/null +++ b/docs/reference/createRestrictPlpDataSettings.html @@ -0,0 +1,209 @@ + +createRestrictPlpDataSettings define extra restriction settings when calling getPlpData — createRestrictPlpDataSettings • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    This function creates the settings used to restrict the target cohort when calling getPlpData

    +
    + +
    +
    createRestrictPlpDataSettings(
    +  studyStartDate = "",
    +  studyEndDate = "",
    +  firstExposureOnly = F,
    +  washoutPeriod = 0,
    +  sampleSize = NULL
    +)
    +
    + +
    +

    Arguments

    +
    studyStartDate
    +

    A calendar date specifying the minimum date that a cohort index +date can appear. Date format is 'yyyymmdd'.

    + + +
    studyEndDate
    +

    A calendar date specifying the maximum date that a cohort index +date can appear. Date format is 'yyyymmdd'. Important: the study +end data is also used to truncate risk windows, meaning no outcomes +beyond the study end date will be considered.

    + + +
    firstExposureOnly
    +

    Should only the first exposure per subject be included? Note that +this is typically done in the createStudyPopulation function, +but can already be done here for efficiency reasons.

    + + +
    washoutPeriod
    +

    The mininum required continuous observation time prior to index +date for a person to be included in the at risk cohort. Note that +this is typically done in the createStudyPopulation function, +but can already be done here for efficiency reasons.

    + + +
    sampleSize
    +

    If not NULL, the number of people to sample from the target cohort

    + +
    +
    +

    Value

    + + +

    A setting object of class restrictPlpDataSettings containing a list getPlpData extra settings

    +
    +
    +

    Details

    +

    Users need to specify the extra restrictions to apply when downloading the target cohort

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/createSampleSettings.html b/docs/reference/createSampleSettings.html new file mode 100644 index 000000000..f2723ab9e --- /dev/null +++ b/docs/reference/createSampleSettings.html @@ -0,0 +1,200 @@ + +Create the settings for defining how the trainData from splitData are sampled using +default sample functions. — createSampleSettings • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Create the settings for defining how the trainData from splitData are sampled using +default sample functions.

    +
    + +
    +
    createSampleSettings(
    +  type = "none",
    +  numberOutcomestoNonOutcomes = 1,
    +  sampleSeed = sample(10000, 1)
    +)
    +
    + +
    +

    Arguments

    +
    type
    +

    (character) Choice of:

    • 'none' No sampling is applied - this is the default

    • +
    • 'underSample' Undersample the non-outcome class to make the data more ballanced

    • +
    • 'overSample' Oversample the outcome class by adding in each outcome multiple times

    • +
    + + +
    numberOutcomestoNonOutcomes
    +

    (numeric) An numeric specifying the require number of non-outcomes per outcome

    + + +
    sampleSeed
    +

    (numeric) A seed to use when splitting the data for reproducibility (if not set a random number will be generated)

    + +
    +
    +

    Value

    + + +

    An object of class sampleSettings

    + + +
    +
    +

    Details

    +

    Returns an object of class sampleSettings that specifies the sampling function that will be called and the settings

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/createSplineSettings.html b/docs/reference/createSplineSettings.html new file mode 100644 index 000000000..9b04d1a1b --- /dev/null +++ b/docs/reference/createSplineSettings.html @@ -0,0 +1,188 @@ + +Create the settings for adding a spline for continuous variables — createSplineSettings • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Create the settings for adding a spline for continuous variables

    +
    + +
    +
    createSplineSettings(continousCovariateId, knots, analysisId = 683)
    +
    + +
    +

    Arguments

    +
    continousCovariateId
    +

    The covariateId to apply splines to

    + + +
    knots
    +

    Either number of knots of vector of split values

    + + +
    analysisId
    +

    The analysisId to use for the spline covariates

    + +
    +
    +

    Value

    + + +

    An object of class featureEngineeringSettings

    + + +
    +
    +

    Details

    +

    Returns an object of class featureEngineeringSettings that specifies the sampling function that will be called and the settings

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/createStratifiedImputationSettings.html b/docs/reference/createStratifiedImputationSettings.html new file mode 100644 index 000000000..753176d9b --- /dev/null +++ b/docs/reference/createStratifiedImputationSettings.html @@ -0,0 +1,184 @@ + +Create the settings for adding a spline for continuous variables — createStratifiedImputationSettings • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Create the settings for adding a spline for continuous variables

    +
    + +
    +
    createStratifiedImputationSettings(covariateId, ageSplits = NULL)
    +
    + +
    +

    Arguments

    +
    covariateId
    +

    The covariateId that needs imputed values

    + + +
    ageSplits
    +

    A vector of age splits in years to create age groups

    + +
    +
    +

    Value

    + + +

    An object of class featureEngineeringSettings

    + + +
    +
    +

    Details

    +

    Returns an object of class featureEngineeringSettings that specifies how to do stratified imputation

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/createStudyPopulation.html b/docs/reference/createStudyPopulation.html new file mode 100644 index 000000000..70fb48349 --- /dev/null +++ b/docs/reference/createStudyPopulation.html @@ -0,0 +1,216 @@ + +Create a study population — createStudyPopulation • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Create a study population

    +
    + +
    +
    createStudyPopulation(
    +  plpData,
    +  outcomeId,
    +  populationSettings,
    +  population = NULL
    +)
    +
    + +
    +

    Arguments

    +
    plpData
    +

    An object of type plpData as generated using +getplpData.

    + + +
    outcomeId
    +

    The ID of the outcome.

    + + +
    populationSettings
    +

    An object of class populationSettings created using createPopulationSettings

    + + +
    population
    +

    If specified, this population will be used as the starting point instead of the +cohorts in the plpData object.

    + +
    +
    +

    Value

    + + +

    A data frame specifying the study population. This data frame will have the following columns:

    rowId
    +

    A unique identifier for an exposure

    + +
    subjectId
    +

    The person ID of the subject

    + +
    cohortStartdate
    +

    The index date

    + +
    outcomeCount
    +

    The number of outcomes observed during the risk window

    + +
    timeAtRisk
    +

    The number of days in the risk window

    + +
    survivalTime
    +

    The number of days until either the outcome or the end of the risk window

    + + +
    +
    +

    Details

    +

    Create a study population by enforcing certain inclusion and exclusion criteria, defining +a risk window, and determining which outcomes fall inside the risk window.

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/createStudyPopulationSettings.html b/docs/reference/createStudyPopulationSettings.html new file mode 100644 index 000000000..e40884cc2 --- /dev/null +++ b/docs/reference/createStudyPopulationSettings.html @@ -0,0 +1,246 @@ + +create the study population settings — createStudyPopulationSettings • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    create the study population settings

    +
    + +
    +
    createStudyPopulationSettings(
    +  binary = T,
    +  includeAllOutcomes = T,
    +  firstExposureOnly = FALSE,
    +  washoutPeriod = 0,
    +  removeSubjectsWithPriorOutcome = TRUE,
    +  priorOutcomeLookback = 99999,
    +  requireTimeAtRisk = T,
    +  minTimeAtRisk = 364,
    +  riskWindowStart = 1,
    +  startAnchor = "cohort start",
    +  riskWindowEnd = 365,
    +  endAnchor = "cohort start",
    +  restrictTarToCohortEnd = F
    +)
    +
    + +
    +

    Arguments

    +
    binary
    +

    Forces the outcomeCount to be 0 or 1 (use for binary prediction problems)

    + + +
    includeAllOutcomes
    +

    (binary) indicating whether to include people with outcomes who are not observed for the whole at risk period

    + + +
    firstExposureOnly
    +

    Should only the first exposure per subject be included? Note that +this is typically done in the createStudyPopulation function,

    + + +
    washoutPeriod
    +

    The mininum required continuous observation time prior to index +date for a person to be included in the cohort.

    + + +
    removeSubjectsWithPriorOutcome
    +

    Remove subjects that have the outcome prior to the risk window start?

    + + +
    priorOutcomeLookback
    +

    How many days should we look back when identifying prior outcomes?

    + + +
    requireTimeAtRisk
    +

    Should subject without time at risk be removed?

    + + +
    minTimeAtRisk
    +

    The minimum number of days at risk required to be included

    + + +
    riskWindowStart
    +

    The start of the risk window (in days) relative to the index date (+ +days of exposure if the addExposureDaysToStart parameter is +specified).

    + + +
    startAnchor
    +

    The anchor point for the start of the risk window. Can be "cohort start" or "cohort end".

    + + +
    riskWindowEnd
    +

    The end of the risk window (in days) relative to the index data (+ +days of exposure if the addExposureDaysToEnd parameter is +specified).

    + + +
    endAnchor
    +

    The anchor point for the end of the risk window. Can be "cohort start" or "cohort end".

    + + +
    restrictTarToCohortEnd
    +

    If using a survival model and you want the time-at-risk to end at the cohort end date set this to T

    + +
    +
    +

    Value

    + + +

    A list containing all the settings required for creating the study population

    +
    +
    +

    Details

    +

    Takes as input the inputs to create study population

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/createTempModelLoc.html b/docs/reference/createTempModelLoc.html new file mode 100644 index 000000000..e91a74cf3 --- /dev/null +++ b/docs/reference/createTempModelLoc.html @@ -0,0 +1,162 @@ + +Create a temporary model location — createTempModelLoc • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Create a temporary model location

    +
    + +
    +
    createTempModelLoc()
    +
    + + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/createUnivariateFeatureSelection.html b/docs/reference/createUnivariateFeatureSelection.html new file mode 100644 index 000000000..4f4c8fb19 --- /dev/null +++ b/docs/reference/createUnivariateFeatureSelection.html @@ -0,0 +1,180 @@ + +Create the settings for defining any feature selection that will be done — createUnivariateFeatureSelection • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Create the settings for defining any feature selection that will be done

    +
    + +
    +
    createUnivariateFeatureSelection(k = 100)
    +
    + +
    +

    Arguments

    +
    k
    +

    This function returns the K features most associated (univariately) to the outcome

    + +
    +
    +

    Value

    + + +

    An object of class featureEngineeringSettings

    + + +
    +
    +

    Details

    +

    Returns an object of class featureEngineeringSettings that specifies the sampling function that will be called and the settings

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/createValidationDesign.html b/docs/reference/createValidationDesign.html new file mode 100644 index 000000000..f3601d2c9 --- /dev/null +++ b/docs/reference/createValidationDesign.html @@ -0,0 +1,200 @@ + +createValidationDesign - Define the validation design for external validation — createValidationDesign • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    createValidationDesign - Define the validation design for external validation

    +
    + +
    +
    createValidationDesign(
    +  targetId,
    +  outcomeId,
    +  populationSettings,
    +  restrictPlpDataSettings,
    +  plpModelList,
    +  recalibrate = NULL,
    +  runCovariateSummary = TRUE
    +)
    +
    + +
    +

    Arguments

    +
    targetId
    +

    The targetId of the target cohort to validate on

    + + +
    outcomeId
    +

    The outcomeId of the outcome cohort to validate on

    + + +
    populationSettings
    +

    A list of population restriction settings created by createPopulationSettings

    + + +
    restrictPlpDataSettings
    +

    A list of plpData restriction settings created by createRestrictPlpDataSettings

    + + +
    plpModelList
    +

    A list of plpModels objects created by runPlp or a path to such objects

    + + +
    recalibrate
    +

    A vector of characters specifying the recalibration method to apply,

    + + +
    runCovariateSummary
    +

    whether to run the covariate summary for the validation data

    + +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/createValidationSettings.html b/docs/reference/createValidationSettings.html new file mode 100644 index 000000000..c21539127 --- /dev/null +++ b/docs/reference/createValidationSettings.html @@ -0,0 +1,182 @@ + +createValidationSettings define optional settings for performing external validation — createValidationSettings • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    This function creates the settings required by externalValidatePlp

    +
    + +
    +
    createValidationSettings(recalibrate = NULL, runCovariateSummary = T)
    +
    + +
    +

    Arguments

    +
    recalibrate
    +

    A vector of characters specifying the recalibration method to apply

    + + +
    runCovariateSummary
    +

    Whether to run the covariate summary for the validation data

    + +
    +
    +

    Value

    + + +

    A setting object of class validationSettings containing a list of settings for externalValidatePlp

    +
    +
    +

    Details

    +

    Users need to specify whether they want to sample or recalibate when performing external validation

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/diagnoseMultiplePlp.html b/docs/reference/diagnoseMultiplePlp.html new file mode 100644 index 000000000..5c24f5c05 --- /dev/null +++ b/docs/reference/diagnoseMultiplePlp.html @@ -0,0 +1,203 @@ + +Run a list of predictions diagnoses — diagnoseMultiplePlp • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Run a list of predictions diagnoses

    +
    + +
    +
    diagnoseMultiplePlp(
    +  databaseDetails = createDatabaseDetails(),
    +  modelDesignList = list(createModelDesign(targetId = 1, outcomeId = 2, modelSettings =
    +    setLassoLogisticRegression()), createModelDesign(targetId = 1, outcomeId = 3,
    +    modelSettings = setLassoLogisticRegression())),
    +  cohortDefinitions = NULL,
    +  logSettings = createLogSettings(verbosity = "DEBUG", timeStamp = T, logName =
    +    "diagnosePlp Log"),
    +  saveDirectory = getwd()
    +)
    +
    + +
    +

    Arguments

    +
    databaseDetails
    +

    The database settings created using createDatabaseDetails()

    + + +
    modelDesignList
    +

    A list of model designs created using createModelDesign()

    + + +
    cohortDefinitions
    +

    A list of cohort definitions for the target and outcome cohorts

    + + +
    logSettings
    +

    The setting spexcifying the logging for the analyses created using createLogSettings()

    + + +
    saveDirectory
    +

    Name of the folder where all the outputs will written to.

    + +
    +
    +

    Value

    + + +

    A data frame with the following columns:

    analysisIdThe unique identifier +for a set of analysis choices.
    targetIdThe ID of the target cohort populations.
    outcomeIdThe ID of the outcomeId.
    dataLocationThe location where the plpData was saved
    the settings idsThe ids for all other settings used for model development.
    +
    +

    Details

    +

    This function will run all specified prediction design diagnoses as defined using .

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/diagnosePlp.html b/docs/reference/diagnosePlp.html new file mode 100644 index 000000000..f4282a426 --- /dev/null +++ b/docs/reference/diagnosePlp.html @@ -0,0 +1,267 @@ + +diagnostic - Investigates the prediction problem settings - use before training a model — diagnosePlp • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    This function runs a set of prediction diagnoses to help pick a suitable T, O, TAR and determine +whether the prediction problem is worth executing.

    +
    + +
    +
    diagnosePlp(
    +  plpData = NULL,
    +  outcomeId,
    +  analysisId,
    +  populationSettings,
    +  splitSettings = createDefaultSplitSetting(),
    +  sampleSettings = createSampleSettings(),
    +  saveDirectory = NULL,
    +  featureEngineeringSettings = createFeatureEngineeringSettings(),
    +  modelSettings = setLassoLogisticRegression(),
    +  logSettings = createLogSettings(verbosity = "DEBUG", timeStamp = T, logName =
    +    "diagnosePlp Log"),
    +  preprocessSettings = createPreprocessSettings()
    +)
    +
    + +
    +

    Arguments

    +
    plpData
    +

    An object of type plpData - the patient level prediction +data extracted from the CDM. Can also include an initial population as +plpData$popualtion.

    + + +
    outcomeId
    +

    (integer) The ID of the outcome.

    + + +
    analysisId
    +

    (integer) Identifier for the analysis. It is used to create, e.g., the result folder. Default is a timestamp.

    + + +
    populationSettings
    +

    An object of type populationSettings created using createStudyPopulationSettings that +specifies how the data class labels are defined and addition any exclusions to apply to the +plpData cohort

    + + +
    splitSettings
    +

    An object of type splitSettings that specifies how to split the data into train/validation/test. +The default settings can be created using createDefaultSplitSetting.

    + + +
    sampleSettings
    +

    An object of type sampleSettings that specifies any under/over sampling to be done. +The default is none.

    + + +
    saveDirectory
    +

    The path to the directory where the results will be saved (if NULL uses working directory)

    + + +
    featureEngineeringSettings
    +

    An object of featureEngineeringSettings specifying any feature engineering to be learned (using the train data)

    + + +
    modelSettings
    +

    An object of class modelSettings created using one of the function:

    • setLassoLogisticRegression() A lasso logistic regression model

    • +
    • setGradientBoostingMachine() A gradient boosting machine

    • +
    • setAdaBoost() An ada boost model

    • +
    • setRandomForest() A random forest model

    • +
    • setDecisionTree() A decision tree model

    • +
    • setKNN() A KNN model

    • +
    + + +
    logSettings
    +

    An object of logSettings created using createLogSettings +specifying how the logging is done

    + + +
    preprocessSettings
    +

    An object of preprocessSettings. This setting specifies the minimum fraction of +target population who must have a covariate for it to be included in the model training +and whether to normalise the covariates before training

    + +
    +
    +

    Value

    + + +

    An object containing the model or location where the model is save, the data selection settings, the preprocessing +and training settings as well as various performance measures obtained by the model.

    +
    distribution
    +

    list for each O of a data.frame containing: i) Time to observation end distribution, ii) Time from observation start distribution, iii) Time to event distribution and iv) Time from last prior event to index distribution (only for patients in T who have O before index)

    + +
    incident
    +

    list for each O of incidence of O in T during TAR

    + +
    characterization
    +

    list for each O of Characterization of T, TnO, Tn~O

    + +
    +
    +

    Details

    +

    Users can define set of Ts, Os, databases and population settings. A list of data.frames containing details such as +follow-up time distribution, time-to-event information, characteriszation details, time from last prior event, +observation time distribution.

    +
    + +
    +

    Examples

    +
    if (FALSE) {
    +#******** EXAMPLE 1 ********* 
    +} 
    +
    +
    +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/diagnosticOddsRatio.html b/docs/reference/diagnosticOddsRatio.html new file mode 100644 index 000000000..806f33f02 --- /dev/null +++ b/docs/reference/diagnosticOddsRatio.html @@ -0,0 +1,190 @@ + +Calculate the diagnostic odds ratio — diagnosticOddsRatio • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Calculate the diagnostic odds ratio

    +
    + +
    +
    diagnosticOddsRatio(TP, TN, FN, FP)
    +
    + +
    +

    Arguments

    +
    TP
    +

    Number of true positives

    + + +
    TN
    +

    Number of true negatives

    + + +
    FN
    +

    Number of false negatives

    + + +
    FP
    +

    Number of false positives

    + +
    +
    +

    Value

    + + +

    diagnosticOddsRatio value

    +
    +
    +

    Details

    +

    Calculate the diagnostic odds ratio

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/evaluatePlp.html b/docs/reference/evaluatePlp.html new file mode 100644 index 000000000..7ad688156 --- /dev/null +++ b/docs/reference/evaluatePlp.html @@ -0,0 +1,183 @@ + +evaluatePlp — evaluatePlp • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Evaluates the performance of the patient level prediction model

    +
    + +
    +
    evaluatePlp(prediction, typeColumn = "evaluationType")
    +
    + +
    +

    Arguments

    +
    prediction
    +

    The patient level prediction model's prediction

    + + +
    typeColumn
    +

    The column name in the prediction object that is used to +stratify the evaluation

    + +
    +
    +

    Value

    + + +

    A list containing the performance values

    +
    +
    +

    Details

    +

    The function calculates various metrics to measure the performance of the model

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/externalValidateDbPlp.html b/docs/reference/externalValidateDbPlp.html new file mode 100644 index 000000000..79f46b721 --- /dev/null +++ b/docs/reference/externalValidateDbPlp.html @@ -0,0 +1,207 @@ + +externalValidateDbPlp - Validate a model on new databases — externalValidateDbPlp • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    This function extracts data using a user specified connection and cdm_schema, applied the model and then calcualtes the performance

    +
    + +
    +
    externalValidateDbPlp(
    +  plpModel,
    +  validationDatabaseDetails = createDatabaseDetails(),
    +  validationRestrictPlpDataSettings = createRestrictPlpDataSettings(),
    +  settings = createValidationSettings(recalibrate = "weakRecalibration"),
    +  logSettings = createLogSettings(verbosity = "INFO", logName = "validatePLP"),
    +  outputFolder = getwd()
    +)
    +
    + +
    +

    Arguments

    +
    plpModel
    +

    The model object returned by runPlp() containing the trained model

    + + +
    validationDatabaseDetails
    +

    A list of objects of class databaseDetails created using createDatabaseDetails

    + + +
    validationRestrictPlpDataSettings
    +

    A list of population restriction settings created by createRestrictPlpDataSettings()

    + + +
    settings
    +

    A settings object of class validationSettings created using createValidationSettings

    + + +
    logSettings
    +

    An object of logSettings created using createLogSettings +specifying how the logging is done

    + + +
    outputFolder
    +

    The directory to save the validation results to (subfolders are created per database in validationDatabaseDetails)

    + +
    +
    +

    Value

    + + +

    A list containing the performance for each validation_schema

    +
    +
    +

    Details

    +

    Users need to input a trained model (the output of runPlp()) and new database connections. The function will return a list of length equal to the +number of cdm_schemas input with the performance on the new data

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/extractDatabaseToCsv.html b/docs/reference/extractDatabaseToCsv.html new file mode 100644 index 000000000..8e90243ea --- /dev/null +++ b/docs/reference/extractDatabaseToCsv.html @@ -0,0 +1,204 @@ + +Exports all the results from a database into csv files — extractDatabaseToCsv • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Exports all the results from a database into csv files

    +
    + +
    +
    extractDatabaseToCsv(
    +  conn = NULL,
    +  connectionDetails,
    +  databaseSchemaSettings = createDatabaseSchemaSettings(resultSchema = "main"),
    +  csvFolder,
    +  minCellCount = 5,
    +  sensitiveColumns = getPlpSensitiveColumns(),
    +  fileAppend = NULL
    +)
    +
    + +
    +

    Arguments

    +
    conn
    +

    The connection to the database with the results

    + + +
    connectionDetails
    +

    The connectionDetails for the result database

    + + +
    databaseSchemaSettings
    +

    The result database schema settings

    + + +
    csvFolder
    +

    Location to save the csv files

    + + +
    minCellCount
    +

    The min value to show in cells that are sensitive (values less than this value will be replaced with -1)

    + + +
    sensitiveColumns
    +

    A named list (name of table columns belong to) with a list of columns to apply the minCellCount to.

    + + +
    fileAppend
    +

    If set to a string this will be appended to the start of the csv file names

    + +
    +
    +

    Details

    +

    Extracts the results from a database into a set of csv files

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/f1Score.html b/docs/reference/f1Score.html new file mode 100644 index 000000000..0a2fb3974 --- /dev/null +++ b/docs/reference/f1Score.html @@ -0,0 +1,190 @@ + +Calculate the f1Score — f1Score • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Calculate the f1Score

    +
    + +
    +
    f1Score(TP, TN, FN, FP)
    +
    + +
    +

    Arguments

    +
    TP
    +

    Number of true positives

    + + +
    TN
    +

    Number of true negatives

    + + +
    FN
    +

    Number of false negatives

    + + +
    FP
    +

    Number of false positives

    + +
    +
    +

    Value

    + + +

    f1Score value

    +
    +
    +

    Details

    +

    Calculate the f1Score

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/falseDiscoveryRate.html b/docs/reference/falseDiscoveryRate.html new file mode 100644 index 000000000..73f4dc46b --- /dev/null +++ b/docs/reference/falseDiscoveryRate.html @@ -0,0 +1,190 @@ + +Calculate the falseDiscoveryRate — falseDiscoveryRate • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Calculate the falseDiscoveryRate

    +
    + +
    +
    falseDiscoveryRate(TP, TN, FN, FP)
    +
    + +
    +

    Arguments

    +
    TP
    +

    Number of true positives

    + + +
    TN
    +

    Number of true negatives

    + + +
    FN
    +

    Number of false negatives

    + + +
    FP
    +

    Number of false positives

    + +
    +
    +

    Value

    + + +

    falseDiscoveryRate value

    +
    +
    +

    Details

    +

    Calculate the falseDiscoveryRate

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/falseNegativeRate.html b/docs/reference/falseNegativeRate.html new file mode 100644 index 000000000..dc984b78f --- /dev/null +++ b/docs/reference/falseNegativeRate.html @@ -0,0 +1,190 @@ + +Calculate the falseNegativeRate — falseNegativeRate • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Calculate the falseNegativeRate

    +
    + +
    +
    falseNegativeRate(TP, TN, FN, FP)
    +
    + +
    +

    Arguments

    +
    TP
    +

    Number of true positives

    + + +
    TN
    +

    Number of true negatives

    + + +
    FN
    +

    Number of false negatives

    + + +
    FP
    +

    Number of false positives

    + +
    +
    +

    Value

    + + +

    falseNegativeRate value

    +
    +
    +

    Details

    +

    Calculate the falseNegativeRate

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/falseOmissionRate.html b/docs/reference/falseOmissionRate.html new file mode 100644 index 000000000..7cf484107 --- /dev/null +++ b/docs/reference/falseOmissionRate.html @@ -0,0 +1,190 @@ + +Calculate the falseOmissionRate — falseOmissionRate • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Calculate the falseOmissionRate

    +
    + +
    +
    falseOmissionRate(TP, TN, FN, FP)
    +
    + +
    +

    Arguments

    +
    TP
    +

    Number of true positives

    + + +
    TN
    +

    Number of true negatives

    + + +
    FN
    +

    Number of false negatives

    + + +
    FP
    +

    Number of false positives

    + +
    +
    +

    Value

    + + +

    falseOmissionRate value

    +
    +
    +

    Details

    +

    Calculate the falseOmissionRate

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/falsePositiveRate.html b/docs/reference/falsePositiveRate.html new file mode 100644 index 000000000..0f378c3c8 --- /dev/null +++ b/docs/reference/falsePositiveRate.html @@ -0,0 +1,190 @@ + +Calculate the falsePositiveRate — falsePositiveRate • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Calculate the falsePositiveRate

    +
    + +
    +
    falsePositiveRate(TP, TN, FN, FP)
    +
    + +
    +

    Arguments

    +
    TP
    +

    Number of true positives

    + + +
    TN
    +

    Number of true negatives

    + + +
    FN
    +

    Number of false negatives

    + + +
    FP
    +

    Number of false positives

    + +
    +
    +

    Value

    + + +

    falsePositiveRate value

    +
    +
    +

    Details

    +

    Calculate the falsePositiveRate

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/fitPlp.html b/docs/reference/fitPlp.html new file mode 100644 index 000000000..6a394ecec --- /dev/null +++ b/docs/reference/fitPlp.html @@ -0,0 +1,218 @@ + +fitPlp — fitPlp • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Train various models using a default parameter gird search or user specified parameters

    +
    + +
    +
    fitPlp(trainData, modelSettings, search = "grid", analysisId, analysisPath)
    +
    + +
    +

    Arguments

    +
    trainData
    +

    An object of type TrainData created using splitData +data extracted from the CDM.

    + + +
    modelSettings
    +

    An object of class modelSettings created using one of the function:

    • setLassoLogisticRegression() A lasso logistic regression model

    • +
    • setGradientBoostingMachine() A gradient boosting machine

    • +
    • setRandomForest() A random forest model

    • +
    • setKNN() A KNN model

    • +
    + + +
    search
    +

    The search strategy for the hyper-parameter selection (currently not used)

    + + +
    analysisId
    +

    The id of the analysis

    + + +
    analysisPath
    +

    The path of the analysis

    + +
    +
    +

    Value

    + + +

    An object of class plpModel containing:

    +
    model
    +

    The trained prediction model

    + +
    preprocessing
    +

    The preprocessing required when applying the model

    + +
    prediction
    +

    The cohort data.frame with the predicted risk column added

    + +
    modelDesign
    +

    A list specifiying the modelDesign settings used to fit the model

    + +
    trainDetails
    +

    The model meta data

    + +
    covariateImportance
    +

    The covariate importance for the model

    + +
    +
    +

    Details

    +

    The user can define the machine learning model to train (regularised logistic regression, random forest, +gradient boosting machine, neural network and )

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/getCalibrationSummary.html b/docs/reference/getCalibrationSummary.html new file mode 100644 index 000000000..f32e764d0 --- /dev/null +++ b/docs/reference/getCalibrationSummary.html @@ -0,0 +1,203 @@ + +Get a sparse summary of the calibration — getCalibrationSummary • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Get a sparse summary of the calibration

    +
    + +
    +
    getCalibrationSummary(
    +  prediction,
    +  predictionType,
    +  typeColumn = "evaluation",
    +  numberOfStrata = 100,
    +  truncateFraction = 0.05
    +)
    +
    + +
    +

    Arguments

    +
    prediction
    +

    A prediction object as generated using the +predict functions.

    + + +
    predictionType
    +

    The type of prediction (binary or survival)

    + + +
    typeColumn
    +

    A column that is used to stratify the results

    + + +
    numberOfStrata
    +

    The number of strata in the plot.

    + + +
    truncateFraction
    +

    This fraction of probability values will be ignored when plotting, to +avoid the x-axis scale being dominated by a few outliers.

    + +
    +
    +

    Value

    + + +

    A dataframe with the calibration summary

    +
    +
    +

    Details

    +

    Generates a sparse summary showing the predicted probabilities and the observed fractions. Predictions are +stratefied into equally sized bins of predicted probabilities.

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/getCohortCovariateData.html b/docs/reference/getCohortCovariateData.html new file mode 100644 index 000000000..b5dcda0a1 --- /dev/null +++ b/docs/reference/getCohortCovariateData.html @@ -0,0 +1,226 @@ + +Extracts covariates based on cohorts — getCohortCovariateData • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Extracts covariates based on cohorts

    +
    + +
    +
    getCohortCovariateData(
    +  connection,
    +  oracleTempSchema = NULL,
    +  cdmDatabaseSchema,
    +  cdmVersion = "5",
    +  cohortTable = "#cohort_person",
    +  rowIdField = "row_id",
    +  aggregated,
    +  cohortIds,
    +  covariateSettings,
    +  ...
    +)
    +
    + +
    +

    Arguments

    +
    connection
    +

    The database connection

    + + +
    oracleTempSchema
    +

    The temp schema if using oracle

    + + +
    cdmDatabaseSchema
    +

    The schema of the OMOP CDM data

    + + +
    cdmVersion
    +

    version of the OMOP CDM data

    + + +
    cohortTable
    +

    the table name that contains the target population cohort

    + + +
    rowIdField
    +

    string representing the unique identifier in the target population cohort

    + + +
    aggregated
    +

    whether the covariate should be aggregated

    + + +
    cohortIds
    +

    cohort id for the target cohort

    + + +
    covariateSettings
    +

    settings for the covariate cohorts and time periods

    + + +
    ...
    +

    additional arguments from FeatureExtraction

    + +
    +
    +

    Value

    + + +

    The models will now be in the package

    +
    +
    +

    Details

    +

    The user specifies a cohort and time period and then a covariate is constructed whether they are in the +cohort during the time periods relative to target population cohort index

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/getDemographicSummary.html b/docs/reference/getDemographicSummary.html new file mode 100644 index 000000000..a4ffe6cdc --- /dev/null +++ b/docs/reference/getDemographicSummary.html @@ -0,0 +1,186 @@ + +Get a calibration per age/gender groups — getDemographicSummary • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Get a calibration per age/gender groups

    +
    + +
    +
    getDemographicSummary(prediction, predictionType, typeColumn = "evaluation")
    +
    + +
    +

    Arguments

    +
    prediction
    +

    A prediction object

    + + +
    predictionType
    +

    The type of prediction (binary or survival)

    + + +
    typeColumn
    +

    A column that is used to stratify the results

    + +
    +
    +

    Value

    + + +

    A dataframe with the calibration summary

    +
    +
    +

    Details

    +

    Generates a data.frame with the calibration per each 5 year age group and gender group

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/getPlpData.html b/docs/reference/getPlpData.html new file mode 100644 index 000000000..88c691ae6 --- /dev/null +++ b/docs/reference/getPlpData.html @@ -0,0 +1,219 @@ + +Get the patient level prediction data from the server — getPlpData • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    This function executes a large set of SQL statements against the database in OMOP CDM format to +extract the data needed to perform the analysis.

    +
    + +
    +
    getPlpData(databaseDetails, covariateSettings, restrictPlpDataSettings)
    +
    + +
    +

    Arguments

    +
    databaseDetails
    +

    The cdm database details created using createDatabaseDetails()

    + + +
    covariateSettings
    +

    An object of type covariateSettings as created using the +createCovariateSettings function in the +FeatureExtraction package.

    + + +
    restrictPlpDataSettings
    +

    Extra settings to apply to the target population while extracting data. Created using createRestrictPlpDataSettings().

    + +
    +
    +

    Value

    + + +

    Returns an object of type plpData, containing information on the cohorts, their +outcomes, and baseline covariates. Information about multiple outcomes can be captured at once for +efficiency reasons. This object is a list with the following components:

    outcomes
    +

    A data frame listing the outcomes per person, including the time to event, and +the outcome id. Outcomes are not yet filtered based on risk window, since this is done at +a later stage.

    +
    cohorts
    +

    A data frame listing the persons in each cohort, listing their +exposure status as well as the time to the end of the observation period and time to the end of the +cohort (usually the end of the exposure era).

    +
    covariates
    +

    An ffdf object listing the +baseline covariates per person in the two cohorts. This is done using a sparse representation: +covariates with a value of 0 are omitted to save space.

    +
    covariateRef
    +

    An ffdf object describing the covariates that have been extracted.

    + +
    metaData
    +

    A list of objects with information on how the cohortMethodData object was +constructed.

    + +

    The generic () and summary() functions have been implemented for this object.

    +
    +
    +

    Details

    +

    Based on the arguments, the at risk cohort data is retrieved, as well as outcomes +occurring in these subjects. The at risk cohort is identified through +user-defined cohorts in a cohort table either inside the CDM instance or in a separate schema. +Similarly, outcomes are identified +through user-defined cohorts in a cohort table either inside the CDM instance or in a separate +schema. Covariates are automatically extracted from the appropriate tables within the CDM. +If you wish to exclude concepts from covariates you will need to +manually add the concept_ids and descendants to the excludedCovariateConceptIds of the +covariateSettings argument.

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/getPredictionDistribution.html b/docs/reference/getPredictionDistribution.html new file mode 100644 index 000000000..f6d9eac2f --- /dev/null +++ b/docs/reference/getPredictionDistribution.html @@ -0,0 +1,191 @@ + +Calculates the prediction distribution — getPredictionDistribution • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Calculates the prediction distribution

    +
    + +
    +
    getPredictionDistribution(
    +  prediction,
    +  predictionType,
    +  typeColumn = "evaluation"
    +)
    +
    + +
    +

    Arguments

    +
    prediction
    +

    A prediction object

    + + +
    predictionType
    +

    The type of prediction (binary or survival)

    + + +
    typeColumn
    +

    A column that is used to stratify the results

    + +
    +
    +

    Value

    + + +

    The 0.00, 0.1, 0.25, 0.5, 0.75, 0.9, 1.00 quantile pf the prediction, +the mean and standard deviation per class

    +
    +
    +

    Details

    +

    Calculates the quantiles from a predition object

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/getPredictionDistribution_binary.html b/docs/reference/getPredictionDistribution_binary.html new file mode 100644 index 000000000..25312a70d --- /dev/null +++ b/docs/reference/getPredictionDistribution_binary.html @@ -0,0 +1,187 @@ + +Calculates the prediction distribution — getPredictionDistribution_binary • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Calculates the prediction distribution

    +
    + +
    +
    getPredictionDistribution_binary(prediction, evalColumn, ...)
    +
    + +
    +

    Arguments

    +
    prediction
    +

    A prediction object

    + + +
    evalColumn
    +

    A column that is used to stratify the results

    + + +
    ...
    +

    Other inputs

    + +
    +
    +

    Value

    + + +

    The 0.00, 0.1, 0.25, 0.5, 0.75, 0.9, 1.00 quantile pf the prediction, +the mean and standard deviation per class

    +
    +
    +

    Details

    +

    Calculates the quantiles from a predition object

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/getThresholdSummary.html b/docs/reference/getThresholdSummary.html new file mode 100644 index 000000000..743440c53 --- /dev/null +++ b/docs/reference/getThresholdSummary.html @@ -0,0 +1,187 @@ + +Calculate all measures for sparse ROC — getThresholdSummary • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Calculate all measures for sparse ROC

    +
    + +
    +
    getThresholdSummary(prediction, predictionType, typeColumn = "evaluation")
    +
    + +
    +

    Arguments

    +
    prediction
    +

    A prediction object

    + + +
    predictionType
    +

    The type of prediction (binary or survival)

    + + +
    typeColumn
    +

    A column that is used to stratify the results

    + +
    +
    +

    Value

    + + +

    A data.frame with all the measures

    +
    +
    +

    Details

    +

    Calculates the TP, FP, TN, FN, TPR, FPR, accuracy, PPF, FOR and Fmeasure +from a prediction object

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/getThresholdSummary_binary.html b/docs/reference/getThresholdSummary_binary.html new file mode 100644 index 000000000..dd31368b0 --- /dev/null +++ b/docs/reference/getThresholdSummary_binary.html @@ -0,0 +1,187 @@ + +Calculate all measures for sparse ROC when prediction is bianry classification — getThresholdSummary_binary • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Calculate all measures for sparse ROC when prediction is bianry classification

    +
    + +
    +
    getThresholdSummary_binary(prediction, evalColumn, ...)
    +
    + +
    +

    Arguments

    +
    prediction
    +

    A prediction object

    + + +
    evalColumn
    +

    A column that is used to stratify the results

    + + +
    ...
    +

    Other inputs

    + +
    +
    +

    Value

    + + +

    A data.frame with all the measures

    +
    +
    +

    Details

    +

    Calculates the TP, FP, TN, FN, TPR, FPR, accuracy, PPF, FOR and Fmeasure +from a prediction object

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/ici.html b/docs/reference/ici.html new file mode 100644 index 000000000..aefd353ec --- /dev/null +++ b/docs/reference/ici.html @@ -0,0 +1,183 @@ + +Calculate the Integrated Calibration Information from Austin and Steyerberg +https://onlinelibrary.wiley.com/doi/full/10.1002/sim.8281 — ici • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Calculate the Integrated Calibration Information from Austin and Steyerberg +https://onlinelibrary.wiley.com/doi/full/10.1002/sim.8281

    +
    + +
    +
    ici(prediction)
    +
    + +
    +

    Arguments

    +
    prediction
    +

    the prediction object found in the plpResult object

    + +
    +
    +

    Value

    + + +

    Integrated Calibration Information

    +
    +
    +

    Details

    +

    Calculate the Integrated Calibration Information

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/index.html b/docs/reference/index.html new file mode 100644 index 000000000..652594c5b --- /dev/null +++ b/docs/reference/index.html @@ -0,0 +1,767 @@ + +Function reference • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +

    Extracting data from the OMOP CDM database

    +

    Functions for getting the necessary data from the database in Common Data Model and saving/loading.

    +
    +

    createDatabaseDetails()

    +

    Create a setting that holds the details about the cdmDatabase connection for data extraction

    +

    createRestrictPlpDataSettings()

    +

    createRestrictPlpDataSettings define extra restriction settings when calling getPlpData

    +

    getPlpData()

    +

    Get the patient level prediction data from the server

    +

    savePlpData()

    +

    Save the cohort data to folder

    +

    loadPlpData()

    +

    Load the cohort data from a folder

    +

    getCohortCovariateData()

    +

    Extracts covariates based on cohorts

    +

    Settings for designing a prediction models

    +

    Design settings required when developing a model.

    +
    +

    createStudyPopulationSettings()

    +

    create the study population settings

    +

    createDefaultSplitSetting()

    +

    Create the settings for defining how the plpData are split into test/validation/train sets using +default splitting functions (either random stratified by outcome, time or subject splitting)

    +

    createSampleSettings()

    +

    Create the settings for defining how the trainData from splitData are sampled using +default sample functions.

    +

    createFeatureEngineeringSettings()

    +

    Create the settings for defining any feature engineering that will be done

    +

    createPreprocessSettings()

    +

    Create the settings for preprocessing the trainData.

    +

    Optional design settings

    +

    Settings for optional steps that can be used in the PLP pipeline

    +
    +

    createCohortCovariateSettings()

    +

    Extracts covariates based on cohorts

    +

    createRandomForestFeatureSelection()

    +

    Create the settings for random foreat based feature selection

    +

    createUnivariateFeatureSelection()

    +

    Create the settings for defining any feature selection that will be done

    +

    createSplineSettings()

    +

    Create the settings for adding a spline for continuous variables

    +

    createStratifiedImputationSettings()

    +

    Create the settings for adding a spline for continuous variables

    +

    External validation

    +

    +
    +

    createValidationDesign()

    +

    createValidationDesign - Define the validation design for external validation

    +

    validateExternal()

    +

    externalValidatePlp - Validate model performance on new data

    +

    createValidationSettings()

    +

    createValidationSettings define optional settings for performing external validation

    +

    recalibratePlp()

    +

    recalibratePlp

    +

    recalibratePlpRefit()

    +

    recalibratePlpRefit

    +

    Execution settings when developing a model

    +

    Execution settings required when developing a model.

    +
    +

    createLogSettings()

    +

    Create the settings for logging the progression of the analysis

    +

    createExecuteSettings()

    +

    Creates list of settings specifying what parts of runPlp to execute

    +

    createDefaultExecuteSettings()

    +

    Creates default list of settings specifying what parts of runPlp to execute

    +

    Binary Classification Models

    +

    Functions for setting binary classifiers and their hyper-parameter search.

    +
    +

    setAdaBoost()

    +

    Create setting for AdaBoost with python DecisionTreeClassifier base estimator

    +

    setDecisionTree()

    +

    Create setting for the scikit-learn 1.0.1 DecisionTree with python

    +

    setGradientBoostingMachine()

    +

    Create setting for gradient boosting machine model using gbm_xgboost implementation

    +

    setKNN()

    +

    Create setting for knn model

    +

    setLassoLogisticRegression()

    +

    Create setting for lasso logistic regression

    +

    setMLP()

    +

    Create setting for neural network model with python

    +

    setNaiveBayes()

    +

    Create setting for naive bayes model with python

    +

    setRandomForest()

    +

    Create setting for random forest model with python (very fast)

    +

    setSVM()

    +

    Create setting for the python sklearn SVM (SVC function)

    +

    setIterativeHardThresholding()

    +

    Create setting for lasso logistic regression

    +

    setLightGBM()

    +

    Create setting for gradient boosting machine model using lightGBM (https://github.com/microsoft/LightGBM/tree/master/R-package).

    +

    Survival Models

    +

    Functions for setting survival models and their hyper-parameter search.

    +
    +

    setCoxModel()

    +

    Create setting for lasso Cox model

    +

    Single Patient-Level Prediction Model

    +

    Functions for training/evaluating/applying a single patient-level-prediction model

    +
    +

    runPlp()

    +

    runPlp - Develop and internally evaluate a model using specified settings

    +

    externalValidateDbPlp()

    +

    externalValidateDbPlp - Validate a model on new databases

    +

    savePlpModel()

    +

    Saves the plp model

    +

    loadPlpModel()

    +

    loads the plp model

    +

    savePlpResult()

    +

    Saves the result from runPlp into the location directory

    +

    loadPlpResult()

    +

    Loads the evalaution dataframe

    +

    diagnosePlp()

    +

    diagnostic - Investigates the prediction problem settings - use before training a model

    +

    Multiple Patient-Level Prediction Models

    +

    Functions for training mutliple patient-level-prediction model in an efficient way.

    +
    +

    createModelDesign()

    +

    Specify settings for deceloping a single model

    +

    runMultiplePlp()

    +

    Run a list of predictions analyses

    +

    validateMultiplePlp()

    +

    externally validate the multiple plp models across new datasets

    +

    savePlpAnalysesJson()

    +

    Save the modelDesignList to a json file

    +

    loadPlpAnalysesJson()

    +

    Load the multiple prediction json settings from a file

    +

    diagnoseMultiplePlp()

    +

    Run a list of predictions diagnoses

    +

    Individual pipeline functions

    +

    Functions for running parts of the PLP workflow

    +
    +

    createStudyPopulation()

    +

    Create a study population

    +

    splitData()

    +

    Split the plpData into test/train sets using a splitting settings of class splitSettings

    +

    preprocessData()

    +

    A function that wraps around FeatureExtraction::tidyCovariateData to normalise the data +and remove rare or redundant features

    +

    fitPlp()

    +

    fitPlp

    +

    predictPlp()

    +

    predictPlp

    +

    evaluatePlp()

    +

    evaluatePlp

    +

    covariateSummary()

    +

    covariateSummary

    +

    Saving results into database

    +

    Functions for saving the prediction model and performances into a database.

    +
    +

    insertResultsToSqlite()

    +

    Create sqlite database with the results

    +

    createPlpResultTables()

    +

    Create the results tables to store PatientLevelPrediction models and results into a database

    +

    addMultipleRunPlpToDatabase()

    +

    Populate the PatientLevelPrediction results tables

    +

    addRunPlpToDatabase()

    +

    Function to add the run plp (development or validation) to database

    +

    createDatabaseSchemaSettings()

    +

    Create the PatientLevelPrediction database result schema settings

    +

    createDatabaseList()

    +

    Create a list with the database details and database meta data entries

    +

    addDiagnosePlpToDatabase()

    +

    Insert a diagnostic result into a PLP result schema database

    +

    addMultipleDiagnosePlpToDatabase()

    +

    Insert mutliple diagnosePlp results saved to a directory into a PLP result schema database

    +

    extractDatabaseToCsv()

    +

    Exports all the results from a database into csv files

    +

    insertCsvToDatabase()

    +

    Function to insert results into a database from csvs

    +

    insertModelDesignInDatabase()

    +

    Insert a model design into a PLP result schema database

    +

    migrateDataModel()

    +

    Migrate Data model

    +

    Shiny Viewers

    +

    Functions for viewing results via a shiny app

    +
    +

    viewPlp()

    +

    viewPlp - Interactively view the performance and model settings

    +

    viewMultiplePlp()

    +

    open a local shiny app for viewing the result of a multiple PLP analyses

    +

    viewDatabaseResultPlp()

    +

    open a local shiny app for viewing the result of a PLP analyses from a database

    +

    Plotting

    +

    Functions for various performance plots

    +
    +

    plotPlp()

    +

    Plot all the PatientLevelPrediction plots

    +

    plotSparseRoc()

    +

    Plot the ROC curve using the sparse thresholdSummary data frame

    +

    plotSmoothCalibration()

    +

    Plot the smooth calibration as detailed in Calster et al. "A calibration heirarchy for risk models +was defined: from utopia to empirical data" (2016)

    +

    plotSparseCalibration()

    +

    Plot the calibration

    +

    plotSparseCalibration2()

    +

    Plot the conventional calibration

    +

    plotDemographicSummary()

    +

    Plot the Observed vs. expected incidence, by age and gender

    +

    plotF1Measure()

    +

    Plot the F1 measure efficiency frontier using the sparse thresholdSummary data frame

    +

    plotGeneralizability()

    +

    Plot the train/test generalizability diagnostic

    +

    plotPrecisionRecall()

    +

    Plot the precision-recall curve using the sparse thresholdSummary data frame

    +

    plotPredictedPDF()

    +

    Plot the Predicted probability density function, showing prediction overlap between true and false cases

    +

    plotPreferencePDF()

    +

    Plot the preference score probability density function, showing prediction overlap between true and false cases +#'

    +

    plotPredictionDistribution()

    +

    Plot the side-by-side boxplots of prediction distribution, by class#'

    +

    plotVariableScatterplot()

    +

    Plot the variable importance scatterplot

    +

    outcomeSurvivalPlot()

    +

    Plot the outcome incidence over time

    +

    Learning Curves

    +

    Functions for creating and plotting learning curves

    +
    +

    createLearningCurve()

    +

    createLearningCurve

    +

    plotLearningCurve()

    +

    plotLearningCurve

    +

    Simulation

    +

    Functions for simulating cohort method data objects.

    +
    +

    simulatePlpData()

    +

    Generate simulated data

    +

    plpDataSimulationProfile

    +

    A simulation profile

    +

    Data manipulation functions

    +

    Functions for manipulating data

    +
    +

    toSparseM()

    +

    Convert the plpData in COO format into a sparse R matrix

    +

    MapIds()

    +

    Map covariate and row Ids so they start from 1

    +

    Helper/utility functions

    +

    +
    +

    listAppend()

    +

    join two lists

    +

    listCartesian()

    +

    Cartesian product

    +

    createTempModelLoc()

    +

    Create a temporary model location

    +

    configurePython()

    +

    Sets up a virtual environment to use for PLP (can be conda or python)

    +

    setPythonEnvironment()

    +

    Use the virtual environment created using configurePython()

    +

    Evaluation measures

    +

    +
    +

    accuracy()

    +

    Calculate the accuracy

    +

    averagePrecision()

    +

    Calculate the average precision

    +

    brierScore()

    +

    brierScore

    +

    calibrationLine()

    +

    calibrationLine

    +

    computeAuc()

    +

    Compute the area under the ROC curve

    +

    f1Score()

    +

    Calculate the f1Score

    +

    falseDiscoveryRate()

    +

    Calculate the falseDiscoveryRate

    +

    falseNegativeRate()

    +

    Calculate the falseNegativeRate

    +

    falseOmissionRate()

    +

    Calculate the falseOmissionRate

    +

    falsePositiveRate()

    +

    Calculate the falsePositiveRate

    +

    ici()

    +

    Calculate the Integrated Calibration Information from Austin and Steyerberg +https://onlinelibrary.wiley.com/doi/full/10.1002/sim.8281

    +

    modelBasedConcordance()

    +

    Calculate the model-based concordance, which is a calculation of the expected discrimination performance of a model under the assumption the model predicts the "TRUE" outcome +as detailed in van Klaveren et al. https://pubmed.ncbi.nlm.nih.gov/27251001/

    +

    negativeLikelihoodRatio()

    +

    Calculate the negativeLikelihoodRatio

    +

    negativePredictiveValue()

    +

    Calculate the negativePredictiveValue

    +

    positiveLikelihoodRatio()

    +

    Calculate the positiveLikelihoodRatio

    +

    positivePredictiveValue()

    +

    Calculate the positivePredictiveValue

    +

    sensitivity()

    +

    Calculate the sensitivity

    +

    specificity()

    +

    Calculate the specificity

    +

    computeGridPerformance()

    +

    Computes grid performance with a specified performance function

    +

    diagnosticOddsRatio()

    +

    Calculate the diagnostic odds ratio

    +

    getCalibrationSummary()

    +

    Get a sparse summary of the calibration

    +

    getDemographicSummary()

    +

    Get a calibration per age/gender groups

    +

    getThresholdSummary()

    +

    Calculate all measures for sparse ROC

    +

    getThresholdSummary_binary()

    +

    Calculate all measures for sparse ROC when prediction is bianry classification

    +

    getPredictionDistribution()

    +

    Calculates the prediction distribution

    +

    getPredictionDistribution_binary()

    +

    Calculates the prediction distribution

    +

    Saving/loading models as json

    +

    Functions for saving or loading models as json

    +
    +

    sklearnFromJson()

    +

    Loads sklearn python model from json

    +

    sklearnToJson()

    +

    Saves sklearn python model object to json in path

    +

    Load/save for sharing

    +

    Functions for loading/saving objects for sharing

    +
    +

    savePlpShareable()

    +

    Save the plp result as json files and csv files for transparent sharing

    +

    loadPlpShareable()

    +

    Loads the plp result saved as json/csv files for transparent sharing

    +

    loadPrediction()

    +

    Loads the prediciton dataframe to csv

    +

    savePrediction()

    +

    Saves the prediction dataframe to RDS

    +

    Feature importance

    +

    +
    +

    pfi()

    +

    pfi

    +

    Other functions

    +

    +
    +

    predictCyclops()

    +

    Create predictive probabilities

    + + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/insertCsvToDatabase.html b/docs/reference/insertCsvToDatabase.html new file mode 100644 index 000000000..80fd8e6cf --- /dev/null +++ b/docs/reference/insertCsvToDatabase.html @@ -0,0 +1,200 @@ + +Function to insert results into a database from csvs — insertCsvToDatabase • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    This function converts a folder with csv results into plp objects and loads them into a plp result database

    +
    + +
    +
    insertCsvToDatabase(
    +  csvFolder,
    +  connectionDetails,
    +  databaseSchemaSettings,
    +  modelSaveLocation,
    +  csvTableAppend = ""
    +)
    +
    + +
    +

    Arguments

    +
    csvFolder
    +

    The location to the csv folder with the plp results

    + + +
    connectionDetails
    +

    A connection details for the plp results database that the csv results will be inserted into

    + + +
    databaseSchemaSettings
    +

    A object created by createDatabaseSchemaSettings with all the settings specifying the result tables to insert the csv results into

    + + +
    modelSaveLocation
    +

    The location to save any models from the csv folder - this should be the same location you picked when inserting other models into the database

    + + +
    csvTableAppend
    +

    A string that appends the csv file names

    + +
    +
    +

    Value

    + + +

    Returns a data.frame indicating whether the results were inported into the database

    +
    +
    +

    Details

    +

    The user needs to have plp csv results in a single folder and an existing plp result database

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/insertModelDesignInDatabase.html b/docs/reference/insertModelDesignInDatabase.html new file mode 100644 index 000000000..a69c6aa60 --- /dev/null +++ b/docs/reference/insertModelDesignInDatabase.html @@ -0,0 +1,197 @@ + +Insert a model design into a PLP result schema database — insertModelDesignInDatabase • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    This function inserts a model design and all the settings into the result schema

    +
    + +
    +
    insertModelDesignInDatabase(
    +  object,
    +  conn,
    +  databaseSchemaSettings,
    +  cohortDefinitions
    +)
    +
    + +
    +

    Arguments

    +
    object
    +

    An object of class modelDesign, runPlp or externalValidatePlp

    + + +
    conn
    +

    A connection to a database created by using the +function connect in the +DatabaseConnector package.

    + + +
    databaseSchemaSettings
    +

    A object created by createDatabaseSchemaSettings with all the settings specifying the result tables

    + + +
    cohortDefinitions
    +

    A set of one or more cohorts extracted using ROhdsiWebApi::exportCohortDefinitionSet()

    + +
    +
    +

    Value

    + + +

    Returns NULL but uploads the model design into the database schema specified in databaseSchemaSettings

    +
    +
    +

    Details

    +

    This function can be used to upload a model design into a database

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/insertResultsToSqlite.html b/docs/reference/insertResultsToSqlite.html new file mode 100644 index 000000000..1fe355a82 --- /dev/null +++ b/docs/reference/insertResultsToSqlite.html @@ -0,0 +1,195 @@ + +Create sqlite database with the results — insertResultsToSqlite • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    This function create an sqlite database with the PLP result schema and inserts all results

    +
    + +
    +
    insertResultsToSqlite(
    +  resultLocation,
    +  cohortDefinitions,
    +  databaseList = NULL,
    +  sqliteLocation = file.path(resultLocation, "sqlite")
    +)
    +
    + +
    +

    Arguments

    +
    resultLocation
    +

    (string) location of directory where the main package results were saved

    + + +
    cohortDefinitions
    +

    A set of one or more cohorts extracted using ROhdsiWebApi::exportCohortDefinitionSet()

    + + +
    databaseList
    +

    A list created by createDatabaseList to specify the databases

    + + +
    sqliteLocation
    +

    (string) location of directory where the sqlite database will be saved

    + +
    +
    +

    Value

    + + +

    Returns the location of the sqlite database file

    +
    +
    +

    Details

    +

    This function can be used upload PatientLevelPrediction results into an sqlite database

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/listAppend.html b/docs/reference/listAppend.html new file mode 100644 index 000000000..2f370ad63 --- /dev/null +++ b/docs/reference/listAppend.html @@ -0,0 +1,176 @@ + +join two lists — listAppend • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    join two lists

    +
    + +
    +
    listAppend(a, b)
    +
    + +
    +

    Arguments

    +
    a
    +

    A list

    + + +
    b
    +

    Another list

    + +
    +
    +

    Details

    +

    This function joins two lists

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/listCartesian.html b/docs/reference/listCartesian.html new file mode 100644 index 000000000..5c5c573b7 --- /dev/null +++ b/docs/reference/listCartesian.html @@ -0,0 +1,174 @@ + +Cartesian product — listCartesian • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Computes the Cartesian product of all the combinations of elements in a list

    +
    + +
    +
    listCartesian(allList)
    +
    + +
    +

    Arguments

    +
    allList
    +

    a list of lists

    + +
    +
    +

    Value

    + + +

    A list with all possible combinations from the input list of lists

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/loadPlpAnalysesJson.html b/docs/reference/loadPlpAnalysesJson.html new file mode 100644 index 000000000..4d1a0a0cc --- /dev/null +++ b/docs/reference/loadPlpAnalysesJson.html @@ -0,0 +1,181 @@ + +Load the multiple prediction json settings from a file — loadPlpAnalysesJson • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Load the multiple prediction json settings from a file

    +
    + +
    +
    loadPlpAnalysesJson(jsonFileLocation)
    +
    + +
    +

    Arguments

    +
    jsonFileLocation
    +

    The location of the file 'predictionAnalysisList.json' with the modelDesignList

    + +
    +
    +

    Details

    +

    This function interprets a json with the multiple prediction settings and creates a list +that can be combined with connection settings to run a multiple prediction study

    +
    + +
    +

    Examples

    +
    if (FALSE) {
    +modelDesignList <- loadPlpAnalysesJson('location of json settings')$analysis
    +}
    +
    +
    +
    +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/loadPlpData.html b/docs/reference/loadPlpData.html new file mode 100644 index 000000000..ab34cac18 --- /dev/null +++ b/docs/reference/loadPlpData.html @@ -0,0 +1,191 @@ + +Load the cohort data from a folder — loadPlpData • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    loadPlpData loads an object of type plpData from a folder in the file +system.

    +
    + +
    +
    loadPlpData(file, readOnly = TRUE)
    +
    + +
    +

    Arguments

    +
    file
    +

    The name of the folder containing the data.

    + + +
    readOnly
    +

    If true, the data is opened read only.

    + +
    +
    +

    Value

    + + +

    An object of class plpData.

    +
    +
    +

    Details

    +

    The data will be written to a set of files in the folder specified by the user.

    +
    + +
    +

    Examples

    +
    # todo
    +
    +
    +
    +
    +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/loadPlpModel.html b/docs/reference/loadPlpModel.html new file mode 100644 index 000000000..3248a457e --- /dev/null +++ b/docs/reference/loadPlpModel.html @@ -0,0 +1,172 @@ + +loads the plp model — loadPlpModel • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    loads the plp model

    +
    + +
    +
    loadPlpModel(dirPath)
    +
    + +
    +

    Arguments

    +
    dirPath
    +

    The location of the model

    + +
    +
    +

    Details

    +

    Loads a plp model that was saved using savePlpModel()

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/loadPlpResult.html b/docs/reference/loadPlpResult.html new file mode 100644 index 000000000..25327ba94 --- /dev/null +++ b/docs/reference/loadPlpResult.html @@ -0,0 +1,172 @@ + +Loads the evalaution dataframe — loadPlpResult • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Loads the evalaution dataframe

    +
    + +
    +
    loadPlpResult(dirPath)
    +
    + +
    +

    Arguments

    +
    dirPath
    +

    The directory where the evaluation was saved

    + +
    +
    +

    Details

    +

    Loads the evaluation

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/loadPlpShareable.html b/docs/reference/loadPlpShareable.html new file mode 100644 index 000000000..a69479f3d --- /dev/null +++ b/docs/reference/loadPlpShareable.html @@ -0,0 +1,172 @@ + +Loads the plp result saved as json/csv files for transparent sharing — loadPlpShareable • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Loads the plp result saved as json/csv files for transparent sharing

    +
    + +
    +
    loadPlpShareable(loadDirectory)
    +
    + +
    +

    Arguments

    +
    loadDirectory
    +

    The directory with the results as json/csv files

    + +
    +
    +

    Details

    +

    Load the main results from json/csv files into a runPlp object

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/loadPrediction.html b/docs/reference/loadPrediction.html new file mode 100644 index 000000000..a4715f9c5 --- /dev/null +++ b/docs/reference/loadPrediction.html @@ -0,0 +1,172 @@ + +Loads the prediciton dataframe to csv — loadPrediction • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Loads the prediciton dataframe to csv

    +
    + +
    +
    loadPrediction(fileLocation)
    +
    + +
    +

    Arguments

    +
    fileLocation
    +

    The location with the saved prediction

    + +
    +
    +

    Details

    +

    Loads the prediciton RDS file

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/migrateDataModel.html b/docs/reference/migrateDataModel.html new file mode 100644 index 000000000..5810fedca --- /dev/null +++ b/docs/reference/migrateDataModel.html @@ -0,0 +1,180 @@ + +Migrate Data model — migrateDataModel • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Migrate data from current state to next state

    +

    It is strongly advised that you have a backup of all data (either sqlite files, a backup database (in the case you +are using a postgres backend) or have kept the csv/zip files from your data generation.

    +
    + +
    +
    migrateDataModel(connectionDetails, databaseSchema, tablePrefix = "")
    +
    + +
    +

    Arguments

    +
    connectionDetails
    +

    DatabaseConnector connection details object

    + + +
    databaseSchema
    +

    String schema where database schema lives

    + + +
    tablePrefix
    +

    (Optional) Use if a table prefix is used before table names (e.g. "cd_")

    + +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/modelBasedConcordance.html b/docs/reference/modelBasedConcordance.html new file mode 100644 index 000000000..b18d502df --- /dev/null +++ b/docs/reference/modelBasedConcordance.html @@ -0,0 +1,182 @@ + +Calculate the model-based concordance, which is a calculation of the expected discrimination performance of a model under the assumption the model predicts the "TRUE" outcome +as detailed in van Klaveren et al. https://pubmed.ncbi.nlm.nih.gov/27251001/ — modelBasedConcordance • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Calculate the model-based concordance, which is a calculation of the expected discrimination performance of a model under the assumption the model predicts the "TRUE" outcome +as detailed in van Klaveren et al. https://pubmed.ncbi.nlm.nih.gov/27251001/

    +
    + +
    +
    modelBasedConcordance(prediction)
    +
    + +
    +

    Arguments

    +
    prediction
    +

    the prediction object found in the plpResult object

    + +
    +
    +

    Value

    + + +

    model-based concordance value

    +
    +
    +

    Details

    +

    Calculate the model-based concordance

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/negativeLikelihoodRatio.html b/docs/reference/negativeLikelihoodRatio.html new file mode 100644 index 000000000..a1fa5b4f2 --- /dev/null +++ b/docs/reference/negativeLikelihoodRatio.html @@ -0,0 +1,190 @@ + +Calculate the negativeLikelihoodRatio — negativeLikelihoodRatio • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Calculate the negativeLikelihoodRatio

    +
    + +
    +
    negativeLikelihoodRatio(TP, TN, FN, FP)
    +
    + +
    +

    Arguments

    +
    TP
    +

    Number of true positives

    + + +
    TN
    +

    Number of true negatives

    + + +
    FN
    +

    Number of false negatives

    + + +
    FP
    +

    Number of false positives

    + +
    +
    +

    Value

    + + +

    negativeLikelihoodRatio value

    +
    +
    +

    Details

    +

    Calculate the negativeLikelihoodRatio

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/negativePredictiveValue.html b/docs/reference/negativePredictiveValue.html new file mode 100644 index 000000000..816870bb3 --- /dev/null +++ b/docs/reference/negativePredictiveValue.html @@ -0,0 +1,190 @@ + +Calculate the negativePredictiveValue — negativePredictiveValue • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Calculate the negativePredictiveValue

    +
    + +
    +
    negativePredictiveValue(TP, TN, FN, FP)
    +
    + +
    +

    Arguments

    +
    TP
    +

    Number of true positives

    + + +
    TN
    +

    Number of true negatives

    + + +
    FN
    +

    Number of false negatives

    + + +
    FP
    +

    Number of false positives

    + +
    +
    +

    Value

    + + +

    negativePredictiveValue value

    +
    +
    +

    Details

    +

    Calculate the negativePredictiveValue

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/outcomeSurvivalPlot.html b/docs/reference/outcomeSurvivalPlot.html new file mode 100644 index 000000000..fc364c7e5 --- /dev/null +++ b/docs/reference/outcomeSurvivalPlot.html @@ -0,0 +1,208 @@ + +Plot the outcome incidence over time — outcomeSurvivalPlot • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Plot the outcome incidence over time

    +
    + +
    +
    outcomeSurvivalPlot(
    +  plpData,
    +  outcomeId,
    +  populationSettings = createStudyPopulationSettings(binary = T, includeAllOutcomes = T,
    +    firstExposureOnly = FALSE, washoutPeriod = 0, removeSubjectsWithPriorOutcome = TRUE,
    +    priorOutcomeLookback = 99999, requireTimeAtRisk = F, riskWindowStart = 1, startAnchor
    +    = "cohort start", riskWindowEnd = 3650, endAnchor = "cohort start"),
    +  riskTable = T,
    +  confInt = T,
    +  yLabel = "Fraction of those who are outcome free in target population"
    +)
    +
    + +
    +

    Arguments

    +
    plpData
    +

    The plpData object returned by running getPlpData()

    + + +
    outcomeId
    +

    The cohort id corresponding to the outcome

    + + +
    populationSettings
    +

    The population settings created using createStudyPopulationSettings

    + + +
    riskTable
    +

    (binary) Whether to include a table at the bottom of the plot showing the number of people at risk over time

    + + +
    confInt
    +

    (binary) Whether to include a confidence interval

    + + +
    yLabel
    +

    (string) The label for the y-axis

    + +
    +
    +

    Value

    + + +

    TRUE if it ran

    +
    +
    +

    Details

    +

    This creates a survival plot that can be used to pick a suitable time-at-risk period

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/pfi.html b/docs/reference/pfi.html new file mode 100644 index 000000000..8a039c50a --- /dev/null +++ b/docs/reference/pfi.html @@ -0,0 +1,216 @@ + +pfi — pfi • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Calculate the permutation feature importance for a PLP model.

    +
    + +
    +
    pfi(
    +  plpResult,
    +  population,
    +  plpData,
    +  repeats = 1,
    +  covariates = NULL,
    +  cores = NULL,
    +  log = NULL,
    +  logthreshold = "INFO"
    +)
    +
    + +
    +

    Arguments

    +
    plpResult
    +

    An object of type runPlp

    + + +
    population
    +

    The population created using createStudyPopulation() who will have their risks predicted

    + + +
    plpData
    +

    An object of type plpData - the patient level prediction +data extracted from the CDM.

    + + +
    repeats
    +

    The number of times to permute each covariate

    + + +
    covariates
    +

    A vector of covariates to calculate the pfi for. If NULL it uses all covariates included in the model.

    + + +
    cores
    +

    Number of cores to use when running this (it runs in parallel)

    + + +
    log
    +

    A location to save the log for running pfi

    + + +
    logthreshold
    +

    The log threshold (e.g., INFO, TRACE, ...)

    + +
    +
    +

    Value

    + + +

    A dataframe with the covariateIds and the pfi (change in AUC caused by permuting the covariate) value

    +
    +
    +

    Details

    +

    The function permutes the each covariate/features <repeats> times and calculates the mean AUC change caused by the permutation.

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/plotDemographicSummary.html b/docs/reference/plotDemographicSummary.html new file mode 100644 index 000000000..8756040c8 --- /dev/null +++ b/docs/reference/plotDemographicSummary.html @@ -0,0 +1,199 @@ + +Plot the Observed vs. expected incidence, by age and gender — plotDemographicSummary • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Plot the Observed vs. expected incidence, by age and gender

    +
    + +
    +
    plotDemographicSummary(
    +  plpResult,
    +  typeColumn = "evaluation",
    +  saveLocation = NULL,
    +  fileName = "roc.png"
    +)
    +
    + +
    +

    Arguments

    +
    plpResult
    +

    A plp result object as generated using the runPlp function.

    + + +
    typeColumn
    +

    The name of the column specifying the evaluation type

    + + +
    saveLocation
    +

    Directory to save plot (if NULL plot is not saved)

    + + +
    fileName
    +

    Name of the file to save to plot, for example +'plot.png'. See the function ggsave in the ggplot2 package for +supported file formats.

    + +
    +
    +

    Value

    + + +

    A ggplot object. Use the ggsave function to save to file in a different +format.

    +
    +
    +

    Details

    +

    Create a plot showing the Observed vs. expected incidence, by age and gender +#'

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/plotF1Measure.html b/docs/reference/plotF1Measure.html new file mode 100644 index 000000000..aec0a7eae --- /dev/null +++ b/docs/reference/plotF1Measure.html @@ -0,0 +1,198 @@ + +Plot the F1 measure efficiency frontier using the sparse thresholdSummary data frame — plotF1Measure • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Plot the F1 measure efficiency frontier using the sparse thresholdSummary data frame

    +
    + +
    +
    plotF1Measure(
    +  plpResult,
    +  typeColumn = "evaluation",
    +  saveLocation = NULL,
    +  fileName = "roc.png"
    +)
    +
    + +
    +

    Arguments

    +
    plpResult
    +

    A plp result object as generated using the runPlp function.

    + + +
    typeColumn
    +

    The name of the column specifying the evaluation type

    + + +
    saveLocation
    +

    Directory to save plot (if NULL plot is not saved)

    + + +
    fileName
    +

    Name of the file to save to plot, for example +'plot.png'. See the function ggsave in the ggplot2 package for +supported file formats.

    + +
    +
    +

    Value

    + + +

    A ggplot object. Use the ggsave function to save to file in a different +format.

    +
    +
    +

    Details

    +

    Create a plot showing the F1 measure efficiency frontier using the sparse thresholdSummary data frame

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/plotGeneralizability.html b/docs/reference/plotGeneralizability.html new file mode 100644 index 000000000..f91a77613 --- /dev/null +++ b/docs/reference/plotGeneralizability.html @@ -0,0 +1,195 @@ + +Plot the train/test generalizability diagnostic — plotGeneralizability • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Plot the train/test generalizability diagnostic

    +
    + +
    +
    plotGeneralizability(
    +  covariateSummary,
    +  saveLocation = NULL,
    +  fileName = "Generalizability.png"
    +)
    +
    + +
    +

    Arguments

    +
    covariateSummary
    +

    A prediction object as generated using the +runPlp function.

    + + +
    saveLocation
    +

    Directory to save plot (if NULL plot is not saved)

    + + +
    fileName
    +

    Name of the file to save to plot, for example +'plot.png'. See the function ggsave in the ggplot2 package for +supported file formats.

    + +
    +
    +

    Value

    + + +

    A ggplot object. Use the ggsave function to save to file in a different +format.

    +
    +
    +

    Details

    +

    Create a plot showing the train/test generalizability diagnostic +#'

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/plotLearningCurve.html b/docs/reference/plotLearningCurve.html new file mode 100644 index 000000000..803ab6e4f --- /dev/null +++ b/docs/reference/plotLearningCurve.html @@ -0,0 +1,226 @@ + +plotLearningCurve — plotLearningCurve • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Create a plot of the learning curve using the object returned +from createLearningCurve.

    +
    + +
    +
    plotLearningCurve(
    +  learningCurve,
    +  metric = "AUROC",
    +  abscissa = "events",
    +  plotTitle = "Learning Curve",
    +  plotSubtitle = NULL,
    +  fileName = NULL
    +)
    +
    + +
    +

    Arguments

    +
    learningCurve
    +

    An object returned by createLearningCurve +function.

    + + +
    metric
    +

    Specifies the metric to be plotted:

    • 'AUROC' - use the area under the Receiver Operating + Characteristic curve

    • +
    • 'AUPRC' - use the area under the Precision-Recall curve

    • +
    • 'sBrier' - use the scaled Brier score

    • +
    + + +
    abscissa
    +

    Specify the abscissa metric to be plotted:

    • 'events' - use number of events

    • +
    • 'observations' - use number of observations

    • +
    + + +
    plotTitle
    +

    Title of the learning curve plot.

    + + +
    plotSubtitle
    +

    Subtitle of the learning curve plot.

    + + +
    fileName
    +

    Filename of plot to be saved, for example 'plot.png'. +See the function ggsave in the ggplot2 package for supported file +formats.

    + +
    +
    +

    Value

    + + +

    A ggplot object. Use the ggsave function to save to +file in a different format.

    +
    + +
    +

    Examples

    +
    if (FALSE) {
    +# create learning curve object
    +learningCurve <- createLearningCurve(population,
    +                                     plpData,
    +                                     modelSettings)
    +# plot the learning curve
    +plotLearningCurve(learningCurve)
    +}
    +
    +
    +
    +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/plotPlp.html b/docs/reference/plotPlp.html new file mode 100644 index 000000000..0ae53ad83 --- /dev/null +++ b/docs/reference/plotPlp.html @@ -0,0 +1,187 @@ + +Plot all the PatientLevelPrediction plots — plotPlp • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Plot all the PatientLevelPrediction plots

    +
    + +
    +
    plotPlp(plpResult, saveLocation = NULL, typeColumn = "evaluation")
    +
    + +
    +

    Arguments

    +
    plpResult
    +

    Object returned by the runPlp() function

    + + +
    saveLocation
    +

    Name of the directory where the plots should be saved (NULL means no saving)

    + + +
    typeColumn
    +

    The name of the column specifying the evaluation type +(to stratify the plots)

    + +
    +
    +

    Value

    + + +

    TRUE if it ran

    +
    +
    +

    Details

    +

    Create a directory with all the plots

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/plotPrecisionRecall.html b/docs/reference/plotPrecisionRecall.html new file mode 100644 index 000000000..9296f567e --- /dev/null +++ b/docs/reference/plotPrecisionRecall.html @@ -0,0 +1,198 @@ + +Plot the precision-recall curve using the sparse thresholdSummary data frame — plotPrecisionRecall • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Plot the precision-recall curve using the sparse thresholdSummary data frame

    +
    + +
    +
    plotPrecisionRecall(
    +  plpResult,
    +  typeColumn = "evaluation",
    +  saveLocation = NULL,
    +  fileName = "roc.png"
    +)
    +
    + +
    +

    Arguments

    +
    plpResult
    +

    A plp result object as generated using the runPlp function.

    + + +
    typeColumn
    +

    The name of the column specifying the evaluation type

    + + +
    saveLocation
    +

    Directory to save plot (if NULL plot is not saved)

    + + +
    fileName
    +

    Name of the file to save to plot, for example +'plot.png'. See the function ggsave in the ggplot2 package for +supported file formats.

    + +
    +
    +

    Value

    + + +

    A ggplot object. Use the ggsave function to save to file in a different +format.

    +
    +
    +

    Details

    +

    Create a plot showing the precision-recall curve using the sparse thresholdSummary data frame

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/plotPredictedPDF.html b/docs/reference/plotPredictedPDF.html new file mode 100644 index 000000000..f95ad0bc3 --- /dev/null +++ b/docs/reference/plotPredictedPDF.html @@ -0,0 +1,198 @@ + +Plot the Predicted probability density function, showing prediction overlap between true and false cases — plotPredictedPDF • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Plot the Predicted probability density function, showing prediction overlap between true and false cases

    +
    + +
    +
    plotPredictedPDF(
    +  plpResult,
    +  typeColumn = "evaluation",
    +  saveLocation = NULL,
    +  fileName = "PredictedPDF.png"
    +)
    +
    + +
    +

    Arguments

    +
    plpResult
    +

    A plp result object as generated using the runPlp function.

    + + +
    typeColumn
    +

    The name of the column specifying the evaluation type

    + + +
    saveLocation
    +

    Directory to save plot (if NULL plot is not saved)

    + + +
    fileName
    +

    Name of the file to save to plot, for example +'plot.png'. See the function ggsave in the ggplot2 package for +supported file formats.

    + +
    +
    +

    Value

    + + +

    A ggplot object. Use the ggsave function to save to file in a different +format.

    +
    +
    +

    Details

    +

    Create a plot showing the predicted probability density function, showing prediction overlap between true and false cases

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/plotPredictionDistribution.html b/docs/reference/plotPredictionDistribution.html new file mode 100644 index 000000000..f39445b5c --- /dev/null +++ b/docs/reference/plotPredictionDistribution.html @@ -0,0 +1,199 @@ + +Plot the side-by-side boxplots of prediction distribution, by class#' — plotPredictionDistribution • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Plot the side-by-side boxplots of prediction distribution, by class#'

    +
    + +
    +
    plotPredictionDistribution(
    +  plpResult,
    +  typeColumn = "evaluation",
    +  saveLocation = NULL,
    +  fileName = "PredictionDistribution.png"
    +)
    +
    + +
    +

    Arguments

    +
    plpResult
    +

    A plp result object as generated using the runPlp function.

    + + +
    typeColumn
    +

    The name of the column specifying the evaluation type

    + + +
    saveLocation
    +

    Directory to save plot (if NULL plot is not saved)

    + + +
    fileName
    +

    Name of the file to save to plot, for example +'plot.png'. See the function ggsave in the ggplot2 package for +supported file formats.

    + +
    +
    +

    Value

    + + +

    A ggplot object. Use the ggsave function to save to file in a different +format.

    +
    +
    +

    Details

    +

    Create a plot showing the side-by-side boxplots of prediction distribution, by class +#'

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/plotPreferencePDF.html b/docs/reference/plotPreferencePDF.html new file mode 100644 index 000000000..8b26d68a7 --- /dev/null +++ b/docs/reference/plotPreferencePDF.html @@ -0,0 +1,204 @@ + +Plot the preference score probability density function, showing prediction overlap between true and false cases +#' — plotPreferencePDF • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Plot the preference score probability density function, showing prediction overlap between true and false cases +#'

    +
    + +
    +
    plotPreferencePDF(
    +  plpResult,
    +  typeColumn = "evaluation",
    +  saveLocation = NULL,
    +  fileName = "plotPreferencePDF.png"
    +)
    +
    + +
    +

    Arguments

    +
    plpResult
    +

    A plp result object as generated using the runPlp function.

    + + +
    typeColumn
    +

    The name of the column specifying the evaluation type

    + + +
    saveLocation
    +

    Directory to save plot (if NULL plot is not saved)

    + + +
    fileName
    +

    Name of the file to save to plot, for example +'plot.png'. See the function ggsave in the ggplot2 package for +supported file formats.

    + +
    +
    +

    Value

    + + +

    A ggplot object. Use the ggsave function to save to file in a different +format.

    +
    +
    +

    Details

    +

    Create a plot showing the preference score probability density function, showing prediction overlap between true and false cases +#'

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/plotSmoothCalibration.html b/docs/reference/plotSmoothCalibration.html new file mode 100644 index 000000000..bb6bd7cd4 --- /dev/null +++ b/docs/reference/plotSmoothCalibration.html @@ -0,0 +1,235 @@ + +Plot the smooth calibration as detailed in Calster et al. "A calibration heirarchy for risk models +was defined: from utopia to empirical data" (2016) — plotSmoothCalibration • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Plot the smooth calibration as detailed in Calster et al. "A calibration heirarchy for risk models +was defined: from utopia to empirical data" (2016)

    +
    + +
    +
    plotSmoothCalibration(
    +  plpResult,
    +  smooth = "loess",
    +  span = 0.75,
    +  nKnots = 5,
    +  scatter = FALSE,
    +  bins = 20,
    +  sample = TRUE,
    +  typeColumn = "evaluation",
    +  saveLocation = NULL,
    +  fileName = "smoothCalibration.pdf"
    +)
    +
    + +
    +

    Arguments

    +
    plpResult
    +

    The result of running runPlp function. An object containing the +model or location where the model is save, the data selection settings, the +preprocessing and training settings as well as various performance measures +obtained by the model.

    + + +
    smooth
    +

    options: 'loess' or 'rcs'

    + + +
    span
    +

    This specifies the width of span used for loess. This will allow for faster +computing and lower memory usage.

    + + +
    nKnots
    +

    The number of knots to be used by the rcs evaluation. Default is 5

    + + +
    scatter
    +

    plot the decile calibrations as points on the graph. Default is False

    + + +
    bins
    +

    The number of bins for the histogram. Default is 20.

    + + +
    sample
    +

    If using loess then by default 20,000 patients will be sampled to save time

    + + +
    typeColumn
    +

    The name of the column specifying the evaluation type

    + + +
    saveLocation
    +

    Directory to save plot (if NULL plot is not saved)

    + + +
    fileName
    +

    Name of the file to save to plot, for example +'plot.png'. See the function ggsave in the ggplot2 package for +supported file formats.

    + +
    +
    +

    Value

    + + +

    A ggplot object.

    +
    +
    +

    Details

    +

    Create a plot showing the smoothed calibration #'

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/plotSparseCalibration.html b/docs/reference/plotSparseCalibration.html new file mode 100644 index 000000000..cb1bc73d3 --- /dev/null +++ b/docs/reference/plotSparseCalibration.html @@ -0,0 +1,199 @@ + +Plot the calibration — plotSparseCalibration • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Plot the calibration

    +
    + +
    +
    plotSparseCalibration(
    +  plpResult,
    +  typeColumn = "evaluation",
    +  saveLocation = NULL,
    +  fileName = "roc.png"
    +)
    +
    + +
    +

    Arguments

    +
    plpResult
    +

    A plp result object as generated using the runPlp function.

    + + +
    typeColumn
    +

    The name of the column specifying the evaluation type

    + + +
    saveLocation
    +

    Directory to save plot (if NULL plot is not saved)

    + + +
    fileName
    +

    Name of the file to save to plot, for example +'plot.png'. See the function ggsave in the ggplot2 package for +supported file formats.

    + +
    +
    +

    Value

    + + +

    A ggplot object. Use the ggsave function to save to file in a different +format.

    +
    +
    +

    Details

    +

    Create a plot showing the calibration +#'

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/plotSparseCalibration2.html b/docs/reference/plotSparseCalibration2.html new file mode 100644 index 000000000..34929c48a --- /dev/null +++ b/docs/reference/plotSparseCalibration2.html @@ -0,0 +1,199 @@ + +Plot the conventional calibration — plotSparseCalibration2 • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Plot the conventional calibration

    +
    + +
    +
    plotSparseCalibration2(
    +  plpResult,
    +  typeColumn = "evaluation",
    +  saveLocation = NULL,
    +  fileName = "roc.png"
    +)
    +
    + +
    +

    Arguments

    +
    plpResult
    +

    A plp result object as generated using the runPlp function.

    + + +
    typeColumn
    +

    The name of the column specifying the evaluation type

    + + +
    saveLocation
    +

    Directory to save plot (if NULL plot is not saved)

    + + +
    fileName
    +

    Name of the file to save to plot, for example +'plot.png'. See the function ggsave in the ggplot2 package for +supported file formats.

    + +
    +
    +

    Value

    + + +

    A ggplot object. Use the ggsave function to save to file in a different +format.

    +
    +
    +

    Details

    +

    Create a plot showing the calibration +#'

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/plotSparseRoc.html b/docs/reference/plotSparseRoc.html new file mode 100644 index 000000000..37ef3c135 --- /dev/null +++ b/docs/reference/plotSparseRoc.html @@ -0,0 +1,198 @@ + +Plot the ROC curve using the sparse thresholdSummary data frame — plotSparseRoc • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Plot the ROC curve using the sparse thresholdSummary data frame

    +
    + +
    +
    plotSparseRoc(
    +  plpResult,
    +  typeColumn = "evaluation",
    +  saveLocation = NULL,
    +  fileName = "roc.png"
    +)
    +
    + +
    +

    Arguments

    +
    plpResult
    +

    A plp result object as generated using the runPlp function.

    + + +
    typeColumn
    +

    The name of the column specifying the evaluation type

    + + +
    saveLocation
    +

    Directory to save plot (if NULL plot is not saved)

    + + +
    fileName
    +

    Name of the file to save to plot, for example +'plot.png'. See the function ggsave in the ggplot2 package for +supported file formats.

    + +
    +
    +

    Value

    + + +

    A ggplot object. Use the ggsave function to save to file in a different +format.

    +
    +
    +

    Details

    +

    Create a plot showing the Receiver Operator Characteristics (ROC) curve.

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/plotVariableScatterplot.html b/docs/reference/plotVariableScatterplot.html new file mode 100644 index 000000000..752c4e840 --- /dev/null +++ b/docs/reference/plotVariableScatterplot.html @@ -0,0 +1,195 @@ + +Plot the variable importance scatterplot — plotVariableScatterplot • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Plot the variable importance scatterplot

    +
    + +
    +
    plotVariableScatterplot(
    +  covariateSummary,
    +  saveLocation = NULL,
    +  fileName = "VariableScatterplot.png"
    +)
    +
    + +
    +

    Arguments

    +
    covariateSummary
    +

    A prediction object as generated using the +runPlp function.

    + + +
    saveLocation
    +

    Directory to save plot (if NULL plot is not saved)

    + + +
    fileName
    +

    Name of the file to save to plot, for example +'plot.png'. See the function ggsave in the ggplot2 package for +supported file formats.

    + +
    +
    +

    Value

    + + +

    A ggplot object. Use the ggsave function to save to file in a different +format.

    +
    +
    +

    Details

    +

    Create a plot showing the variable importance scatterplot +#'

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/plpDataSimulationProfile.html b/docs/reference/plpDataSimulationProfile.html new file mode 100644 index 000000000..cdf677259 --- /dev/null +++ b/docs/reference/plpDataSimulationProfile.html @@ -0,0 +1,184 @@ + +A simulation profile — plpDataSimulationProfile • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    A simulation profile

    +
    + +
    +
    data(plpDataSimulationProfile)
    +
    + +
    +

    Format

    +

    A data frame containing the following elements:

    covariatePrevalence
    +

    prevalence of all covariates

    + +
    outcomeModels
    +

    regression model parameters to simulate outcomes

    + +
    metaData
    +

    settings used to simulate the profile

    + +
    covariateRef
    +

    covariateIds and covariateNames

    + +
    timePrevalence
    +

    time window

    + +
    exclusionPrevalence
    +

    prevalence of exclusion of covariates

    + + +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/positiveLikelihoodRatio.html b/docs/reference/positiveLikelihoodRatio.html new file mode 100644 index 000000000..15d6fb14f --- /dev/null +++ b/docs/reference/positiveLikelihoodRatio.html @@ -0,0 +1,190 @@ + +Calculate the positiveLikelihoodRatio — positiveLikelihoodRatio • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Calculate the positiveLikelihoodRatio

    +
    + +
    +
    positiveLikelihoodRatio(TP, TN, FN, FP)
    +
    + +
    +

    Arguments

    +
    TP
    +

    Number of true positives

    + + +
    TN
    +

    Number of true negatives

    + + +
    FN
    +

    Number of false negatives

    + + +
    FP
    +

    Number of false positives

    + +
    +
    +

    Value

    + + +

    positiveLikelihoodRatio value

    +
    +
    +

    Details

    +

    Calculate the positiveLikelihoodRatio

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/positivePredictiveValue.html b/docs/reference/positivePredictiveValue.html new file mode 100644 index 000000000..8e32622a2 --- /dev/null +++ b/docs/reference/positivePredictiveValue.html @@ -0,0 +1,190 @@ + +Calculate the positivePredictiveValue — positivePredictiveValue • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Calculate the positivePredictiveValue

    +
    + +
    +
    positivePredictiveValue(TP, TN, FN, FP)
    +
    + +
    +

    Arguments

    +
    TP
    +

    Number of true positives

    + + +
    TN
    +

    Number of true negatives

    + + +
    FN
    +

    Number of false negatives

    + + +
    FP
    +

    Number of false positives

    + +
    +
    +

    Value

    + + +

    positivePredictiveValue value

    +
    +
    +

    Details

    +

    Calculate the positivePredictiveValue

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/predictCyclops.html b/docs/reference/predictCyclops.html new file mode 100644 index 000000000..8482299c3 --- /dev/null +++ b/docs/reference/predictCyclops.html @@ -0,0 +1,188 @@ + +Create predictive probabilities — predictCyclops • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Create predictive probabilities

    +
    + +
    +
    predictCyclops(plpModel, data, cohort)
    +
    + +
    +

    Arguments

    +
    plpModel
    +

    An object of type predictiveModel as generated using +fitPlp.

    + + +
    data
    +

    The new plpData containing the covariateData for the new population

    + + +
    cohort
    +

    The cohort to calculate the prediction for

    + +
    +
    +

    Value

    + + +

    The value column in the result data.frame is: logistic: probabilities of the outcome, poisson: +Poisson rate (per day) of the outome, survival: hazard rate (per day) of the outcome.

    +
    +
    +

    Details

    +

    Generates predictions for the population specified in plpData given the model.

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/predictPlp.html b/docs/reference/predictPlp.html new file mode 100644 index 000000000..1765cf0ec --- /dev/null +++ b/docs/reference/predictPlp.html @@ -0,0 +1,191 @@ + +predictPlp — predictPlp • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Predict the risk of the outcome using the input plpModel for the input plpData

    +
    + +
    +
    predictPlp(plpModel, plpData, population, timepoint)
    +
    + +
    +

    Arguments

    +
    plpModel
    +

    An object of type plpModel - a patient level prediction model

    + + +
    plpData
    +

    An object of type plpData - the patient level prediction +data extracted from the CDM.

    + + +
    population
    +

    The population created using createStudyPopulation() who will have their risks predicted or a cohort without the outcome known

    + + +
    timepoint
    +

    The timepoint to predict risk (survival models only)

    + +
    +
    +

    Value

    + + +

    A dataframe containing the prediction for each person in the population with an attribute metaData containing prediction details.

    +
    +
    +

    Details

    +

    The function applied the trained model on the plpData to make predictions

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/preprocessData.html b/docs/reference/preprocessData.html new file mode 100644 index 000000000..c8c042505 --- /dev/null +++ b/docs/reference/preprocessData.html @@ -0,0 +1,188 @@ + +A function that wraps around FeatureExtraction::tidyCovariateData to normalise the data +and remove rare or redundant features — preprocessData • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    A function that wraps around FeatureExtraction::tidyCovariateData to normalise the data +and remove rare or redundant features

    +
    + +
    +
    preprocessData(covariateData, preprocessSettings)
    +
    + +
    +

    Arguments

    +
    covariateData
    +

    The covariate part of the training data created by splitData after being sampled and having +any required feature engineering

    + + +
    preprocessSettings
    +

    The settings for the preprocessing created by createPreprocessSettings

    + +
    +
    +

    Value

    + + +

    The data processed

    +
    +
    +

    Details

    +

    Returns an object of class covariateData that has been processed

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/recalibratePlp.html b/docs/reference/recalibratePlp.html new file mode 100644 index 000000000..d9fb6af46 --- /dev/null +++ b/docs/reference/recalibratePlp.html @@ -0,0 +1,196 @@ + +recalibratePlp — recalibratePlp • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Train various models using a default parameter gird search or user specified parameters

    +
    + +
    +
    recalibratePlp(
    +  prediction,
    +  analysisId,
    +  typeColumn = "evaluationType",
    +  method = c("recalibrationInTheLarge", "weakRecalibration")
    +)
    +
    + +
    +

    Arguments

    +
    prediction
    +

    A prediction dataframe

    + + +
    analysisId
    +

    The model analysisId

    + + +
    typeColumn
    +

    The column name where the strata types are specified

    + + +
    method
    +

    Method used to recalibrate ('recalibrationInTheLarge' or 'weakRecalibration' )

    + +
    +
    +

    Value

    + + +

    An object of class runPlp that is recalibrated on the new data

    +
    +
    +

    Details

    +

    The user can define the machine learning model to train (regularised logistic regression, random forest, +gradient boosting machine, neural network and )

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/recalibratePlpRefit.html b/docs/reference/recalibratePlpRefit.html new file mode 100644 index 000000000..f9338fe9b --- /dev/null +++ b/docs/reference/recalibratePlpRefit.html @@ -0,0 +1,188 @@ + +recalibratePlpRefit — recalibratePlpRefit • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Train various models using a default parameter gird search or user specified parameters

    +
    + +
    +
    recalibratePlpRefit(plpModel, newPopulation, newData)
    +
    + +
    +

    Arguments

    +
    plpModel
    +

    The trained plpModel (runPlp$model)

    + + +
    newPopulation
    +

    The population created using createStudyPopulation() who will have their risks predicted

    + + +
    newData
    +

    An object of type plpData - the patient level prediction +data extracted from the CDM.

    + +
    +
    +

    Value

    + + +

    An object of class runPlp that is recalibrated on the new data

    +
    +
    +

    Details

    +

    The user can define the machine learning model to train (regularised logistic regression, random forest, +gradient boosting machine, neural network and )

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/runMultiplePlp.html b/docs/reference/runMultiplePlp.html new file mode 100644 index 000000000..5e36a1c9f --- /dev/null +++ b/docs/reference/runMultiplePlp.html @@ -0,0 +1,213 @@ + +Run a list of predictions analyses — runMultiplePlp • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Run a list of predictions analyses

    +
    + +
    +
    runMultiplePlp(
    +  databaseDetails = createDatabaseDetails(),
    +  modelDesignList = list(createModelDesign(targetId = 1, outcomeId = 2, modelSettings =
    +    setLassoLogisticRegression()), createModelDesign(targetId = 1, outcomeId = 3,
    +    modelSettings = setLassoLogisticRegression())),
    +  onlyFetchData = F,
    +  cohortDefinitions = NULL,
    +  logSettings = createLogSettings(verbosity = "DEBUG", timeStamp = T, logName =
    +    "runPlp Log"),
    +  saveDirectory = getwd(),
    +  sqliteLocation = file.path(saveDirectory, "sqlite")
    +)
    +
    + +
    +

    Arguments

    +
    databaseDetails
    +

    The database settings created using createDatabaseDetails()

    + + +
    modelDesignList
    +

    A list of model designs created using createModelDesign()

    + + +
    onlyFetchData
    +

    Only fetches and saves the data object to the output folder without running the analysis.

    + + +
    cohortDefinitions
    +

    A list of cohort definitions for the target and outcome cohorts

    + + +
    logSettings
    +

    The setting specifying the logging for the analyses created using createLogSettings()

    + + +
    saveDirectory
    +

    Name of the folder where all the outputs will written to.

    + + +
    sqliteLocation
    +

    (optional) The location of the sqlite database with the results

    + +
    +
    +

    Value

    + + +

    A data frame with the following columns:

    analysisIdThe unique identifier +for a set of analysis choices.
    targetIdThe ID of the target cohort populations.
    outcomeIdThe ID of the outcomeId.
    dataLocationThe location where the plpData was saved
    the settings idsThe ids for all other settings used for model development.
    +
    +

    Details

    +

    This function will run all specified predictions as defined using .

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/runPlp.html b/docs/reference/runPlp.html new file mode 100644 index 000000000..2eb237596 --- /dev/null +++ b/docs/reference/runPlp.html @@ -0,0 +1,279 @@ + +runPlp - Develop and internally evaluate a model using specified settings — runPlp • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    This provides a general framework for training patient level prediction models. The user can select +various default feature selection methods or incorporate their own, The user can also select from +a range of default classifiers or incorporate their own. There are three types of evaluations for the model +patient (randomly splits people into train/validation sets) or year (randomly splits data into train/validation sets +based on index year - older in training, newer in validation) or both (same as year spliting but checks there are +no overlaps in patients within training set and validaiton set - any overlaps are removed from validation set)

    +
    + +
    +
    runPlp(
    +  plpData,
    +  outcomeId = plpData$metaData$call$outcomeIds[1],
    +  analysisId = paste(Sys.Date(), plpData$metaData$call$outcomeIds[1], sep = "-"),
    +  analysisName = "Study details",
    +  populationSettings = createStudyPopulationSettings(),
    +  splitSettings = createDefaultSplitSetting(type = "stratified", testFraction = 0.25,
    +    trainFraction = 0.75, splitSeed = 123, nfold = 3),
    +  sampleSettings = createSampleSettings(type = "none"),
    +  featureEngineeringSettings = createFeatureEngineeringSettings(type = "none"),
    +  preprocessSettings = createPreprocessSettings(minFraction = 0.001, normalize = T),
    +  modelSettings = setLassoLogisticRegression(),
    +  logSettings = createLogSettings(verbosity = "DEBUG", timeStamp = T, logName =
    +    "runPlp Log"),
    +  executeSettings = createDefaultExecuteSettings(),
    +  saveDirectory = getwd()
    +)
    +
    + +
    +

    Arguments

    +
    plpData
    +

    An object of type plpData - the patient level prediction +data extracted from the CDM. Can also include an initial population as +plpData$popualtion.

    + + +
    outcomeId
    +

    (integer) The ID of the outcome.

    + + +
    analysisId
    +

    (integer) Identifier for the analysis. It is used to create, e.g., the result folder. Default is a timestamp.

    + + +
    analysisName
    +

    (character) Name for the analysis

    + + +
    populationSettings
    +

    An object of type populationSettings created using createStudyPopulationSettings that +specifies how the data class labels are defined and addition any exclusions to apply to the +plpData cohort

    + + +
    splitSettings
    +

    An object of type splitSettings that specifies how to split the data into train/validation/test. +The default settings can be created using createDefaultSplitSetting.

    + + +
    sampleSettings
    +

    An object of type sampleSettings that specifies any under/over sampling to be done. +The default is none.

    + + +
    featureEngineeringSettings
    +

    An object of featureEngineeringSettings specifying any feature engineering to be learned (using the train data)

    + + +
    preprocessSettings
    +

    An object of preprocessSettings. This setting specifies the minimum fraction of +target population who must have a covariate for it to be included in the model training +and whether to normalise the covariates before training

    + + +
    modelSettings
    +

    An object of class modelSettings created using one of the function:

    • setLassoLogisticRegression() A lasso logistic regression model

    • +
    • setGradientBoostingMachine() A gradient boosting machine

    • +
    • setAdaBoost() An ada boost model

    • +
    • setRandomForest() A random forest model

    • +
    • setDecisionTree() A decision tree model

    • +
    • setKNN() A KNN model

    • +
    + + +
    logSettings
    +

    An object of logSettings created using createLogSettings +specifying how the logging is done

    + + +
    executeSettings
    +

    An object of executeSettings specifying which parts of the analysis to run

    + + +
    saveDirectory
    +

    The path to the directory where the results will be saved (if NULL uses working directory)

    + +
    +
    +

    Value

    + + +

    An object containing the following:

    +

    +
    • model The developed model of class plpModel

    • +
    • executionSummary A list containing the hardward details, R package details and execution time

    • +
    • performanceEvaluation Various internal performance metrics in sparse format

    • +
    • prediction The plpData cohort table with the predicted risks added as a column (named value)

    • +
    • covariateSummary A characterization of the features for patients with and without the outcome during the time at risk

    • +
    • analysisRef A list with details about the analysis

    • +
    +
    +

    Details

    +

    This function takes as input the plpData extracted from an OMOP CDM database and follows the specified settings to +develop and internally validate a model for the specified outcomeId.

    +
    + +
    +

    Examples

    + +
    +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/savePlpAnalysesJson.html b/docs/reference/savePlpAnalysesJson.html new file mode 100644 index 000000000..675b05e57 --- /dev/null +++ b/docs/reference/savePlpAnalysesJson.html @@ -0,0 +1,200 @@ + +Save the modelDesignList to a json file — savePlpAnalysesJson • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Save the modelDesignList to a json file

    +
    + +
    +
    savePlpAnalysesJson(
    +  modelDesignList = list(createModelDesign(targetId = 1, outcomeId = 2, modelSettings =
    +    setLassoLogisticRegression()), createModelDesign(targetId = 1, outcomeId = 3,
    +    modelSettings = setLassoLogisticRegression())),
    +  cohortDefinitions = NULL,
    +  saveDirectory = NULL
    +)
    +
    + +
    +

    Arguments

    +
    modelDesignList
    +

    A list of modelDesigns created using createModelDesign()

    + + +
    cohortDefinitions
    +

    A list of the cohortDefinitions (generally extracted from ATLAS)

    + + +
    saveDirectory
    +

    The directory to save the modelDesignList settings

    + +
    +
    +

    Details

    +

    This function creates a json file with the modelDesignList saved

    +
    + +
    +

    Examples

    +
    if (FALSE) {
    +savePlpAnalysesJson(
    +modelDesignList = list(
    +createModelDesign(targetId = 1, outcomeId = 2, modelSettings = setLassoLogisticRegression()), 
    +createModelDesign(targetId = 1, outcomeId = 3, modelSettings = setLassoLogisticRegression())
    +),
    +saveDirectory = 'C:/bestModels'
    +)
    +}
    +
    +
    +
    +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/savePlpData.html b/docs/reference/savePlpData.html new file mode 100644 index 000000000..54a2af0ae --- /dev/null +++ b/docs/reference/savePlpData.html @@ -0,0 +1,193 @@ + +Save the cohort data to folder — savePlpData • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    savePlpData saves an object of type plpData to folder.

    +
    + +
    +
    savePlpData(plpData, file, envir = NULL, overwrite = F)
    +
    + +
    +

    Arguments

    +
    plpData
    +

    An object of type plpData as generated using +getPlpData.

    + + +
    file
    +

    The name of the folder where the data will be written. The folder should +not yet exist.

    + + +
    envir
    +

    The environment for to evaluate variables when saving

    + + +
    overwrite
    +

    Whether to force overwrite an existing file

    + +
    +
    +

    Details

    +

    The data will be written to a set of files in the folder specified by the user.

    +
    + +
    +

    Examples

    +
    # todo
    +
    +
    +
    +
    +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/savePlpModel.html b/docs/reference/savePlpModel.html new file mode 100644 index 000000000..786674942 --- /dev/null +++ b/docs/reference/savePlpModel.html @@ -0,0 +1,176 @@ + +Saves the plp model — savePlpModel • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Saves the plp model

    +
    + +
    +
    savePlpModel(plpModel, dirPath)
    +
    + +
    +

    Arguments

    +
    plpModel
    +

    A trained classifier returned by running runPlp()$model

    + + +
    dirPath
    +

    A location to save the model to

    + +
    +
    +

    Details

    +

    Saves the plp model to a user specificed folder

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/savePlpResult.html b/docs/reference/savePlpResult.html new file mode 100644 index 000000000..52064bba2 --- /dev/null +++ b/docs/reference/savePlpResult.html @@ -0,0 +1,176 @@ + +Saves the result from runPlp into the location directory — savePlpResult • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Saves the result from runPlp into the location directory

    +
    + +
    +
    savePlpResult(result, dirPath)
    +
    + +
    +

    Arguments

    +
    result
    +

    The result of running runPlp()

    + + +
    dirPath
    +

    The directory to save the csv

    + +
    +
    +

    Details

    +

    Saves the result from runPlp into the location directory

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/savePlpShareable.html b/docs/reference/savePlpShareable.html new file mode 100644 index 000000000..828518125 --- /dev/null +++ b/docs/reference/savePlpShareable.html @@ -0,0 +1,180 @@ + +Save the plp result as json files and csv files for transparent sharing — savePlpShareable • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Save the plp result as json files and csv files for transparent sharing

    +
    + +
    +
    savePlpShareable(result, saveDirectory, minCellCount = 10)
    +
    + +
    +

    Arguments

    +
    result
    +

    An object of class runPlp with development or validation results

    + + +
    saveDirectory
    +

    The directory the save the results as csv files

    + + +
    minCellCount
    +

    Minimum cell count for the covariateSummary and certain evaluation results

    + +
    +
    +

    Details

    +

    Saves the main results json/csv files (these files can be read by the shiny app)

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/savePrediction.html b/docs/reference/savePrediction.html new file mode 100644 index 000000000..6ff5a6ff3 --- /dev/null +++ b/docs/reference/savePrediction.html @@ -0,0 +1,180 @@ + +Saves the prediction dataframe to RDS — savePrediction • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Saves the prediction dataframe to RDS

    +
    + +
    +
    savePrediction(prediction, dirPath, fileName = "prediction.rds")
    +
    + +
    +

    Arguments

    +
    prediction
    +

    The prediciton data.frame

    + + +
    dirPath
    +

    The directory to save the prediction RDS

    + + +
    fileName
    +

    The name of the RDS file that will be saved in dirPath

    + +
    +
    +

    Details

    +

    Saves the prediction data frame returned by predict.R to an RDS file and returns the fileLocation where the prediction is saved

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/sensitivity.html b/docs/reference/sensitivity.html new file mode 100644 index 000000000..512fa41d5 --- /dev/null +++ b/docs/reference/sensitivity.html @@ -0,0 +1,190 @@ + +Calculate the sensitivity — sensitivity • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Calculate the sensitivity

    +
    + +
    +
    sensitivity(TP, TN, FN, FP)
    +
    + +
    +

    Arguments

    +
    TP
    +

    Number of true positives

    + + +
    TN
    +

    Number of true negatives

    + + +
    FN
    +

    Number of false negatives

    + + +
    FP
    +

    Number of false positives

    + +
    +
    +

    Value

    + + +

    sensitivity value

    +
    +
    +

    Details

    +

    Calculate the sensitivity

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/setAdaBoost.html b/docs/reference/setAdaBoost.html new file mode 100644 index 000000000..2ad4c4aea --- /dev/null +++ b/docs/reference/setAdaBoost.html @@ -0,0 +1,195 @@ + +Create setting for AdaBoost with python DecisionTreeClassifier base estimator — setAdaBoost • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Create setting for AdaBoost with python DecisionTreeClassifier base estimator

    +
    + +
    +
    setAdaBoost(
    +  nEstimators = list(10, 50, 200),
    +  learningRate = list(1, 0.5, 0.1),
    +  algorithm = list("SAMME.R"),
    +  seed = sample(1e+06, 1)
    +)
    +
    + +
    +

    Arguments

    +
    nEstimators
    +

    (list) The maximum number of estimators at which boosting is terminated. In case of perfect fit, the learning procedure is stopped early.

    + + +
    learningRate
    +

    (list) Weight applied to each classifier at each boosting iteration. A higher learning rate increases the contribution of each classifier. There is a trade-off between the learningRate and nEstimators parameters +There is a trade-off between learningRate and nEstimators.

    + + +
    algorithm
    +

    (list) If ‘SAMME.R’ then use the SAMME.R real boosting algorithm. base_estimator must support calculation of class probabilities. If ‘SAMME’ then use the SAMME discrete boosting algorithm. The SAMME.R algorithm typically converges faster than SAMME, achieving a lower test error with fewer boosting iterations.

    + + +
    seed
    +

    A seed for the model

    + +
    + +
    +

    Examples

    +
    if (FALSE) {
    +model.adaBoost <- setAdaBoost(nEstimators = list(10,50,200), learningRate = list(1, 0.5, 0.1),
    +                              algorithm = list('SAMME.R'), seed = sample(1000000,1)
    +                              )
    +}
    +
    +
    +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/setCoxModel.html b/docs/reference/setCoxModel.html new file mode 100644 index 000000000..072cb424c --- /dev/null +++ b/docs/reference/setCoxModel.html @@ -0,0 +1,215 @@ + +Create setting for lasso Cox model — setCoxModel • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Create setting for lasso Cox model

    +
    + +
    +
    setCoxModel(
    +  variance = 0.01,
    +  seed = NULL,
    +  includeCovariateIds = c(),
    +  noShrinkage = c(),
    +  threads = -1,
    +  upperLimit = 20,
    +  lowerLimit = 0.01,
    +  tolerance = 2e-07,
    +  maxIterations = 3000
    +)
    +
    + +
    +

    Arguments

    +
    variance
    +

    Numeric: prior distribution starting variance

    + + +
    seed
    +

    An option to add a seed when training the model

    + + +
    includeCovariateIds
    +

    a set of covariate IDS to limit the analysis to

    + + +
    noShrinkage
    +

    a set of covariates whcih are to be forced to be included in the final model. default is the intercept

    + + +
    threads
    +

    An option to set number of threads when training model

    + + +
    upperLimit
    +

    Numeric: Upper prior variance limit for grid-search

    + + +
    lowerLimit
    +

    Numeric: Lower prior variance limit for grid-search

    + + +
    tolerance
    +

    Numeric: maximum relative change in convergence criterion from successive iterations to achieve convergence

    + + +
    maxIterations
    +

    Integer: maximum iterations of Cyclops to attempt before returning a failed-to-converge error

    + +
    + +
    +

    Examples

    +
    model.lr <- setCoxModel()
    +
    +
    +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/setDecisionTree.html b/docs/reference/setDecisionTree.html new file mode 100644 index 000000000..1a6f0906f --- /dev/null +++ b/docs/reference/setDecisionTree.html @@ -0,0 +1,227 @@ + +Create setting for the scikit-learn 1.0.1 DecisionTree with python — setDecisionTree • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Create setting for the scikit-learn 1.0.1 DecisionTree with python

    +
    + +
    +
    setDecisionTree(
    +  criterion = list("gini"),
    +  splitter = list("best"),
    +  maxDepth = list(as.integer(4), as.integer(10), NULL),
    +  minSamplesSplit = list(2, 10),
    +  minSamplesLeaf = list(10, 50),
    +  minWeightFractionLeaf = list(0),
    +  maxFeatures = list(100, "sqrt", NULL),
    +  maxLeafNodes = list(NULL),
    +  minImpurityDecrease = list(10^-7),
    +  classWeight = list(NULL),
    +  seed = sample(1e+06, 1)
    +)
    +
    + +
    +

    Arguments

    +
    criterion
    +

    The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain.

    + + +
    splitter
    +

    The strategy used to choose the split at each node. Supported strategies are “best” to choose the best split and “random” to choose the best random split.

    + + +
    maxDepth
    +

    (list) The maximum depth of the tree. If NULL, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

    + + +
    minSamplesSplit
    +

    The minimum number of samples required to split an internal node

    + + +
    minSamplesLeaf
    +

    The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least minSamplesLeaf training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.

    + + +
    minWeightFractionLeaf
    +

    The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sampleWeight is not provided.

    + + +
    maxFeatures
    +

    (list) The number of features to consider when looking for the best split (int/'sqrt'/NULL)

    + + +
    maxLeafNodes
    +

    (list) Grow a tree with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes. (int/NULL)

    + + +
    minImpurityDecrease
    +

    Threshold for early stopping in tree growth. A node will split if its impurity is above the threshold, otherwise it is a leaf.

    + + +
    classWeight
    +

    (list) Weights associated with classes 'balance' or NULL

    + + +
    seed
    +

    The random state seed

    + +
    + +
    +

    Examples

    +
    if (FALSE) {
    +model.decisionTree <- setDecisionTree(maxDepth=10,minSamplesLeaf=10, seed=NULL )
    +}
    +
    +
    +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/setGradientBoostingMachine.html b/docs/reference/setGradientBoostingMachine.html new file mode 100644 index 000000000..125a8d43b --- /dev/null +++ b/docs/reference/setGradientBoostingMachine.html @@ -0,0 +1,222 @@ + +Create setting for gradient boosting machine model using gbm_xgboost implementation — setGradientBoostingMachine • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Create setting for gradient boosting machine model using gbm_xgboost implementation

    +
    + +
    +
    setGradientBoostingMachine(
    +  ntrees = c(100, 300),
    +  nthread = 20,
    +  earlyStopRound = 25,
    +  maxDepth = c(4, 6, 8),
    +  minChildWeight = 1,
    +  learnRate = c(0.05, 0.1, 0.3),
    +  scalePosWeight = 1,
    +  lambda = 1,
    +  alpha = 0,
    +  seed = sample(1e+07, 1)
    +)
    +
    + +
    +

    Arguments

    +
    ntrees
    +

    The number of trees to build

    + + +
    nthread
    +

    The number of computer threads to use (how many cores do you have?)

    + + +
    earlyStopRound
    +

    If the performance does not increase over earlyStopRound number of trees then training stops (this prevents overfitting)

    + + +
    maxDepth
    +

    Maximum depth of each tree - a large value will lead to slow model training

    + + +
    minChildWeight
    +

    Minimum sum of of instance weight in a child node - larger values are more conservative

    + + +
    learnRate
    +

    The boosting learn rate

    + + +
    scalePosWeight
    +

    Controls weight of positive class in loss - useful for imbalanced classes

    + + +
    lambda
    +

    L2 regularization on weights - larger is more conservative

    + + +
    alpha
    +

    L1 regularization on weights - larger is more conservative

    + + +
    seed
    +

    An option to add a seed when training the final model

    + +
    + +
    +

    Examples

    +
    model.gbm <- setGradientBoostingMachine(ntrees=c(10,100), nthread=20,
    +                           maxDepth=c(4,6), learnRate=c(0.1,0.3))
    +
    +
    +
    +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/setIterativeHardThresholding.html b/docs/reference/setIterativeHardThresholding.html new file mode 100644 index 000000000..6a234802c --- /dev/null +++ b/docs/reference/setIterativeHardThresholding.html @@ -0,0 +1,225 @@ + +Create setting for lasso logistic regression — setIterativeHardThresholding • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Create setting for lasso logistic regression

    +
    + +
    +
    setIterativeHardThresholding(
    +  K = 10,
    +  penalty = "bic",
    +  seed = sample(1e+05, 1),
    +  exclude = c(),
    +  forceIntercept = F,
    +  fitBestSubset = FALSE,
    +  initialRidgeVariance = 10000,
    +  tolerance = 1e-08,
    +  maxIterations = 10000,
    +  threshold = 1e-06,
    +  delta = 0
    +)
    +
    + +
    +

    Arguments

    +
    K
    +

    The maximum number of non-zero predictors

    + + +
    penalty
    +

    Specifies the IHT penalty; possible values are `BIC` or `AIC` or a numeric value

    + + +
    seed
    +

    An option to add a seed when training the model

    + + +
    exclude
    +

    A vector of numbers or covariateId names to exclude from prior

    + + +
    forceIntercept
    +

    Logical: Force intercept coefficient into regularization

    + + +
    fitBestSubset
    +

    Logical: Fit final subset with no regularization

    + + +
    initialRidgeVariance
    +

    integer

    + + +
    tolerance
    +

    numeric

    + + +
    maxIterations
    +

    integer

    + + +
    threshold
    +

    numeric

    + + +
    delta
    +

    numeric

    + +
    + +
    +

    Examples

    + +
    +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/setKNN.html b/docs/reference/setKNN.html new file mode 100644 index 000000000..e3025432b --- /dev/null +++ b/docs/reference/setKNN.html @@ -0,0 +1,183 @@ + +Create setting for knn model — setKNN • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Create setting for knn model

    +
    + +
    +
    setKNN(k = 1000, indexFolder = file.path(getwd(), "knn"), threads = 1)
    +
    + +
    +

    Arguments

    +
    k
    +

    The number of neighbors to consider

    + + +
    indexFolder
    +

    The directory where the results and intermediate steps are output

    + + +
    threads
    +

    The number of threads to use when applying big knn

    + +
    + +
    +

    Examples

    +
    if (FALSE) {
    +model.knn <- setKNN(k=10000)
    +}
    +
    +
    +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/setLassoLogisticRegression.html b/docs/reference/setLassoLogisticRegression.html new file mode 100644 index 000000000..05f3b0a0b --- /dev/null +++ b/docs/reference/setLassoLogisticRegression.html @@ -0,0 +1,225 @@ + +Create setting for lasso logistic regression — setLassoLogisticRegression • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Create setting for lasso logistic regression

    +
    + +
    +
    setLassoLogisticRegression(
    +  variance = 0.01,
    +  seed = NULL,
    +  includeCovariateIds = c(),
    +  noShrinkage = c(0),
    +  threads = -1,
    +  forceIntercept = F,
    +  upperLimit = 20,
    +  lowerLimit = 0.01,
    +  tolerance = 2e-06,
    +  maxIterations = 3000,
    +  priorCoefs = NULL
    +)
    +
    + +
    +

    Arguments

    +
    variance
    +

    Numeric: prior distribution starting variance

    + + +
    seed
    +

    An option to add a seed when training the model

    + + +
    includeCovariateIds
    +

    a set of covariate IDS to limit the analysis to

    + + +
    noShrinkage
    +

    a set of covariates whcih are to be forced to be included in the final model. default is the intercept

    + + +
    threads
    +

    An option to set number of threads when training model

    + + +
    forceIntercept
    +

    Logical: Force intercept coefficient into prior

    + + +
    upperLimit
    +

    Numeric: Upper prior variance limit for grid-search

    + + +
    lowerLimit
    +

    Numeric: Lower prior variance limit for grid-search

    + + +
    tolerance
    +

    Numeric: maximum relative change in convergence criterion from successive iterations to achieve convergence

    + + +
    maxIterations
    +

    Integer: maximum iterations of Cyclops to attempt before returning a failed-to-converge error

    + + +
    priorCoefs
    +

    Use coefficients from a previous model as starting points for model fit (transfer learning)

    + +
    + +
    +

    Examples

    +
    model.lr <- setLassoLogisticRegression()
    +
    +
    +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/setLightGBM.html b/docs/reference/setLightGBM.html new file mode 100644 index 000000000..d15720ee7 --- /dev/null +++ b/docs/reference/setLightGBM.html @@ -0,0 +1,234 @@ + +Create setting for gradient boosting machine model using lightGBM (https://github.com/microsoft/LightGBM/tree/master/R-package). — setLightGBM • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Create setting for gradient boosting machine model using lightGBM (https://github.com/microsoft/LightGBM/tree/master/R-package).

    +
    + +
    +
    setLightGBM(
    +  nthread = 20,
    +  earlyStopRound = 25,
    +  numIterations = c(100),
    +  numLeaves = c(31),
    +  maxDepth = c(5, 10),
    +  minDataInLeaf = c(20),
    +  learningRate = c(0.05, 0.1, 0.3),
    +  lambdaL1 = c(0),
    +  lambdaL2 = c(0),
    +  scalePosWeight = 1,
    +  isUnbalance = FALSE,
    +  seed = sample(1e+07, 1)
    +)
    +
    + +
    +

    Arguments

    +
    nthread
    +

    The number of computer threads to use (how many cores do you have?)

    + + +
    earlyStopRound
    +

    If the performance does not increase over earlyStopRound number of trees then training stops (this prevents overfitting)

    + + +
    numIterations
    +

    Number of boosting iterations.

    + + +
    numLeaves
    +

    This hyperparameter sets the maximum number of leaves. Increasing this parameter can lead to higher model complexity and potential overfitting.

    + + +
    maxDepth
    +

    This hyperparameter sets the maximum depth . Increasing this parameter can also lead to higher model complexity and potential overfitting.

    + + +
    minDataInLeaf
    +

    This hyperparameter sets the minimum number of data points that must be present in a leaf node. Increasing this parameter can help to reduce overfitting

    + + +
    learningRate
    +

    This hyperparameter controls the step size at each iteration of the gradient descent algorithm. Lower values can lead to slower convergence but may result in better performance.

    + + +
    lambdaL1
    +

    This hyperparameter controls L1 regularization, which can help to reduce overfitting by encouraging sparse models.

    + + +
    lambdaL2
    +

    This hyperparameter controls L2 regularization, which can also help to reduce overfitting by discouraging large weights in the model.

    + + +
    scalePosWeight
    +

    Controls weight of positive class in loss - useful for imbalanced classes

    + + +
    isUnbalance
    +

    This parameter cannot be used at the same time with scalePosWeight, choose only one of them. While enabling this should increase the overall performance metric of your model, it will also result in poor estimates of the individual class probabilities.

    + + +
    seed
    +

    An option to add a seed when training the final model

    + +
    + +
    +

    Examples

    +
    model.lightgbm <- setLightGBM(
    +    numLeaves = c(20, 31, 50), maxDepth = c(-1, 5, 10),
    +    minDataInLeaf = c(10, 20, 30), learningRate = c(0.05, 0.1, 0.3)
    +)
    +
    +
    +
    +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/setMLP.html b/docs/reference/setMLP.html new file mode 100644 index 000000000..f7f77fb07 --- /dev/null +++ b/docs/reference/setMLP.html @@ -0,0 +1,281 @@ + +Create setting for neural network model with python — setMLP • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Create setting for neural network model with python

    +
    + +
    +
    setMLP(
    +  hiddenLayerSizes = list(c(100), c(20)),
    +  activation = list("relu"),
    +  solver = list("adam"),
    +  alpha = list(0.3, 0.01, 1e-04, 1e-06),
    +  batchSize = list("auto"),
    +  learningRate = list("constant"),
    +  learningRateInit = list(0.001),
    +  powerT = list(0.5),
    +  maxIter = list(200, 100),
    +  shuffle = list(TRUE),
    +  tol = list(1e-04),
    +  warmStart = list(TRUE),
    +  momentum = list(0.9),
    +  nesterovsMomentum = list(TRUE),
    +  earlyStopping = list(FALSE),
    +  validationFraction = list(0.1),
    +  beta1 = list(0.9),
    +  beta2 = list(0.999),
    +  epsilon = list(1e-08),
    +  nIterNoChange = list(10),
    +  seed = sample(1e+05, 1)
    +)
    +
    + +
    +

    Arguments

    +
    hiddenLayerSizes
    +

    (list of vectors) The ith element represents the number of neurons in the ith hidden layer.

    + + +
    activation
    +

    (list) Activation function for the hidden layer.

    • "identity": no-op activation, useful to implement linear bottleneck, returns f(x) = x

    • +
    • "logistic": the logistic sigmoid function, returns f(x) = 1 / (1 + exp(-x)).

    • +
    • "tanh": the hyperbolic tan function, returns f(x) = tanh(x).

    • +
    • "relu": the rectified linear unit function, returns f(x) = max(0, x)

    • +
    + + +
    solver
    +

    (list) The solver for weight optimization. (‘lbfgs’, ‘sgd’, ‘adam’)

    + + +
    alpha
    +

    (list) L2 penalty (regularization term) parameter.

    + + +
    batchSize
    +

    (list) Size of minibatches for stochastic optimizers. If the solver is ‘lbfgs’, the classifier will not use minibatch. When set to “auto”, batchSize=min(200, n_samples).

    + + +
    learningRate
    +

    (list) Only used when solver='sgd' Learning rate schedule for weight updates. ‘constant’, ‘invscaling’, ‘adaptive’, default=’constant’

    + + +
    learningRateInit
    +

    (list) Only used when solver=’sgd’ or ‘adam’. The initial learning rate used. It controls the step-size in updating the weights.

    + + +
    powerT
    +

    (list) Only used when solver=’sgd’. The exponent for inverse scaling learning rate. It is used in updating effective learning rate when the learning_rate is set to ‘invscaling’.

    + + +
    maxIter
    +

    (list) Maximum number of iterations. The solver iterates until convergence (determined by ‘tol’) or this number of iterations. For stochastic solvers (‘sgd’, ‘adam’), note that this determines the number of epochs (how many times each data point will be used), not the number of gradient steps.

    + + +
    shuffle
    +

    (list) boolean: Whether to shuffle samples in each iteration. Only used when solver=’sgd’ or ‘adam’.

    + + +
    tol
    +

    (list) Tolerance for the optimization. When the loss or score is not improving by at least tol for nIterNoChange consecutive iterations, unless learning_rate is set to ‘adaptive’, convergence is considered to be reached and training stops.

    + + +
    warmStart
    +

    (list) When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.

    + + +
    momentum
    +

    (list) Momentum for gradient descent update. Should be between 0 and 1. Only used when solver=’sgd’.

    + + +
    nesterovsMomentum
    +

    (list) Whether to use Nesterov’s momentum. Only used when solver=’sgd’ and momentum > 0.

    + + +
    earlyStopping
    +

    (list) boolean Whether to use early stopping to terminate training when validation score is not improving. If set to true, it will automatically set aside 10 percent of training data as validation and terminate training when validation score is not improving by at least tol for n_iter_no_change consecutive epochs.

    + + +
    validationFraction
    +

    (list) The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1. Only used if earlyStopping is True.

    + + +
    beta1
    +

    (list) Exponential decay rate for estimates of first moment vector in adam, should be in 0 to 1.

    + + +
    beta2
    +

    (list) Exponential decay rate for estimates of second moment vector in adam, should be in 0 to 1.

    + + +
    epsilon
    +

    (list) Value for numerical stability in adam.

    + + +
    nIterNoChange
    +

    (list) Maximum number of epochs to not meet tol improvement. Only effective when solver=’sgd’ or ‘adam’.

    + + +
    seed
    +

    A seed for the model

    + +
    + +
    +

    Examples

    +
    if (FALSE) {
    +model.mlp <- setMLP()
    +}
    +
    +
    +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/setNaiveBayes.html b/docs/reference/setNaiveBayes.html new file mode 100644 index 000000000..a0389c1d8 --- /dev/null +++ b/docs/reference/setNaiveBayes.html @@ -0,0 +1,169 @@ + +Create setting for naive bayes model with python — setNaiveBayes • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Create setting for naive bayes model with python

    +
    + +
    +
    setNaiveBayes()
    +
    + + +
    +

    Examples

    +
    if (FALSE) {
    +model.nb <- setNaiveBayes()
    +}
    +
    +
    +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/setPythonEnvironment.html b/docs/reference/setPythonEnvironment.html new file mode 100644 index 000000000..ce517c7b7 --- /dev/null +++ b/docs/reference/setPythonEnvironment.html @@ -0,0 +1,176 @@ + +Use the virtual environment created using configurePython() — setPythonEnvironment • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Use the virtual environment created using configurePython()

    +
    + +
    +
    setPythonEnvironment(envname = "PLP", envtype = NULL)
    +
    + +
    +

    Arguments

    +
    envname
    +

    A string for the name of the virtual environment (default is 'PLP')

    + + +
    envtype
    +

    An option for specifying the environment as'conda' or 'python'. If NULL then the default is 'conda' for windows users and 'python' for non-windows users

    + +
    +
    +

    Details

    +

    This function sets PatientLevelPrediction to use a virtual environment

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/setRandomForest.html b/docs/reference/setRandomForest.html new file mode 100644 index 000000000..1b44fefc4 --- /dev/null +++ b/docs/reference/setRandomForest.html @@ -0,0 +1,253 @@ + +Create setting for random forest model with python (very fast) — setRandomForest • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Create setting for random forest model with python (very fast)

    +
    + +
    +
    setRandomForest(
    +  ntrees = list(100, 500),
    +  criterion = list("gini"),
    +  maxDepth = list(4, 10, 17),
    +  minSamplesSplit = list(2, 5),
    +  minSamplesLeaf = list(1, 10),
    +  minWeightFractionLeaf = list(0),
    +  mtries = list("sqrt", "log2"),
    +  maxLeafNodes = list(NULL),
    +  minImpurityDecrease = list(0),
    +  bootstrap = list(TRUE),
    +  maxSamples = list(NULL, 0.9),
    +  oobScore = list(FALSE),
    +  nJobs = list(NULL),
    +  classWeight = list(NULL),
    +  seed = sample(1e+05, 1)
    +)
    +
    + +
    +

    Arguments

    +
    ntrees
    +

    (list) The number of trees to build

    + + +
    criterion
    +

    (list) The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain. Note: this parameter is tree-specific.

    + + +
    maxDepth
    +

    (list) The maximum depth of the tree. If NULL, then nodes are expanded until all leaves are pure or until all leaves contain less than minSamplesSplit samples.

    + + +
    minSamplesSplit
    +

    (list) The minimum number of samples required to split an internal node

    + + +
    minSamplesLeaf
    +

    (list) The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least minSamplesLeaf training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.

    + + +
    minWeightFractionLeaf
    +

    (list) The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sampleWeight is not provided.

    + + +
    mtries
    +

    (list) The number of features to consider when looking for the best split:

    • int then consider max_features features at each split.

    • +
    • float then max_features is a fraction and round(max_features * n_features) features are considered at each split

    • +
    • 'sqrt' then max_features=sqrt(n_features)

    • +
    • 'log2' then max_features=log2(n_features)

    • +
    • NULL then max_features=n_features

    • +
    + + +
    maxLeafNodes
    +

    (list) Grow trees with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.

    + + +
    minImpurityDecrease
    +

    (list) A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

    + + +
    bootstrap
    +

    (list) Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.

    + + +
    maxSamples
    +

    (list) If bootstrap is True, the number of samples to draw from X to train each base estimator.

    + + +
    oobScore
    +

    (list) Whether to use out-of-bag samples to estimate the generalization score. Only available if bootstrap=True.

    + + +
    nJobs
    +

    The number of jobs to run in parallel.

    + + +
    classWeight
    +

    (list) Weights associated with classes. If not given, all classes are supposed to have weight one. NULL, “balanced”, “balanced_subsample”

    + + +
    seed
    +

    A seed when training the final model

    + +
    + +
    +

    Examples

    +
    if (FALSE) {
    +model.rf <- setRandomForest(mtries=list('auto',5,20),  ntrees=c(10,100),
    +                           maxDepth=c(5,20))
    +}
    +
    +
    +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/setSVM.html b/docs/reference/setSVM.html new file mode 100644 index 000000000..c8ca9a9e7 --- /dev/null +++ b/docs/reference/setSVM.html @@ -0,0 +1,222 @@ + +Create setting for the python sklearn SVM (SVC function) — setSVM • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Create setting for the python sklearn SVM (SVC function)

    +
    + +
    +
    setSVM(
    +  C = list(1, 0.9, 2, 0.1),
    +  kernel = list("rbf"),
    +  degree = list(1, 3, 5),
    +  gamma = list("scale", 1e-04, 3e-05, 0.001, 0.01, 0.25),
    +  coef0 = list(0),
    +  shrinking = list(TRUE),
    +  tol = list(0.001),
    +  classWeight = list(NULL),
    +  cacheSize = 500,
    +  seed = sample(1e+05, 1)
    +)
    +
    + +
    +

    Arguments

    +
    C
    +

    (list) Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty.

    + + +
    kernel
    +

    (list) Specifies the kernel type to be used in the algorithm. one of ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’. If none is given ‘rbf’ will be used.

    + + +
    degree
    +

    (list) degree of kernel function is significant only in poly, rbf, sigmoid

    + + +
    gamma
    +

    (list) kernel coefficient for rbf and poly, by default 1/n_features will be taken. ‘scale’, ‘auto’ or float, default=’scale’

    + + +
    coef0
    +

    (list) independent term in kernel function. It is only significant in poly/sigmoid.

    + + +
    shrinking
    +

    (list) whether to use the shrinking heuristic.

    + + +
    tol
    +

    (list) Tolerance for stopping criterion.

    + + +
    classWeight
    +

    (list) Class weight based on imbalance either 'balanced' or NULL

    + + +
    cacheSize
    +

    Specify the size of the kernel cache (in MB).

    + + +
    seed
    +

    A seed for the model

    + +
    + +
    +

    Examples

    +
    if (FALSE) {
    +model.svm <- setSVM(kernel='rbf', seed = NULL)
    +}
    +
    +
    +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/simulatePlpData.html b/docs/reference/simulatePlpData.html new file mode 100644 index 000000000..28d042f5a --- /dev/null +++ b/docs/reference/simulatePlpData.html @@ -0,0 +1,185 @@ + +Generate simulated data — simulatePlpData • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    simulateplpData creates a plpData object with simulated data.

    +
    + +
    +
    simulatePlpData(plpDataSimulationProfile, n = 10000)
    +
    + +
    +

    Arguments

    +
    plpDataSimulationProfile
    +

    An object of type plpDataSimulationProfile as generated +using the
    createplpDataSimulationProfile function.

    + + +
    n
    +

    The size of the population to be generated.

    + +
    +
    +

    Value

    + + +

    An object of type plpData.

    +
    +
    +

    Details

    +

    This function generates simulated data that is in many ways similar to the original data on which +the simulation profile is based. The contains same outcome, comparator, and outcome concept IDs, +and the covariates and their 1st order statistics should be comparable.

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/sklearnFromJson.html b/docs/reference/sklearnFromJson.html new file mode 100644 index 000000000..207925bfd --- /dev/null +++ b/docs/reference/sklearnFromJson.html @@ -0,0 +1,168 @@ + +Loads sklearn python model from json — sklearnFromJson • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Loads sklearn python model from json

    +
    + +
    +
    sklearnFromJson(path)
    +
    + +
    +

    Arguments

    +
    path
    +

    path to the model json file

    + +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/sklearnToJson.html b/docs/reference/sklearnToJson.html new file mode 100644 index 000000000..4f18ac540 --- /dev/null +++ b/docs/reference/sklearnToJson.html @@ -0,0 +1,172 @@ + +Saves sklearn python model object to json in path — sklearnToJson • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Saves sklearn python model object to json in path

    +
    + +
    +
    sklearnToJson(model, path)
    +
    + +
    +

    Arguments

    +
    model
    +

    a fitted sklearn python model object

    + + +
    path
    +

    path to the saved model file

    + +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/specificity.html b/docs/reference/specificity.html new file mode 100644 index 000000000..e16bf46a6 --- /dev/null +++ b/docs/reference/specificity.html @@ -0,0 +1,190 @@ + +Calculate the specificity — specificity • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Calculate the specificity

    +
    + +
    +
    specificity(TP, TN, FN, FP)
    +
    + +
    +

    Arguments

    +
    TP
    +

    Number of true positives

    + + +
    TN
    +

    Number of true negatives

    + + +
    FN
    +

    Number of false negatives

    + + +
    FP
    +

    Number of false positives

    + +
    +
    +

    Value

    + + +

    specificity value

    +
    +
    +

    Details

    +

    Calculate the specificity

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/splitData.html b/docs/reference/splitData.html new file mode 100644 index 000000000..1b6baefdf --- /dev/null +++ b/docs/reference/splitData.html @@ -0,0 +1,197 @@ + +Split the plpData into test/train sets using a splitting settings of class splitSettings — splitData • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Split the plpData into test/train sets using a splitting settings of class splitSettings

    +
    + +
    +
    splitData(
    +  plpData = plpData,
    +  population = population,
    +  splitSettings = splitSettings
    +)
    +
    + +
    +

    Arguments

    +
    plpData
    +

    An object of type plpData - the patient level prediction +data extracted from the CDM.

    + + +
    population
    +

    The population created using createStudyPopulation that define who will be used to develop the model

    + + +
    splitSettings
    +

    An object of type splitSettings specifying the split - the default can be created using createDefaultSplitSetting

    + +
    +
    +

    Value

    + + +

    An object of class splitSettings

    + + +
    +
    +

    Details

    +

    Returns a list containing the training data (Train) and optionally the test data (Test). Train is an Andromeda object containing

    • covariateRef: a table with the covariate information

    • +
    • labels: a table (rowId, outcomeCount, ...) for each data point in the train data (outcomeCount is the class label)

    • +
    • folds: a table (rowId, index) specifying which training fold each data point is in.

    • +

    Test is an Andromeda object containing

    • covariateRef: a table with the covariate information

    • +
    • labels: a table (rowId, outcomeCount, ...) for each data point in the test data (outcomeCount is the class label)

    • +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/toSparseM.html b/docs/reference/toSparseM.html new file mode 100644 index 000000000..7c240da30 --- /dev/null +++ b/docs/reference/toSparseM.html @@ -0,0 +1,206 @@ + +Convert the plpData in COO format into a sparse R matrix — toSparseM • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    Converts the standard plpData to a sparse matrix

    +
    + +
    +
    toSparseM(plpData, cohort = NULL, map = NULL)
    +
    + +
    +

    Arguments

    +
    plpData
    +

    An object of type plpData with covariate in coo format - the patient level prediction +data extracted from the CDM.

    + + +
    cohort
    +

    If specified the plpData is restricted to the rowIds in the cohort (otherwise plpData$labels is used)

    + + +
    map
    +

    A covariate map (telling us the column number for covariates)

    + +
    +
    +

    Value

    + + +

    Returns a list, containing the data as a sparse matrix, the plpData covariateRef +and a data.frame named map that tells us what covariate corresponds to each column +This object is a list with the following components:

    data
    +

    A sparse matrix with the rows corresponding to each person in the plpData and the columns corresponding to the covariates.

    + +
    covariateRef
    +

    The plpData covariateRef.

    + +
    map
    +

    A data.frame containing the data column ids and the corresponding covariateId from covariateRef.

    + + +
    +
    +

    Details

    +

    This function converts the covariate file from ffdf in COO format into a sparse matrix from +the package Matrix

    +
    + +
    +

    Examples

    +
    #TODO
    +
    +
    +
    +
    +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/validateExternal.html b/docs/reference/validateExternal.html new file mode 100644 index 000000000..ce79f9c15 --- /dev/null +++ b/docs/reference/validateExternal.html @@ -0,0 +1,188 @@ + +externalValidatePlp - Validate model performance on new data — validateExternal • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    externalValidatePlp - Validate model performance on new data

    +
    + +
    +
    validateExternal(
    +  validationDesignList,
    +  databaseDetails,
    +  logSettings,
    +  outputFolder
    +)
    +
    + +
    +

    Arguments

    +
    validationDesignList
    +

    A list of objects created with createValidationDesign

    + + +
    databaseDetails
    +

    A list of objects of class +databaseDetails created using createDatabaseDetails

    + + +
    logSettings
    +

    An object of logSettings created +using createLogSettings

    + + +
    outputFolder
    +

    The directory to save the validation results to +(subfolders are created per database in validationDatabaseDetails)

    + +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/validateMultiplePlp.html b/docs/reference/validateMultiplePlp.html new file mode 100644 index 000000000..c75da7891 --- /dev/null +++ b/docs/reference/validateMultiplePlp.html @@ -0,0 +1,202 @@ + +externally validate the multiple plp models across new datasets — validateMultiplePlp • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    This function loads all the models in a multiple plp analysis folder and +validates the models on new data

    +
    + +
    +
    validateMultiplePlp(
    +  analysesLocation,
    +  validationDatabaseDetails,
    +  validationRestrictPlpDataSettings = createRestrictPlpDataSettings(),
    +  recalibrate = NULL,
    +  cohortDefinitions = NULL,
    +  saveDirectory = NULL
    +)
    +
    + +
    +

    Arguments

    +
    analysesLocation
    +

    The location where the multiple plp analyses are

    + + +
    validationDatabaseDetails
    +

    A single or list of validation database settings created using createDatabaseDetails()

    + + +
    validationRestrictPlpDataSettings
    +

    The settings specifying the extra restriction settings when extracting the data created using createRestrictPlpDataSettings().

    + + +
    recalibrate
    +

    A vector of recalibration methods (currently supports 'RecalibrationintheLarge' and/or 'weakRecalibration')

    + + +
    cohortDefinitions
    +

    A list of cohortDefinitions

    + + +
    saveDirectory
    +

    The location to save to validation results

    + +
    +
    +

    Details

    +

    Users need to input a location where the results of the multiple plp analyses +are found and the connection and database settings for the new data

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/viewDatabaseResultPlp.html b/docs/reference/viewDatabaseResultPlp.html new file mode 100644 index 000000000..6ff722a2e --- /dev/null +++ b/docs/reference/viewDatabaseResultPlp.html @@ -0,0 +1,204 @@ + +open a local shiny app for viewing the result of a PLP analyses from a database — viewDatabaseResultPlp • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    open a local shiny app for viewing the result of a PLP analyses from a database

    +
    + +
    +
    viewDatabaseResultPlp(
    +  mySchema,
    +  myServer,
    +  myUser,
    +  myPassword,
    +  myDbms,
    +  myPort = NULL,
    +  myTableAppend
    +)
    +
    + +
    +

    Arguments

    +
    mySchema
    +

    Database result schema containing the result tables

    + + +
    myServer
    +

    server with the result database

    + + +
    myUser
    +

    Username for the connection to the result database

    + + +
    myPassword
    +

    Password for the connection to the result database

    + + +
    myDbms
    +

    database management system for the result database

    + + +
    myPort
    +

    Port for the connection to the result database

    + + +
    myTableAppend
    +

    A string appended to the results tables (optional)

    + +
    +
    +

    Details

    +

    Opens a shiny app for viewing the results of the models from a database

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/viewMultiplePlp.html b/docs/reference/viewMultiplePlp.html new file mode 100644 index 000000000..60cbf1a87 --- /dev/null +++ b/docs/reference/viewMultiplePlp.html @@ -0,0 +1,173 @@ + +open a local shiny app for viewing the result of a multiple PLP analyses — viewMultiplePlp • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    open a local shiny app for viewing the result of a multiple PLP analyses

    +
    + +
    +
    viewMultiplePlp(analysesLocation)
    +
    + +
    +

    Arguments

    +
    analysesLocation
    +

    The directory containing the results (with the analysis_x folders)

    + +
    +
    +

    Details

    +

    Opens a shiny app for viewing the results of the models from various T,O, Tar and settings +settings.

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/reference/viewPlp.html b/docs/reference/viewPlp.html new file mode 100644 index 000000000..5dc710218 --- /dev/null +++ b/docs/reference/viewPlp.html @@ -0,0 +1,186 @@ + +viewPlp - Interactively view the performance and model settings — viewPlp • PatientLevelPrediction + + +
    +
    + + + +
    +
    + + +
    +

    This is a shiny app for viewing interactive plots of the performance and the settings

    +
    + +
    +
    viewPlp(runPlp, validatePlp = NULL, diagnosePlp = NULL)
    +
    + +
    +

    Arguments

    +
    runPlp
    +

    The output of runPlp() (an object of class 'runPlp')

    + + +
    validatePlp
    +

    The output of externalValidatePlp (on object of class 'validatePlp')

    + + +
    diagnosePlp
    +

    The output of diagnosePlp()

    + +
    +
    +

    Value

    + + +

    Opens a shiny app for interactively viewing the results

    +
    +
    +

    Details

    +

    Either the result of runPlp and view the plots

    +
    + +
    + +
    + + +
    + +
    +

    Site built with pkgdown 2.0.7.

    +
    + +
    + + + + + + + + diff --git a/docs/sitemap.xml b/docs/sitemap.xml new file mode 100644 index 000000000..5c677f919 --- /dev/null +++ b/docs/sitemap.xml @@ -0,0 +1,456 @@ + + + + /404.html + + + /articles/AddingCustomFeatureEngineering.html + + + /articles/AddingCustomModels.html + + + /articles/AddingCustomSamples.html + + + /articles/AddingCustomSplitting.html + + + /articles/BenchmarkTasks.html + + + /articles/BestPractices.html + + + /articles/BuildingMultiplePredictiveModels.html + + + /articles/BuildingPredictiveModels.html + + + /articles/ClinicalModels.html + + + /articles/ConstrainedPredictors.html + + + /articles/CreatingLearningCurves.html + + + /articles/CreatingNetworkStudies.html + + + /articles/InstallationGuide.html + + + /articles/Videos.html + + + /articles/index.html + + + /authors.html + + + /index.html + + + /news/index.html + + + /reference/MapIds.html + + + /reference/PatientLevelPrediction.html + + + /reference/accuracy.html + + + /reference/addDiagnosePlpToDatabase.html + + + /reference/addMultipleDiagnosePlpToDatabase.html + + + /reference/addMultipleRunPlpToDatabase.html + + + /reference/addRunPlpToDatabase.html + + + /reference/averagePrecision.html + + + /reference/brierScore.html + + + /reference/calibrationLine.html + + + /reference/computeAuc.html + + + /reference/computeGridPerformance.html + + + /reference/configurePython.html + + + /reference/covariateSummary.html + + + /reference/createCohortCovariateSettings.html + + + /reference/createDatabaseDetails.html + + + /reference/createDatabaseList.html + + + /reference/createDatabaseSchemaSettings.html + + + /reference/createDefaultExecuteSettings.html + + + /reference/createDefaultSplitSetting.html + + + /reference/createExecuteSettings.html + + + /reference/createFeatureEngineeringSettings.html + + + /reference/createLearningCurve.html + + + /reference/createLogSettings.html + + + /reference/createModelDesign.html + + + /reference/createPlpResultTables.html + + + /reference/createPreprocessSettings.html + + + /reference/createRandomForestFeatureSelection.html + + + /reference/createRestrictPlpDataSettings.html + + + /reference/createSampleSettings.html + + + /reference/createSplineSettings.html + + + /reference/createStratifiedImputationSettings.html + + + /reference/createStudyPopulation.html + + + /reference/createStudyPopulationSettings.html + + + /reference/createTempModelLoc.html + + + /reference/createUnivariateFeatureSelection.html + + + /reference/createValidationDesign.html + + + /reference/createValidationSettings.html + + + /reference/diagnoseMultiplePlp.html + + + /reference/diagnosePlp.html + + + /reference/diagnosticOddsRatio.html + + + /reference/evaluatePlp.html + + + /reference/externalValidateDbPlp.html + + + /reference/extractDatabaseToCsv.html + + + /reference/f1Score.html + + + /reference/falseDiscoveryRate.html + + + /reference/falseNegativeRate.html + + + /reference/falseOmissionRate.html + + + /reference/falsePositiveRate.html + + + /reference/fitPlp.html + + + /reference/getCalibrationSummary.html + + + /reference/getCohortCovariateData.html + + + /reference/getDemographicSummary.html + + + /reference/getPlpData.html + + + /reference/getPredictionDistribution.html + + + /reference/getPredictionDistribution_binary.html + + + /reference/getThresholdSummary.html + + + /reference/getThresholdSummary_binary.html + + + /reference/ici.html + + + /reference/index.html + + + /reference/insertCsvToDatabase.html + + + /reference/insertModelDesignInDatabase.html + + + /reference/insertResultsToSqlite.html + + + /reference/listAppend.html + + + /reference/listCartesian.html + + + /reference/loadPlpAnalysesJson.html + + + /reference/loadPlpData.html + + + /reference/loadPlpModel.html + + + /reference/loadPlpResult.html + + + /reference/loadPlpShareable.html + + + /reference/loadPrediction.html + + + /reference/migrateDataModel.html + + + /reference/modelBasedConcordance.html + + + /reference/negativeLikelihoodRatio.html + + + /reference/negativePredictiveValue.html + + + /reference/outcomeSurvivalPlot.html + + + /reference/pfi.html + + + /reference/plotDemographicSummary.html + + + /reference/plotF1Measure.html + + + /reference/plotGeneralizability.html + + + /reference/plotLearningCurve.html + + + /reference/plotPlp.html + + + /reference/plotPrecisionRecall.html + + + /reference/plotPredictedPDF.html + + + /reference/plotPredictionDistribution.html + + + /reference/plotPreferencePDF.html + + + /reference/plotSmoothCalibration.html + + + /reference/plotSparseCalibration.html + + + /reference/plotSparseCalibration2.html + + + /reference/plotSparseRoc.html + + + /reference/plotVariableScatterplot.html + + + /reference/plpDataSimulationProfile.html + + + /reference/positiveLikelihoodRatio.html + + + /reference/positivePredictiveValue.html + + + /reference/predictCyclops.html + + + /reference/predictPlp.html + + + /reference/preprocessData.html + + + /reference/recalibratePlp.html + + + /reference/recalibratePlpRefit.html + + + /reference/runMultiplePlp.html + + + /reference/runPlp.html + + + /reference/savePlpAnalysesJson.html + + + /reference/savePlpData.html + + + /reference/savePlpModel.html + + + /reference/savePlpResult.html + + + /reference/savePlpShareable.html + + + /reference/savePrediction.html + + + /reference/sensitivity.html + + + /reference/setAdaBoost.html + + + /reference/setCoxModel.html + + + /reference/setDecisionTree.html + + + /reference/setGradientBoostingMachine.html + + + /reference/setIterativeHardThresholding.html + + + /reference/setKNN.html + + + /reference/setLassoLogisticRegression.html + + + /reference/setLightGBM.html + + + /reference/setMLP.html + + + /reference/setNaiveBayes.html + + + /reference/setPythonEnvironment.html + + + /reference/setRandomForest.html + + + /reference/setSVM.html + + + /reference/simulatePlpData.html + + + /reference/sklearnFromJson.html + + + /reference/sklearnToJson.html + + + /reference/specificity.html + + + /reference/splitData.html + + + /reference/toSparseM.html + + + /reference/validateExternal.html + + + /reference/validateMultiplePlp.html + + + /reference/viewDatabaseResultPlp.html + + + /reference/viewMultiplePlp.html + + + /reference/viewPlp.html + + diff --git a/vignettes/BestPractices.rmd b/vignettes/BestPractices.rmd index 743938fd9..bfc1bc792 100644 --- a/vignettes/BestPractices.rmd +++ b/vignettes/BestPractices.rmd @@ -23,47 +23,267 @@ output: number_sections: yes toc: yes --- - -```{=html} -``` + ## Best practice publications using the OHDSI PatientLevelPrediction framework -+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ -| Topic | Research Summary | Link | -+=======================+======================================================================================================================================+=====================================================================================================================+ -| Problem Specification | When is prediction suitable in observational data? | Guidelines needed | -+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ -| Data Creation | Comparison of cohort vs case-control design | [Journal of Big Data](https://journalofbigdata.springeropen.com/articles/10.1186/s40537-021-00501-2) | -+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ -| Data Creation | Addressing loss to follow-up (right censoring) | [BMC medical informatics and decision makingk](https://link.springer.com/article/10.1186/s12911-021-01408-x) | -+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ -| Data Creation | Investigating how to address left censoring in features construction | [BMC Medical Research Methodology](https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-021-01370-2) | -+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ -| Data Creation | Impact of over/under-sampling | Paper under review | -+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ -| Data Creation | Impact of phenotypes | Study Done - Paper submitted | -+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ -| Model development | How much data do we need for prediction - Learning curves at scale | [International Journal of Medical Informatics](https://www.sciencedirect.com/science/article/pii/S1386505622000764) | -+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ -| Model development | What impact does test/train/validation design have on model performance | [BMJ Open](https://bmjopen.bmj.com/content/11/12/e050146) | -+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ -| Model development | What is the impact of the classifier | [JAMIA](https://academic.oup.com/jamia/article/25/8/969/4989437?login=true) | -+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ -| Model development | Can we find hyper-parameter combinations per classifier that consistently lead to good performing models when using claims/EHR data? | Study needs to be done | -+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ -| Model development | Can we use ensembles to combine different algorithm models within a database to improve models transportability? | Study Complete | -+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ -| Model development | Can we use ensembles to combine models developed using different databases to improve models transportability? | [BMC Medical Informatics and Decision Making](https://link.springer.com/article/10.1186/s12911-022-01879-6) | -+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ -| Evaluation | How should we present model performance? (e.g., new visualizations) | [JAMIA Open](https://academic.oup.com/jamiaopen/article/4/1/ooab017/6168493?searchresult=1) | -+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ -| Evaluation | How to interpret external validation performance (can we figure out why the performance drops or stays consistent)? | Study needs to be done | -+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ -| Evaluation | Recalibration methods | Study needs to be done | -+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ -| Evaluation | Is there a way to automatically simplify models? | [Study protocol under development](https://ohdsi-studies.github.io/FeatureSelectionComparison/docs/Protocol.html) | -+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +Topic + +Research Summary + +Link +
    +Problem Specification + +When is prediction suitable in observational data? + +Guidelines needed +
    +Data Creation + +Comparison of cohort vs case-control design + +Journal of Big Data +
    +Data Creation + +Addressing loss to follow-up (right censoring) + +BMC medical informatics and decision makingk +
    +Data Creation + +Investigating how to address left censoring in features construction + +BMC Medical Research Methodology +
    +Data Creation + +Impact of over/under-sampling + + Journal of big data +
    +Data Creation + +Impact of phenotypes + +Study Done - Paper submitted +
    +Model development + +How much data do we need for prediction - Learning curves at scale + +International Journal of Medical Informatics +
    +Model development + +What impact does test/train/validation design have on model performance + +BMJ Open +
    +Model development + +What is the impact of the classifier + +JAMIA +
    +Model development + +Can we find hyper-parameter combinations per classifier that consistently lead to good performing models when using claims/EHR data? + +Study needs to be done +
    +Model development + +Can we use ensembles to combine different algorithm models within a database to improve models transportability? + + Caring is Sharing–Exploiting the Value in Data for Health and Innovation +
    +Model development + +Can we use ensembles to combine models developed using different databases to improve models transportability? + + BMC Medical Informatics and Decision Making +
    +Model development + +Impact of regularization method + + JAMIA +
    +Evaluation + +Why prediction is not suitable for risk factor identification + + Machine Learning for Healthcare Conference +
    +Evaluation + +Iterative pairwise external validation to put validation into context + + Drug Safety +
    +Evaluation + +A novel method to estimate external validation using aggregate statistics + + Study under review +
    +Evaluation + +How should we present model performance? (e.g., new visualizations) + +JAMIA Open +
    +Evaluation + +How to interpret external validation performance (can we figure out why the performance drops or stays consistent)? + +Study needs to be done +
    +Evaluation + +Recalibration methods + +Study needs to be done +
    +Evaluation + +Is there a way to automatically simplify models? + +Study protocol under development +
    + diff --git a/vignettes/ClinicalModels.rmd b/vignettes/ClinicalModels.rmd new file mode 100644 index 000000000..3b6a5e5ae --- /dev/null +++ b/vignettes/ClinicalModels.rmd @@ -0,0 +1,46 @@ +--- +title: "Clinical Models" +author: "Jenna Reps, Peter R. Rijnbeek" +date: '`r Sys.Date()`' +header-includes: + - \usepackage{fancyhdr} + - \pagestyle{fancy} + - \fancyhead{} + - \fancyhead[CO,CE]{Installation Guide} + - \fancyfoot[CO,CE]{PatientLevelPrediction Package Version `r utils::packageVersion("PatientLevelPrediction")`} + - \fancyfoot[LE,RO]{\thepage} + - \renewcommand{\headrulewidth}{0.4pt} + - \renewcommand{\footrulewidth}{0.4pt} +output: + pdf_document: + includes: + in_header: preamble.tex + number_sections: yes + toc: yes + word_document: + toc: yes + html_document: + number_sections: yes + toc: yes +--- + +```{=html} + +``` + +## Clinical models developed using the OHDSI PatientLevelPrediction framework + +| Title | Link | +|----------------------|-------| +| Using Machine Learning Applied to Real-World Healthcare Data for Predictive Analytics: An Applied Example in Bariatric Surgery | [Value in Health](https://www.sciencedirect.com/science/article/pii/S1098301519300737) | +| Development and validation of a prognostic model predicting symptomatic hemorrhagic transformation in acute ischemic stroke at scale in the OHDSI network | [PLoS One](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0226718) | +| Wisdom of the CROUD: development and validation of a patient-level prediction model for opioid use disorder using population-level claims data | [PLoS One](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0228632) | +| Developing predictive models to determine Patients in End-of-life Care in Administrative datasets | [Drug Safety](https://link.springer.com/article/10.1007/s40264-020-00906-7) | +| Predictors of diagnostic transition from major depressive disorder to bipolar disorder: a retrospective observational network study | [Translational psychiatry](https://www.nature.com/articles/s41398-021-01760-6) | +| Seek COVER: using a disease proxy to rapidly develop and validate a personalized risk calculator for COVID-19 outcomes in an international network | [BMC Medical Research Methodology](https://link.springer.com/article/10.1186/s12874-022-01505-z) | +| 90-Day all-cause mortality can be predicted following a total knee replacement: an international, network study to develop and validate a prediction model | [Knee Surgery, Sports Traumatology, Arthroscopy](https://link.springer.com/article/10.1007/s00167-021-06799-y) | +| Machine learning and real-world data to predict lung cancer risk in routine care | [Cancer Epidemiology, Biomarkers & Prevention](https://aacrjournals.org/cebp/article-abstract/32/3/337/718495) | +| Development and validation of a patient-level model to predict dementia across a network of observational databases | [BMC medicine](https://link.springer.com/article/10.1186/s12916-024-03530-9) | \ No newline at end of file