Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to use the exact measurement values in PLP? #459

Closed
arani11 opened this issue Jun 7, 2024 · 3 comments
Closed

Is there a way to use the exact measurement values in PLP? #459

arani11 opened this issue Jun 7, 2024 · 3 comments

Comments

@arani11
Copy link

arani11 commented Jun 7, 2024

We used the PLP package to develop a model and weight came out as the top predictor. However, our understanding is that in the PLP this is used as a binary variable (yes/ no) for whether weight was recorded or not. We are interested in using weight as a continuous variable for our prediction model using the PLP package and wondered if there was a way to do this?

@egillax
Copy link
Collaborator

egillax commented Jun 7, 2024

Hi @arani11 ,

This depends on how you specify your covariateSettings. For example to extract age, sex and measurement values I'd use:

plpData <- getPlpData(
  databaseDetails = myDatabaseDetails,
  covariateSettings = FeatureExtraction::createCovariateSettings(
    useDemographicsAge = TRUE,
    useDemographicsGender = TRUE,
    useMeasurementValueLongTerm = TRUE
  ),
  restrictPlpDataSettings = createRestrictPlpDataSettings())

Or alternatively if you're using runMultiplePlp the covariateSettings are defined the same way in the modelDesign

modelDesign <- createModelDesign(
  targetId = 1,
  outcomeId = 2,
  restrictPlpDataSettings = createRestrictPlpDataSettings(),
  populationSettings = popSettings,
  covariateSettings = FeatureExtraction::createCovariateSettings(
    useDemographicsAge = TRUE,
    useDemographicsGender = TRUE,
    useMeasurementValueLongTerm = TRUE
  ),
  preprocessSettings = createPreprocessSettings(),
  splitSettings = createDefaultSplitSetting(splitSeed = 42),
  modelSettings = setLassoLogisticRegression(seed=42)
)

Here I use useMeasurementValueLongTerm to use the measurement values themselves (in last year) instead of only the binary indicators (which use useMeasurementLongTerm). If you're using default covariateSettings I think it uses the binary indicator ones.

Be careful though, missing measurement values will be treated as zero. You would need to use some kind of imputation to handle that which is possible using the featureEngineering functionality. Recently createStratifiedImputationSettings function was added as a feature engineering method, which is the first imputation method built into the package. If you need any help with that let me know and I can throw together a short example. Either with that one or a simple mean imputation.

@egillax
Copy link
Collaborator

egillax commented Dec 9, 2024

There will be more imputation methods added to 6.4. Tracked with #461. Closing this issue in favour of that one.

@egillax egillax closed this as completed Dec 9, 2024
@arani11
Copy link
Author

arani11 commented Dec 10, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants