Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model folder upload #195

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open

Model folder upload #195

wants to merge 2 commits into from

Conversation

azimov
Copy link
Collaborator

@azimov azimov commented Dec 10, 2024

I cannot find a reason this folder would be called 'dbmodels' and in all executions 'models' is the folder found. Users can adjust this setting in PLP, but not within Strategus

Copy link

codecov bot commented Dec 10, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.55%. Comparing base (a599bb6) to head (728ad57).

Additional details and impacted files
@@           Coverage Diff            @@
##           develop     #195   +/-   ##
========================================
  Coverage    98.55%   98.55%           
========================================
  Files           14       14           
  Lines         3450     3450           
========================================
  Hits          3400     3400           
  Misses          50       50           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -130,7 +130,7 @@ PatientLevelPredictionModule <- R6::R6Class(
csvFolder = resultsFolder,
connectionDetails = resultsConnectionDetails,
databaseSchemaSettings = databaseSchemaSettings,
modelSaveLocation = file.path(resultsFolder, "dbmodels"),
modelSaveLocation = file.path(resultsFolder, "models"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was taken from https://github.com/OHDSI/PatientLevelPredictionModule/blob/main/Main.R#L197-L204. Tagging @jreps to confirm she has no concerns otherwise this looks fine to me.

@jreps
Copy link
Collaborator

jreps commented Dec 11, 2024

I think there is a bug somewhere in the module or PLP as that folder 'dbmodels' should be created without issue - this pull request will let users upload the results but does not solve the main issue (there is something going wrong somewhere).

@anthonysena
Copy link
Collaborator

Tracing this: the PLP module's execute function calls PatientLevelPrediction::extractDatabaseToCsv (ref). Within extractDatabaseToCsv the 'models' folder is hard-coded (ref).

@jreps just let us know if you believe this is a bug in PLP or not. Thanks!

@jreps
Copy link
Collaborator

jreps commented Dec 11, 2024

that being hard coded is fine. The function PatientLevelPrediction::insertCsvToDatabase takes as input the modelSaveLocation and that is called in PatientLevelPrediction::addRunPlpToDatabase which calls PatientLevelPrediction:::insertModelInDatabase. PatientLevelPrediction:::insertModelInDatabase has code to create the modelSaveLocation - so I don't see why there is an issue saying modelSaveLocation doesn't exist.

@anthonysena
Copy link
Collaborator

anthonysena commented Dec 11, 2024

Looking at some PLP results I have from running Strategus shows the following directory structure under the results:

Subdirectories for <database>/strategusOutput/PatientLevelPredictionModule/models

  • folder-1-1
  • folder-2-1
  • folder-3-1
  • folder-4-1

And another set of subdirectories for <database>/strategusOutput/PatientLevelPredictionModule/dbmodels/models

  • folder-1-3
  • folder-2-3
  • folder-3-3
  • folder-4-3

Looking at the contents of the subfolders, the contents of models/folder-1-1 have the identical files (and file sizes) of the ones found in dbmodels/models/folder-1-3 and so on.

Not sure what the intended behavior should be here but it looks like specifying modelSaveLocation = file.path(resultsFolder, "dbmodels") is creating another set of exported models in a different location.

@jreps
Copy link
Collaborator

jreps commented Dec 12, 2024

yeah - that seems to be working then. I don't know why Jamie was hitting an issue that said the modelSaveLocation did not exist when it was set to be file.path(resultsFolder, "dbmodels"). That error is the problem, but the results you have seem to suggest things are working fine in those examples.

In PLP when we export the results to csv we cannot make a model a csv, it is just too complex to be represented as a csv, so everything else gets saved as a csv and the models get saved in their own folder. In this case the module sets that to be the a subdirectory 'models' in the resultFolder. Similarly, when you insert results from R into a database it asks for you to specify a directory for the models that the database knows about as the models cannot be saved into the database (many databases have a limit on the text length), the modelSaveLocation is where the models are saved for the database. When we extract the results from csv it creates the models into the 'model' folder and then when you load into the database it also saves these models (but this could, and most of the time should be, a different location), in the module it is set to the 'dbmodels' folder and that is why you get the two duplicate folders.

Jamie mentioned a silent error while uploading, but this resolves when the database model folder is the same as the csv model folder (that is why he made the pull request changes), but that means there is an error somewhere, since it should be possible for these to be different locations. However, I can't see why there is an error since the PLP function for inserting the model creates the database model directory if it doesn't exist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants