-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
computeStandardizedDifference does not handle temporal covariate data #225
Comments
I have refactored the original code. Please feel free to use it |
Adding a reprex as suggested by @ginberg to illustrate the problem @gowthamrao has described: packageVersion("FeatureExtraction")
#> [1] '3.4.0'
# 4283893204 = condition_era group: Sinusitis
temporalCovariateSettings <- FeatureExtraction::createTemporalCovariateSettings(
useConditionEraGroupOverlap = TRUE,
temporalStartDays = c(-365, -364),
temporalEndDays = c(-365, -364),
includedCovariateConceptIds = 4283893
)
# Execute the analysis on Eunomia
connectionDetails <- Eunomia::getEunomiaConnectionDetails()
Eunomia::createCohorts(
connectionDetails = connectionDetails
)
#> Connecting using SQLite driver
#> Creating cohort: Celecoxib
#> | | | 0% | |======================================================================| 100%
#> Executing SQL took 0.026 secs
#> Creating cohort: Diclofenac
#> | | | 0% | |======================================================================| 100%
#> Executing SQL took 0.0169 secs
#> Creating cohort: GiBleed
#> | | | 0% | |======================================================================| 100%
#> Executing SQL took 0.0279 secs
#> Creating cohort: NSAIDs
#> | | | 0% | |======================================================================| 100%
#> Executing SQL took 0.0735 secs
#> Cohorts created in table main.cohort
#> cohortId name
#> 1 1 Celecoxib
#> 2 2 Diclofenac
#> 3 3 GiBleed
#> 4 4 NSAIDs
#> description
#> 1 A simplified cohort definition for new users of celecoxib, designed specifically for Eunomia.
#> 2 A simplified cohort definition for new users ofdiclofenac, designed specifically for Eunomia.
#> 3 A simplified cohort definition for gastrointestinal bleeding, designed specifically for Eunomia.
#> 4 A simplified cohort definition for new users of NSAIDs, designed specifically for Eunomia.
#> count
#> 1 1844
#> 2 850
#> 3 479
#> 4 2694
covariateData <- FeatureExtraction::getDbCovariateData(
connectionDetails = connectionDetails,
cdmDatabaseSchema = "main",
cohortDatabaseSchema = "main",
cohortTable = "cohort",
covariateSettings = temporalCovariateSettings,
aggregated = TRUE
)
#> Connecting using SQLite driver
#> Currently in a tryCatch or withCallingHandlers block, so unable to add global calling handlers. ParallelLogger will not capture R messages, errors, and warnings, only explicit calls to ParallelLogger. (This message will not be shown again this R session)
#> Sending temp tables to server
#> Inserting data took 0.0144 secs
#> Inserting data took 0.0312 secs
#> Constructing features on server
#> | | | 0% | |===== | 8% | |=========== | 15% | |================ | 23% | |====================== | 31% | |=========================== | 38% | |================================ | 46% | |====================================== | 54% | |=========================================== | 62% | |================================================ | 69% | |====================================================== | 77% | |=========================================================== | 85% | |================================================================= | 92% | |======================================================================| 100%
#> Executing SQL took 0.102 secs
#> Fetching data from server
#> Warning: Low disk space in 'C:/Users/asena5/AppData/Local/Temp/1/Rtmp2fzzCg'. Only 9.5 GB left.
#> Use options(warnDiskSpaceThreshold = <n>) to set the number of bytes for this warning to trigger.
#> This warning will not be shown for this file location again during this R session.
#> Fetching data took 0.177 secs
covariateData$covariates
#> # Source: table<covariates> [4 x 5]
#> # Database: sqlite 3.41.2 [C:\Users\asena5\AppData\Local\Temp\1\Rtmp2fzzCg\file4085a72dbc.sqlite]
#> cohortDefinitionId covariateId timeId sumValue averageValue
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 4283893204 1 4 0.00217
#> 2 1 4283893204 2 4 0.00217
#> 3 4 4283893204 1 4 0.00148
#> 4 4 4283893204 2 4 0.00148
FeatureExtraction::computeStandardizedDifference(
covariateData1 = covariateData,
covariateData2 = covariateData,
cohortId1 = 1,
cohortId2 = 4
)
#> # A tibble: 4 × 8
#> covariateId mean1 sd1 mean2 sd2 sd stdDiff covariateName
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 4283893204 0.00217 0.0465 0.00148 0.0385 0.0427 -0.0160 condition_era group:…
#> 2 4283893204 0.00217 0.0465 0.00148 0.0385 0.0427 -0.0160 condition_era group:…
#> 3 4283893204 0.00217 0.0465 0.00148 0.0385 0.0427 -0.0160 condition_era group:…
#> 4 4283893204 0.00217 0.0465 0.00148 0.0385 0.0427 -0.0160 condition_era group:… Created on 2024-02-16 with reprex v2.1.0 As mentioned in this issue the resulting output lacks the |
Hi @ginberg @anthonysena this issue seems to apply to create table 1 function also. if you continue with @anthonysena documentation with this function, it will cause an error
|
The covariates is covariateData object from temporalAnalysis settings has timeId. computeStandardizedDifference appears to not know that. it only joins by covariateId, instead of covariateId, timeId. This causes a cartesian product.
The text was updated successfully, but these errors were encountered: