Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Units of time are not consistent in the executionTimes.csv output file #1144

Open
ablack3 opened this issue Oct 9, 2024 · 1 comment
Open

Comments

@ablack3
Copy link
Collaborator

ablack3 commented Oct 9, 2024

It seem like the executionTimes.csv file records one second, one minute, and one hour as 1 in the executionTime column. Let's use a consistent time unit like minutes for these times.

remotes::install_github("ohdsi/CohortDiagnostics")
exportFolder <- tempfile()
dir.create(exportFolder)
CohortDiagnostics:::timeExecution(
  exportFolder,
  taskName = "test 1 second",
  cohortIds = NULL,
  parent = NULL,
  start = NA,
  execTime = NA,
  expr = {
    Sys.sleep(1)
  })
#>            task           startTime cohortIds executionTime parent
#> 1 test 1 second 2024-10-09 07:31:13           1.005063 secs

CohortDiagnostics:::timeExecution(
  exportFolder,
  taskName = "test 1 minute",
  cohortIds = NULL,
  parent = NULL,
  start = NA,
  execTime = NA,
  expr = {
    Sys.sleep(60)
  })
#>            task           startTime cohortIds executionTime parent
#> 1 test 1 minute 2024-10-09 07:31:14            1.00007 mins

start <- as.POSIXct("2024-10-09 03:37:46") 
oneHour <- start - as.POSIXct("2024-10-09 02:37:46")
CohortDiagnostics:::timeExecution(
  exportFolder,
  taskName = "test 1 hour",
  cohortIds = NULL,
  parent = NULL,
  start = start,
  execTime = oneHour,
  expr = NULL)
#>          task           startTime cohortIds executionTime parent
#> 1 test 1 hour 2024-10-09 03:37:46                 1 hours

list.files(exportFolder)
#> [1] "executionTimes.csv"
readr::read_csv(file.path(exportFolder, "executionTimes.csv"))
#> Rows: 3 Columns: 5
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr  (1): task
#> dbl  (1): executionTime
#> lgl  (2): cohortIds, parent
#> dttm (1): startTime
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 3 × 5
#>   task          startTime           cohortIds executionTime parent
#>   <chr>         <dttm>              <lgl>             <dbl> <lgl> 
#> 1 test 1 second 2024-10-09 05:31:13 NA                 1.01 NA    
#> 2 test 1 minute 2024-10-09 05:31:14 NA                 1.00 NA    
#> 3 test 1 hour   2024-10-09 01:37:46 NA                 1    NA

Created on 2024-10-09 with reprex v2.1.1

Here is an example execution times file where I think the different units make it confusing.
exportConceptInformation took 22 (seconds or minutes?) while executeDiagnostics took 18 (hours?). I'm not quite sure what the units are.

image
@ablack3 ablack3 changed the title Time units are not consitant in the execution_times.csv output file Time units are not consitant in the executionTimes.csv output file Oct 9, 2024
@ablack3 ablack3 changed the title Time units are not consitant in the executionTimes.csv output file Units of time are not consistent in the executionTimes.csv output file Oct 9, 2024
ablack3 added a commit to darwin-eu-dev/CohortDiagnostics that referenced this issue Oct 9, 2024
@ablack3
Copy link
Collaborator Author

ablack3 commented Oct 9, 2024

This is fixed in the Darwin Sprint branch we are currently working on. I was thinking of renaming the column executionTime to executionTimeMinutes just to make the unit clear but I'm not sure if that will break anything or if it is clear enough without changing the column name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant