Skip to content

Commit

Permalink
Merge branch 'main' of https://github.com/OHDSI/Hades
Browse files Browse the repository at this point in the history
# Conflicts:
#	docs/renv.html
  • Loading branch information
schuemie committed Dec 3, 2024
2 parents 9bf4b3a + a94c2b1 commit 9cb580a
Show file tree
Hide file tree
Showing 15 changed files with 4,179 additions and 590 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/R_CMD_check_Hades.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ jobs:
- name: Download package tarball
if: ${{ env.new_version != '' }}
uses: actions/download-artifact@v2
uses: actions/download-artifact@v4.1.7
with:
name: package_tarball

Expand Down
4 changes: 3 additions & 1 deletion Rmd/installingHades.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -46,11 +46,13 @@ These releases are currently captured as [renv](https://rstudio.github.io/renv/a
- 2023Q1: [renv lock file](https://raw.githubusercontent.com/OHDSI/Hades/main/hadesWideReleases/2023Q1/renv.lock)
- 2023Q3: [renv lock file](https://raw.githubusercontent.com/OHDSI/Hades/main/hadesWideReleases/2023Q3/renv.lock)
- 2024Q1: [renv lock file](https://raw.githubusercontent.com/OHDSI/Hades/main/hadesWideReleases/2024Q1/renv.lock)
- 2024Q3: [renv lock file](https://raw.githubusercontent.com/OHDSI/Hades/refs/heads/main/hadesWideReleases/2024Q3/renv.lock)


To build the R library corresponding to the latest release in your current RStudio project, you can use:

```r
download.file("https://raw.githubusercontent.com/OHDSI/Hades/main/hadesWideReleases/2024Q1/renv.lock", "renv.lock")
download.file("https://raw.githubusercontent.com/OHDSI/Hades/refs/heads/main/hadesWideReleases/2024Q3/renv.lock", "renv.lock")
install.packages("renv")
renv::restore()
```
2 changes: 1 addition & 1 deletion Rmd/rSetup.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ We will need to install C and Fortran compilers in order to build R packages fro
1. Open the App Store in Mac OS and install Xcode. Xcode is a large program. If disk space is scarce you could also try only installing Xcode command line tools by following the instructions [here](https://railsapps.github.io/xcode-command-line-tools.html). Verify that the C compiler gcc is installed by opening the terminal and running the command `clang`. You should see an error that says "no input files".
2. Download and install the gfortran compiler `gfortran-12.2-universal.pkg` from <https://cran.r-project.org/bin/macosx/tools/>. Verify the installation by opening the terminal and running the command `/opt/gfortran/bin/gfortran`. You should see an error that says "no input files". R 4.2.3 will look in the wrong place for gfortran, so you must point it to the right location. Run `usethis::edit_r_makevars()` to open `~.R/Makevars` and add:
2. Download and install the gfortran compiler `gfortran-12.2-universal.pkg` from <https://cran.r-project.org/bin/macosx/tools/>. Verify the installation by opening the terminal and running the command `/opt/gfortran/bin/gfortran`. You should see an error that says "no input files". R 4.4.1 will look in the wrong place for gfortran, so you must point it to the right location. Run `usethis::edit_r_makevars()` to open `~.R/Makevars` and add:
```{r, eval=FALSE}
F77 = /opt/gfortran/bin/gfortran
Expand Down
5 changes: 4 additions & 1 deletion docs/installingHades.html
Original file line number Diff line number Diff line change
Expand Up @@ -492,10 +492,13 @@ <h1>HADES-wide releases</h1>
<li>2024Q1: <a
href="https://raw.githubusercontent.com/OHDSI/Hades/main/hadesWideReleases/2024Q1/renv.lock">renv
lock file</a></li>
<li>2024Q3: <a
href="https://raw.githubusercontent.com/OHDSI/Hades/refs/heads/main/hadesWideReleases/2024Q3/renv.lock">renv
lock file</a></li>
</ul>
<p>To build the R library corresponding to the latest release in your
current RStudio project, you can use:</p>
<pre class="r"><code>download.file(&quot;https://raw.githubusercontent.com/OHDSI/Hades/main/hadesWideReleases/2024Q1/renv.lock&quot;, &quot;renv.lock&quot;)
<pre class="r"><code>download.file(&quot;https://raw.githubusercontent.com/OHDSI/Hades/refs/heads/main/hadesWideReleases/2024Q3/renv.lock&quot;, &quot;renv.lock&quot;)
install.packages(&quot;renv&quot;)
renv::restore()</code></pre>
</div>
Expand Down
14 changes: 7 additions & 7 deletions docs/rSetup.html
Original file line number Diff line number Diff line change
Expand Up @@ -436,8 +436,8 @@ <h1 class="title toc-ignore">Setting up the R environment</h1>
<li><strong>R</strong> is a statistical computing environment. It comes
with a basic user interface that is primarily a command-line interface.
For stability, we’ve picked one R version that we aim to support for a
while moving forward. Since 2023-04-01 our target is <strong>R
4.2.3</strong>. We highly recommend installing R 4.2.3 for maximum
while moving forward. Since 2024-06-14 our target is <strong>R
4.4.1</strong>. We highly recommend installing R 4.4.1 for maximum
compatibility.</li>
<li><strong>RTools</strong> is a set of programs that is required on
Windows to build R packages from source.</li>
Expand Down Expand Up @@ -468,8 +468,8 @@ <h1>Instructions for Windows</h1>
<h2>Installing R</h2>
<ol style="list-style-type: decimal">
<li><p>Go to <a
href="https://cran.r-project.org/bin/windows/base/old/4.2.3/">https://cran.r-project.org/bin/windows/base/old/4.2.3/</a>,
click on ‘R-4.2.3-win.exe’ to download.</p></li>
href="https://cran.r-project.org/bin/windows/base/old/4.4.1/">https://cran.r-project.org/bin/windows/base/old/4.4.1/</a>,
click on ‘R-4.4.1-win.exe’ to download.</p></li>
<li><p>After the download has completed, run the installer. Use the
default options everywhere, with two exceptions: First, it is better not
to install into program files. Instead, just make R a subfolder of your
Expand All @@ -487,7 +487,7 @@ <h2>Installing R</h2>
<h2>Installing RTools</h2>
<ol style="list-style-type: decimal">
<li><p>Go to <a
href="https://cran.r-project.org/bin/windows/Rtools/rtools42/rtools.html">https://cran.r-project.org/bin/windows/Rtools/rtools42/rtools.html</a>,
href="https://cran.r-project.org/bin/windows/Rtools/rtools44/rtools.html">https://cran.r-project.org/bin/windows/Rtools/rtools44/rtools.html</a>,
and download the RTools installer.</p></li>
<li><p>After downloading has completed run the installer. Select the
default options everywhere.</p></li>
Expand Down Expand Up @@ -535,7 +535,7 @@ <h2>Installing R</h2>
<ol style="list-style-type: decimal">
<li><p>Go to <a href="https://cran.r-project.org/bin/macosx/base/"
class="uri">https://cran.r-project.org/bin/macosx/base/</a>, click on
‘R-4.2.3.pkg’ to download.</p></li>
‘R-4.4.1.pkg’ to download.</p></li>
<li><p>After the download has completed, run the installer and accept
all of the default options. You should see a screen indicating that the
installation was successful.</p>
Expand All @@ -561,7 +561,7 @@ <h2>Installing R build tools</h2>
class="uri">https://cran.r-project.org/bin/macosx/tools/</a>. Verify the
installation by opening the terminal and running the command
<code>/opt/gfortran/bin/gfortran</code>. You should see an error that
says “no input files”. R 4.2.3 will look in the wrong place for
says “no input files”. R 4.4.1 will look in the wrong place for
gfortran, so you must point it to the right location. Run
<code>usethis::edit_r_makevars()</code> to open
<code>~.R/Makevars</code> and add:</p>
Expand Down
235 changes: 235 additions & 0 deletions extras/AnalyzeHadesIssues.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,235 @@
library(httr)
library(jsonlite)
library(dplyr)

cacheFolder <- "e:/temp/issueCache"

# Fetch all issues and comments for all HADES repos ----------------------------
issueToRow <- function(issue) {
row <- tibble(
number = issue$number,
title = issue$title,
body = sprintf("%s:\n%s", issue$user$login, issue$body),
closed = issue$state == "closed",
dateCreated = as.Date(gsub("T.*$", "", issue$created_at)),
dateUpdated = as.Date(gsub("T.*$", "", issue$updated_at)),
dateClosed = if (is.null(issue$closed_at)) as.Date(NA) else as.Date(gsub("T.*$", "", issue$closed_at))
)
return(row)
}

# Function to get all issues (open and closed) from a GitHub repository
getIssuesFromRepo <- function(repo, token) {
url <- paste0("https://api.github.com/repos/ohdsi/", repo, "/issues?state=all&per_page=100")
issues <- list()

# Paginate through all issues
while(!is.null(url)) {
response <- GET(url, add_headers(Authorization = paste("token", token)))

if(status_code(response) != 200) {
stop("Failed to fetch data from GitHub API. Status code: ", status_code(response))
}

issuesPage <- content(response, as = "parsed", type = "application/json")
issues <- append(issues, lapply(issuesPage, issueToRow))

# Check for pagination and extract the next URL if available
url <- headers(response)$`link`
if(!is.null(url)) {
nextLink <- gsub(".*<(.*)>; rel=\"next\".*", "\\1", url)
if (nextLink == url) {
url <- NULL
} else {
url <- nextLink
}
}
}
issues <- bind_rows(issues) |>
mutate(repo = !!repo)
return(issues)
}

# Function to extract all messages (comments) from a particular issue
getIssueComments <- function(repo, issue_number, token) {
url <- paste0("https://api.github.com/repos/ohdsi/", repo, "/issues/", issue_number, "/comments")
response <- GET(url, add_headers(Authorization = paste("token", token)))

if(status_code(response) != 200) {
stop("Failed to fetch issue comments. Status code: ", status_code(response))
}

comments <- content(response, as = "parsed", type = "application/json")
if (length(comments) == 0) {
return("")
} else {
text <- sapply(comments, function(x) sprintf("%s:\n%s", x$user$login, x$body))
text <- paste(text, collapse = "\n\n")
return(text)
}
}

# Main function to fetch all issues and comments for a list of repositories
fetchAllIssuesAndComments <- function(repositories, token, cacheFolder) {
if (!dir.exists(cacheFolder))
dir.create(cacheFolder, recursive = TRUE)
results <- list()

for (repo in repositories) {
fileName <- file.path(cacheFolder, sprintf("issues_%s.rds", repo))
if (file.exists(fileName)) {
issues <- readRDS(fileName)
} else {
cat("Fetching issues for repo:", repo, "\n")
issues <- getIssuesFromRepo(repo, token)
issues <- issues |>
mutate(comments = "")
for (i in seq_len(nrow(issues))) {
issue <- issues[i, ]
issue_number <- issue$number
issues$comments[i] <- getIssueComments(repo, issue_number, token)
}
saveRDS(issues, fileName)
}
results[[repo]] <- issues
}
results <- bind_rows(results)
return(results)
}

# repositories <- c("SelfControlledCaseSeries", "CohortMethod")
repositories <- readr::read_csv("extras/packages.csv")
issues <- fetchAllIssuesAndComments(repositories$name, Sys.getenv("GITHUB_PAT"), cacheFolder)
saveRDS(issues, file.path(cacheFolder, "allIssues.rds"))


# Analyze issues with comments using GPT-4o ------------------------------------
getGpt4Response <- function(systemPrompt, prompt) {
json <- jsonlite::toJSON(
list(
messages = list(
list(
role = "system",
content = systemPrompt
),
list(
role = "user",
content = prompt
),
list(
role = "assistant",
content = ""
)
)
),
auto_unbox = TRUE
)

response <- POST(
url = keyring::key_get("genai_gpt4o_endpoint"),
body = json,
add_headers("Content-Type" = "application/json",
"api-key" = keyring::key_get("genai_api_gpt4_key"))
)
result <- content(response, "text", encoding = "UTF-8")
result <- jsonlite::fromJSON(result)
text <- result$choices$message$content
return(text)
}

issues <- readRDS(file.path(cacheFolder, "allIssues.rds"))

systemPrompt <- "You are an expert in health analytics and data platforms with a focus on identifying developer burden related to specific database and query engines. Your goal is to classify issues by their relevance to specific platforms such as bigquery, duckdb, oracle, postgresql, redshift, snowflake, spark (including DataBricks), sql server, sqlite, and synapse. Be aware that in HADES packages all source SQL lives in the `inst/sql/sql_server` folder, from where it is translated, so only a mention of the `inst/sql/sql_server` folder does not imply SQL Server is involved. Additionally, you must determine whether each issue would have been raised if the platform in question was not supported. Your responses should be concise, and fit the exact specified output format so it can be parsed."


promptTemplate <- '
You are given an issue from the OHDSI Health-Analytics-Data-to-Evidence Suite (HADES) repositories. Based on the issue title, body, and comments, please answer the following:
### Issue Information:
--- Title start
%s
--- Title end
--- Body start
%s
--- Body end
--- Comments start
%s
--- Comments End
### Questions:
1. **Platform Relevance**: Which platform(s) is directly relevant to this issue? Choose from the following: bigquery, duckdb, oracle, postgresql, redshift, snowflake, spark (includes DataBricks), sql server, sqlite, synapse, or "none" if it is not platform-specific.
2. **Necessity of Platform Support**: Would this issue have been raised if the platform(s) identified above was/were not supported? Answer "yes" or "no" and provide a brief explanation.
### Expected Output Format:
{
"platforms": ["{platform1}", "{platform2}", ...],
"would_exist_without_platform": "yes/no"
}
'

gpt4CacheFolder <- file.path(cacheFolder, "gpt4Responses")
dir.create(gpt4CacheFolder)
pb <- txtProgressBar(style = 3)
for (i in seq_len(nrow(issues))) {
issue <- issues[i, ]
fileName <- file.path(gpt4CacheFolder, sprintf("%s_issue%s.txt", issue$repo, issue$number))
if (file.exists(fileName)) {
response <- paste(readLines(fileName), collapse = "\n")
} else {
prompt <- sprintf(promptTemplate, issue$title, issue$body, issue$comments)
response <- getGpt4Response(systemPrompt, prompt)
writeLines(response, fileName)
}
response <- gsub("}.*", "}", gsub("```", "", gsub("```json", "", response)))
parsed <- jsonlite::fromJSON(response)
platforms <- gsub(" \\(includes DataBricks\\)", "", paste(parsed$platforms, collapse = ";"))
issues$platforms[i] <- platforms
issues$existWithoutPlatform[i] <- parsed$would_exist_without_platform == "yes"
setTxtProgressBar(pb, i / nrow(issues))
}
close(pb)

issues <- issues |>
select(repo, number, closed, dateCreated, dateUpdated, dateClosed, platforms, existWithoutPlatform)
saveRDS(issues, file.path(cacheFolder, "allIssuesWithPlatforms.rds"))


# Compute numbers per platform and repo ----------------------------------------
issues <- readRDS(file.path(cacheFolder, "allIssuesWithPlatforms.rds"))

platforms <- readr::read_csv("extras/supportedPlatforms.csv")
platforms <- platforms |>
filter(status == "Supported") |>
pull("abbreviation")
counts <- list()
for (i in seq_along(platforms)) {
platform <- platforms[i]
counts[[i]] <- issues |>
filter(grepl(platform, platforms),
existWithoutPlatform == FALSE) |>
group_by(repo) |>
summarise(issues = n()) |>
mutate(platform = !!platform)
}
counts <- bind_rows(counts)

counts |>
group_by(platform) |>
summarise(issues = sum(issues)) |>
arrange(desc(issues)) |>
readr::write_csv(file.path(cacheFolder, "CountsPerPlatform.csv"))

counts |>
arrange(platform, desc(issues)) |>
print(n=200)

issues |>
group_by(repo) |>
summarise(issues = n(),
platformIssues = sum(!existWithoutPlatform)) |>
mutate(fraction = platformIssues / issues) |>
arrange(desc(platformIssues)) |>
readr::write_csv(file.path(cacheFolder, "CountsPerRepo.csv"))
4 changes: 2 additions & 2 deletions extras/Releasing/CreateRenvLockFile.R
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,15 @@ renv::activate()
# Install HADES (from scratch --------------------------------------------------
install.packages("remotes")
options(install.packages.compile.from.source = "never")
remotes::install_github("ohdsi/Hades")
remotes::install_github("ohdsi/Hades", upgrade = "never")

# Create renv lock file --------------------------------------------------------
remotes::install_github("ohdsi/OhdsiRTools")

packagesUtils <- c("keyring")
packagesForPlp <- c("lightgbm", "survminer", "parallel")
packagesForDatabaseConnector <- c("duckdb", "RSQLite", "aws.s3", "R.utils", "odbc")
install.packages(c(packagesForPlp, packagesUtils, packagesForBulkImport))
install.packages(c(packagesForPlp, packagesUtils, packagesForDatabaseConnector))

OhdsiRTools::createRenvLockFile(
rootPackage = "Hades",
Expand Down
Binary file removed extras/Releasing/Dependencies.rds
Binary file not shown.
10 changes: 6 additions & 4 deletions extras/Releasing/RunCheckOnAllHadesPackages.R
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,12 @@ saveRDS(prepareForPackageCheck(), "Dependencies.rds")
dependencies <- readRDS("Dependencies.rds")
# Skipping Hydra, as renv seems to clash with skeleton renv environments:
dependencies <- dependencies[dependencies$name != "Hydra", ]
for (i in 1:nrow(dependencies)) {
if (dependencies$name[i] == "PatientLevelPrediction") {
# Temp workaround for https://github.com/OHDSI/PatientLevelPrediction/issues/435
additionalCheckArgs <- c("--no-build-vignettes", "--ignore-vignettes")
for (i in 29:nrow(dependencies)) {
if (dependencies$name[i] == "DeepPatientLevelPrediction") {
# This package is failing unit tests, it seems because Python modules are
# missing. Skipping for now. Hoping DeepPlp users are savy enough to figure
# this out
additionalCheckArgs <- c("--no-tests")
} else {
additionalCheckArgs <- c()
}
Expand Down
2 changes: 1 addition & 1 deletion extras/Releasing/SetupPython.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ reticulate::use_virtualenv("r-reticulate")
# Test: np <- reticulate::import('numpy')

# Packages needed by DeepPatientLevelPrediction --------------------------------
reticulate::py_install(c("polars", "tqdm", "connectorx", "scikit-learn", "pyarrow"))
reticulate::py_install(c("polars", "tqdm", "connectorx", "scikit-learn", "pyarrow", "pynvml"))
reticulate::py_install("torch")
# Test: torch <- reticulate::import('torch')
# Error in py_module_import(module, convert = convert) :
Expand Down
Loading

0 comments on commit 9cb580a

Please sign in to comment.