💨 dadR - Apache Spark enabled R package to analyze Discharge Abstract Database

Disclaimer

Parts of this material are based on the Canadian Institute for Health Information Discharge Abstract Database Research Analytic Files (sampled from fiscal years 2014-15). However the analysis, conclusions, opinions and statements expressed herein are those of the author(s) and not those of the Canadian Institute for Health Information.

Why dadR

The DAD database is large and the flat SPSS sav format is not amenable to fast processing and data mining for clinical insights. dadR uses Apache Spark to parallelize search and extraction. Most functions return a Spark data frame. This includes some innovative clustering and other machine learning functions.

Installation

devtools::install_github("E-Health/dadR")

Work in progress ...... (Feedback and contributions welcome!)

Modules

How to use

Install Apache Spark (https://spark.apache.org/).
Researchers can download DAD from Odesi. Please make sure that you comply with the licensing terms.

library(SparkR)
library(data.table)
library(foreign)
library(dadR)

# Change Master UI here
sparkR.session(
  master = "localhost",
  sparkConfig = list(
    spark.driver.memory = "3g",
    spark.executor.memory = "3g")
)
DADSparkInit(savFile = "path/to/dad_sample_2015.sav")

# csv file with the filename dadr will be automatically created the first time
# This can be used for future analysis
DADSparkInit(csvFile = "path/to/dadr.csv")
spark_df <- DADSameDisease("J08")
r_df <- collect(spark_df)

# All records with the diagnosis J08
(r_dt <- as.data.table(r_df))

Testing

Update Spark Master URL in dadR/tests/testthat/helper.R

devtools::load_all() # Repeat on error
devtools::test()

Contributors

Bell Eapen (McMaster U) canehealth.com
This package is developed and tested using Compute Canada resources.
See also: 🔦 QRMine | Qualitative Research Support Tools in Python

Citation

Please cite dadR in your publications if it helped your research. Here is an example BibTeX entry:


@misc{eapenbr2018,
  title={dadR - Spark enabled R package for analyzing discharge abstract database.},
  author={Eapen, Bell Raj and contributors},
  year={2018},
  publisher={GitHub},
  journal = {GitHub repository},
  howpublished={\url{https://github.com/E-Health/dadR}}
}

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
R		R
man		man
notes		notes
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.md		README.md
dadR.Rproj		dadR.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

💨 dadR - Apache Spark enabled R package to analyze Discharge Abstract Database

Disclaimer

Why dadR

Installation

Work in progress ...... (Feedback and contributions welcome!)

Modules

How to use

Testing

Contributors

Citation

About

Releases

Packages

Contributors 2

Languages

License

E-Health/dadR

Folders and files

Latest commit

History

Repository files navigation

💨 dadR - Apache Spark enabled R package to analyze Discharge Abstract Database

Disclaimer

Why dadR

Installation

Work in progress ...... (Feedback and contributions welcome!)

Modules

How to use

Testing

Contributors

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages