There was a 2021 Applied Data Analytics workshop for UNCF / Excelencia. Participants worked in teams with data from National Center for Science and Engineering Statistics and the Institute for Research on innovation and Science. The specific focus of the workshop was on an effort to increase data-based research capacity among institutions of higher education that serve a large share of Black and Latinx students. Participants received training on core data concepts such as record linkage and data visualization as well as cutting-edge training in machine learning.
This repository contains the class materials for the UNCF / Excelencia applied data analytics program.
Datasets Used in the Class:
-
Survey of Earned Doctorates
-
Survey of Doctorate Recipients
-
Higher Education Research and Development Survey
-
UMETRICS (provided by the Institute for Research on Innovation and Science)
-
United States Patent data (provided by the United States Patent and Trademark Office)
-
Federal Reporter (https://federalreporter.nih.gov/)
Class Program
Day 1 - Overview, Project Scoping, and Privacy and Confidentiality
Day 2 - Dataset Introduction
Day 3 - Applications of Dataset Exploration
Day 4 - Basics of Data Visualization
Day 5 - Applications of Data Visualization
Day 6 - Text Analysis
Day 7 - Application of Text Analysis
Day 8 - Interim Presentations
Day 9 - Unsupervised Machine Learning
Day 10 - Inference and Imputation
Day 11 - Privacy and Confidentiality
Day 12 - Bias and Ethics
References
The notebooks in this repository were inspired by previous applied data analytics class materials and notebooks.