Skip to content

Medical school admissions dataset curation via web scraping and exploratory data analysis

License

Notifications You must be signed in to change notification settings

MichaelJWelsh/med-school-dataset-curation-and-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Medical School Admissions Dataset Curation via Web Scraping and Exploratory Data Analysis

The average acceptance rate out of the ~170 medical schools in the U.S. is around 5.5%. Airfare for interviewing alone can exceed $500, on top of the other hundreds of dollars to apply and send primary/secondary applications to just a single school. Despite these expenses, it's necessary to apply to 20-30 schools to get an acceptance, and for many, you cannot afford, both literally and figuratively, to not get accepted and reapply the following year. How do you pick your list of schools to maximize your chances of acceptance?

In this final project, medical school admission statistics are scraped from the internet and turned into a dataset. This dataset includes numerous things such as MCAT/GPA quantiles, in/out-of-state acceptance rates/bias, demographics, geographics, funding and institution type, residency match rates by specialty, etc. Exploratory data analysis is then performed to determine the list of schools my fiancée, based on her background, should apply to, to maximize her chances of acceptance this cycle.

Drexel class: Data Science 521 Data Analysis and Interpretation

Data Usage

Note that due to data usage policies by AAMC, I am not allowed to share or distribute the data I collected nor the code used to parse/transform the data into a dataset. I can share my exploratory data analysis and presentation which I hope you'll enjoy.

About

Medical school admissions dataset curation via web scraping and exploratory data analysis

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published