Learning Resources for Clinical Data Science

This is an overview for those from a clinical background to begin a career in statistics / data science / machine learning. Re-training in this field requires that you unlearn most of what you have been taught about statistics. Forget tests of significance. Most importantly, it requires a significant comitment of time, energy and love. But it's extremely rewarding.

Path to Data Science (Short Version)

Below is a comprehensive path to learning data science, focussed around the R statistical computer language. We choose R because its powerful, free and has a thriving online community. We've tried a lot of different things and have listed what we feel were of most use. Everyone is different, so if you found something particularly useful, let us know so we can add it to the list. Ultimately the langugage doesn't matter, and Python is equally a good choice. Our recommendation would be to pick one language and become very proficient in it.

There is no shortcut to advancing in this field. Find a project, something that you feel passionate about, and get stuck in; learn by doing. A fantastic reasource to start is the tidyverse and R for data science, availible for free online here: http://r4ds.had.co.nz

Skill development (The Long Version)

Below is a list of some areas that are important to develop on your journey into clinical data science. These are aimed at someone who is interested and motivated, but from a clinical background, and so not formally trained in these methods. I have inlcuded links to amazon for books, but mostly just for reference. Ask for a link to a shared dropbox folder for electronic materials for the majority of them, plenty of others are published online for free. We also keep other copies in the lab which we are happy for you to borrow (please ask first).

Maths

The foundation of data science is rooted in probability and statistics. As clinicians this can be extremely limiting in the long run as we typically have at most an A-level in maths (often from a very long time ago). Key areas to focus on are:
- Probability
- Linear algebra
- Statistical notation
It isn't necessary to become completely independently proficient in these areas. Rather, the focus should be to gain an intution for what is going on. Ultimately you want to become an applied researcher, but some core theory is required.
Once you have these basic tools, you will have the basic skills to understand the language of statistics and make the most out of your data (... and get away from hypothesis testing!). All that is needed here is a conceptual overview of how these things work.

The following resources are extremely useful:

Harvard Stat 110 (https://www.youtube.com/watch?v=KbB0FjPg0mw&list=PL2SOU6wwxB0uwwH80KTQ6ht66KWxbzTIo)
- The stategic practice is availible here (https://projects.iq.harvard.edu/stat110/home)
Introduction to Probability (https://www.amazon.co.uk/Introduction-Probability-Chapman-Statistical-Science-ebook/dp/B00MMOJ19I/ref=sr_1_3?ie=UTF8&qid=1516904017&sr=8-3&keywords=introduction+to+probability) the companion text book to the above course
Khan academy (www.khanacademy.com)
3Blue1Brown (https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw)

Other resources for statistics:

Discovering statistics using R; Field. A good high level overview with practical applicatons in R.
Statistical rethinking (https://www.amazon.co.uk/Statistical-Rethinking-Bayesian-Examples-Chapman/dp/1482253445/ref=sr_1_1?ie=UTF8&qid=1516907232&sr=8-1&keywords=statistical+rethinking). This book is a full course in bayesian statistics. It takes a narrative approach, and goes easy on the maths. You'll get a lot more out of it if you have covered the STAT110 course first.

Online Courses

There has been an explosion in the availibility of online courses. They vary from free to expensive, and their quality is often not related to cost. Here we compile a list of the best:

Name	Subject	Website	Cost	Rating
Essence of Linear Algebra	Linear Algebra	https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw/playlists	Free	5/5
The World of Maths	General Maths at all levels	www.khanacademy.com	Free	5/5
Machine Learning - Coursera	ML	https://www.coursera.org/learn/machine-learning	£58	Recommended
Maths for Machine Learning	Linear Algebra	https://www.coursera.org/learn/linear-algebra-machine-learning/home/welcome	£48	4/5

R

Datacamp.com is an excellent resource that uses gamification for learning the basics of R. Beyond learning how R works, it's use can be rather limited as it teaches a fairly shallow understanding of the topic. After you have got to grips with the basics, move away and start to get hands on with your own project
Learn tidy data practices here https://www.tidyverse.org/
R for data science http://r4ds.had.co.nz/
Advanced R (https://www.amazon.co.uk/Advanced-Chapman-Hall-Hadley-Wickham/dp/1466586966/ref=sr_1_1?s=books&ie=UTF8&qid=1516907344&sr=1-1&keywords=advanced+R). Really excellent book by Hadley Wickham (a don in the R universe). Great to help push your programming to the next level. Also free online.
Data Science for Doctors: This is a shameless plug for our own course that we run at the RCoA every few months. It provides a comprehensive introduction to R for the uninitiated.

Python

We need some recommendations here

UNIX

vim adventures (www.vimadventures.com) is a brilliant way to learn your way around vim (a unix based text editor).
William E. Shotts has written an ebook on unix which is open source and a great overview of how unix works http://linuxcommand.org/tlcl.php

Databases

Need stuff here

Specific Statistical Topics

Machine Learning - "An Introduction to Statistical Learning", James, Spriner. Solid basic introduction to the common methods used in machine learning. Accessible, written for R and lots of practice questions.
Mixed Effects Models - "Mixed Effects Models and Extensions in Ecology with R" Zuur, Springer. Absolutely brilliant and accessible books to the topic of modelling correlated data. Be a little cautious of their model selection proceedures however, as they are questionable.

Interesting Reads

The Book of Why (causal inference)
Observation and Experiment
The Lady Tasting Tea, Salsburg (history of statistics)

Lab Meetings

We meet every Monday morning to discuss ongoing projects. Everyone is welcome to attend and watch or present. No matter the size, scale or complexity, we would encourge you to present any work as a way to crowd source solutions to problems and figure out what the next step might be.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
src		src
README.md		README.md
learn_the_command_line.md		learn_the_command_line.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning Resources for Clinical Data Science

Path to Data Science (Short Version)

Skill development (The Long Version)

Maths

Online Courses

R

Python

UNIX

Databases

Specific Statistical Topics

Interesting Reads

Lab Meetings

About

Releases

Packages

Languages

uclhal/learning_resources

Folders and files

Latest commit

History

Repository files navigation

Learning Resources for Clinical Data Science

Path to Data Science (Short Version)

Skill development (The Long Version)

Maths

Online Courses

R

Python

UNIX

Databases

Specific Statistical Topics

Interesting Reads

Lab Meetings

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages