This repo will give you a very brief introduction to what Python can do for you as someone who works with data, and some of the advantages of using Python over tools like Excel.
We have two notebooks in here:
- rap_example_pipeline_python.ipynb: to show some data processing in a simple RAP (Reproducible Analytical Pipelines) process
- Visualising_data_with_dtale: to show a visualisation tool called D-Tale
You can run these in GitHub Codespaces, locally on any machine with Python installed (see instructions below), or with Google Colab at these links:
RAP Example Pipeline notebook:
The rap_example_pipeline_python.ipynb notebook demonstrates - in a very simple way - the three main steps common to almost all analytical processes:
- Get the data - in this case we'll download the artificial HES data from the web
- Process the data - we'll do a simple aggregation
- Making outputs - we'll make some charts
We'll use the artificial Hospital Episode Statistics (HES) data. This has the same structure as the real HES, but it's mock data with no personally identifiable information in it.
One of the great advantages of Python is that you can use other people's code in your projects. This can save you a lot of time!
How this works is like this:
- Someone writes some useful Python code
- They think, "Hey this would be really useful, I want to share this with the world!"
- They turn their code into something called a "package"
- They upload their package to a central package repository called PyPi
- You install their package from PyPi and use their code in your project!
In this demo, we'll use a data manipulation package called Pandas
.
Pandas can do pretty much anything you can do with other tools like Excel and SQL. You can filter, group, create aggregations, make pivot tables, and, in combination with other packages, even make charts!
Simply load up the rap_example_pipeline_python.ipynb notebook in Codespaces, Google Colab, or in your local environment if you know how to set this up, and follow the instructions inside!
The Visualising_data_with_dtale.ipynb file showcases a Python package called D-Tale, which you can use to visualise and explore datasets. This notebook loads up some publically available Covid data into D-Tale, so that you can play around with it's features.
With just a few lines of code you'll get a GUI interface, from where you can process data, create charts, and perform various kinds of analysis in a few clicks.
One useful feature of this package is that when you do something with it, such as change a data type of a column or create a chart, it can display the Python code it used to carry out that function! So it's a pretty handy learning tool as well as a nice way to explore data.
As with the example pipeline notebook, just load up the Visualising_data_with_dtale.ipynb file and click Run All. D-Tale should open in a new browser tab.
Try having a look around the D-Tale interface and see what you can do with it. For example...
- The date column appears to have been imported as a string. Can you convert it to a date type? (hint: click the column name at the top for some transformations you can do)
- Could you find a way to group the data by country and count the number of covid cases?
- Once you have done step 2, could you use this aggregate data and make a bar chart out of it?
Keep an eye out for the <>
icons - if you click these, D-Tale will display the Python code it's using under the bonnet!
This code requires:
- Python (> 3.6), the official Python website has instructions for downloading and installing python.
- Clone the repository. To learn about what this means, and how to use Git, see the Git guide.
git clone https://github.com/nhsengland/python_colab_tutorial.git
- Set up your environment.
If you're using Windows, enter the following commands into the Command Line or Powershell:
python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install -r requirements.txt
If you're using Linux or MacOS, enter the following commands into the Terminal:
python -m venv .venv
source venv/bin/activate
python -m pip install -r requirements.txt
For Visual Studio Code it is necessary that you change your default interpreter to the virtual environment you just created .venv. To do this use the shortcut Ctrl-Shift-P, search for Python: Select interpreter and select .venv from the list.
For more information on how to use virtual environments and why they are important, see the virtual environments guide.
You can also load the repo into a GitHub Codespace (click the blue "Code" button and then click "Create Codespace"), or in Google Colab (see the links at the top of the page)
This codebase is released under the MIT License. This covers both the codebase and any sample code in the documentation.
Any HTML or Markdown documentation is © Crown copyright and available under the terms of the Open Government 3.0 licence.
- Warren Davies
- Sam Hollings
- Jennifer Struthers
- Ameersohail Syed
- The RAP team!