In this repository I've collected some analysis written on Jupyter Notebook with Python about Covid-19 data for Italy. The structure of the repository is as follows:
Italy_Covid-19/
│
├── Data/
│ ├── national_trend.csv
│ ├── regions_trend.csv
│ ├── provinces_trend.csv
├── Shapefiles/
│ ├── regions/
│ │ ├── italy_regions.shp
│ │ ├── italy_regions.shx
│ │ ├── italy_regions.prj
│ │ ├── italy_regions.dbf
│ │ ├── italy_regions.cpg
│ ├── provinces/
│ │ ├── italy_provinces.shp
│ │ ├── italy_provinces.shx
│ │ ├── italy_provinces.prj
│ │ ├── italy_provinces.dbf
│ │ ├── italy_provinces.cpg
├── src/
│ ├── EDA.ipynb
│ ├── Forecasting.ipynb
│ ├── load_data.py
The jupyter notebooks have been built to work with the project structure above; specifically, once downloaded the repository one just needs to run jupyter and open the notebooks to start working.
Data is directly imported in the jupyter notebook from the following github repository, which is the official one of Italian Civil Protection: (https://github.com/pcm-dpc/COVID-19).
Data can be manually downloaded as CSV files from the latter and saved in the Data directory.
In order to plot geographic data i've also dowloaded shapefiles from the following site: (http://www.diva-gis.org/gdata)
There are two mainly shape files:
- italy_regions.shp: shapefile with regions of italy.
- italy_provinces.shp: shapefile with provinces of italy.
Up to now I'm working on the following notebooks, saved in the src directory (code sources):
- EDA.ipynb: in this notebook there is a basic EDA, the codes are commented to understand the usage.
In order to plot geographic data I've used geopandas package (the official page suggests to create a specific
virtual environment and install the geopandas dependencies there to avoid possible conflicts). - Forecasting.ipynb: in this notebook i've tried to forecast the trend of total cases in each region. I've used:
- Logistic model
- LSTM: implementation of long-short time memory network to model the growth of coronavirus cases with a data-driven approach.
The aim of these analysis is simply to have a global view on behavior of Coronavirus in my country. Some difficulties are related to the task of predictive modeling:
- Little amount of historical data.
- Missing of important information such as exogenous variables (i.e. restriction policies and so on) and epidemiological variables characterizing virus spread in the actual dataset.
The latter motivates the use of a data-driven approach exploiting the potentiality of Deep Learning methods (such as LSTM). Anyone who is interested in the analysis and has suggestions/hints about possible predictive models can write me to the following mail address: [email protected]
Data comes from various public sources, I believe that the results dataset can be classified under Public Domain and Dedication License.