This project's main goal is to apply machine learning to road accident predictions using aerial or satellite imagery in the Portuguese mainland. The strategy used consisted of dividing the area of study in 200mx200m squares and use the data taken from the Autoridade Nacional da Proteção Civil(Portugal)1 API and downloaded using the tools made available in this repository2. The data gathered consisted of occurrences of various accidents in the Portuguese mainland from early 2016 to march 2020. The incidents not directly related to road accident risk were removed from the gathered dataset and the resulting dataset was cleaned and processed to allow for easier used and data exploration.
areas whith 0(green),1(yellow) ,2(orange),3(red),4(grey),5 or more accidents(black)
Benfica stadium as reference for what a 200mx200m cell looks like
The resulting dataset was then used to create a dataset distinguishing between the cells where accidents occurred in the given timeframe and another dataset used to divide the cells where accidents occurred in three different levels. Both the datasets were balanced to avoid potential problems related to overfitting.
Number of cells having road containing and not containing accidents in the given period
Having created the datasets the aerial imagery was downloaded using the bing maps API3 and the watermark cut and images resized to 128x128.
satellite imagery taken from the bing maps API original vs resized
Using these images we trained and validated some deep learning models for each of the datasets and got the following results.
metrics for the first dataset without transfer learning for the different deep learning architectures
metrics for the first dataset with transfer learning for the different deep learning architectures
metrics for the second dataset without transfer learning or data augmentation for the different deep learning architectures
metrics for the second dataset with transfer learning or data augmentation for the different deep learning architectures
We also used the trained models to implement a couple of proof of concepts that show some of the possible use cases for the created models, this use cases include a REST API that predicts the risk associated with the given coordinates and a library capable of generating RiskMaps for a given GeoJson specification file,a given area RiskMaps consists in a GeoTiff file containing colorations according to the risk of each of the portions of the map.
Sattelite imagery of an area in the Braga district and its associated RiskMap
- First Dataset-contains images of areas with and without accidents
- ACCIDENTS
- SAFE
- READY TO TRAIN DATASET - images already separated into testing and training folders
- Second Dataset-This dataset containg images of areas having various levels of risk depending on the number of accidents that occured during a given time frame
- Level1
- Level2
- Level3
- READY TO TRAIN DATASET - images already separated into testing and training folders
- Data gathered from the Portuguese National Emergency and Civil Protection Authority this dataset contains a wide array of accidents and events both traffic and non-traffic related taking place from 1/01/2016 to 24/03/2020
- Grid Accidents Using the data in the dataset above we extracted the traffic accidents and created this dataset containing the number of accidents in each 200x200m in mainland Portugal.
- Portugal District This GeoJson file containing delimitations of mainland Portuguese districts.
- MODEL1 Contains around 30335 200x200m grid cells where 1 ore more accidents occurred in the above time frame and 37000 where none occurred.
- MODEL2 Contains 30335 grid cells where accidents occurred in the given dataset divided into 3 danger categories according to their accident count.