This study aims to shed light on various aspects of traffic incidents in Calgary, with a specific emphasis on understanding the frequency and patterns of incidents based on different neighborhoods, time of day, and the safety concerns for pedestrians and cyclists.
Calgary, as a bustling urban center, experiences its fair share of traffic incidents, impacting the lives of its residents and visitors.
One of the key guiding questions explored in the project is how the frequency of incidents differs based on the neighborhoods of Calgary. By mapping and analyzing incident data geographically, the main project aim is to identify areas that may have a higher incidence of traffic-related incidents. Understanding these neighborhood-specific variations will enable us to pinpoint areas that require targeted interventions and proactive measures to enhance safety and reduce the occurrence of incidents.
Another crucial aspect investigated is how the frequency of traffic incidents in Calgary changes based on the time of day. Traffic patterns and congestion levels can vary significantly throughout the day, and understanding how incidents are influenced by these temporal factors is essential for effective traffic management. By analyzing incident data across different time intervals, it can be identified peak incident hours, assess the impact of rush hour congestion, and develop strategies to alleviate these issues during specific time periods.
In addition to neighborhood and temporal analysis, the project focuses on identifying the most dangerous areas for pedestrians and cyclists in Calgary. Pedestrians and cyclists are vulnerable road users who require special attention and protection. By examining incident data involving pedestrians and cyclists, the aim is to identify high-risk areas where these road users face increased dangers. This knowledge will aid in implementing targeted safety measures, improving infrastructure, and designing initiatives to promote safer conditions for pedestrians and cyclists in Calgary.
Through this data-driven approach and advanced analytical techniques, the aim is to provide evidence-based insights and actionable recommendations for traffic management authorities, urban planners, and policymakers. By answering the guiding questions mentioned above, it can be contributed to the development of effective strategies to enhance traffic safety, optimize resource allocation, and create a more pedestrian and cyclist-friendly city.
The dataset used for the Calgary Traffic Incidents analysis in this project is obtained from the City of Calgary's official data portal, data.calgary.ca. This dataset is a comprehensive archive of traffic incidents within the city, regularly updated every 10 minutes to ensure the most current information is available. It covers a substantial time span, starting from December 6, 2016, up to the present day.
It's important to note that the dataset may have occasional gaps due to system or script malfunctions, which can result in missing or incomplete incident records. However, every effort is made to maintain the accuracy and completeness of the data.
The dataset provides essential information about each traffic incident, including a description of the incident itself, geolocation, and address details. This information enables precise identification and mapping of incident locations. Additionally, the dataset includes the quadrant of the city where each incident occurred, allowing for further analysis and exploration of incident patterns specific to different areas within Calgary.
Another crucial component of the dataset is the time of each incident, providing insights into when incidents occur throughout the day. This temporal information allows for analyzing the frequency and distribution of incidents based on different time intervals, enabling the identification of peak incident hours and temporal trends.
The dataset is made available under the Open Government License - City of Calgary, which encourages the free use, distribution, and modification of the data while maintaining acknowledgment of the City of Calgary as the original source.
It's worth noting that for more detailed information and context regarding the traffic incidents, users can refer to the Calgary Traffic Report page, which provides additional insights, updates, and relevant information related to traffic incidents in the city.
By leveraging this comprehensive and regularly updated dataset, our project aims to extract meaningful insights, identify trends, and develop strategies to enhance traffic management, improve road safety, and optimize transportation systems in Calgary.
Dataset: https://data.calgary.ca/Transportation-Transit/Traffic-Incidents/35ra-9556
The data cleaning step in this project involves preparing the initial dataset obtained from the City of Calgary's data portal for analysis by addressing missing or incomplete information and identifying potential issues with the dataset.
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import seaborn as s
import matplotlib.pyplot as plt
import numpy as np
import geopandas as gpd
from shapely.geometry import Point
import datetime
df2 = pd.read_csv('Traffic_Incidents.csv')
df2
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
INCIDENT INFO | DESCRIPTION | START_DT | MODIFIED_DT | QUADRANT | Longitude | Latitude | Count | id | Point | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Westbound 16 Avenue at Deerfoot Trail NE | Stalled vehicle. Partially blocking the right... | 2022/06/21 07:31:40 AM | 2022/06/21 07:33:16 AM | NE | -114.026687 | 51.067485 | 1 | 2022-06-21T07:31:4051.067485129276236-114.0266... | POINT (-114.02668672232672 51.067485129276236) |
1 | 11 Avenue and 4 Street SW | Traffic incident. Blocking multiple lanes | 2022/06/21 04:02:11 AM | 2022/06/21 04:12:38 AM | SW | -114.071481 | 51.042624 | 1 | 2022-06-21T04:02:1151.04262449261462-114.07148... | POINT (-114.07148057660925 51.04262449261462) |
2 | 68 Street and Memorial Drive E | Traffic incident. | 2022/06/20 11:53:08 PM | 2022/06/20 11:55:42 PM | NE | -113.935553 | 51.052474 | 1 | 2022-06-20T23:53:0851.0524735056658-113.935553... | POINT (-113.935553325751 51.0524735056658) |
3 | Eastbound 16 Avenue and 36 Street NE | Traffic incident. Blocking the left shoulder | 2022/06/20 04:43:21 PM | 2022/06/20 05:17:05 PM | NE | -113.989219 | 51.067086 | 1 | 2022-06-20T16:43:2151.06708565896752-113.98921... | POINT (-113.98921905311566 51.06708565896752) |
4 | Barlow Trail and 61 Avenue SE | Traffic incident. | 2022/06/20 04:42:12 PM | 2022/06/20 05:28:21 PM | SE | -113.985727 | 50.998727 | 1 | 2022-06-20T16:42:1250.99872748477766-113.98572... | POINT (-113.98572655353505 50.99872748477766) |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
39581 | Aero Gate and 11 Street NE | Traffic incident. | 2023/05/17 07:22:44 AM | 2023/05/17 09:00:43 AM | NE | -114.039109 | 51.120601 | 1 | 2023-05-17T07:22:4451.12060124821737-114.03910... | POINT (-114.03910908721207 51.12060124821737) |
39582 | Eastbound Glenmore Trail and Crowchild Trail SW | Multi-vehicle incident. Blocking the left lane | 2023/05/17 08:59:09 AM | 2023/05/17 09:00:43 AM | SW | -114.122814 | 51.001318 | 1 | 2023-05-17T08:59:0951.00131795970452-114.12281... | POINT (-114.12281385934625 51.00131795970452) |
39583 | 17 Avenue and 36 Street SE | NB 36 St closed at 17 Ave. WB 17 Ave closed at... | 2023/05/17 04:18:57 AM | 2023/05/17 09:11:54 AM | SE | -113.981546 | 51.038037 | 1 | 2023-05-17T04:18:5751.03803711297289-113.98154... | POINT (-113.9815464139313 51.03803711297289) |
39584 | Northbound Crowchild Trail approaching Kensin... | Stalled vehicle. Blocking the middle lane | 2023/05/17 01:12:03 PM | 2023/05/17 01:13:47 PM | NW | -114.118501 | 51.052492 | 1 | 2023-05-17T13:12:0351.05249233303009-114.11850... | POINT (-114.11850138362924 51.05249233303009) |
39585 | Northbound Falconridge Boulevard and Falworth... | Traffic incident. | 2023/05/17 04:58:17 PM | 2023/05/17 05:40:31 PM | NE | -113.956155 | 51.102931 | 1 | 2023-05-17T16:58:1751.10293072063497-113.95615... | POINT (-113.9561550734143 51.10293072063497) |
39586 rows Ă— 10 columns
The initial dataset contains 39,586 rows of data. However, approximately 35.4% of these rows do not have quadrant data, which is a crucial attribute for analyzing incident patterns across different areas of Calgary. Despite this missing information, it is observed that most of the incident descriptions contain indirect indications of the quadrant in which they occurred. This valuable information allows us to fill the gaps in the quadrant data and retain these rows for analysis, ensuring that important incident data is not lost.
for i, row in df2.iterrows():
if pd.isna(row['QUADRANT']):
if 'NE' in row['INCIDENT INFO']:
df2.at[i, 'QUADRANT'] = 'NE'
elif 'NW' in row['INCIDENT INFO']:
df2.at[i, 'QUADRANT'] = 'NW'
elif 'SW' in row['INCIDENT INFO']:
df2.at[i, 'QUADRANT'] = 'SW'
elif 'SE' in row['INCIDENT INFO']:
df2.at[i, 'QUADRANT'] = 'SE'
df2[df2.isna().any(axis=1)].isnull().sum()
INCIDENT INFO 0
DESCRIPTION 2
START_DT 0
MODIFIED_DT 14057
QUADRANT 301
Longitude 0
Latitude 0
Count 0
id 0
Point 0
dtype: int64
df = df2.dropna(subset = ['DESCRIPTION'])
df
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
INCIDENT INFO | DESCRIPTION | START_DT | MODIFIED_DT | QUADRANT | Longitude | Latitude | Count | id | Point | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Westbound 16 Avenue at Deerfoot Trail NE | Stalled vehicle. Partially blocking the right... | 2022/06/21 07:31:40 AM | 2022/06/21 07:33:16 AM | NE | -114.026687 | 51.067485 | 1 | 2022-06-21T07:31:4051.067485129276236-114.0266... | POINT (-114.02668672232672 51.067485129276236) |
1 | 11 Avenue and 4 Street SW | Traffic incident. Blocking multiple lanes | 2022/06/21 04:02:11 AM | 2022/06/21 04:12:38 AM | SW | -114.071481 | 51.042624 | 1 | 2022-06-21T04:02:1151.04262449261462-114.07148... | POINT (-114.07148057660925 51.04262449261462) |
2 | 68 Street and Memorial Drive E | Traffic incident. | 2022/06/20 11:53:08 PM | 2022/06/20 11:55:42 PM | NE | -113.935553 | 51.052474 | 1 | 2022-06-20T23:53:0851.0524735056658-113.935553... | POINT (-113.935553325751 51.0524735056658) |
3 | Eastbound 16 Avenue and 36 Street NE | Traffic incident. Blocking the left shoulder | 2022/06/20 04:43:21 PM | 2022/06/20 05:17:05 PM | NE | -113.989219 | 51.067086 | 1 | 2022-06-20T16:43:2151.06708565896752-113.98921... | POINT (-113.98921905311566 51.06708565896752) |
4 | Barlow Trail and 61 Avenue SE | Traffic incident. | 2022/06/20 04:42:12 PM | 2022/06/20 05:28:21 PM | SE | -113.985727 | 50.998727 | 1 | 2022-06-20T16:42:1250.99872748477766-113.98572... | POINT (-113.98572655353505 50.99872748477766) |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
39581 | Aero Gate and 11 Street NE | Traffic incident. | 2023/05/17 07:22:44 AM | 2023/05/17 09:00:43 AM | NE | -114.039109 | 51.120601 | 1 | 2023-05-17T07:22:4451.12060124821737-114.03910... | POINT (-114.03910908721207 51.12060124821737) |
39582 | Eastbound Glenmore Trail and Crowchild Trail SW | Multi-vehicle incident. Blocking the left lane | 2023/05/17 08:59:09 AM | 2023/05/17 09:00:43 AM | SW | -114.122814 | 51.001318 | 1 | 2023-05-17T08:59:0951.00131795970452-114.12281... | POINT (-114.12281385934625 51.00131795970452) |
39583 | 17 Avenue and 36 Street SE | NB 36 St closed at 17 Ave. WB 17 Ave closed at... | 2023/05/17 04:18:57 AM | 2023/05/17 09:11:54 AM | SE | -113.981546 | 51.038037 | 1 | 2023-05-17T04:18:5751.03803711297289-113.98154... | POINT (-113.9815464139313 51.03803711297289) |
39584 | Northbound Crowchild Trail approaching Kensin... | Stalled vehicle. Blocking the middle lane | 2023/05/17 01:12:03 PM | 2023/05/17 01:13:47 PM | NW | -114.118501 | 51.052492 | 1 | 2023-05-17T13:12:0351.05249233303009-114.11850... | POINT (-114.11850138362924 51.05249233303009) |
39585 | Northbound Falconridge Boulevard and Falworth... | Traffic incident. | 2023/05/17 04:58:17 PM | 2023/05/17 05:40:31 PM | NE | -113.956155 | 51.102931 | 1 | 2023-05-17T16:58:1751.10293072063497-113.95615... | POINT (-113.9561550734143 51.10293072063497) |
39584 rows Ă— 10 columns
Further investigation reveals that the gaps in quadrant data and the MODIFIED_DT column exist from March 2019 until October 2021. Notably, during the period from June 14, 2019, to September 15, 2019, there is only one row of data available. This fact suggests the possibility of data entry rule changes during that period or a potential incident that led to the loss of data. It is essential to keep this information in mind during the analysis of the results to ensure accurate interpretation and avoid any potential biases introduced by these gaps.
Additionally, the initial dataset includes a column named DESCRIPTION, which provides valuable insights into the types of incidents that occur in different locations. Parsing and analyzing this column can provide a better understanding of the specific types of incidents that take place across various areas of Calgary, enabling a more comprehensive analysis of the dataset.
#display(pd.DataFrame(df2['DESCRIPTION'].unique()))
#df['col'].str.contains('partial_string').any()
df.loc[df['DESCRIPTION'].str.contains('Traffic incident'), 'itype'] = 'Traffic incident'
df.loc[df['DESCRIPTION'].str.contains('Stalled vehicle') | df['DESCRIPTION'].str.contains('stalled vehicle'), 'itype'] = 'Stalled vehicle'
df.loc[df['DESCRIPTION'].str.contains('Single vehicle incident'), 'itype'] = 'Single vehicle incident'
df.loc[df['DESCRIPTION'].str.contains('Two vehicle incident') | df['DESCRIPTION'].str.contains('2 vehicle incident'), 'itype'] = 'Two vehicle incident'
df.loc[(df['DESCRIPTION'].str.contains('Multi-vehicle incident')) | (df['DESCRIPTION'].str.contains('Multi vehicle incident')), 'itype'] = 'Multi-vehicle incident'
df.loc[(df['DESCRIPTION'].str.contains('incident involving a cyclist')|(df['DESCRIPTION'].str.contains('bicycle'))), 'itype'] = 'Incident involving a cyclist'
df.loc[(df['DESCRIPTION'].str.contains('Traffic signal')) | (df['DESCRIPTION'].str.contains('Traffic light'))
| (df['DESCRIPTION'].str.contains('traffic signal')) | (df['DESCRIPTION'].str.contains('traffic Signal'))
| (df['DESCRIPTION'].str.contains('traffic light')) | (df['DESCRIPTION'].str.contains('light'))
| (df['DESCRIPTION'].str.contains('Signal light')) | (df['DESCRIPTION'].str.contains('Light stuck'))
| (df['DESCRIPTION'].str.contains('Light on flash') | (df['DESCRIPTION'].str.contains('signals'))), 'itype'] = 'Traffic lights incident'
df.loc[(df['DESCRIPTION'].str.contains('Slow traffic')) | (df['DESCRIPTION'].str.contains('slow moving'))
| (df['DESCRIPTION'].str.contains('Please go slow')), 'itype'] = 'Slow traffic'
df.loc[(df['DESCRIPTION'].str.contains('Blocking') |df['DESCRIPTION'].str.contains('blocked')| df['DESCRIPTION'].str.contains('blocking'))
& (df['DESCRIPTION'].str.contains('lane') | df['DESCRIPTION'].str.contains('shoulder')
| df['DESCRIPTION'].str.contains('Traffic') | df['DESCRIPTION'].str.contains('traffic')), 'itype'] = 'Blocked lane/shoulder/ramp'
df.loc[(df['DESCRIPTION'].str.contains('closed')) | (df['DESCRIPTION'].str.contains('closures')) | (df['DESCRIPTION'].str.contains('ramp'))
| (df['DESCRIPTION'].str.contains('disruption')), 'itype'] = 'Road closed'
df.loc[(df['DESCRIPTION'].str.contains('fire')), 'itype'] = 'Fire incident'
df.loc[(df['DESCRIPTION'].str.contains('Police')) | (df['DESCRIPTION'].str.contains('police')), 'itype'] = 'Police incident'
df.loc[(df['DESCRIPTION'].str.contains('LRT')) | (df['DESCRIPTION'].str.contains('train')) | (df['DESCRIPTION'].str.contains('Train'))
| (df['DESCRIPTION'].str.contains('Railway')), 'itype'] = 'LRT/Railway incident'
df.loc[(df['DESCRIPTION'].str.contains('incident involving a pedestrian')) | (df['DESCRIPTION'].str.contains('pedestrian incident'))
| (df['DESCRIPTION'].str.contains('Incident involving a pedestrian')), 'itype'] = 'Incident involving a pedestrian'
df.loc[(df['DESCRIPTION'].str.contains('going incident')), 'itype'] = 'Ongoing incident'
df.loc[df['itype'].isna(), 'itype'] = 'Other'#]#& df['DESCRIPTION'].str.contains('versus'), 'DESCRIPTION'].iloc[0]
/var/folders/9z/scwdbhm50c1848gsllv1nqk80000gn/T/ipykernel_29114/71809234.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df.loc[df['DESCRIPTION'].str.contains('Traffic incident'), 'itype'] = 'Traffic incident'
df['itype'].value_counts()
Blocked lane/shoulder/ramp 13831
Two vehicle incident 8596
Traffic incident 7679
Single vehicle incident 2461
Multi-vehicle incident 1654
Traffic lights incident 1564
Road closed 1451
Incident involving a pedestrian 1359
Stalled vehicle 426
Incident involving a cyclist 203
Ongoing incident 132
Police incident 82
Other 68
LRT/Railway incident 51
Fire incident 19
Slow traffic 8
Name: itype, dtype: int64
df['itype'].count()
39584
By addressing missing quadrant data, noting the gaps in the MODIFIED_DT column, and leveraging the information in the DESCRIPTION column, the data cleaning step ensures that the dataset is prepared for subsequent analysis, enabling accurate and meaningful insights into the Calgary Traffic Incidents data.
To gain a better understanding of the incident distribution across different neighborhoods in Calgary, an additional step was taken in the data processing pipeline. Since the initial dataset did not include the neighborhood information for each incident, the dataset was merged with the geo-spatial data of community boundaries from the City of Calgary's open data portal.
The merging process involved using the latitude and longitude coordinates provided in the initial dataset to connect each incident with its corresponding neighborhood. This was achieved by utilizing the sjoin function from the geopandas library, which performs a spatial join based on the geometry of each community boundary.
By merging the incident data with the community boundary data, each incident was assigned to the specific neighborhood in which it occurred. This integration allowed for a comprehensive analysis of the incident distribution across different neighborhoods within Calgary.
Following the merging process, the data was grouped by the neighborhood name and incident type. This grouping facilitated the aggregation of incidents, providing a count of each incident type within each neighborhood. As a result, a dataframe was obtained, containing the names of neighborhoods and the corresponding counts of each incident type.
This enriched dataframe with neighborhood information and incident counts enables a more detailed exploration of which communities in Calgary experience a higher frequency of incidents. This information can be valuable for understanding the safety concerns and traffic patterns specific to each neighborhood, aiding in targeted interventions and resource allocation to improve traffic management and enhance safety measures.
By leveraging the power of geospatial data and integrating it with the initial incident dataset, this step in the data processing pipeline contributes to a more comprehensive analysis of the incident distribution across neighborhoods in Calgary.
census_file = './Census by Community 2019.geojson'
communities = gpd.read_file(census_file)
communities.index
RangeIndex(start=0, stop=306, step=1)
df_comm = communities[['name', 'geometry']]
df_comm.index
RangeIndex(start=0, stop=306, step=1)
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.Longitude, df.Latitude))
df_comm = df_comm.to_crs(epsg=4326)
gdf.crs = "EPSG:4326"
merged = gpd.sjoin(gdf, df_comm, how='left', predicate='within')
incident_counts = merged.groupby(['name', 'itype']).size()
incident_counts = incident_counts.reset_index(name='count')
incident_counts['count'].sum()
incident_counts
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
name | itype | count | |
---|---|---|---|
0 | 01B | Blocked lane/shoulder/ramp | 13 |
1 | 01B | Multi-vehicle incident | 1 |
2 | 01B | Road closed | 2 |
3 | 01B | Single vehicle incident | 2 |
4 | 01B | Stalled vehicle | 2 |
... | ... | ... | ... |
2346 | WOODLANDS | Single vehicle incident | 1 |
2347 | WOODLANDS | Traffic incident | 4 |
2348 | WOODLANDS | Traffic lights incident | 4 |
2349 | WOODLANDS | Two vehicle incident | 7 |
2350 | YORKVILLE | Traffic incident | 2 |
2351 rows Ă— 3 columns
gdf
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
INCIDENT INFO | DESCRIPTION | START_DT | MODIFIED_DT | QUADRANT | Longitude | Latitude | Count | id | Point | itype | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Westbound 16 Avenue at Deerfoot Trail NE | Stalled vehicle. Partially blocking the right... | 2022/06/21 07:31:40 AM | 2022/06/21 07:33:16 AM | NE | -114.026687 | 51.067485 | 1 | 2022-06-21T07:31:4051.067485129276236-114.0266... | POINT (-114.02668672232672 51.067485129276236) | Blocked lane/shoulder/ramp | POINT (-114.02669 51.06749) |
1 | 11 Avenue and 4 Street SW | Traffic incident. Blocking multiple lanes | 2022/06/21 04:02:11 AM | 2022/06/21 04:12:38 AM | SW | -114.071481 | 51.042624 | 1 | 2022-06-21T04:02:1151.04262449261462-114.07148... | POINT (-114.07148057660925 51.04262449261462) | Blocked lane/shoulder/ramp | POINT (-114.07148 51.04262) |
2 | 68 Street and Memorial Drive E | Traffic incident. | 2022/06/20 11:53:08 PM | 2022/06/20 11:55:42 PM | NE | -113.935553 | 51.052474 | 1 | 2022-06-20T23:53:0851.0524735056658-113.935553... | POINT (-113.935553325751 51.0524735056658) | Traffic incident | POINT (-113.93555 51.05247) |
3 | Eastbound 16 Avenue and 36 Street NE | Traffic incident. Blocking the left shoulder | 2022/06/20 04:43:21 PM | 2022/06/20 05:17:05 PM | NE | -113.989219 | 51.067086 | 1 | 2022-06-20T16:43:2151.06708565896752-113.98921... | POINT (-113.98921905311566 51.06708565896752) | Blocked lane/shoulder/ramp | POINT (-113.98922 51.06709) |
4 | Barlow Trail and 61 Avenue SE | Traffic incident. | 2022/06/20 04:42:12 PM | 2022/06/20 05:28:21 PM | SE | -113.985727 | 50.998727 | 1 | 2022-06-20T16:42:1250.99872748477766-113.98572... | POINT (-113.98572655353505 50.99872748477766) | Traffic incident | POINT (-113.98573 50.99873) |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
39581 | Aero Gate and 11 Street NE | Traffic incident. | 2023/05/17 07:22:44 AM | 2023/05/17 09:00:43 AM | NE | -114.039109 | 51.120601 | 1 | 2023-05-17T07:22:4451.12060124821737-114.03910... | POINT (-114.03910908721207 51.12060124821737) | Traffic incident | POINT (-114.03911 51.12060) |
39582 | Eastbound Glenmore Trail and Crowchild Trail SW | Multi-vehicle incident. Blocking the left lane | 2023/05/17 08:59:09 AM | 2023/05/17 09:00:43 AM | SW | -114.122814 | 51.001318 | 1 | 2023-05-17T08:59:0951.00131795970452-114.12281... | POINT (-114.12281385934625 51.00131795970452) | Blocked lane/shoulder/ramp | POINT (-114.12281 51.00132) |
39583 | 17 Avenue and 36 Street SE | NB 36 St closed at 17 Ave. WB 17 Ave closed at... | 2023/05/17 04:18:57 AM | 2023/05/17 09:11:54 AM | SE | -113.981546 | 51.038037 | 1 | 2023-05-17T04:18:5751.03803711297289-113.98154... | POINT (-113.9815464139313 51.03803711297289) | Road closed | POINT (-113.98155 51.03804) |
39584 | Northbound Crowchild Trail approaching Kensin... | Stalled vehicle. Blocking the middle lane | 2023/05/17 01:12:03 PM | 2023/05/17 01:13:47 PM | NW | -114.118501 | 51.052492 | 1 | 2023-05-17T13:12:0351.05249233303009-114.11850... | POINT (-114.11850138362924 51.05249233303009) | Blocked lane/shoulder/ramp | POINT (-114.11850 51.05249) |
39585 | Northbound Falconridge Boulevard and Falworth... | Traffic incident. | 2023/05/17 04:58:17 PM | 2023/05/17 05:40:31 PM | NE | -113.956155 | 51.102931 | 1 | 2023-05-17T16:58:1751.10293072063497-113.95615... | POINT (-113.9561550734143 51.10293072063497) | Traffic incident | POINT (-113.95616 51.10293) |
39584 rows Ă— 12 columns
incident_counts_pivot = incident_counts.pivot(index=['name'], columns='itype', values='count')
incident_counts_pivot.reset_index(inplace=True)
incident_counts_pivot = incident_counts_pivot.fillna(0)
incident_counts_pivot['total'] = incident_counts_pivot.select_dtypes(include=['number']).sum(axis=1)
final_df = df_comm.merge(incident_counts_pivot, left_on='name', right_on='name')
final_df['total'].sum()
final_df.reset_index(inplace=True)
final_df
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
index | name | geometry | Blocked lane/shoulder/ramp | Fire incident | Incident involving a cyclist | Incident involving a pedestrian | LRT/Railway incident | Multi-vehicle incident | Ongoing incident | Other | Police incident | Road closed | Single vehicle incident | Slow traffic | Stalled vehicle | Traffic incident | Traffic lights incident | Two vehicle incident | total | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | LEGACY | MULTIPOLYGON (((-114.02200 50.86308, -114.0213... | 7.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 8.0 | 3.0 | 1.0 | 20.0 |
1 | 1 | HIGHLAND PARK | MULTIPOLYGON (((-114.06916 51.09565, -114.0667... | 14.0 | 0.0 | 0.0 | 4.0 | 0.0 | 3.0 | 0.0 | 0.0 | 0.0 | 4.0 | 8.0 | 0.0 | 0.0 | 34.0 | 5.0 | 42.0 | 114.0 |
2 | 2 | CORNERSTONE | MULTIPOLYGON (((-113.91840 51.17607, -113.9166... | 9.0 | 0.0 | 2.0 | 3.0 | 0.0 | 3.0 | 0.0 | 0.0 | 1.0 | 4.0 | 19.0 | 0.0 | 0.0 | 65.0 | 0.0 | 28.0 | 134.0 |
3 | 3 | MONTGOMERY | MULTIPOLYGON (((-114.16458 51.08145, -114.1644... | 33.0 | 0.0 | 4.0 | 16.0 | 0.0 | 6.0 | 6.0 | 0.0 | 1.0 | 4.0 | 12.0 | 0.0 | 0.0 | 76.0 | 10.0 | 75.0 | 243.0 |
4 | 4 | TEMPLE | MULTIPOLYGON (((-113.93513 51.09608, -113.9351... | 47.0 | 0.0 | 1.0 | 11.0 | 0.0 | 3.0 | 4.0 | 0.0 | 0.0 | 2.0 | 4.0 | 0.0 | 0.0 | 38.0 | 9.0 | 36.0 | 155.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
290 | 290 | 01H | MULTIPOLYGON (((-114.27030 51.10060, -114.2702... | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 2.0 | 1.0 | 0.0 | 0.0 | 3.0 | 0.0 | 2.0 | 10.0 |
291 | 291 | HIDDEN VALLEY | MULTIPOLYGON (((-114.09474 51.15426, -114.0947... | 57.0 | 0.0 | 0.0 | 2.0 | 0.0 | 7.0 | 0.0 | 0.0 | 1.0 | 8.0 | 23.0 | 0.0 | 3.0 | 68.0 | 4.0 | 42.0 | 215.0 |
292 | 292 | RIVERBEND | MULTIPOLYGON (((-114.01581 50.98069, -114.0148... | 28.0 | 0.0 | 0.0 | 3.0 | 0.0 | 5.0 | 1.0 | 0.0 | 0.0 | 8.0 | 5.0 | 0.0 | 0.0 | 20.0 | 2.0 | 22.0 | 94.0 |
293 | 293 | RIDEAU PARK | MULTIPOLYGON (((-114.07153 51.02885, -114.0715... | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2.0 |
294 | 294 | FRANKLIN | MULTIPOLYGON (((-113.98171 51.06697, -113.9817... | 109.0 | 1.0 | 0.0 | 7.0 | 1.0 | 7.0 | 0.0 | 0.0 | 0.0 | 7.0 | 7.0 | 0.0 | 0.0 | 31.0 | 8.0 | 49.0 | 227.0 |
295 rows Ă— 20 columns
#final_df[final_df['name']=='BURNS INDUSTRIAL']
The use of the Plotly library for visualization in this project offers several advantages in terms of interactive and visually appealing representations of the data. Plotly is a powerful data visualization library that provides a range of chart types and interactive features, making it well-suited for showcasing spatial and temporal data.
One of the key visualizations used is the choropleth_mapbox, which is ideal for displaying geographic data such as incident distribution across neighborhoods. This chart type uses a map layout and color-coding to represent the density or intensity of a particular variable (in this case, incident counts) within different areas or regions. By leveraging Plotly's choropleth_mapbox, it becomes possible to create a visually compelling and informative representation of the incident distribution across Calgary's neighborhoods.
The bar chart is another visualization utilized in this project. Bar charts are effective for displaying categorical data and comparing frequencies or counts across different categories. Plotly's bar chart functionality allows for interactive exploration of the data, providing users with the ability to zoom in, hover over bars for detailed information, and customize the visual appearance of the chart.
Density_mapbox is a visualization tool that offers a heatmap-like representation of density patterns across a geographical area. In the context of this project, density_mapbox can be used to visualize the intensity of incidents in different neighborhoods, highlighting areas with higher incident densities. This visualization allows for a quick understanding of areas that may require more attention in terms of traffic management and safety measures.
Lastly, bar_polar is a chart type that can be used for visualizing cyclic or periodic data, such as incidents occurring throughout the day or year. It provides a circular representation, allowing for the exploration of patterns and trends over time. Plotly's bar_polar enables the creation of interactive and customizable polar bar charts, offering an intuitive way to analyze and compare incident frequencies within different time periods.
By utilizing Plotly's diverse chart types, including choropleth_mapbox, bar, density_mapbox, and bar_polar, the visualization step in this project ensures that the analyzed data is presented in a visually engaging and informative manner. This facilitates a deeper understanding of incident patterns, time-related trends, and the spatial distribution of incidents in Calgary.
The question aims to understand the variation in the number of incidents across different neighborhoods in Calgary. By analyzing the incident data and considering the neighborhood information, insights can be gained into which areas experience a higher frequency of incidents and potentially require more attention in terms of traffic management and safety measures. Based on the findings, it is observed that the South East of Calgary tends to have the highest number of incidents. This suggests that this area may experience more traffic-related issues compared to other neighborhoods.
final_df = final_df.to_crs(epsg=4326)
# Need to be careful with the options or you'll end up with just the base map
fig = px.choropleth_mapbox(final_df, geojson=final_df,
locations=final_df.index,
color="total",
color_continuous_scale=['white', 'red'],
center={"lat": 51.0486, "lon": -114.0708}, # Calgary
mapbox_style='open-street-map',
opacity=0.75,
zoom=10,
title = 'Calgary Trafic incidents map by neighbourhood',
hover_name='name')
fig.update_layout(margin={"r":50,"t":50,"l":50,"b":50},
autosize=True,
height=1000 )
fig.update_geos(showcountries=True, showcoastlines=True, showland=True, fitbounds="locations")
fig.write_image("fig1.png")
The specific types of incidents prevalent in this region are related to blocked lanes or roads and multi-vehicle incidents, indicating potential congestion and traffic flow challenges in this area.
qs = gpd.read_file('City Quadrants.geojson')
qs = qs.to_crs(epsg=4326)
merged2 = gpd.sjoin(gdf, qs, how='left', predicate='within')
incident_counts2 = merged2.groupby(['QUADRANT', 'itype']).size()
incident_counts2 = incident_counts2.reset_index(name='count')
incident_counts2['count'].sum()
incident_counts2
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
QUADRANT | itype | count | |
---|---|---|---|
0 | NE | Blocked lane/shoulder/ramp | 3993 |
1 | NE | Fire incident | 3 |
2 | NE | Incident involving a cyclist | 35 |
3 | NE | Incident involving a pedestrian | 275 |
4 | NE | LRT/Railway incident | 8 |
... | ... | ... | ... |
59 | SW | Slow traffic | 1 |
60 | SW | Stalled vehicle | 81 |
61 | SW | Traffic incident | 1672 |
62 | SW | Traffic lights incident | 416 |
63 | SW | Two vehicle incident | 1824 |
64 rows Ă— 3 columns
fig = px.bar(incident_counts2, x="QUADRANT", y="count", color="itype", title="Number of Traffic Incidents in each quadrant of Calgary")
#fig.show()
#fig.write_html("ct5_itypes.html")
fig.write_image("fig2.png")
#![fig1.png](fig1.png)
Furthermore, the analysis reveals that several neighborhoods in the central part of Calgary have the highest number of incidents. These neighborhoods include Bridgeland/Riverside, Beltline, Downtown commercial core, Burns industrial, Alberta park/Radisson heights, among others. The concentration of incidents in these central neighborhoods may be attributed to their high population density, commercial activities, or transportation hubs, which can contribute to increased traffic volume and potential incidents.
Understanding the neighborhoods with the highest number of incidents is crucial for traffic management authorities and policymakers to prioritize resources and implement targeted interventions. By focusing on these areas, measures such as improved infrastructure, traffic flow management, and safety campaigns can be implemented to reduce incidents and enhance the overall safety of residents and commuters in Calgary.
#df5 = df[df['itype'] == 'Incident involving a pedestrian'] #Incident involving a cyclist
fig = px.bar(final_df.sort_values(by=['total'], ascending=False).iloc[:20], y='name', x='total',
color = 'total', orientation='h', text = 'total',
title = 'Total 20 neighborhoods by number of incidents', width=750, height=550)
fig.update_layout(barmode='stack', yaxis={'categoryorder': 'total ascending'})
fig.update_layout(margin=dict(t=40, b=0, l=0, r=0))
#fig.show()
fig.write_image("fig3.png")
#fig.write_html("top20.html")
The question aims to explore how the number of traffic incidents varies throughout different periods of the day. By analyzing the incident data in relation to the time of occurrence, insights can be gained into the temporal patterns and peak hours of incidents in Calgary.
Based on the findings, a timeline analysis reveals two distinct rush hours during which the most traffic incidents occur. The first rush hour happens in the morning at around 8 am, likely corresponding to the peak commuting time when individuals are traveling to work or school. The second rush hour occurs in the evening at around 5 pm, which aligns with the time when people typically finish work and begin their journey home. These periods of increased traffic activity and congestion contribute to a higher frequency of incidents during these specific hours.
df['Datetime'] = pd.to_datetime(df['START_DT'])
df['Datetime'].min().time()
dft = pd.DataFrame(columns = ['Hour', 'itype', 'Count'])
itypes = pd.unique(df['itype'])
for hour in range(24):
tmin = datetime.time(hour, 0, 0)
tmax = datetime.time(hour, 59, 59)
dftemp = df.loc[((df['Datetime'].dt.time >= tmin)&(df['Datetime'].dt.time <= tmax))]
for itype in itypes:
count_itype = len(dftemp[dftemp['itype'] == itype])
#print(tmin, itype, count_itype)
new_row = pd.DataFrame({'Hour':hour, 'itype': itype, 'Count': count_itype}, index=[0])
dft = pd.concat([dft.loc[:],new_row]).reset_index(drop=True)
dft.head()
/var/folders/9z/scwdbhm50c1848gsllv1nqk80000gn/T/ipykernel_29114/3376379377.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Hour | itype | Count | |
---|---|---|---|
0 | 0 | Blocked lane/shoulder/ramp | 137 |
1 | 0 | Traffic incident | 129 |
2 | 0 | Incident involving a pedestrian | 24 |
3 | 0 | Single vehicle incident | 87 |
4 | 0 | Road closed | 32 |
dft_morning = dft[dft['Hour']<12]
dft_evening = dft[dft['Hour']>=12]
dft_morning['Hour'] = dft_morning['Hour'].astype(str)
dft_evening['Hour'] = dft_evening['Hour'].astype(str)
/var/folders/9z/scwdbhm50c1848gsllv1nqk80000gn/T/ipykernel_29114/2374614761.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/var/folders/9z/scwdbhm50c1848gsllv1nqk80000gn/T/ipykernel_29114/2374614761.py:4: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
Morning Rush Hour:
fig = px.bar_polar(dft_morning, r="Count", theta="Hour", color="itype",
title="Number of Traffic Incidents by hour",
color_discrete_sequence= px.colors.qualitative.Dark24,
range_r = [0, 4000],
log_r = False
)
fig.update_layout(
font=dict(size=14)
)
#fig.show()
#fig.write_html("morning_total.html")
fig.write_image("fig4.png")
Evening Rush Hour:
fig = px.bar_polar(dft_evening, r="Count", theta="Hour", color="itype",
color_discrete_sequence= px.colors.qualitative.Dark24,
title="Number of Traffic Incidents by hour",
range_r = [0, 4000],
log_r = False
)
fig.update_layout(
font=dict(size=14)
)
#fig.show()
#fig.write_html("evening_total.html")
fig.write_image("fig5.png")
Furthermore, the analysis indicates that incidents connected with blocked lanes or roads contribute significantly to the overall increase in the total number of incidents during rush hours. Blocked lanes or roads can result from accidents, construction work, or other factors that impede the normal flow of traffic. The prevalence of incidents related to blocked lanes or roads during rush hours suggests that traffic congestion and disruptions play a significant role in the occurrence of incidents during peak periods.
fig = px.area(dft, x="Hour", y="Count", color="itype")
fig.update_xaxes(range=[0, 24])
#fig.show()
#fig.write_html("day_total.html")
fig.write_image("fig6.png")
The guiding question seeks to identify the locations and time periods where pedestrians and cyclists are most vulnerable to incidents. By filtering the incident data specifically for pedestrian and cyclist-related incidents, it becomes possible to gain insights into the areas and times that pose higher risks for these road users in Calgary.
df5 = df[df['itype'] == 'Incident involving a pedestrian'] #Incident involving a cyclist
fig = px.density_mapbox(df5, lat='Latitude', lon='Longitude', z='Count', radius=5, zoom=10)
fig.update_layout(margin={"r":50,"t":50,"l":50,"b":50},
autosize=True,
height=800, width = 750,
mapbox_style="open-street-map")
#fig.show()
#fig.write_html("heatmap_pedestrains.html")
fig.write_image("fig7.png")
Based on the analysis, the top three worst neighborhoods for pedestrians are identified as Beltline, Downtown commercial core, and Forest Lawn. These neighborhoods have a higher frequency of incidents involving pedestrians, indicating that pedestrians in these areas are at a greater risk of being involved in accidents or facing safety concerns.
fig = px.bar(final_df.sort_values(by=['Incident involving a pedestrian'], ascending=False).iloc[:10], y='name', x='Incident involving a pedestrian',
color = 'Incident involving a pedestrian', orientation='h', text = 'Incident involving a pedestrian',
title = 'Top 10 neighborhoods dangerous for pedestrians', width=800, height=400)
fig.update_layout(barmode='stack', yaxis={'categoryorder': 'total ascending'})
fig.update_layout(margin=dict(t=40, b=0, l=0, r=0))
#fig.show()
#fig.write_html("pedestrians.html")
fig.write_image("fig8.png")
When examining the time periods, it is observed that the morning rush hour at 8 am and the time window from 2 pm to 6 pm are particularly dangerous for pedestrians. The increased incidents during these periods may be attributed to higher pedestrian activity during commute hours and school dismissal times, as well as potential congestion and traffic-related challenges in these neighborhoods.
dft_m2 = dft_morning[dft_morning['itype'] == 'Incident involving a pedestrian']
fig = px.bar_polar(dft_m2, r="Count", theta="Hour", color="itype",
color_discrete_sequence= px.colors.qualitative.Dark24,
title="Number of incidents with pedestrians by morning hour",
range_r = [0, 125],
log_r = False
)
fig.update_layout(
font=dict(size=14)
)
#fig.show()
#fig.write_html("morning_pedestrians.html")
fig.write_image("fig9.png")
dft_e2 = dft_evening[dft_evening['itype'] == 'Incident involving a pedestrian']
fig = px.bar_polar(dft_e2, r="Count", theta="Hour", color="itype",
color_discrete_sequence= px.colors.qualitative.Dark24,
title="Number of incidents with pedestrians by evening hour",
range_r = [0, 125],
log_r = False
)
fig.update_layout(
font=dict(size=14)
)
#fig.show()
#fig.write_html("evening_pedestrians.html")
fig.write_image("fig10.png")
Similarly, the top three worst neighborhoods for cyclists are identified as Beltline, Downtown commercial core, and Sunnyside.
df5 = df[df['itype'] == 'Incident involving a cyclist'] #Incident involving a cyclist
fig = px.density_mapbox(df5, lat='Latitude', lon='Longitude', z='Count', radius=5, zoom=10)
fig.update_layout(margin={"r":50,"t":50,"l":50,"b":50},
autosize=True,
height=800, width = 750,
mapbox_style="open-street-map")
#fig.show()
#fig.write_html("heatmap_cyclists.html")
fig.write_image("fig11.png")
These areas have a higher incidence of cyclist-related incidents, highlighting the risks faced by cyclists in these locations.
#df5 = df[df['itype'] == 'Incident involving a pedestrian'] #Incident involving a cyclist
fig = px.bar(final_df.sort_values(by=['Incident involving a cyclist'], ascending=False).iloc[:10], y='name', x='Incident involving a cyclist',
color = 'Incident involving a cyclist', orientation='h', text = 'Incident involving a cyclist',
title = 'Top 10 neighborhoods dangerous for cyclists', width=800, height=400)
fig.update_layout(barmode='stack', yaxis={'categoryorder': 'total ascending'})
fig.update_layout(margin=dict(t=40, b=0, l=0, r=0))
#fig.show()
#fig.write_html("cyclists.html")
fig.write_image("fig12.png")
Regarding the time periods for cyclists, the analysis indicates that the time window from 4 pm to 5 pm and around 7 pm is particularly dangerous. These time periods may coincide with peak traffic hours and increased cyclist presence on the road, potentially leading to a higher risk of incidents.
dft_m2 = dft_morning[dft_morning['itype'] == 'Incident involving a cyclist']
fig = px.bar_polar(dft_m2, r="Count", theta="Hour", color="itype",
color_discrete_sequence= px.colors.qualitative.Dark24,
title="Number of incidents with cyclists by morning hour",
range_r = [0, 25],
log_r = False
)
fig.update_layout(
font=dict(size=14)
)
#fig.show()
#fig.write_html("morning_cyclists.html")
fig.write_image("fig13.png")
dft_e2 = dft_evening[dft_evening['itype'] == 'Incident involving a cyclist']
fig = px.bar_polar(dft_e2, r="Count", theta="Hour", color="itype",
color_discrete_sequence= px.colors.qualitative.Dark24,
title="Number of incidents with cyclists by evening hour",
range_r = [0, 25],
log_r = False
)
fig.update_layout(
font=dict(size=14)
)
#fig.show()
#fig.write_html("evening_cyclists.html")
fig.write_image("fig14.png")
The findings emphasize the importance of focusing on these specific neighborhoods and time periods to implement targeted safety measures and interventions. Enhancements such as improved infrastructure, designated cycling lanes, traffic management strategies, and educational campaigns can help mitigate risks for pedestrians and cyclists in these areas, ultimately promoting safer and more accessible transportation options.
Overall, the findings highlight the importance of analyzing incident frequency based on neighborhoods, as it allows for a better understanding of localized traffic challenges and helps guide decision-making processes to improve traffic safety and management.
Understanding the temporal patterns of traffic incidents is vital for effective traffic management and the implementation of strategies to reduce incidents during peak hours. By focusing on targeted interventions such as improving traffic flow, managing construction schedules, and enhancing public transportation options, authorities can mitigate congestion and decrease the occurrence of incidents, ultimately improving road safety and minimizing travel disruptions.
The analysis of the frequency of traffic incidents based on the time of day highlights the presence of two rush hours, namely in the morning around 8 am and in the evening around 5 pm, during which the number of incidents is notably higher. Additionally, incidents related to blocked lanes or roads contribute significantly to the overall increase in incidents during these peak periods. These findings emphasize the importance of implementing measures to manage traffic flow and address congestion issues during these specific timeframes to enhance road safety and optimize transportation systems in Calgary.
By understanding the most dangerous areas and times for pedestrians and cyclists in Calgary, authorities and policymakers can prioritize resources, implement proactive safety measures, and raise awareness to ensure the well-being and protection of vulnerable road users.