Using this Python notebook I will:
- Explore 3 Chicago datasets
- Load the 3 datasets into 3 tables in a Db2 database
- Execute SQL queries
To complete this notebook I will be using three datasets that are available on the city of Chicago's Data Portal:
This dataset contains a selection of six socioeconomic indicators of public health significance and a “hardship index,” for each Chicago community area, for the years 2008 – 2012.
For this notebook I will use a snapshot of this dataset which can be downloaded from: Census Data
A detailed description of this dataset and the original dataset can be obtained from the Chicago Data Portal at: https://data.cityofchicago.org/Health-Human-Services/Census-Data-Selected-socioeconomic-indicators-in-C/kn9c-c2s2
This dataset shows all school level performance data used to create CPS School Report Cards for the 2011-2012 school year. This dataset is provided by the city of Chicago's Data Portal.
For this notebook I will use a snapshot of this dataset which can be downloaded from: Chicago Public School
A detailed description of this dataset and the original dataset can be obtained from the Chicago Data Portal at: https://data.cityofchicago.org/Education/Chicago-Public-Schools-Progress-Report-Cards-2011-/9xs2-f89t
This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days.
This dataset is quite large - over 1.5GB in size with over 6.5 million rows. For the purposes of this notebook I will use a much smaller sample of this dataset which can be downloaded from: Chicago Crime Data
A detailed description of this dataset and the original dataset can be obtained from the Chicago Data Portal at: https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2
.CSV files:
-
CENSUS_DATA: Census Dataset
-
CHICAGO_PUBLIC_SCHOOLS Chicago Public School
-
CHICAGO_CRIME_DATA: Chicago Crime Data
To analyze the data using SQL, it first needs to be stored in the database.
While it is easier to read the dataset into a Pandas dataframe and then PERSIST it into the database, its results in mapping to default datatypes which may not be optimal for SQL querying. For example a long textual field may map to a CLOB instead of a VARCHAR.
I loaded to the datsets into the IBM DB2 cloud console.
I first load the SQL extension and establish a connection with the database
%load_ext sql
Then in the next cell, enter the db2 connection string.
# Connection string is in the format:
# %sql ibm_db_sa://my-username:my-password@my-hostname:my-port/my-db-name
# Entered the connection string for the Db2 on Cloud database instance below
%sql ibm_db_sa://fwh43441:nc7qjqmk-czlqdbv@dashdb-txn-sbox-yp-dal09-08.services.dal.bluemix.net:50000/BLUDB
'Connected: fwh43441@BLUDB'
# Rows in Crime table
%sql Select Count(*) FROM CHICAGO_CRIME_DATA;
* ibm_db_sa://fwh43441:***@dashdb-txn-sbox-yp-dal09-08.services.dal.bluemix.net:50000/BLUDB
Done.
1 |
---|
533 |
%sql Select * from CHICAGO_CRIME_DATA LIMIT 10;
* ibm_db_sa://fwh43441:***@dashdb-txn-sbox-yp-dal09-08.services.dal.bluemix.net:50000/BLUDB
Done.
id | case_number | DATE | block | iucr | primary_type | description | location_description | arrest | domestic | beat | district | ward | community_area_number | fbicode | x_coordinate | y_coordinate | YEAR | updatedon | latitude | longitude | location |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3512276 | HK587712 | 2004-08-28 17:50:56 | 047XX S KEDZIE AVE | 890 | THEFT | FROM BUILDING | SMALL RETAIL STORE | FALSE | FALSE | 911 | 9 | 14 | 58 | 6 | 1155838 | 1873050 | 2004 | 2018-02-10 15:50:01 | 41.80744050 | -87.70395585 | (41.8074405, -87.703955849) |
3406613 | HK456306 | 2004-06-26 12:40:00 | 009XX N CENTRAL PARK AVE | 820 | THEFT | $500 AND UNDER | OTHER | FALSE | FALSE | 1112 | 11 | 27 | 23 | 6 | 1152206 | 1906127 | 2004 | 2018-02-28 15:56:25 | 41.89827996 | -87.71640551 | (41.898279962, -87.716405505) |
8002131 | HT233595 | 2011-04-04 05:45:00 | 043XX S WABASH AVE | 820 | THEFT | $500 AND UNDER | NURSING HOME/RETIREMENT HOME | FALSE | FALSE | 221 | 2 | 3 | 38 | 6 | 1177436 | 1876313 | 2011 | 2018-02-10 15:50:01 | 41.81593313 | -87.62464213 | (41.815933131, -87.624642127) |
7903289 | HT133522 | 2010-12-30 16:30:00 | 083XX S KINGSTON AVE | 840 | THEFT | FINANCIAL ID THEFT: OVER $300 | RESIDENCE | FALSE | FALSE | 423 | 4 | 7 | 46 | 6 | 1194622 | 1850125 | 2010 | 2018-02-10 15:50:01 | 41.74366532 | -87.56246276 | (41.743665322, -87.562462756) |
10402076 | HZ138551 | 2016-02-02 19:30:00 | 033XX W 66TH ST | 820 | THEFT | $500 AND UNDER | ALLEY | FALSE | FALSE | 831 | 8 | 15 | 66 | 6 | 1155240 | 1860661 | 2016 | 2018-02-10 15:50:01 | 41.77345530 | -87.70648047 | (41.773455295, -87.706480471) |
7732712 | HS540106 | 2010-09-29 07:59:00 | 006XX W CHICAGO AVE | 810 | THEFT | OVER $500 | PARKING LOT/GARAGE(NON.RESID.) | FALSE | FALSE | 1323 | 12 | 27 | 24 | 6 | 1171668 | 1905607 | 2010 | 2018-02-10 15:50:01 | 41.89644677 | -87.64493868 | (41.896446772, -87.644938678) |
10769475 | HZ534771 | 2016-11-30 01:15:00 | 050XX N KEDZIE AVE | 810 | THEFT | OVER $500 | STREET | FALSE | FALSE | 1713 | 17 | 33 | 14 | 6 | 1154133 | 1933314 | 2016 | 2018-02-10 15:50:01 | 41.97284491 | -87.70860008 | (41.972844913, -87.708600079) |
4494340 | HL793243 | 2005-12-16 16:45:00 | 005XX E PERSHING RD | 860 | THEFT | RETAIL THEFT | GROCERY FOOD STORE | TRUE | FALSE | 213 | 2 | 3 | 38 | 6 | 1180448 | 1879234 | 2005 | 2018-02-28 15:56:25 | 41.82387989 | -87.61350386 | (41.823879885, -87.613503857) |
3778925 | HL149610 | 2005-01-28 17:00:00 | 100XX S WASHTENAW AVE | 810 | THEFT | OVER $500 | STREET | FALSE | FALSE | 2211 | 22 | 19 | 72 | 6 | 1160129 | 1838040 | 2005 | 2018-02-28 15:56:25 | 41.71128051 | -87.68917910 | (41.711280513, -87.689179097) |
3324217 | HK361551 | 2004-05-13 14:15:00 | 033XX W BELMONT AVE | 820 | THEFT | $500 AND UNDER | SMALL RETAIL STORE | FALSE | FALSE | 1733 | 17 | 35 | 21 | 6 | 1153590 | 1921084 | 2004 | 2018-02-28 15:56:25 | 41.93929582 | -87.71092344 | (41.939295821, -87.710923442) |
%sql Select COUNT(*) from CHICAGO_CRIME_DATA Where arrest = "TRUE";
* ibm_db_sa://fwh43441:***@dashdb-txn-sbox-yp-dal09-08.services.dal.bluemix.net:50000/BLUDB
Done.
1 |
---|
163 |
%sql Select distinct primary_type from CHICAGO_CRIME_DATA where location_description = 'GAS STATION';
* ibm_db_sa://fwh43441:***@dashdb-txn-sbox-yp-dal09-08.services.dal.bluemix.net:50000/BLUDB
Done.
primary_type |
---|
CRIMINAL TRESPASS |
NARCOTICS |
ROBBERY |
THEFT |
%sql Select COMMUNITY_AREA_NAME from CENSUS where COMMUNITY_AREA_NAME like'B%';
* ibm_db_sa://fwh43441:***@dashdb-txn-sbox-yp-dal09-08.services.dal.bluemix.net:50000/BLUDB
Done.
community_area_name |
---|
Belmont Cragin |
Burnside |
Brighton Park |
Bridgeport |
Beverly |
%sql select name_of_school from SCHOOLS where community_area_number >= 10 and community_area_number <=15 and healthy_school_certified = 'Yes';
* ibm_db_sa://fwh43441:***@dashdb-txn-sbox-yp-dal09-08.services.dal.bluemix.net:50000/BLUDB
Done.
name_of_school |
---|
Rufus M Hitch Elementary School |
%sql Select AVG(SAFETY_SCORE)as AVG_SAFETY_SCORE FROM SCHOOLS;
* ibm_db_sa://fwh43441:***@dashdb-txn-sbox-yp-dal09-08.services.dal.bluemix.net:50000/BLUDB
Done.
avg_safety_score |
---|
49.504873 |
%sql select community_area_name as comm_area_name, avg(college_enrollment) as avg_enroll \
from SCHOOLS group by community_area_name \
order by avg_enroll desc limit 5;
* ibm_db_sa://fwh43441:***@dashdb-txn-sbox-yp-dal09-08.services.dal.bluemix.net:50000/BLUDB
Done.
comm_area_name | avg_enroll |
---|---|
ARCHER HEIGHTS | 2411.500000 |
MONTCLARE | 1317.000000 |
WEST ELSDON | 1233.333333 |
BRIGHTON PARK | 1205.875000 |
BELMONT CRAGIN | 1198.833333 |
%sql select COMMUNITY_AREA_NAME,SAFETY_SCORE from SCHOOLS \
WHERE SAFETY_SCORE = (SELECT MIN(SAFETY_SCORE) FROM SCHOOLS);
* ibm_db_sa://fwh43441:***@dashdb-txn-sbox-yp-dal09-08.services.dal.bluemix.net:50000/BLUDB
Done.
community_area_name | safety_score |
---|---|
WASHINGTON PARK | 1 |
Finding the Per Capita Income of the Community Area which has a school Safety Score of 1.
%sql select COMMUNITY_AREA_NAME,per_capita_income \
from CENSUS \
where community_area_number = (select community_area_number from SCHOOLS where safety_score = 1);
UsageError: Line magic function `%sql` not found.