Skip to content

WhitSade/Exploring-Chicago-Census-Datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Exploring Chicago Census Datastes

Introduction

Using this Python notebook I will:

  1. Explore 3 Chicago datasets
  2. Load the 3 datasets into 3 tables in a Db2 database
  3. Execute SQL queries

Understanding the datasets

To complete this notebook I will be using three datasets that are available on the city of Chicago's Data Portal:

  1. Socioeconomic Indicators in Chicago
  2. Chicago Public Schools
  3. Chicago Crime Data

1. Socioeconomic Indicators in Chicago

This dataset contains a selection of six socioeconomic indicators of public health significance and a “hardship index,” for each Chicago community area, for the years 2008 – 2012.

For this notebook I will use a snapshot of this dataset which can be downloaded from: Census Data

A detailed description of this dataset and the original dataset can be obtained from the Chicago Data Portal at: https://data.cityofchicago.org/Health-Human-Services/Census-Data-Selected-socioeconomic-indicators-in-C/kn9c-c2s2

2. Chicago Public Schools

This dataset shows all school level performance data used to create CPS School Report Cards for the 2011-2012 school year. This dataset is provided by the city of Chicago's Data Portal.

For this notebook I will use a snapshot of this dataset which can be downloaded from: Chicago Public School

A detailed description of this dataset and the original dataset can be obtained from the Chicago Data Portal at: https://data.cityofchicago.org/Education/Chicago-Public-Schools-Progress-Report-Cards-2011-/9xs2-f89t

3. Chicago Crime Data

This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days.

This dataset is quite large - over 1.5GB in size with over 6.5 million rows. For the purposes of this notebook I will use a much smaller sample of this dataset which can be downloaded from: Chicago Crime Data

A detailed description of this dataset and the original dataset can be obtained from the Chicago Data Portal at: https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2

Downloading the datasets

.CSV files:

  1. CENSUS_DATA: Census Dataset

  2. CHICAGO_PUBLIC_SCHOOLS Chicago Public School

  3. CHICAGO_CRIME_DATA: Chicago Crime Data

Storing the datasets in database tables

To analyze the data using SQL, it first needs to be stored in the database.

While it is easier to read the dataset into a Pandas dataframe and then PERSIST it into the database, its results in mapping to default datatypes which may not be optimal for SQL querying. For example a long textual field may map to a CLOB instead of a VARCHAR.

I loaded to the datsets into the IBM DB2 cloud console.

Connecting to the database

I first load the SQL extension and establish a connection with the database

%load_ext sql

Then in the next cell, enter the db2 connection string.

# Connection string is in the format:
# %sql ibm_db_sa://my-username:my-password@my-hostname:my-port/my-db-name
# Entered the connection string for the Db2 on Cloud database instance below
%sql ibm_db_sa://fwh43441:nc7qjqmk-czlqdbv@dashdb-txn-sbox-yp-dal09-08.services.dal.bluemix.net:50000/BLUDB
'Connected: fwh43441@BLUDB'

Exploring the Dataset

Finding the total number of crimes recorded in the CRIME table
# Rows in Crime table
%sql Select Count(*) FROM CHICAGO_CRIME_DATA;
 * ibm_db_sa://fwh43441:***@dashdb-txn-sbox-yp-dal09-08.services.dal.bluemix.net:50000/BLUDB
Done.
1
533
Retrieving the first 10 rows from the CRIME table
%sql Select * from CHICAGO_CRIME_DATA LIMIT 10;
 * ibm_db_sa://fwh43441:***@dashdb-txn-sbox-yp-dal09-08.services.dal.bluemix.net:50000/BLUDB
Done.
id case_number DATE block iucr primary_type description location_description arrest domestic beat district ward community_area_number fbicode x_coordinate y_coordinate YEAR updatedon latitude longitude location
3512276 HK587712 2004-08-28 17:50:56 047XX S KEDZIE AVE 890 THEFT FROM BUILDING SMALL RETAIL STORE FALSE FALSE 911 9 14 58 6 1155838 1873050 2004 2018-02-10 15:50:01 41.80744050 -87.70395585 (41.8074405, -87.703955849)
3406613 HK456306 2004-06-26 12:40:00 009XX N CENTRAL PARK AVE 820 THEFT $500 AND UNDER OTHER FALSE FALSE 1112 11 27 23 6 1152206 1906127 2004 2018-02-28 15:56:25 41.89827996 -87.71640551 (41.898279962, -87.716405505)
8002131 HT233595 2011-04-04 05:45:00 043XX S WABASH AVE 820 THEFT $500 AND UNDER NURSING HOME/RETIREMENT HOME FALSE FALSE 221 2 3 38 6 1177436 1876313 2011 2018-02-10 15:50:01 41.81593313 -87.62464213 (41.815933131, -87.624642127)
7903289 HT133522 2010-12-30 16:30:00 083XX S KINGSTON AVE 840 THEFT FINANCIAL ID THEFT: OVER $300 RESIDENCE FALSE FALSE 423 4 7 46 6 1194622 1850125 2010 2018-02-10 15:50:01 41.74366532 -87.56246276 (41.743665322, -87.562462756)
10402076 HZ138551 2016-02-02 19:30:00 033XX W 66TH ST 820 THEFT $500 AND UNDER ALLEY FALSE FALSE 831 8 15 66 6 1155240 1860661 2016 2018-02-10 15:50:01 41.77345530 -87.70648047 (41.773455295, -87.706480471)
7732712 HS540106 2010-09-29 07:59:00 006XX W CHICAGO AVE 810 THEFT OVER $500 PARKING LOT/GARAGE(NON.RESID.) FALSE FALSE 1323 12 27 24 6 1171668 1905607 2010 2018-02-10 15:50:01 41.89644677 -87.64493868 (41.896446772, -87.644938678)
10769475 HZ534771 2016-11-30 01:15:00 050XX N KEDZIE AVE 810 THEFT OVER $500 STREET FALSE FALSE 1713 17 33 14 6 1154133 1933314 2016 2018-02-10 15:50:01 41.97284491 -87.70860008 (41.972844913, -87.708600079)
4494340 HL793243 2005-12-16 16:45:00 005XX E PERSHING RD 860 THEFT RETAIL THEFT GROCERY FOOD STORE TRUE FALSE 213 2 3 38 6 1180448 1879234 2005 2018-02-28 15:56:25 41.82387989 -87.61350386 (41.823879885, -87.613503857)
3778925 HL149610 2005-01-28 17:00:00 100XX S WASHTENAW AVE 810 THEFT OVER $500 STREET FALSE FALSE 2211 22 19 72 6 1160129 1838040 2005 2018-02-28 15:56:25 41.71128051 -87.68917910 (41.711280513, -87.689179097)
3324217 HK361551 2004-05-13 14:15:00 033XX W BELMONT AVE 820 THEFT $500 AND UNDER SMALL RETAIL STORE FALSE FALSE 1733 17 35 21 6 1153590 1921084 2004 2018-02-28 15:56:25 41.93929582 -87.71092344 (41.939295821, -87.710923442)
How many crimes involve an arrest?
%sql Select COUNT(*) from CHICAGO_CRIME_DATA Where arrest = "TRUE";
 * ibm_db_sa://fwh43441:***@dashdb-txn-sbox-yp-dal09-08.services.dal.bluemix.net:50000/BLUDB
Done.
1
163
Which unique types of crimes have been recorded at GAS STATION locations?
%sql Select distinct primary_type from CHICAGO_CRIME_DATA where location_description = 'GAS STATION';
 * ibm_db_sa://fwh43441:***@dashdb-txn-sbox-yp-dal09-08.services.dal.bluemix.net:50000/BLUDB
Done.
primary_type
CRIMINAL TRESPASS
NARCOTICS
ROBBERY
THEFT
In the CENUS_DATA table: Listing all Community Areas whose names start with the letter ‘B’.
%sql Select COMMUNITY_AREA_NAME from CENSUS where COMMUNITY_AREA_NAME like'B%';
 * ibm_db_sa://fwh43441:***@dashdb-txn-sbox-yp-dal09-08.services.dal.bluemix.net:50000/BLUDB
Done.
community_area_name
Belmont Cragin
Burnside
Brighton Park
Bridgeport
Beverly
Which schools in Community Areas 10 to 15 are healthy school certified?
%sql select name_of_school from SCHOOLS where community_area_number >= 10 and community_area_number <=15 and healthy_school_certified = 'Yes';
 * ibm_db_sa://fwh43441:***@dashdb-txn-sbox-yp-dal09-08.services.dal.bluemix.net:50000/BLUDB
Done.
name_of_school
Rufus M Hitch Elementary School
What is the average school Safety Score?
%sql Select AVG(SAFETY_SCORE)as AVG_SAFETY_SCORE FROM SCHOOLS;
 * ibm_db_sa://fwh43441:***@dashdb-txn-sbox-yp-dal09-08.services.dal.bluemix.net:50000/BLUDB
Done.
avg_safety_score
49.504873
Listing the top 5 Community Areas by average College Enrollment [number of students]
%sql select community_area_name as comm_area_name, avg(college_enrollment) as avg_enroll \
from SCHOOLS group by community_area_name \
order by avg_enroll desc limit 5;
 * ibm_db_sa://fwh43441:***@dashdb-txn-sbox-yp-dal09-08.services.dal.bluemix.net:50000/BLUDB
Done.
comm_area_name avg_enroll
ARCHER HEIGHTS 2411.500000
MONTCLARE 1317.000000
WEST ELSDON 1233.333333
BRIGHTON PARK 1205.875000
BELMONT CRAGIN 1198.833333
Using a sub-query to determine which Community Area has the least value for school Safety Score
%sql select COMMUNITY_AREA_NAME,SAFETY_SCORE from SCHOOLS \
WHERE SAFETY_SCORE = (SELECT MIN(SAFETY_SCORE) FROM SCHOOLS);
 * ibm_db_sa://fwh43441:***@dashdb-txn-sbox-yp-dal09-08.services.dal.bluemix.net:50000/BLUDB
Done.
community_area_name safety_score
WASHINGTON PARK 1

Finding the Per Capita Income of the Community Area which has a school Safety Score of 1.

%sql select  COMMUNITY_AREA_NAME,per_capita_income \
from CENSUS \
where community_area_number = (select community_area_number from SCHOOLS where safety_score = 1);
UsageError: Line magic function `%sql` not found.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published