Skip to content

Latest commit

 

History

History
155 lines (85 loc) · 6.49 KB

README.md

File metadata and controls

155 lines (85 loc) · 6.49 KB

Tableau-CitiBike

citi-bike

Link to citibike Analysis

Table of contents

Technologies

Objective

citibike NYC Rider and Station Analysis: 2019 vs 2020

citibike Ridership: Pre- COVID-19 vs. During-COVID-19

*Has the customer base changed? *Have the top station locations changed due to WFH lifestyle? *Have the trip totals gone up or down due to COVID-19 pandemic?

The following visualizations will illustrate citibike ridership data from August, September, October 2019 as compared to August, September, October 2020.

These months were selected because the weather is nice, so bike riding is one of the preferred ideal mode of transportation. The data is representative of Tourism in August as well as student populations in September.

Data Cleaning

I collected the data from Citi Bike Data. I used Citi Bike trip history csv files from August, September and October of 2019 and August, September, and October of 2020. The files are very large and include trip and rider data from every station trip for the entire month. I used pandas in a jupyter notebook to clean the data. I used the concat function to combine all the csv files into one dataframe.

concat

Then I separated the ‘year’ and ‘month’ information from the ‘start date’ column. This helped clearly visualize the date in my tableau story.

clean year

clean month

Rider gender was represented by numeric values in the original data set so I assigned ‘male’ and ‘female’ values in place of the numbers to be more meaningful.

gender

To display age in my visualizations, I calculated the rider age by subtracting the riders ‘birth year’ by the ‘Trip Year’. I created a new column for ‘Rider Age’.

age

I included the ‘unknown’ genders and outlying rider ages in my data sets, but I filtered them out of the final visualizations for clarity.

Data Aggregation

The date from citibike was exceptionally large and was too big to use in Tableau in its original form. I created different aggregations of the data sets to make smaller data frames that would be ok to use in Tableau Public. The smaller data frames also made visualizations easier to display.

To create the total citibike trips per year, I used the .groupby function to group the data by ‘Trip Year’ and ‘Trip month’ and count the total trips.

month_df = clean_df3.groupby(['Trip Year','Trip Month']).count()

Total Data

To create the user data frame, I used the .groupby function and grouped the data by ‘Trip Year’, ‘Trip Month’, ‘Rider Gender’, ‘Rider Age’, and ‘User Type’. I added .count() to calculate the sum of each group.

user_df1 = user_df.groupby(["Trip Year", "Trip Month", 'Rider Gender', 'Rider Age', 'User Type']).count()

user df

Visualizations

To create visualizations in Tableau, I imported my data sets and joined them on common fields such as ‘station name’ and ‘longitude’ and ‘latitude’.

I used year, gender, and age as filters in my visualizations. The main purpose of my story was to compare ridership and stations data from 2019 and 2020. I used a filter for ‘Trip Year’ to create duplicate charts for each year.

year filter

As part of the story telling process, I played with different versions of the visualizations displaying the same data to see which version was more impactful and clear. Below you can see two versions of ‘Ridership by Age and Gender’. The bar chart has more specific data displayed clearly, but the overall look of the chart is overwhelming. bar chart age

The line chart shows less details but is clear and clean as a visualization.

line chart age

For the map visualizations, I used ‘Longitude’ as the column value and ‘Latitude’ as the row value. I then plotted the points as ‘sum’ of station total trips.

long/lat

I used color to show the value of the map points- blue representing less trips and red representing more trips. I also added specific tool tips to display all relevant data points associated to station locations.

color map

To add the zip code layer, I used Map Layers.

map layers

I used the ‘create set’ calculation when creating my visualizations showing the Top 10 trip stations.

create set

Analysis

After reviewing the visualizations, I concluded the following:

conclusion

Tableau Story

Below is the final Tableau Story. You can also view it on the Tableau Public site- citibike Analysis

intro

rider age

rider type

start stations

top 10 start

top 10 start map

end stations

top 10 end

top 10 end map

conclusion

Resources

CitiBike Data Sources:

201907-citibike-tripdata.csv

201908-citibike-tripdata.csv

201910-citibike-tripdata.csv

202008-citibike-tripdata.csv

202009-citibike-tripdata.csv

202010-citibike-tripdata.csv

Contact

Sara Simoes