Hi fellow redditors ! We are two fellow r/sg redditors ( u/Sproinkerino & u/captmomo) wanted to work on a mini-project to improve our programming/analytics skills. We would like to share with you guys the work that we have done! We are pretty much new to this so any feedback is pretty much appreciated.
Using python, we scraped SMRT's twitter feed to gather the delay/breakdown data in the past 5 years. The data is then cleaned to obtain the delays for specific lines and stations.
Next we used ggplot2/ggmaps in R to obtain a map plot of all the different stations along with the number of delays.
3) Here is the plot: https://imgur.com/a/b7D728p
The darker the node the more delays the station had experienced.
Station with most number of breakdown | NS16 Ang Mo Kio |
Line with most number of breakdowns | North South Line |
Line with least number of breakdowns | Down Town Line |
We also created a fun little webapp to find the probability of delay when travelling from 1 station to another ! Link : https://mrt-breakdown.herokuapp.com/
We hope this mini-project was interesting to you guys ! Any feedback will be greatly appreciated ! :)