Housing in GTA
Members: Darvy Teav, Arefin Shamsil, Wei Zhang
Project Introduction:
Key question: What factors contribute to housing prices within the different regions of GTA?
People always consider buying a house as an excellent investment. And the GTA housing market has always been optimistic, even during COVID. But with the burgeoning population, people are worried about affordable housing. The following general questions come to mind:
-How population density is influencing dwelling numbers?
-Are housing supplies and demands in balance?
-Can school ranking affect housing prices?
Scope:
-GTA area: Durham Region, York Region, City of Toronto, Peel Region, and Halton region.
-Years:
Price, Supply, and Demand: 2018-2021
School: 2018-2020
Population and Dwelling: 2016-2021 Census
Type of house: Detached, Semi-Detached, Condo Apt, Town House
Schooling: Elementary School
Limitations:
- API are expensive and reserved for real estate agents
- For schooling Fraser Institute only had data until 2020
- Census data was only available for 2016 and 2021 as those were there 2 recent years the goverment sent out the survey for Canadians to fillout
Data Source:
- Census 2021 Population and Dwellings
https://data.peelregion.ca/datasets/RegionofPeel::census-2021-cd-csd-population-and-dwellings/about
-GTA Housing price between 2018-2021
https://trreb.ca/index.php/market-news/mls-home-price-index/mls-home-price-index-archive
-GTA Elementary school ranking 2018-2020
https://www.fraserinstitute.org/sites/default/files/ontario-secondary-school-rankings-2018.pdf
https://www.fraserinstitute.org/sites/default/files/ontario-elementary-school-rankings-2019-12659.pdf
https://www.fraserinstitute.org/sites/default/files/ontario-elementary-school-rankings-2020-13385.pdf
-GTA region
https://www.ureachtoronto.ca/city-services/
-Geoapify
https://apidocs.geoapify.com/docs/places/#about
Data Cleaning:
For the GTA Census 2021 data, the csv file was downloaded from the source website.
The GTA housing price was downloaded as a PDF file by month in different years. We had to convert each pdf into an Excel file. Since it did not properly convert all the time, we had to scan through some of the data and manually fix a few. Once that was completed, we aggregated all the files into one csv file for all of us to use. In order to merge our data for our project, we all used the data on average house price and regions of the GTA.
We also Geoapify to gather the longitutde and latitude of each region to create a visual image of the GTA based on the average housing price of 2021
GTA School Ranking data was download in csv file by year from the data source
We used Pandas to clean and format all the datasets. We have created multiple Jupyter Notebooks for data exploration and cleanup processes. All raw data files, and cleaned datasets are stored in this repository separately. Moreover, all presentation distributed graphs have been saved in this repository separately as well.
Key Findings:
By analysing Census 2021 data, we see that the population and the dwellings numbers in GTA have both grown. From comparing the Census 2016 data, we could see the dwelling percentage change is higher than the population percentage change in Toronto, York, and Peel. When excluding public dwellings, and counting in population density, we could see that population density and private dwelling numbers are in a positive relationship.
The housing prices have been increasing through 2018-2021, but the housing prices in Toronto, York, Halton has dropped in 2020. With COVID work-from-home policy, we could see people have taken this opportunity to pursue cheaper housing. Based on the analysis of the average sold price and the number of houses sold in each month of 2021, we draw a graph to understand the relationship between housing supplies and demand. It showed that the equilibrium point between housing supplies and demands is 14,007 sold houses with the average price of $989,172. The data is showing the housing market is a seller’s market. For all the points, that are above the equilibrium point, means the housing market is in excess supply.
Since it was going to be costly to get a more accurate data for housing demand, we ended up measuring it based on the number of houses sold per month. After aggregating the sum for each year and each region, we found that the demand did increase each year except for 2019 where it dropped by 3.42%. However, from 2018-2021 we did see an overall increase of 36.34%. The drop in 2019 could be due to the raising in prime interest rate of 3.70% July 2018 to 3.95% in January 2019. In addition, there were a lot of talks in the news about housing bubbles where the hype in demand was causing housing prices to inflate. The two regions that were not affected in 2019 were Durham and York region. These 2 regions followed the price increase trend. Based on the regression analysis, the correlation between the demand and average house price is 0.42. This indicates that there is a positive moderate relationship between the 2. The maximum demand is 33,182 houses and the minimum is 559 houses. The median is 2898 houses. Looking at the scatter plot graph, the majority of the data is clustered between 600 to 4500 range. The outliers are contributing to the weaker r-squared and correlation coefficient.
Shifting to supply, we measured it based on the new numbers of listings per month, but not the ones that were actively listed. In 2020, supply continued to increase except for the York and Durham region. Once York region was the only one that continued to follow the average price trend. The increase of housing on the market for Toronto was partially due to Covid. Many people saw an opportunity to move out of the city since working from home felt permanent. In the GTA, some people saw that it was an opportunity to move into Toronto due to the dip in price. From 2018 to 2021, we saw an overall increase of 33.77%. One thing to note, in 2020 Peel Region, the housing price did dip, but the supply continued to increase, whereas Durham region did the opposite. The housing prices continued to increase, but the supply did drop. The regression analysis does show a very similar trend to demand. The correlation coefficient is 0.44 and r-squared is 0.2. We ended up using the same average house price as demand since it is somewhat known that the listing price of the house on the market is purposely set low to attract buyers and create bidding wars to generate more profit. As a result it creates a sense of strong demand for houses.
Turning to the supply and demand curve for 2021, the equilibrium point is at 14,007 houses at an average price point of $989,172. To generate both curves/lines, we took the max and min point for both supply and demand and created the same size array to plot the lines on the graph. Based on the equation to determine if it is a buyer’s or seller’s market, we can see that it is a buyer’s market, since the quotient clearly demonstrates that the supply is nowhere near 5 times the size of demand. However, if the quotient is greater than 7, it would be a buyer’s market. In addition, the data supports a surplus in the market, as the average price sits around $1.1M. This indicates that there are affordability issues rather than a shortage in the market.
One factor we assumed will affect housing prices in GTA is school ranking. From the available data, we compiled elementary school ranking distributions between 2018 to 2020 in GTA. The overall area-wise ranking distributions were quite similar over the years. The consistent best rankings were observed in the Halton and the York regions while the poorer rankings were observed in the Toronto, Peel and Durham regions. A correlation test between the area-wise housing prices and the corresponding school rankings showed a very weak negative correlation over the years. It could indicate that elementary school rankings may not have an effect on the housing market. However, we found that housing price, sales volume and school ranking all have wider distributions in the Toronto, Peel and Durham regions generating a large number of outliers. This could have affected the correlations. Finally, conducting a trend analysis combining area-wise school ranking, sale price and sale volume, we find that there is a greater preference for purchasing houses in the Toronto, Peel and Durham regions, despite them having lower school rankings.
In conclusion, we would highlight that the most important contributing factors to housing prices are population growth, regional demands, and supply. York region had the highest average, whereas Durham has the lowest. Covid did have an affect on the housing market causing a fluctuation between 2019 and 2020. The only region to respond in tandem with the ups and downs in housing price was York region. The housing crisis is not due to a shortage, but more of an affordability issues since it is a seller's market. School ranking is one attribute among the regional demands. Our research suggests that there could be other contributing factors, such as proximity to amenities, employment opportunities, or access to culture and communities, for the rising housing prices.
Graphing images in order of the presentation
Sources
-Arrows added to graphs
https://matplotlib.org/stable/tutorials/text/annotations.html
-Calculating buyer's or seller's market
https://rocketmortgage.ca/
-Finding point of intersection
https://stackoverflow.com/questions/20677795/how-do-i-compute-the-intersection-point-of-two-lines
-Horizontal and vertical lines for graphs
https://pub.towardsai.net/make-your-matplotlib-plots-stand-out-using-this-cheat-sheet-8c666de90433
-Interest Rate
https://www2.gov.bc.ca/assets/gov/british-columbians-our-governments/government-finances/historical-effective-prime-rate.pdf
-Stack/pd.wide_to_long function
https://towardsdatascience.com/wide-to-long-data-how-and-when-to-use-pandas-melt-stack-and-wide-to-long-7c1e0f462a98
Stacking
https://courses.lumenlearning.com/wm-microeconomics/chapter/equilibrium-surplus-and-shortage/
-Ticklabels to non scientific
https://www.tutorialspoint.com/prevent-scientific-notation-in-matplotlib-pyplot#:~:text=Using%20ticklabel_format()%20method%20with,show%20the%20figure%2C%20use%20plt.