Skip to content

UBC-MDS/pyhousehunter

Repository files navigation

Python House Hunter

build codecov DeployDocumentation Status

Python package for searching for housing on Craigslist.

  • Author: Ela Bandari, Junting He, Ling (Elina) Lin, Alex Truong

Overview

Hunting for rentals can be an exhausting and frustrating experience in Canada, but this process can be made easy with a simple installation of our package. This Python package intends to facilitate the house hunting process by scraping the listing information from Craigslist and organizing the extracted data for the user. Instead of having to manually go on the website to catch up with individual new listings, the user will be updated through email with new results as per their selection criteria.

Functions

Function Name Input Output Description
scraper url, online Pandas DataFrame Scrape data from rental websites into a Pandas DataFrame
data_cleaner Pandas DataFrame Pandas DataFrame Clean the extracted data
data_filter Pandas DataFrame, min_price, max_price, sqrt_ft, num_bedroom, city_name Pandas DataFrame Filter the cleaned data set based on user inputs
send_email Pandas DataFrame, email address csv file Send the organized listing information to user email

Our Package in the Python Ecosystem

To the best of our knowledge, there is currently no existing Python package that simplifies the entire rental searching process with such a comprehensive functionality. This package takes care of all the steps including scraping rental websites, processing the data, and emailing users with the updated listing information. Plenty of general scraper packages exist in the Python ecosystem, but they lack the focus on house rental and emailing functionality, such as the following two: https://github.com/narfman0/craigslist-scraper and https://github.com/juliomalegria/python-craigslist.

Installation

$ pip install -i https://test.pypi.org/simple/ pyhousehunter

Features

The pyhousehunter package contains the following four functions:

  • scraper() The scraper function will scrape all listings available on a given craigslist housing url (e.g. https://vancouver.craigslist.org/d/apartments-housing-for-rent/search/apa).
  • data_cleaner() The data returned by the scraper is not very tidy so the data_cleaner function uses Pandas and Regex to create a cleaned Pandas DataFrame containing the scraped information.
  • data_filter() The data_filter function will filter the cleaned pandas dataframe based on the user's specifications. Users can specify their price range, minimum size, number of bedrooms, bathrooms, and desired municipality.
  • send_email() The send_email function sends the users an houses meeting their specification in a csv format. The user must specify a valid email address and has the option to change the email subject.

Dependencies

  • python = ^3.8
  • beautifulsoup4 = ^4.9.3
  • requests = ^2.25.1
  • pandas = ^1.2.3
  • regex = ^2020.11.13
  • geotext = ^0.4.0
  • python-semantic-release = ^7.15.0
  • pytest-cov = ^2.11.1
  • pytest = ^6.2.2
  • codecov = ^2.1.11
  • flake8 = ^3.8.4
  • Sphinx = ^3.5.2
  • sphinxcontrib-napoleon = ^0.7
  • nbsphinx = ^0.8.2
  • ipykernel = ^5.5.0

Usage

Scraping

The first function in our package is the scraper(). Here you will input a Craigslist housing url for the main housing and apartment rentals page of Craigslist BC and designate the argument online = True to scrape directly from the internet. When online = False the scraper function will scrape from a local HTML file, this may be handy if the Craigslist website is down or for internal development and test. Please note that you cannot input the url for an individual listing.

from pyhousehunter import scraper
# Craiglist rental page url 
url = "https://vancouver.craigslist.org/d/apartments-housing-for-rent/search/apa"
# Scrape Craigslist 
scraped_data = scraper.scraper(url, online = True)

Cleaning

Our data_cleaner() function is straightforward and powerful tool. It turns the raw Pandas DataFrame generated by the scraper() function into a clean and tidy DataFrame object. It has a single input, which is the output of the scraper() function.

from pyhousehunter import cleaner
# Clean the scraped data
cleaned_data = cleaner.data_cleaner(scraped_data)

Filtering

The filter() function allows you to filter the cleaned data to find the rentals meeting your specifications. The inputs of this function include: the data_cleaner() generated Pandas DataFrame object along with the numeric values for the minimum price, maximum price, mimimum square feet, minimum number of bedrooms, and a string of the city name of the desired rentals. It ouputs a Pandas DataFrame object with the matching results.

from pyhousehunter import filter
# Filter data based on preferences 
filtered_data = filter.data_filter(df = cleaned_data, min_price = 1500, max_price = 2000, sqrt_ft = 500, num_bedroom = 1, city_name = "Vancouver")

Emailing

At this stage, your filtered results are ready to be emailed to your inbox in a .csv. You will need to input your email address and the filtered dataframe. You have the choice to change the optional email_subject argument to set your own email subject. Once the email has been sent, the function will let you know that the "Email has been successfully sent". If there was a problem in sending the email the function will print "The email was not sent. The following SMTP error occurred in the process: <error>". We hope this package has facilitated your house-hunting.

from pyhousehunter import emailer
# Send email 
emailer.send_email(email_recipient  = "[email protected]", filtered_data  = filtered_data, email_subject =  "Results from Saturday March 13th")

Documentation

The official documentation is hosted on Read the Docs: https://pyhousehunter.readthedocs.io/en/latest/

Contributors

The names and GitHub handles of core development team is listed below.

Name Github Handle
Alex Truong Hai Yen athy9193
Ela Bandari elabandari
Elina Lin elina-linglin
Junting He juntinghe

We welcome and recognize all contributions. You can see a list of current contributors in the contributors tab.

Credits

This package was created with Cookiecutter and the UBC-MDS/cookiecutter-ubc-mds project template, modified from the pyOpenSci/cookiecutter-pyopensci project template and the audreyr/cookiecutter-pypackage.