Python package for searching for housing on Craigslist.
- Author: Ela Bandari, Junting He, Ling (Elina) Lin, Alex Truong
Hunting for rentals can be an exhausting and frustrating experience in Canada, but this process can be made easy with a simple installation of our package. This Python package intends to facilitate the house hunting process by scraping the listing information from Craigslist and organizing the extracted data for the user. Instead of having to manually go on the website to catch up with individual new listings, the user will be updated through email with new results as per their selection criteria.
Function Name | Input | Output | Description |
---|---|---|---|
scraper | url, online | Pandas DataFrame | Scrape data from rental websites into a Pandas DataFrame |
data_cleaner | Pandas DataFrame | Pandas DataFrame | Clean the extracted data |
data_filter | Pandas DataFrame, min_price, max_price, sqrt_ft, num_bedroom, city_name | Pandas DataFrame | Filter the cleaned data set based on user inputs |
send_email | Pandas DataFrame, email address | csv file | Send the organized listing information to user email |
To the best of our knowledge, there is currently no existing Python package that simplifies the entire rental searching process with such a comprehensive functionality. This package takes care of all the steps including scraping rental websites, processing the data, and emailing users with the updated listing information. Plenty of general scraper packages exist in the Python ecosystem, but they lack the focus on house rental and emailing functionality, such as the following two: https://github.com/narfman0/craigslist-scraper and https://github.com/juliomalegria/python-craigslist.
$ pip install -i https://test.pypi.org/simple/ pyhousehunter
The pyhousehunter package contains the following four functions:
scraper()
The scraper function will scrape all listings available on a given craigslist housing url (e.g. https://vancouver.craigslist.org/d/apartments-housing-for-rent/search/apa).data_cleaner()
The data returned by the scraper is not very tidy so the data_cleaner function uses Pandas and Regex to create a cleaned Pandas DataFrame containing the scraped information.data_filter()
The data_filter function will filter the cleaned pandas dataframe based on the user's specifications. Users can specify their price range, minimum size, number of bedrooms, bathrooms, and desired municipality.send_email()
The send_email function sends the users an houses meeting their specification in a csv format. The user must specify a valid email address and has the option to change the email subject.
- python = ^3.8
- beautifulsoup4 = ^4.9.3
- requests = ^2.25.1
- pandas = ^1.2.3
- regex = ^2020.11.13
- geotext = ^0.4.0
- python-semantic-release = ^7.15.0
- pytest-cov = ^2.11.1
- pytest = ^6.2.2
- codecov = ^2.1.11
- flake8 = ^3.8.4
- Sphinx = ^3.5.2
- sphinxcontrib-napoleon = ^0.7
- nbsphinx = ^0.8.2
- ipykernel = ^5.5.0
The first function in our package is the scraper()
. Here you will input a Craigslist housing url for the main housing and apartment rentals page of Craigslist BC and designate the argument online = True
to scrape directly from the internet. When online = False
the scraper function will scrape from a local HTML file, this may be handy if the Craigslist website is down or for internal development and test. Please note that you cannot input the url for an individual listing.
from pyhousehunter import scraper
# Craiglist rental page url
url = "https://vancouver.craigslist.org/d/apartments-housing-for-rent/search/apa"
# Scrape Craigslist
scraped_data = scraper.scraper(url, online = True)
Our data_cleaner()
function is straightforward and powerful tool. It turns the raw Pandas DataFrame generated by the scraper()
function into a clean and tidy DataFrame object. It has a single input, which is the output of the scraper()
function.
from pyhousehunter import cleaner
# Clean the scraped data
cleaned_data = cleaner.data_cleaner(scraped_data)
The filter()
function allows you to filter the cleaned data to find the rentals meeting your specifications. The inputs of this function include: the data_cleaner()
generated Pandas DataFrame object along with the numeric values for the minimum price, maximum price, mimimum square feet, minimum number of bedrooms, and a string of the city name of the desired rentals. It ouputs a Pandas DataFrame object with the matching results.
from pyhousehunter import filter
# Filter data based on preferences
filtered_data = filter.data_filter(df = cleaned_data, min_price = 1500, max_price = 2000, sqrt_ft = 500, num_bedroom = 1, city_name = "Vancouver")
At this stage, your filtered results are ready to be emailed to your inbox in a .csv
. You will need to input your email address and the filtered dataframe. You have the choice to change the optional email_subject
argument to set your own email subject. Once the email has been sent, the function will let you know that the "Email has been successfully sent"
. If there was a problem in sending the email the function will print "The email was not sent. The following SMTP error occurred in the process: <error>"
. We hope this package has facilitated your house-hunting.
from pyhousehunter import emailer
# Send email
emailer.send_email(email_recipient = "[email protected]", filtered_data = filtered_data, email_subject = "Results from Saturday March 13th")
The official documentation is hosted on Read the Docs: https://pyhousehunter.readthedocs.io/en/latest/
The names and GitHub handles of core development team is listed below.
Name | Github Handle |
---|---|
Alex Truong Hai Yen | athy9193 |
Ela Bandari | elabandari |
Elina Lin | elina-linglin |
Junting He | juntinghe |
We welcome and recognize all contributions. You can see a list of current contributors in the contributors tab.
This package was created with Cookiecutter and the UBC-MDS/cookiecutter-ubc-mds project template, modified from the pyOpenSci/cookiecutter-pyopensci project template and the audreyr/cookiecutter-pypackage.