redfin_crawler

Introduction

This crawler will scrape all houses sold in past 3 months in the U.S and save them into "gp/gp/houses1.csv" and "gp/gp/houses2.csv" which is really easy to store into either MongoDB or MySQL. Temporary downloaded files can be found in "gp/gp/spiders/download" It will scrape following fields:

url to the house
sold date
price
baths
beds
yearbuilt
sqft
lotsize
type
daysonmarket
state
county
city
address
latitude
longitude
zipcode However, the website may miss some of the fields and in this condition the field will leave blank.

Prerequisites

Before running "main.py", please install following packages:

scrapy
csv
selenium
re
time
newest version of firefox

CRAWLER STRUCTURE

This crawler is scrapy-selenium combined.RedfinSpider yields requests asychronously and process responses from the downloader middleware, this middleware mainly uses selenium.webdriver to generate response(in this way, we can bypass crawler blocking mechanism of redfin.com and get html as we want) The webdriver is instantiated in the spider and function as a downloader in the form of Firefox browser.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.idea		.idea
gp		gp
README.md		README.md
geckodriver		geckodriver
geckodriver.log		geckodriver.log
log.text		log.text
redfinCookies.json		redfinCookies.json
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

redfin_crawler

Introduction

Prerequisites

CRAWLER STRUCTURE

About

Releases

Packages

Languages

a9x26j8i/redfin_crawler

Folders and files

Latest commit

History

Repository files navigation

redfin_crawler

Introduction

Prerequisites

CRAWLER STRUCTURE

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages