Skip to content

philshem/gmaps_popular_times_scraper

Repository files navigation

Scraper of Google Maps "Popular Times" for business entries

Turn this:

screenshot of google maps popular times

into a machine readable dataset. (This is really unofficial. YMMV.)

to get the code

git clone https://github.com/philshem/gmaps_popular_times_scraper.git
cd gmaps_popular_times_scraper

to configure the code

Install required packages (selenium and beautifulsoup4)

pip3 install -r requirements.txt

Modify these lines in the code config.py to point to your path of Chrome and chromedriver.

CHROME_BINARY_LOCATION = '/Applications/Google Chrome.app/Contents/MacOS/Google Chrome'
CHROMEDRIVER_BINARY_LOCATION = '/usr/local/bin/chromedriver'

Chromedriver downloads are here. Make sure you use the version that matches your Chrome version.

to run the code

Run the scraper by putting a URL as the system argument:

python3 scrape_gm.py "$URL_TO_CSV"

or specifically for a list of URLs stored in a google sheet

python3 scrape_gm.py "https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet={sheet_name}"

The URL should point to any CSV (local or http) that has as the first column a valid google maps url. For example, a valid google maps URL:

https://www.google.com/maps/place/Der+Gr%C3%BCne+Libanon/@47.3809042,8.5325368,17z/data=!3m1!4b1!4m5!3m4!1s0x47900a0e662015b7:0x54fec14b60b7f528!8m2!3d47.3809006!4d8.5347255

Or a shortened one is also valid:

https://goo.gl/maps/r2xowUB3UZX7ZL2u6

Note that the html page source can be saved to the folder html/ by setting the parameter in config.py. If the html files are saved as cache, with a timestamp for when they were retrieved, and can be cleaned out once in a while. Logs are saved to logs/, which makes an archive of the URLs retrieved based on the CSV input file.

results

The output data (sample_output.csv) has this structure (abbreviated):

place,url,scrape_time,day_of_week,hour_of_day,popularity_percent_normal,popularity_percent_current
AnRYn1F8NfSGLexf7,https://goo.gl/maps/AnRYn1F8NfSGLexf7,20200318_163629,Wednesday,13,38,
AnRYn1F8NfSGLexf7,https://goo.gl/maps/AnRYn1F8NfSGLexf7,20200318_163629,Wednesday,14,45,
AnRYn1F8NfSGLexf7,https://goo.gl/maps/AnRYn1F8NfSGLexf7,20200318_163629,Wednesday,15,61,
AnRYn1F8NfSGLexf7,https://goo.gl/maps/AnRYn1F8NfSGLexf7,20200318_163629,Wednesday,16,79,30
AnRYn1F8NfSGLexf7,https://goo.gl/maps/AnRYn1F8NfSGLexf7,20200318_163629,Wednesday,17,90,
AnRYn1F8NfSGLexf7,https://goo.gl/maps/AnRYn1F8NfSGLexf7,20200318_163629,Wednesday,18,88,

All technical timestamps, for example 20200318_163629, are in the machine's time. The hour from the column hour_of_day is in the local time of the mapped place.

Data in csv format is saved to data/. You can use the code (csv2sql.py) to convert to a SQLite3 database. Or this awk command to take all individual CSVs for each place and time, and write to one big CSV called all.csv

awk 'FNR==NR||FNR>2' data/*.csv > all.csv

dataviz

And to visualize the data for a week of one Kebab shop in Zürich (note that Friday at 12 is max crowd!)

shawarma popularity

About

Scraper for Google Maps "Popular Times" for place entries

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages