Aim to build a database of famous places
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes
Things you need to install and how to install them
Python 3.+
Scrapy
Python: Python 3.+. Select the python distribution for your machine and follow the installation guide there.
Scrapy: Latest version. Follow the instruction there to download and install latest scrapy on your local machine.
Summary: My goal for this repo is to scrape some data from a movie website using scrapy.
Webiste: link. This is the root link for my test.
1, To get a copy of this repo, open terminal and navigate to a location and run:
git clone https://github.com/otakenz/famous_places_scrape.git
2, To check out the source code for web crawling and scraping, on terminal: Navigate to the spider folder, it will look something like this
user//xxx/famous_place/famous_place/spider/
3, There you could see 2 output files in json format, namely famous_locations.json and famous_movies_and_location.json
4, If you wish to test the 2 python script, on the same path, run:
scrapy crawl filming_locations_spider
scrapy crawl locations_spider
- If you wish to output it to json file, run
scrapy crawl filming_locations_spider -o YOUR_FILE_NAME.json
scrapy crawl locations_spider -o YOUR_FILE_NAME.json
Note: scrapy support other format such as xml and csv
filming_locations_spider is programmed to index all the website links on the target webpage
locations_spider is programmed to populate the database with country state, movie title and film location.
Note: Running step 4 will takes some time, if you wish to exit, simple press keyboard "ctrl + z"
Scrapy documentation: link