scrapy-pokemon-web-crawler

Pokemon pokeapi scrapy demo project using spiders and simple yields to save data in .csv files

Motivation

Demonstrate how to fetch and crawl data from APIs REST using python. This Repository contains spiders that fetch and crawl data from REST APIs JSON responses and saves it to .csv files with headers.

How to DIY

clone this repository

git clone https://github.com/esperancaleonardo/scrapy-pokemon-web-crawler.git

create a python virtual env inside the repo, aside /crawler folder

cd scrapy-pokemon-web-crawler
python3 -m venv .venv

activate the virtual env and install project requirements.txt

source .venv/bin/activate
pip install -r requirements.txt

on root folder of the project /scrapy-web-crawler, first run the list pokemons spider, using -o flag to create a .csv file to output and save data

scrapy crawl list_pokemons -L INFO -o pokemons_list.csv

it will create a .csv with the structure (columns) below:

pokemon_id	pokemon_name	pokemon_api_url
1	bulbasaur	https://pokeapi.co/api/v2/pokemon/1/
...	...	...

then, on the same folder, run the pokemon spider to crawl pokemon info of all pokemons in the crawled .csv file

scrapy crawl pokemon -L INFO -o pokemons_info.csv

it will create a .csv with the structure (columns) below:

id	name	order	base_exp	sprite	hp	attack	defense	speed	height	weight
1	bulbasaur	1	64		45	49	49	45	7	69

Create new spiders to fetch other routes

generate the spider inside this repo folder

scrapy genspider <spidername> <domaintocrawl>

it will create a .py file with the spider given name and a spider template inside /spiders folder

import scrapy

# generated with  'scrapy genspider myspider exemple.com'
class MyspiderSpider(scrapy.Spider):
    name = 'myspider'
    allowed_domains = ['exemple.com']
    start_urls = ['http://exemple.com/']

    def parse(self, response):
        pass

Stack in project

Python version: 3.10.5

Libraries: [email protected], [email protected]

Author

@esperancaleonardo

References

Scrapy - Docs
Pandas - Docs

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
crawler		crawler
.gitignore		.gitignore
Readme-ptBR.md		Readme-ptBR.md
Readme.md		Readme.md
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scrapy-pokemon-web-crawler

Motivation

How to DIY

Create new spiders to fetch other routes

Stack in project

Author

References

About

Releases

Packages

Languages

esperancaleonardo/scrapy-pokemon-web-crawler

Folders and files

Latest commit

History

Repository files navigation

scrapy-pokemon-web-crawler

Motivation

How to DIY

Create new spiders to fetch other routes

Stack in project

Author

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages