This repo contains my solutions to Modern Information Retrieval course projects.
The description of each phase of the project is in MIR_PHASE#
directories.
Spiders for crawling ResearchGate and Semantic Scholar are located in paper_crawler/paper_crawler/spiders
.
pip install -r requirements.txt
conda install -c numba icc_rt
- https://www.digitalocean.com/community/tutorials/how-to-crawl-a-web-page-with-scrapy-and-python-3
- https://blog.scrapinghub.com/building-spiders-made-easy-gui-for-your-scrapy-shell?_ga=2.255484751.654111236.1591029832-335430954.1591029832
- https://github.com/further-reading/scrapy-gui
- https://www.pythongasm.com/introduction-to-scrapy/
- https://docs.scrapy.org/en/latest/topics/request-response.html#topics-request-response-ref-request-callback-arguments