Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create spider for player transfers #89

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

BlckLvls
Copy link

Add PlayerTransfersSpider for scraping transfer data

Description

This pull request introduces a new spider, PlayerTransfersSpider, designed to scrape transfer history data for football players from transfermarkt.co.uk. The spider utilizes Scrapy framework and follows the structure of our BaseSpider class.

Features

  • Parses player profile pages to extract transfer history API URL
  • Makes API calls to retrieve detailed transfer data
  • Extracts key information including transfer dates, clubs involved, market values, and transfer fees

Implementation Details

  • Uses JSON parsing to handle API responses
  • Implements two main parsing methods: parse and parse_transfers
  • Includes Scrapy contracts for testing and documentation purposes

Please review and provide feedback, especially regarding the structure and any potential optimizations.

@dcaribou
Copy link
Owner

Hi @BlckLvls,

First of all, thanks a lot for your PR. Transfer history would be a really nice addition to the project, and what you've done here is smart use of scrapy to fetch it. Here's my feedback.

scrapy main purpose is to support scraping data from a website HTML, where you'd need to use css or xpath queries to extract parts of the HTML, follow links that you find there etc. See for example, the games spider.

For extracting data from a REST API I feel it's a bit of an overkill to use scrapy and you're probably better off by simply using a library like requests. I actually did something very similar to this for the marketValue endpoint (you can have a look at the code here) where I just used requests to simply query the endpoint and saved the results in a file. You are very welcome to contribute this functionality there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants