Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FBref] Scraper returns old season data #328

Closed
mhd0528 opened this issue Aug 16, 2023 · 2 comments
Closed

[FBref] Scraper returns old season data #328

mhd0528 opened this issue Aug 16, 2023 · 2 comments
Labels
FBref Issue or pull request related to the FBref scraper

Comments

@mhd0528
Copy link
Contributor

mhd0528 commented Aug 16, 2023

Hi,
I was trying to use the following commands to get data for this season (2023-24), but it actually gives me data from 1923-1924.

import soccerdata as sd
fbref = sd.FBref(leagues='ENG-Premier League', seasons='2023-24')
fbref.read_schedule()
                                                                 			week  day  ... notes   game_id
league             			season game                                                ...
ENG-Premier League 2324   1923-08-25 Arsenal-Newcastle Utd            1  Sat  ...  <NA>  9b8e5a81
   				                          1923-08-25 Birmingham-Aston Villa           1  Sat  ...  <NA>  1a908e21
   				                          1923-08-25 Blackburn-Chelsea                1  Sat  ...  <NA>  feb28dde
   				                          1923-08-25 Cardiff City-Bolton              1  Sat  ...  <NA>  6cfbbf26
   				                          1923-08-25 Everton-Nott'ham Forest          1  Sat  ...  <NA>  8931066c
   				...                                                                 ...  ...  ...   ...       ...
   				                          1924-05-03 Huddersfield-Nott'ham Forest    42  Sat  ...  <NA>  0964ff7d
   				                          1924-05-03 Manchester City-West Ham        42  Sat  ...  <NA>  a0a50c50
   				                          1924-05-03 Notts County-Liverpool          42  Sat  ...  <NA>  dafa1982
   				                          1924-05-03 Tottenham-Burnley               42  Sat  ...  <NA>  47a88b95
   				                          1924-05-03 West Brom-Sheffield Utd         42  Sat  ...  <NA>  7fc5fb2b

I have also tried to remove the cache or disable the cache before calling read_schedule(), but it can't find the new data.
I think there might be some wrong with the season parsing/reading part.
Maybe just modify the following line/file would work:

season_ends = date(datetime.strptime(season[-2:], "%y").year, 7, 1)

Thanks in advance!

@probberechts
Copy link
Owner

This is probably related to #97 and can be solved by invalidating the cache:

import soccerdata as sd
fbref = sd.FBref(leagues='ENG-Premier League', seasons='2023-24', no_cache=True)
fbref.read_schedule()

@probberechts probberechts added the FBref Issue or pull request related to the FBref scraper label Aug 21, 2023
@probberechts probberechts changed the title FBref returns old season data [FBref] Scraper returns old season data Aug 21, 2023
@mhd0528
Copy link
Contributor Author

mhd0528 commented Aug 25, 2023

Hi,
Yes, that totally solves the issue! Thanks so much for helping!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FBref Issue or pull request related to the FBref scraper
Projects
None yet
Development

No branches or pull requests

2 participants