What you need to know:
- These files are not immediately usable, you have to replace URLs with those you want to use, and, especially with regard to the scraping file, it will not work if you don't use Eredivisie links without modification!
- The text files generated by the scraping file cannot be read into the cleaning file directly. You will have to use https://konklone.io/json/ or modify pandas' natural csv converting functions to fit the data source.
- There's tons of bad practice code in here! I know! I iterate through a dataframe instead of vectorizing, I index an iterable object unnecessarily, I import libraries more than once. I'll get around to fixing that ASAP, but I have a full time job and I wanted to get this out there as quickly as possible so other people could play with this stuff.
- None of the data has been used for material gain, just tinkering. If anyone wants the files taken down, I will take them down.
- The PV file that you'll need comes from here: https://github.com/Friends-of-Tracking-Data-FoTD/LaurieOnTracking/blob/master/EPV_grid.csv where Laurie Shaw has done some work that I can only dream of! Thank you Laurie!
- The minutes file you'll need you'll have to put together yourself. You'll need the player names, ids, and minutes played for each player in your sample saved as a csv to your desktop, otherwise per 90 calculations won't be possible
- All the PV data is ultimately aggregated, for me at least, in Tableau. So without aggregation this dataset could get wonky! You've been forewarned!
- Again, what you find in this repo is a work in progress. Cheers.
Housekeeping:
- Datacrape for PV is for retrieving event data that will later be used to attribute PV to events, players, and teams
- Cleaning xy data is for getting the event data cleaned up and applying PV values to said events, as I saw fit