Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add game lineups crawler #72

Merged
merged 8 commits into from
Sep 28, 2023
Merged

feat: add game lineups crawler #72

merged 8 commits into from
Sep 28, 2023

Conversation

LarchLiu
Copy link
Contributor

@LarchLiu LarchLiu commented Sep 28, 2023

Fix #59

This is extract from the lineups page.

Why use the lineups page to extract?

WX20230928-122451

In some cases, lineups' information is presented as shown in the image below.

  1. A Club has no data regarding Substitutes and there is no data about players' positions.
  2. B Club has no data regarding the positions of the starting lineup's players.

The structure of the data:
(If there is no data regarding a club's formation, such as for A Club, then use the players' position information to create data for the formation)

lineups: {
    href: '/spielbericht/aufstellung/spielbericht/3956749',
    home_club: {
      formation: '4-2-3-1',
      starting_lineup: [
        { number: '1', href: '/sergio-padt/profil/spieler/79573', name: 'Sergio Padt', team_captain: 0, position: 'Goalkeeper' },
        { number: '22', href: '/franco-russo/profil/spieler/378463', name: 'Franco Russo', team_captain: 0, position: 'Centre-Back' },
        { number: '32', href: '/igor-plastun/profil/spieler/97335', name: 'Igor Plastun', team_captain: 0, position: 'Centre-Back' },
        { number: '14', href: '/denny-gropper/profil/spieler/624717', name: 'Denny Gropper', team_captain: 0, position: 'Left-Back' },
        { number: '16', href: '/aslak-witry/profil/spieler/313944', name: 'Aslak Witry', team_captain: 0, position: 'Right-Back' },
        { number: '23', href: '/show/profil/spieler/568216', name: 'Show', team_captain: 0, position: 'Defensive Midfield' },
        { number: '30', href: '/pedro-naressi/profil/spieler/646679', name: 'Pedro Naressi', team_captain: 0, position: 'Defensive Midfield' },
        { number: '20', href: '/nonato/profil/spieler/546172', name: 'Nonato', team_captain: 0, position: 'Attacking Midfield' },
        { number: '37', href: '/bernard-tekpetey/profil/spieler/422380', name: 'Bernard Tekpetey', team_captain: 0, position: 'Left Winger' },
        { number: '11', href: '/kiril-despodov/profil/spieler/221540', name: 'Kiril Despodov', team_captain: 1, position: 'Right Winger' },
        { number: '9', href: '/igor-thiago/profil/spieler/739443', name: 'Igor Thiago', team_captain: 0, position: 'Centre-Forward' },
      ],
      substitutes: [
        { number: '12', href: '/simon-sluga/profil/spieler/188201', name: 'Simon Sluga', team_captain: 0, position: 'Goalkeeper' },
        { number: '67', href: '/damyan-hristov/profil/spieler/817632', name: 'Damyan Hristov', team_captain: 0, position: 'Goalkeeper' },
        { number: '5', href: '/georgi-terziev/profil/spieler/95790', name: 'Georgi Terziev', team_captain: 0, position: 'Centre-Back' },
        { number: '2', href: '/pipa/profil/spieler/423166', name: 'Pipa', team_captain: 0, position: 'Right-Back' },
        { number: '8', href: '/claude-goncalves/profil/spieler/280178', name: 'Claude Gon\u00E7alves', team_captain: 0, position: 'Defensive Midfield' },
        { number: '64', href: '/dominik-yankov/profil/spieler/541287', name: 'Dominik Yankov', team_captain: 0, position: 'Attacking Midfield' },
        { number: '17', href: '/jorginho/profil/spieler/277111', name: 'Jorginho', team_captain: 0, position: 'Left Winger' },
        { number: '90', href: '/spas-delev/profil/spieler/124869', name: 'Spas Delev', team_captain: 0, position: 'Left Winger' },
        { number: '10', href: '/matias-tissera/profil/spieler/503001', name: 'Mat\u00EDas Tissera', team_captain: 0, position: 'Centre-Forward' },
      ],
    },
    away_club: {
      formation: '4-3-3 Attacking',
      starting_lineup: [
        { number: '16', href: '/bart-verbruggen/profil/spieler/565093', name: 'Bart Verbruggen', team_captain: 0, position: 'Goalkeeper' },
        { number: '14', href: '/jan-vertonghen/profil/spieler/43250', name: 'Jan Vertonghen', team_captain: 1, position: 'Centre-Back' },
        { number: '56', href: '/zeno-debast/profil/spieler/548193', name: 'Zeno Debast', team_captain: 0, position: 'Centre-Back' },
        { number: '54', href: '/killian-sardella/profil/spieler/454336', name: 'Killian Sardella', team_captain: 0, position: 'Left-Back' },
        { number: '62', href: '/amir-murillo/profil/spieler/354482', name: 'Amir Murillo', team_captain: 0, position: 'Right-Back' },
        { number: '21', href: '/amadou-diawara/profil/spieler/355501', name: 'Amadou Diawara', team_captain: 0, position: 'Defensive Midfield' },
        { number: '61', href: '/kristian-arnstad/profil/spieler/581106', name: 'Kristian Arnstad', team_captain: 0, position: 'Central Midfield' },
        { number: '10', href: '/yari-verschaeren/profil/spieler/502302', name: 'Yari Verschaeren', team_captain: 0, position: 'Central Midfield' },
        { number: '29', href: '/mario-stroeykens/profil/spieler/588866', name: 'Mario Stroeykens', team_captain: 0, position: 'Left Winger' },
        { number: '36', href: '/anders-dreyer/profil/spieler/342389', name: 'Anders Dreyer', team_captain: 0, position: 'Right Winger' },
        { number: '13', href: '/islam-slimani/profil/spieler/174915', name: 'Islam Slimani', team_captain: 0, position: 'Centre-Forward' },
      ],
      substitutes: [
        { number: '30', href: '/hendrik-van-crombrugge/profil/spieler/137326', name: 'Hendrik Van Crombrugge', team_captain: 0, position: 'Goalkeeper' },
        { number: '26', href: '/colin-coosemans/profil/spieler/154351', name: 'Colin Coosemans', team_captain: 0, position: 'Goalkeeper' },
        { number: '5', href: '/moussa-ndiaye/profil/spieler/649022', name: 'Moussa N\'Diaye', team_captain: 0, position: 'Left-Back' },
        { number: '27', href: '/noah-sadiki/profil/spieler/727089', name: 'Noah Sadiki', team_captain: 0, position: 'Right-Back' },
        { number: '25', href: '/adrien-trebel/profil/spieler/159622', name: 'Adrien Tr\u00E9bel', team_captain: 0, position: 'Central Midfield' },
        { number: '71', href: '/theo-leoni/profil/spieler/418071', name: 'Th\u00E9o Leoni', team_captain: 0, position: 'Central Midfield' },
        { number: '18', href: '/majeed-ashimeru/profil/spieler/360140', name: 'Majeed Ashimeru', team_captain: 0, position: 'Central Midfield' },
        { number: '24', href: '/ishaq-abdulrazak/profil/spieler/775535', name: 'Ishaq Abdulrazak', team_captain: 0, position: 'Central Midfield' },
        { number: '23', href: '/henrik-bellman/profil/spieler/367980', name: 'Henrik Bellman', team_captain: 0, position: 'Right Midfield' },
        { number: '11', href: '/lior-refaelov/profil/spieler/24484', name: 'Lior Refaelov', team_captain: 0, position: 'Attacking Midfield' },
        { number: '32', href: '/nilson-angulo/profil/spieler/903611', name: 'Nilson Angulo', team_captain: 0, position: 'Attacking Midfield' },
        { number: '9', href: '/benito-raman/profil/spieler/112930', name: 'Benito Raman', team_captain: 0, position: 'Centre-Forward' },
      ],
    },
  },

@dcaribou
Copy link
Owner

Hey @LarchLiu, I've one suggestion for you, let me know what you think.

Instead of enhancing the games crawler, what do you think about creating a new one for the lineups (let's say we call it game_lineups?

The game_lineups crawler can take the output of the games crawler as its input, and extract the lineups in the exact same way as you already implemented here. So, you would run it in two steps

  1. Run the games crawler to get the list of games in (for example) a file → scrapy crawl games -a parents=samples/competitions.json > games.json
  2. Run the game_lineups crawler to get the lineups from those games → scrapy crawl game_lineups -a parents=samples/games.json > game_lineups.json

I see there are a couple of advantages of doing this

  • The games crawler code becomes smaller easier to follow
  • games and game_lineups can be executed independently, which means one can run and complete successfully even if the other one is failing

This was also the criteria for splitting the appearances scraping into its own crawler (it could've been part of the players crawler).

Let me know your thoughts.

@LarchLiu
Copy link
Contributor Author

Yes, I think the idea you proposed is much better. 🙌
Since I am not very familiar with this framework, I just tried to implement the functionality at first. So this PR is just a draft - I wanted to see what do you think about extract data from the lineups page(as #59 you want to scrape from game page) and how about the structure of the lineups' data.

@dcaribou
Copy link
Owner

Sure, I understand.

Well, what you implemented makes all the sense to me and I think extracting the data from the lineups tab (instead of the match sheet tab as I suggested in #59) is a great idea 😄

@LarchLiu LarchLiu marked this pull request as ready for review September 28, 2023 10:44
@LarchLiu LarchLiu changed the title feat: add lineups info feat: add game lineups crawler Sep 28, 2023
@LarchLiu
Copy link
Contributor Author

😱 OMG, successful check finnnnnally. I don't know how to write in python.

Copy link
Owner

@dcaribou dcaribou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

Just a minor detail: could you rename the class as suggested to keep it consistent?

tfmkt/spiders/game_lineups.py Outdated Show resolved Hide resolved
@dcaribou
Copy link
Owner

Great addition to the project 🚀
Many thanks again for your contribution @LarchLiu 🙏

Are you able to merge yourself?

@LarchLiu
Copy link
Contributor Author

No, i have no write access to this repo.

@dcaribou dcaribou merged commit 3e4ccb8 into dcaribou:main Sep 28, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scrape starting lineup from games page
2 participants