Skip to content

A data pipeline that ingests data from crichsheet.org, transforms the semi-structured to a relational database and does performance analysis via SQL

Notifications You must be signed in to change notification settings

d-s-dc/Cricsheet-ingest

Repository files navigation

Cricsheet-ingest 🏏

This code will download ODI data from the website, pre-process it using spark and finally transfer to MySQL DB for analysis

Requirements

  1. MySQL, Java & Python should’ve been installed. I cannot provide code for that since the process is different for different OS
  2. JAVA_HOME variable should be set
    • Ubuntu - then write the following on terminal
      JAVA_HOME=$(readlink -f $(which java) | sed -E 's/\/b.*//')
      echo "export JAVA_HOME=${JAVA_HOME} >> ~/.bashrc"
      
    • Mac - Link
    • Windows - Link

Running

Before running ensure that the MYSQL_PASSWORD variable is set to your root password in the .env file.

Open the folder on the terminal and run the following

pip install -r requirements.txt  && python main.py

About

A data pipeline that ingests data from crichsheet.org, transforms the semi-structured to a relational database and does performance analysis via SQL

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages