Skip to content

This is a canonical example of a data science project, which can be used for testing various purposes for the AI Validation team

License

Notifications You must be signed in to change notification settings

MinBZK/example-data-science-project

Example Data Science Project

Description

This repository contains a typical example of a data science project written in Python and is used for testing purposes when we need to have a more realstic example of an algorithm in use by the Dutch Government.

The typical steps are:

  1. exploratory data analysis
  2. prepping data
  3. training models
  4. evaluating models
  5. serving a result
  6. monitoring

Building an algorithm is a non-linear process, because of dependencies and cycles in the above-mentioned steps. Because of this, it is hard to define this in a standard project template. Therefore, I have chosen to make a single Python file and comment out the chain-of-thought that one could have in order to e.g. mitigate bias or choosing a specific model. So running __main__.py will make a model and does inference on it without anything to return.

Kaggle

Before you can use this project you need a kaggle account and create a token. You can store the token in ~/.kaggle/kaggle.json

Tools used

Different frameworks are used for different stages in the project. For the exploratory data analyses PyCaret is used to very quickly check a bunch of models to give an indication what kind of model to train for production purposes later on. For the "production" model, scikitlearn is used to generate a non-deep machine learning model. For the analysis on fairness and mitigation of bias both FairLearn and AIF360 are used. Possible extensions are:

  • For the analysis on data drift for the monitoring of the model evidentlyai, now just a very small part has been used

  • For experiment tracking and logging with MLflow.

This repository is inspired by the Thesis of Guusje Juijn

About

This is a canonical example of a data science project, which can be used for testing various purposes for the AI Validation team

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages