Skip to content

ch-ant/Pythia

 
 

Repository files navigation

Pythia

Java library that produces an automated statistical profile of an input dataset.

A standard dataset is just a text file, with lines, where each line is a record, the fields of which are separated by a separator (eg. tabs, comma, pipe, etc). After registering a dataset and declaring the desired data analysis methods that should get executed, the system produces a 100% automatic statistical profile of the dataset and generates reports of the findings.

Important Note


This fork is only meant to demonstrate the "Automated Highlight Identification in a Data Profiling System" diploma thesis. It is not meant for actual development.

The official Pythia development repo of the DAINTINESS-Group can be found here.

  • The thesis PDF (English) can be found at the root directory.
  • Thesis demonstrational video (Greek).

About

An automated dataset profiler

Resources

Stars

Watchers

Forks

Languages

  • Java 100.0%