A standard dataset is just a text file, with lines, where each line is a record, the fields of which are separated by a separator (eg. tabs, comma, pipe, etc). After registering a dataset and declaring the desired data analysis methods that should get executed, the system produces a 100% automatic statistical profile of the dataset and generates reports of the findings.
This fork is only meant to demonstrate the "Automated Highlight Identification in a Data Profiling System" diploma thesis. It is not meant for actual development.
The official Pythia development repo of the DAINTINESS-Group can be found here.
- The thesis PDF (English) can be found at the root directory.
- Thesis demonstrational video (Greek).