GitHub - ch-ant/Pythia: An automated dataset profiler

Pythia

Java library that produces an automated statistical profile of an input dataset.

A standard dataset is just a text file, with lines, where each line is a record, the fields of which are separated by a separator (eg. tabs, comma, pipe, etc). After registering a dataset and declaring the desired data analysis methods that should get executed, the system produces a 100% automatic statistical profile of the dataset and generates reports of the findings.

Important Note

This fork is only meant to demonstrate the "Automated Highlight Identification in a Data Profiling System" diploma thesis. It is not meant for actual development.

The official Pythia development repo of the DAINTINESS-Group can be found here.

The thesis PDF (English) can be found at the root directory.
Thesis demonstrational video (Greek).

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
.mvn/wrapper		.mvn/wrapper
src		src
uml		uml
.gitignore		.gitignore
Automated_Highlight_Identification_Diploma_Thesis.pdf		Automated_Highlight_Identification_Diploma_Thesis.pdf
README.md		README.md
dependency-reduced-pom.xml		dependency-reduced-pom.xml
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pythia

Java library that produces an automated statistical profile of an input dataset.

Important Note

About

Languages

ch-ant/Pythia

Folders and files

Latest commit

History

Repository files navigation

Pythia

Java library that produces an automated statistical profile of an input dataset.

Important Note

About

Resources

Stars

Watchers

Forks

Languages