GitHub - Yeoonsu/dissertation: This repository contains the full implementation, data, and results supporting the dissertation: "Optimized Data Preparation Pipelines for Predictive Process Monitoring Using LLMs."

Dissertation Repository Description

Repository Name: Optimized Data Preparation for Predictive Process Monitoring Using Large Language Models (LLMs)

Overview

This repository contains the full implementation, data, and results supporting the dissertation:
"Optimized Data Preparation Pipelines for Predictive Process Monitoring Using LLMs."

Predictive Process Monitoring (PPM) relies heavily on high-quality event logs for accurate predictions and insights. This research explores the potential of Large Language Models (LLMs) to address challenges in data preparation, including missing values, synonym transformations, and homonym ambiguity. The repository demonstrates how LLM-based imputation enhances data quality, resulting in improved PPM performance.

Key Features

LLM-Based Data Imputation: Python scripts for restoring missing values and handling semantic variability in event logs using LLMs like GPT or LLaMA.
Synonym and Homonym Transformation Tools: Utilities for generating transformed datasets to simulate real-world challenges.
PPM Model Evaluation: End-to-end pipeline to train and evaluate PPM models using both classic and LLM-enhanced datasets.
Performance Metrics and Analysis: Detailed metrics (accuracy, F1-score) and visualizations comparing LLM-based and traditional methods.

Repository Structure

📂 datasets/
    ├── credit/        # Original and transformed event logs for the Credit dataset
    ├── pub/           # Original and transformed event logs for the Pub dataset

📂 models/
    ├── llm/           # Scripts and checkpoints for fine-tuning LLMs
    ├── traditional/   # Scripts for rule-based and statistical imputation methods

📂 experiments/
    ├── preprocessing/ # Synonym and homonym transformation scripts
    ├── training/      # Code for training PPM models on restored datasets
    ├── evaluation/    # Evaluation scripts and metrics visualization

📂 results/
    ├── tables/        # Tabulated metrics for each experiment
    ├── figures/       # Visualizations (accuracy, F1-score trends)

📂 thesis/
    ├── latex/         # LaTeX source files for the dissertation
    ├── figures/       # Diagrams and images used in the thesis
    ├── bibliography/  # References and citations

How to Use (It should be revised. !!!!!!!!!!!!)

Clone the Repository:

git clone https://github.com/Yeoonsu/dissertation.git
cd dissertation

Install Dependencies:
```
pip install -r requirements.txt
```

Run Experiments:

Generate transformed datasets:

python experiments/preprocessing/transform_data.py --dataset credit --transformation synonym

Train a PPM model:

python experiments/training/train_ppm.py --dataset credit --method llm

Evaluate performance:

python experiments/evaluation/evaluate_model.py --dataset credit --method llm

Visualize Results:

Accuracy and F1-score plots:

python experiments/evaluation/plot_results.py --output results/figures/

Highlights

Reproducibility: All experiments are fully documented and reproducible.
Data Diversity: Includes two datasets (Credit and Pub) with varied linguistic challenges.
Open Access: Researchers can adapt the pipeline for domain-specific PPM tasks.

Citation

If you use this repository, please cite:

@phdthesis{yeonsu2024thesis,
  title     = {Optimized Data Preparation Pipelines for Predictive Process Monitoring Using LLMs},
  author    = {Yeonsu Kim},
  year      = {2024},
  school    = {UNIST},
  url       = {https://github.com/Yeoonsu/dissertation}
}

License

This project is licensed under the MIT License.

Contact

For questions or collaborations, please contact:
[Yeonsu Kim] - [email protected]
GitHub Profile

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dissertation Repository Description

Overview

Key Features

Repository Structure

How to Use (It should be revised. !!!!!!!!!!!!)

Highlights

Citation

License

Contact

About

Releases

Packages

License

Yeoonsu/dissertation

Folders and files

Latest commit

History

Repository files navigation

Dissertation Repository Description

Overview

Key Features

Repository Structure

How to Use (It should be revised. !!!!!!!!!!!!)

Highlights

Citation

License

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages