GitHub - cho-amy/waffle-iron

Large Language Model (LLM) Processing Pipeline Project

Authors:

Freeman Chen
Abhi Erra
Amy Cho
Karthik Ayyalasomayajula
Ronel Solomon

Intro:

"Ever clicked on a headline so compelling that you just couldn't resist, only to find out the story was about as exciting as watching paint dry? 🎨 Welcome to the world of clickbait, the internet's version of 'bait and switch.' 🎣 But what if we told you there's a way to sift through the sensational to find the substantial? 🕵️‍♂️ Enter our project: a large language model (LLM) processing pipeline that doesn't just read between the lines—it reads between the clicks. 👀

Project Description

This repository contains all necessary code, documentation, and resources used in our research for building and automating a large language model processing pipeline. It is designed to serve as a practical framework for analyzing text data at scale, specifically targeting the identification and comparison of clickbait content in news articles.

Getting Started

Dependencies

Python 3.8
Apache Spark
MongoDB Atlas
Apache Airflow
Google Cloud Services (GCS)

Installation

Clone the repository to your local machine

https://github.com/cho-amy/waffle-iron

Configuration and Execution

Refer to the individual guides within the repository for configuring and executing each component of the pipeline:

API_gcs.py: Contains scripts for calling external APIs to gather data. Scripts and notebooks for cleaning and preprocessing raw text data.
aggregates_to_mongo.py: Documentation and configuration files for storing data in MongoDB Atlas.Manipulating and analyzing text data, including feature extraction and model training.
airflow_call.py: Configuration files and scripts for automating the pipeline using Apache Airflow.

ML images :

Similarity Scores:

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Images		Images
__pycache__		__pycache__
ml_data		ml_data
ml_pipline		ml_pipline
node_modules		node_modules
test notebook		test notebook
.DS_Store		.DS_Store
README.md		README.md
environment.yml		environment.yml
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Large Language Model (LLM) Processing Pipeline Project

Authors:

Intro:

Project Description

Getting Started

Dependencies

Installation

Configuration and Execution

ML images :

About

Releases

Packages

Contributors 5

Languages

cho-amy/waffle-iron

Folders and files

Latest commit

History

Repository files navigation

Large Language Model (LLM) Processing Pipeline Project

Authors:

Intro:

Project Description

Getting Started

Dependencies

Installation

Configuration and Execution

ML images :

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages