Skip to content

cows-cats/SynthLlama

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SynthLlama

SynthLlama is a project for generating synthetic data using language models. It allows you to upload PDFs, select the desired data format, and specify the amount of data needed. As a result, you can obtain your dataset in either JSON or CSV format.

image

Table of Contents

Introduction

Large models require substantial data, and collecting it manually is not always feasible. At this point, synthetic data plays a critical role in supplementing training data where it is lacking. Our goal in this project is to address this issue by enhancing models with synthetic data, thus eliminating data scarcity as a limitation.

Installation

Clone the repository:

git clone https://github.com/cows-cats/SynthLlama.git

cd SynthLlama

pip install -r requirements.txt

Usage

first terminal

python api.py 

second terminal

streamlit run streamlit1.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published