SynthLlama

SynthLlama is a project for generating synthetic data using language models. It allows you to upload PDFs, select the desired data format, and specify the amount of data needed. As a result, you can obtain your dataset in either JSON or CSV format.

Introduction

Large models require substantial data, and collecting it manually is not always feasible. At this point, synthetic data plays a critical role in supplementing training data where it is lacking. Our goal in this project is to address this issue by enhancing models with synthetic data, thus eliminating data scarcity as a limitation.

Installation

Clone the repository:

git clone https://github.com/cows-cats/SynthLlama.git

cd SynthLlama

pip install -r requirements.txt

Usage

first terminal

python api.py

second terminal

streamlit run streamlit1.py

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
api.py		api.py
dockerfile		dockerfile
formats.py		formats.py
requirements.txt		requirements.txt
streamlit1.py		streamlit1.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SynthLlama

Table of Contents

Introduction

Installation

Usage

first terminal

second terminal

About

Releases

Packages

Contributors 2

Languages

cows-cats/SynthLlama

Folders and files

Latest commit

History

Repository files navigation

SynthLlama

Table of Contents

Introduction

Installation

Usage

first terminal

second terminal

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages