This repository is a documentation of a research project I did into AI safety in imitating music artists, and cointains a Jupyter notebook (Song_Generation.ipynb
) for generating song lyrics using the dolly-v2-3b LLM. The notebook uses the Hugging Face transformers
library for fine-tuning the model.
Everything is 100% free to replicate using google colab compute and this dataset created. 💯
- Using a model that is trained on singing/rapping vocal data rather than plain voice data to improve the flow of the AI generated voice.
Before running the notebook, ensure that you have the required dependencies installed. You can install them using the following commands:
!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
!pip install -q datasets
Please note that there might be dependency conflicts, and you may need to resolve them based on the error messages.
The notebook downloads a dataset of Drake's lyrics from Kaggle using the Kaggle API. Make sure to provide your Kaggle API key and follow the instructions to secure it.
The notebook processes the downloaded lyrics data, removes empty entries, and generates prompts for each song using OpenAI's GPT-3.5-turbo model.
The notebook then loads the training data from a Google Sheets CSV that I have publicly postest for fine-tuning the model.
The model is fine-tuned on the prepared dataset using the DrakeTrainer
class, a custom trainer based on the transformers
library.
The notebook demonstrates song generation using prompts. It provides examples of generating lyrics based on a given prompt and showcases the generated results.
This project is licensed under the MIT License.
Feel free to explore, experiment, and create your own songs using this notebook!