An NLP Text Generation Project borne out of the #60DaysOfUdacity Challenge during Facebook's Secure and Private AI Scholarship.
Sixty is a text generation AI trained on a text corpus of extracted messages from the Udacity's Slark Archive Channel called 60 Days of Udacity during the Facebook's Secure and Private AI Scholarship. As part of the #60DaysOfUdacity challenge, participants had to update their progress in the challenge course daily in the slack channel for 60 days at a stretch. Hence, the text generated by Sixty closely resembles the updates people were posting in the channel.
Sixty harnesses the power of OpenAI's GPT-2 model to generate text. Specifically, their smallest version called 117M. However, raw GPT-2 117M was not used for the task as the original model lacks the 'Finetuning' capability. Instead, a fork of GPT-2 by Max Woolf called gpt-2-simple was used.
This repository is structured in a rather unorthodox manner, not following the usual folder naming conventions. Instead, the folders are named and numbered in a sequential fashion informing anyone where to look at first, starting from the first folder.
The last two folders in this repository contain notebooks that lay out a straightforward template to train the GPT-2 on any text corpus. You simply need to plug-in the text corpus (can be a csv or text file), and the gpt-2-simple
library takes care of everything else. Read the markdowns and the comments in the notebook to know more.
Some example text generated by Sixty,
- day 2 - completed till deep learning with pytorch section 4 - completed 2 more steps and coded a second image recognition and classification notebook - continuing the deep learning with differential privacy lesson :female-student:
- day 9 of #60daysofudacity: - restarted the deep learning with pytorch course. - continued reading chapter 2 of the book "the algorithmic foundations of differential privacy" (i read it twice) - followed the suggested reading by google cloud, i was able to make some progress. i have already started with the intro to dl with pytorch course. i will continue tomorrow. - read "the algorithmic foundations of differential privacy" (i almost finished it!) - read "chapter 1: the promise of differential privacy" by cynthia dwork
- day 10: 1. i updated my #60daysofudacity project in github: 2. i finished lesson 6.6 and started working on the final project 6.7 which is going to be implemented using pytorch.
- day 11 :torch_heart_big: :torch_heart_big: day 10: completed lesson 6: differential privacy for deep learning :torch_heart_big:
In the examples above, notice the text styling as well as the insertion of emojis (:text between colons:) done by Sixty on its own. Sixty has even learned to use the proper hashtags!