Skip to content

Latest commit

 

History

History
36 lines (21 loc) · 1.31 KB

README.md

File metadata and controls

36 lines (21 loc) · 1.31 KB

sexting-dataset

Chats scraped from public resources on the internet

Structure

  • clean folder contains conversations from different websites in txt format.
  • raw folder contains the original files, some are images which I converted into text applying an OCR.
  • sexting_dataset.txt contains all the chats put together.

Bias and contributions

These are mostly cis-heterosexual conversations. If you have some data that might help to become this dataset bigger and/or more diverse feel free to contribute.

Sources

This is the list of websites where I took the data from. If some page dies try accessing with https://web.archive.org

Support my work

Mathias's open-source projects are supported by his ko-fi. If you found this project helpful, any monetary contributions are appreciated and will be put to good creative use.

License

MIT