Skip to content

A Rust pipeline for extracting HUMONGOUS, a dataset of web-based text extracted from Common Crawl and ready for multilingual language modeling.

Notifications You must be signed in to change notification settings

cfoster0/humongous-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 

About

A Rust pipeline for extracting HUMONGOUS, a dataset of web-based text extracted from Common Crawl and ready for multilingual language modeling.

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages