Common Crawl Miner

This is a tool for mining parallel web pages from the CommonCrawl data hosted on AWS. It is based on the CommonCrawl example codebase:

This was developed during the 2012 Machine Translation Marathon by:

Herve Saint-Amand ([email protected]) Magdalena Plamada ([email protected]) Jason Smith ([email protected])

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
lib		lib
scripts		scripts
src		src
README.md		README.md
build.xml		build.xml

Provide feedback