This is a tool for mining parallel web pages from the CommonCrawl data hosted on AWS. It is based on the CommonCrawl example codebase:
https://github.com/commoncrawl/commoncrawl-examples
This was developed during the 2012 Machine Translation Marathon by:
Herve Saint-Amand ([email protected]) Magdalena Plamada ([email protected]) Jason Smith ([email protected])