This Python script will build a zip archive of some of the most important Github repositories using Github's API to access the contents. The files are then concatenated by programming language, as determined by Github's linguist repo.
The result is a corpus organized by programming languages and suitable fori, training or evaluating keyboard layouts and other methods. All the code in the corpus remains under its original copyright.
Both the excellent requests package and pyyaml library are necessary to run this script (written for Python 3).
This code is released under the MIT license