Skip to content

Python code to build a corpus sorted by programming language from GitHub's "important" repos

License

Notifications You must be signed in to change notification settings

soli/proglang_corpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

Proglang Corpus

This Python script will build a zip archive of some of the most important Github repositories using Github's API to access the contents. The files are then concatenated by programming language, as determined by Github's linguist repo.

The result is a corpus organized by programming languages and suitable fori, training or evaluating keyboard layouts and other methods. All the code in the corpus remains under its original copyright.

Requirements

Both the excellent requests package and pyyaml library are necessary to run this script (written for Python 3).

License

This code is released under the MIT license

About

Python code to build a corpus sorted by programming language from GitHub's "important" repos

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages