This is a poorly structured repository where Kamu's team shares root dataset manifests for data we find useful or interesting.
We find the process of searching, ingesting, and cleaning data from the wild variety of formats that publishers use today the most tedious and boring part of data science. Eliminating it is one of the big goals of Kamu.
Until that happens - this repo serves as a great catalog of examples of how to deal with different ingestion scenarios.
We gladly accept PRs!