-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for compressed JSON (gz or bz2) in input #1
Comments
Mmh, what is preventing us to store the intermediate JSON in a compressed format and then un-compress it and stream it to |
Nothing is preventing us to do so. The main advantage of doing it directly in Go is performance IMHO. The bzip2 package of the Go standard library (http://golang.org/pkg/compress/bzip2/) implements the reader interface and the JSON decoder can directly read JSON from a reader. That way, the JSON decoder can uncompress and decode the JSON at the same time. Besides, it only takes few lines of code to implement that option. Since everything comes from the standard library, it does not require extra testing. So I see no reason not to implement it :) |
Fair enough. |
Yeah for sure :) |
I bet that the pure Go version will be faster. Even if the Go standard implementation is much slower than bzcat(1), in the end, the bzcat solution will need to read the bzipped file, output the uncompressed JSON and |
Since we are dealing with a huge amount of data, it is very slow to re-parse all the projects with the source code parsers everytime we update the source analyzer. Thus, it makes sense to store the intermediate JSON. However, the JSON files are really big and they use a lot of disk space so it would be useful to compress them.
The text was updated successfully, but these errors were encountered: