Sort, grep and join files where every line is JSON. JSON has long surpassed CSV as the most common, useful and robust format in much of the code that I encounter but as yet it has no parallel to the array of fine tuned command line tools that target CSV. This collection fills some of the gaps.
The JSON lines format is formally described at http://jsonlines.org/. Other pages of interest are Wikipedia and Newline Delimited JSON.
Install node, if you don't have it already:
which node || curl "https://gist.githubusercontent.com/bitdivine/309a1594e891dec70461/raw/4a96a04dfa179eee531647347c485a8750b9ea66/install-nodejs.sh" > node-installer.sh
sudo sh node-installer.sh
Install jline with the node package manager:
sudo npm install -g jline
Check that jline is in your PATH:
which jline-pretty
If jlin-pretty is not found, have a look at the troubleshooting page.
Assume a file dat.jsonl
with lines:
{"foo":{"bar":9,"bat":49},"mitz":"ding"}
{"foo":{"bar":4,"bat":9},"mitz":"do"}
{"foo":{"bar":6,"bat":17},"mitz":"mnogo"}
Then:
cat dat.jsonl | jline-clean # Removes all but clean JSON lines
cat dat.jsonl | jline-sort foo.bar # Sorts on the key foo.bar
cat dat.jsonl | jline-filter 'foo.bat>9' # Keeps just those lines
cat dat.jsonl | jline-select foo.bar:bar foo.bat:bat mitz # Selects and renames fields
cat dat.jsonl | jline-pretty # Pretty-prints the JSON
# For a full awk-like tool, write javascript:
cat dat.jsonl | jline-foreach 'console.log(Math.round(record.foo.bar / record.foo.bat))'
cat dat.jsonl | jline-foreach begin::global.sum=0 sum+=record.foo.bar end::'emit(sum)'
cat dat.jsonl | jline-foreach 'begin::global.c=require("crypto")' 'record.hash=c.createHash("sha512").update(record.passwd).digest("hex");emit(record)'
echo '{"url":"http://winning.gold"}' | jline-foreach 'beg::req=require("request-promise")' 'req(record.url).then(console.log)'
... etc
# Import from other formats:
cat data.csv | jline-csv2jl # Convert CSV into lines of JSON dictionaries
cat data.csv | jline-csv2jla # Convert CSV into lines of JSON arrays
echo "SELECT 1,2,3;" | mysql | jline-mysql2jl # Get data from MySQL, one line per record
jline-json2jl -a --path=data.sales dump.json # Extract an array from inside a JSON blob
# Export to other formats:
cat data.jline | jline-csv
Use --help
or refer to the markdown files:
- clean - keeps only well formed JSON lines
- filter - Keeps only the records you choose.
- parsePath
'abba.cadabba[4].u' -> ['abba','cadabba',4,'u']
- sort - sorts by a given key.
- select - Selects just a few fields from each record.
- breakdown by - Sums fields by a breakdown.
- foreach - Execute arbitrary code for each record. Awk for JSON lines.
- map - DEPRECATED as foreach is much more powerful without being more complicated.
- jline-mysql2jl - EXPERIMENTAL - mysql output to JSON dictionaries
- jline-mysql2jla - EXPERIMENTAL - mysql output to JSON arrays
- jline-json2jl - EXPERIMENTAL - JSON to JSON lines.
Hey dude! Help me out for a couple of 🍻!