JSON, reshape, regex captures, and more
Primary features:
- JSON is now a supported format for input and output. Miller handles tabular data, and JSON supports arbitrarily deeply nested data structures, so if you want general JSON processing you should use
jq
. But if you have tabular data represented in JSON then Miller can now handle that for you. Please see the reference page and the FAQ. - Reshape is a standard data-processing idiom, now available in Miller: http://johnkerl.org/miller/doc/reference.html#reshape
- Incidentally (not part of this release, but new since the last release) Miller is now available in FreeBSD's package manager: https://www.freshports.org/textproc/miller/. A full list of distributions containing Miller may be found here.
- Miller is not yet available from within Fedora/CentOS, but as a step toward this goal, an SRPM is included in this release (see file-list below).
DSL enhancements for mlr put
and mlr filter
:
- Regex captures
\0
through\9
: http://johnkerl.org/miller/doc/reference.html#Regex_captures - Ternary operator in expression right-hand sides: e.g.
mlr put '$y = $x < 0.5 ? 0 : 1'
- Boolean literals
true
andfalse
- Final semicolon is now allowed: e.g.
mlr put '$x=1;$y=2;'
- Environment variables are now accessible, where environment-variable names may be string literals or arbitrary expressions:
mlr put '$home = ENV["HOME"]'
ormlr put '$value = ENV[$name]'
. - While records are still string-to-string maps for input and output, and between
then
statements, types are preserved between multiple statements within aput
. Example:mlr put '$y = string($x); $z = $y . $y'
works as expected, without requringmlr put '$y = string($x); $z = string($y) . string($y)'
as before.
Bug fixes:
- Mixed-format join, e.g. CSV file joined with DKVP file, was incorrectly computing default separators (
IRS
,IFS
,IPS
). This resulted in records not being joined together. - Segmentation violation on non-standard-input read of files with size an exact multiple of page size and not ending in
IRS
, e.g. newline. (This is less of a corner case than it sounds: for example, leave a long-running program running with output redirected to a file, then in a sleep-and-process loop, have Miller process that file. The former program's stdio library will likely be doing block-sized buffered I/O, where block sizes will often be multiples of system page size and the block will almost surely not ending a newline.)
Acknowledgements: Big thank-yous to @gregfr and @aaronwolen for feature requests including reshape and regex captures, and to @jungle-boogie for his work getting Miller into FreeBSD. Also, ongoing thanks to @0-wiz-0 for his past work on configure support, making it possible for Miller to be put to use in multiple operating systems.