New data-rearrangers: nest, shuffle, repeat; misc. features
Major features in this release:
mlr nest
is a companion tomlr reshape
which was introduced in Miller 3.4.0: it allows unpacking key-value pairs which are nested within field values, and repacking them. Please see http://johnkerl.org/miller/doc/reference.html#nest.mlr shuffle
is a simple output-record permutor: http://johnkerl.org/miller/doc/reference.html#shufflemlr repeat
can be used as a data-generator, to expand a few input records (or even a single one) into arbitrarily many. This is particularly useful in conjunction with pseudorandom-number generators. As well, it can be used to reconstruct individual samples from data which have been count-aggregated, so that statistics such asmode
, percentiles, etc. may be computed on them. Please see http://johnkerl.org/miller/doc/reference.html#repeat.mlr put
andmlr filter
now accept a-f {filename}
option, so that the DSL expression may be placed within a file instead of being typed out on the command line when desired. Please see http://johnkerl.org/miller/doc/reference.html#put and http://johnkerl.org/miller/doc/reference.html#filter.
Minor features:
put
/filter
DSL string literals now may include\t
,\"
, etc.: e.g.mlr put '$out = $left . "\t" . $right'
- There is now a
typeof
function for theput
/filter
DSLs:mlr put '$xtype = typeof($x)'
. This is occasionally useful for debugging type-conversion questions. - You may now do
mlr --nr-progress-mod 1000000 ...
to get something printed to stderr every 1000000th input record, and so on. For long-running aggregations on large input file(s), this can provide reassurance that processing is indeed proceeding apace. Example:
$ mlr --nr-progress-mod 100000 check data/big.dkvp
NR=100000 FNR=100000 FILENAME=data/big.dkvp
NR=200000 FNR=200000 FILENAME=data/big.dkvp
NR=300000 FNR=300000 FILENAME=data/big.dkvp
NR=400000 FNR=400000 FILENAME=data/big.dkvp
NR=500000 FNR=500000 FILENAME=data/big.dkvp
NR=600000 FNR=600000 FILENAME=data/big.dkvp
NR=700000 FNR=700000 FILENAME=data/big.dkvp
...
mlr cat -n
had a bug wherein it counted zero-up while its documentation claimed it counted one-up. Now it counts one-up as documented.