diff --git a/docs/src/data-diving-examples.md b/docs/src/data-diving-examples.md index 4a62754030..39738f193d 100644 --- a/docs/src/data-diving-examples.md +++ b/docs/src/data-diving-examples.md @@ -271,19 +271,19 @@ The histogram shows the different distribution of 0/1 flags: mlr --opprint histogram -f flag,u,v --lo -0.1 --hi 1.1 --nbins 12 data/colored-shapes.dkvp
-bin_lo bin_hi flag_count u_count v_count --0.010000000000000002 0.09000000000000002 6058 0 36 -0.09000000000000002 0.19000000000000003 0 1062 988 -0.19000000000000003 0.29000000000000004 0 985 1003 -0.29000000000000004 0.39000000000000007 0 1024 1014 -0.39000000000000007 0.4900000000000001 0 1002 991 -0.4900000000000001 0.5900000000000002 0 989 1041 -0.5900000000000002 0.6900000000000002 0 1001 1016 -0.6900000000000002 0.7900000000000001 0 972 962 -0.7900000000000001 0.8900000000000002 0 1035 1070 -0.8900000000000002 0.9900000000000002 0 995 993 -0.9900000000000002 1.0900000000000003 4020 1013 939 -1.0900000000000003 1.1900000000000002 0 0 25 +bin_lo bin_hi flag_count u_count v_count +-0.1 0.000000000000000013877787807814457 6058 0 36 +0.000000000000000013877787807814457 0.10000000000000003 0 1062 988 +0.10000000000000003 0.20000000000000004 0 985 1003 +0.20000000000000004 0.30000000000000004 0 1024 1014 +0.30000000000000004 0.40000000000000013 0 1002 991 +0.40000000000000013 0.5000000000000001 0 989 1041 +0.5000000000000001 0.6000000000000002 0 1001 1016 +0.6000000000000002 0.7000000000000002 0 972 962 +0.7000000000000002 0.8000000000000002 0 1035 1070 +0.8000000000000002 0.9000000000000002 0 995 993 +0.9000000000000002 1 4020 1013 939 +1 1.1 0 0 25Look at univariate stats by color and shape. In particular, color-dependent flag probabilities pop out, aligning with their original Bernoulli probabilities from the data-generator script: diff --git a/docs/src/manpage.md b/docs/src/manpage.md index 8b0683d39c..70bb446bca 100644 --- a/docs/src/manpage.md +++ b/docs/src/manpage.md @@ -50,7 +50,7 @@ MILLER(1) MILLER(1) insertion-ordered hash map. This encompasses a variety of data formats, including but not limited to the familiar CSV, TSV, and JSON. (Miller can handle positionally-indexed data as a special case.) This - manpage documents mlr 6.5.0-dev. + manpage documents mlr 6.6.0. 1mEXAMPLES0m mlr --icsv --opprint cat example.csv @@ -197,7 +197,7 @@ MILLER(1) MILLER(1) most-frequent nest nothing put regularize remove-empty-columns rename reorder repeat reshape sample sec2gmtdate sec2gmt seqgen shuffle skip-trivial-records sort sort-within-records split stats1 stats2 step summary tac tail tee - template top utf8-to-latin1 unflatten uniq unsparsify + template top utf8-to-latin1 unflatten uniq unspace unsparsify 1mFUNCTION LIST0m abs acos acosh any append apply arrayify asin asinh asserting_absent @@ -2080,6 +2080,15 @@ MILLER(1) MILLER(1) With -n, produces only one record which is the unique-record count. With neither -c nor -n, produces unique records. + 1munspace0m + Usage: mlr unspace [options] + Replaces spaces in record keys and/or values with _. This is helpful for PPRINT output. + Options: + -f {x} Replace spaces with specified filler character. + -k Unspace only keys, not keys and values. + -v Unspace only values, not keys and values. + -h|--help Show this message. + 1munsparsify0m Usage: mlr unsparsify [options] Prints records with the union of field names over all input records. @@ -3135,7 +3144,7 @@ MILLER(1) MILLER(1) int: declares an integer local variable in the current curly-braced scope. Type-checking happens at assignment: 'int x = 0.0' is an error. - map + 1mmap0m map: declares a map-valued local variable in the current curly-braced scope. Type-checking happens at assignment: 'map b = 0' is an error. map b = {} is always OK. map b = a is OK or not depending on whether a is a map. @@ -3288,5 +3297,5 @@ MILLER(1) MILLER(1) - 2022-12-05 MILLER(1) + 2023-01-01 MILLER(1) diff --git a/docs/src/manpage.txt b/docs/src/manpage.txt index e0e99eb7d5..224be25208 100644 --- a/docs/src/manpage.txt +++ b/docs/src/manpage.txt @@ -29,7 +29,7 @@ MILLER(1) MILLER(1) insertion-ordered hash map. This encompasses a variety of data formats, including but not limited to the familiar CSV, TSV, and JSON. (Miller can handle positionally-indexed data as a special case.) This - manpage documents mlr 6.5.0-dev. + manpage documents mlr 6.6.0. 1mEXAMPLES0m mlr --icsv --opprint cat example.csv @@ -176,7 +176,7 @@ MILLER(1) MILLER(1) most-frequent nest nothing put regularize remove-empty-columns rename reorder repeat reshape sample sec2gmtdate sec2gmt seqgen shuffle skip-trivial-records sort sort-within-records split stats1 stats2 step summary tac tail tee - template top utf8-to-latin1 unflatten uniq unsparsify + template top utf8-to-latin1 unflatten uniq unspace unsparsify 1mFUNCTION LIST0m abs acos acosh any append apply arrayify asin asinh asserting_absent @@ -2059,6 +2059,15 @@ MILLER(1) MILLER(1) With -n, produces only one record which is the unique-record count. With neither -c nor -n, produces unique records. + 1munspace0m + Usage: mlr unspace [options] + Replaces spaces in record keys and/or values with _. This is helpful for PPRINT output. + Options: + -f {x} Replace spaces with specified filler character. + -k Unspace only keys, not keys and values. + -v Unspace only values, not keys and values. + -h|--help Show this message. + 1munsparsify0m Usage: mlr unsparsify [options] Prints records with the union of field names over all input records. @@ -3114,7 +3123,7 @@ MILLER(1) MILLER(1) int: declares an integer local variable in the current curly-braced scope. Type-checking happens at assignment: 'int x = 0.0' is an error. - map + 1mmap0m map: declares a map-valued local variable in the current curly-braced scope. Type-checking happens at assignment: 'map b = 0' is an error. map b = {} is always OK. map b = a is OK or not depending on whether a is a map. @@ -3267,4 +3276,4 @@ MILLER(1) MILLER(1) - 2022-12-05 MILLER(1) + 2023-01-01 MILLER(1) diff --git a/docs/src/operating-on-all-fields.md b/docs/src/operating-on-all-fields.md index 452f4486d9..476b685dd8 100644 --- a/docs/src/operating-on-all-fields.md +++ b/docs/src/operating-on-all-fields.md @@ -24,10 +24,9 @@ Suppose you want to replace spaces with underscores in your column names: cat data/spaces.csv
-a b c,def,g h i -123,4567,890 -2468,1357,3579 -9987,3312,4543 +column 1,column 2,column 3 +apple,ball,cat +dale egg,fish,galeThe simplest way is to use `mlr rename` with `-g` (for global replace, not just first occurrence of space within each field) and `-r` for pattern-matching (rather than explicit single-column renames): @@ -36,20 +35,18 @@ The simplest way is to use `mlr rename` with `-g` (for global replace, not just mlr --csv rename -g -r ' ,_' data/spaces.csv
-a_b_c,def,g_h_i -123,4567,890 -2468,1357,3579 -9987,3312,4543 +column_1,column_2,column_3 +apple,ball,cat +dale egg,fish,gale
mlr --csv --opprint rename -g -r ' ,_' data/spaces.csv
-a_b_c def g_h_i -123 4567 890 -2468 1357 3579 -9987 3312 4543 +column_1 column_2 column_3 +apple ball cat +dale egg fish galeYou can also do this with a for-loop: @@ -69,10 +66,9 @@ $* = newrec mlr --icsv --opprint put -f data/bulk-rename-for-loop.mlr data/spaces.csv
-a_b_c def g_h_i -123 4567 890 -2468 1357 3579 -9987 3312 4543 +column_1 column_2 column_3 +apple ball cat +dale egg fish gale## Bulk rename of fields with carriage returns diff --git a/docs/src/reference-verbs.md b/docs/src/reference-verbs.md index 1bbeb2e703..5fca68607f 100644 --- a/docs/src/reference-verbs.md +++ b/docs/src/reference-verbs.md @@ -4099,7 +4099,7 @@ The primary use-case is for PPRINT output, which is space-delimited. For example cat data/spaces.csv
-column 1, column 2, column 3 +column 1,column 2,column 3 apple,ball,cat dale egg,fish,gale@@ -4108,40 +4108,40 @@ dale egg,fish,gale mlr --icsv --opprint cat data/spaces.csv
-column 1 column 2 column 3 -apple ball cat -dale egg fish gale +column 1 column 2 column 3 +apple ball cat +dale egg fish gale
mlr --icsv --opprint cat data/spaces.csv
-column 1 column 2 column 3 -apple ball cat -dale egg fish gale +column 1 column 2 column 3 +apple ball cat +dale egg fish gale
mlr --icsv --opprint unspace data/spaces.csv
-column_1 _column_2 _column_3 -apple ball cat -dale_egg fish gale +column_1 column_2 column_3 +apple ball cat +dale_egg fish gale
mlr --icsv --opprint unspace data/spaces.csv | mlr --ipprint --oxtab cat
-column_1 apple -_column_2 ball -_column_3 cat +column_1 apple +column_2 ball +column_3 cat -column_1 dale_egg -_column_2 fish -_column_3 gale +column_1 dale_egg +column_2 fish +column_3 gale## unsparsify diff --git a/docs/src/spaces.csv b/docs/src/spaces.csv index 6fc75cea31..50c2f89d06 100644 --- a/docs/src/spaces.csv +++ b/docs/src/spaces.csv @@ -3,4 +3,3 @@ Zone,Total MWh 17,39.8 24,7.4 30,50.5 - diff --git a/internal/pkg/go-csv/csv_reader.go b/internal/pkg/go-csv/csv_reader.go index 708e62fbde..507e9a94ca 100644 --- a/internal/pkg/go-csv/csv_reader.go +++ b/internal/pkg/go-csv/csv_reader.go @@ -473,4 +473,3 @@ parseField: } return dst, err } - diff --git a/internal/pkg/go-csv/csv_writer.go b/internal/pkg/go-csv/csv_writer.go index 4f352e68d8..ac64b4d54c 100644 --- a/internal/pkg/go-csv/csv_writer.go +++ b/internal/pkg/go-csv/csv_writer.go @@ -179,4 +179,3 @@ func (w *Writer) fieldNeedsQuotes(field string) bool { r1, _ := utf8.DecodeRuneInString(field) return unicode.IsSpace(r1) } - diff --git a/internal/pkg/version/version.go b/internal/pkg/version/version.go index 7f08f9bca9..96afe00cc0 100644 --- a/internal/pkg/version/version.go +++ b/internal/pkg/version/version.go @@ -4,4 +4,4 @@ package version // Nominally things like "6.0.0" for a release, then "6.0.0-dev" in between. // This makes it clear that a given build is on the main dev branch, not a // particular snapshot tag. -var STRING string = "6.5.0-dev" +var STRING string = "6.6.0" diff --git a/man/manpage.txt b/man/manpage.txt index e0e99eb7d5..224be25208 100644 --- a/man/manpage.txt +++ b/man/manpage.txt @@ -29,7 +29,7 @@ MILLER(1) MILLER(1) insertion-ordered hash map. This encompasses a variety of data formats, including but not limited to the familiar CSV, TSV, and JSON. (Miller can handle positionally-indexed data as a special case.) This - manpage documents mlr 6.5.0-dev. + manpage documents mlr 6.6.0. 1mEXAMPLES0m mlr --icsv --opprint cat example.csv @@ -176,7 +176,7 @@ MILLER(1) MILLER(1) most-frequent nest nothing put regularize remove-empty-columns rename reorder repeat reshape sample sec2gmtdate sec2gmt seqgen shuffle skip-trivial-records sort sort-within-records split stats1 stats2 step summary tac tail tee - template top utf8-to-latin1 unflatten uniq unsparsify + template top utf8-to-latin1 unflatten uniq unspace unsparsify 1mFUNCTION LIST0m abs acos acosh any append apply arrayify asin asinh asserting_absent @@ -2059,6 +2059,15 @@ MILLER(1) MILLER(1) With -n, produces only one record which is the unique-record count. With neither -c nor -n, produces unique records. + 1munspace0m + Usage: mlr unspace [options] + Replaces spaces in record keys and/or values with _. This is helpful for PPRINT output. + Options: + -f {x} Replace spaces with specified filler character. + -k Unspace only keys, not keys and values. + -v Unspace only values, not keys and values. + -h|--help Show this message. + 1munsparsify0m Usage: mlr unsparsify [options] Prints records with the union of field names over all input records. @@ -3114,7 +3123,7 @@ MILLER(1) MILLER(1) int: declares an integer local variable in the current curly-braced scope. Type-checking happens at assignment: 'int x = 0.0' is an error. - map + 1mmap0m map: declares a map-valued local variable in the current curly-braced scope. Type-checking happens at assignment: 'map b = 0' is an error. map b = {} is always OK. map b = a is OK or not depending on whether a is a map. @@ -3267,4 +3276,4 @@ MILLER(1) MILLER(1) - 2022-12-05 MILLER(1) + 2023-01-01 MILLER(1) diff --git a/man/mlr.1 b/man/mlr.1 index 4711b829f1..99c5f4aac0 100644 --- a/man/mlr.1 +++ b/man/mlr.1 @@ -2,12 +2,12 @@ .\" Title: mlr .\" Author: [see the "AUTHOR" section] .\" Generator: ./mkman.rb -.\" Date: 2022-12-05 +.\" Date: 2023-01-01 .\" Manual: \ \& .\" Source: \ \& .\" Language: English .\" -.TH "MILLER" "1" "2022-12-05" "\ \&" "\ \&" +.TH "MILLER" "1" "2023-01-01" "\ \&" "\ \&" .\" ----------------------------------------------------------------- .\" * Portability definitions .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -47,7 +47,7 @@ on integer-indexed fields: if the natural data structure for the latter is the array, then Miller's natural data structure is the insertion-ordered hash map. This encompasses a variety of data formats, including but not limited to the familiar CSV, TSV, and JSON. (Miller can handle positionally-indexed data as -a special case.) This manpage documents mlr 6.5.0-dev. +a special case.) This manpage documents mlr 6.6.0. .SH "EXAMPLES" .sp @@ -217,7 +217,7 @@ json-stringify join label latin1-to-utf8 least-frequent merge-fields most-frequent nest nothing put regularize remove-empty-columns rename reorder repeat reshape sample sec2gmtdate sec2gmt seqgen shuffle skip-trivial-records sort sort-within-records split stats1 stats2 step summary tac tail tee -template top utf8-to-latin1 unflatten uniq unsparsify +template top utf8-to-latin1 unflatten uniq unspace unsparsify .fi .if n \{\ .RE @@ -2604,6 +2604,21 @@ Options: .fi .if n \{\ .RE +.SS "unspace" +.if n \{\ +.RS 0 +.\} +.nf +Usage: mlr unspace [options] +Replaces spaces in record keys and/or values with _. This is helpful for PPRINT output. +Options: +-f {x} Replace spaces with specified filler character. +-k Unspace only keys, not keys and values. +-v Unspace only values, not keys and values. +-h|--help Show this message. +.fi +.if n \{\ +.RE .SS "unsparsify" .if n \{\ .RS 0 diff --git a/miller.spec b/miller.spec index 9dc84b0e9f..c396bcc38a 100644 --- a/miller.spec +++ b/miller.spec @@ -1,6 +1,6 @@ Summary: Name-indexed data processing tool Name: miller -Version: 6.5.0 +Version: 6.6.0 Release: 1%{?dist} License: BSD Source: https://github.com/johnkerl/miller/releases/download/%{version}/miller-%{version}.tar.gz @@ -36,6 +36,9 @@ make install %{_mandir}/man1/mlr.1* %changelog +* Sun Jan 1 2023 John Kerl