diff --git a/README.md b/README.md index 33ccced..687fa92 100644 --- a/README.md +++ b/README.md @@ -248,16 +248,36 @@ More [examples](http://shenwei356.github.io/csvtk/usage/) and [tutorial](http:// **Attention** -1. The CSV parser requires all the lines have same number of fields/columns. - Even lines with spaces will cause error. - Use '-I/--ignore-illegal-row' to skip these lines if neccessary. -2. By default, csvtk thinks your files have header row, if not, switch flag `-H` on. +1. By default, csvtk assumes input files have header row, if not, switch flag `-H` on. +2. By default, csvtk handles CSV files, use flag `-t` for tab-delimited files. 3. Column names better be unique. 4. By default, lines starting with `#` will be ignored, if the header row - starts with `#`, please assign flag `-C` another rare symbol, e.g. `'$'`. -5. By default, csvtk handles CSV files, use flag `-t` for tab-delimited files. -6. If `"` exists in tab-delimited files, use flag `-l`. -7. Do not mix use field (column) numbers and names. + starts with `#`, please assign flag `-C` another rare symbol, e.g. `$`. +5. Do not mix use field (column) numbers and names to specify columns to operate. +6. The CSV parser requires all the lines have same numbers of fields/columns. + Even lines with spaces will cause error. + Use `-I/--ignore-illegal-row` to skip these lines if neccessary. + You can also use "csvtk fix" to fix files with different numbers of columns in rows. +7. If double-quotes exist in fields not enclosed with double-quotes, e.g., + + x,a "b" c,1 + + It would report error: + + bare `"` in non-quoted-field. + + Please switch on the flag `-l` or use `csvtk fix-quotes` to fix it. + +8. If somes fields have only a double-quote eighter in the beginning or in the end, e.g., + + x,d "e","a" b c,1 + + It would report error: + + extraneous or missing " in quoted-field + + Please use `csvtk fix-quotes` to fix it, and use `csvtk del-quotes` to reset to the + original format as needed. Examples diff --git a/csvtk/cmd/fix-quotes.go b/csvtk/cmd/fix-quotes.go index 2a5c532..cfb070d 100644 --- a/csvtk/cmd/fix-quotes.go +++ b/csvtk/cmd/fix-quotes.go @@ -100,12 +100,13 @@ Limitation: var iLine int var reQuotedDelimiter = regexp.MustCompile(fmt.Sprintf(`(^|%c)".*%c.*"($|%c)`, d, d, d)) var hasQuotedDelimiter bool + var commentChar byte = byte(config.CommentChar) for scanner.Scan() { iLine++ line = scanner.Text() hasQuotedDelimiter = reQuotedDelimiter.MatchString(line) - if len(line) == 0 || line[0] == byte(config.CommentChar) { + if len(line) == 0 || line[0] == commentChar { outfh.WriteString(line) outfh.WriteByte('\n') diff --git a/csvtk/cmd/root.go b/csvtk/cmd/root.go index 3aed3a8..1b905d2 100644 --- a/csvtk/cmd/root.go +++ b/csvtk/cmd/root.go @@ -43,17 +43,27 @@ Source code: https://github.com/shenwei356/csvtk Attention: - 1. The CSV parser requires all the lines have same number of fields/columns. - Even lines with spaces will cause error. - Use '-I/--ignore-illegal-row' to skip these lines if neccessary. - You can also use 'csvtk fix' to fix files with different numbers of columns in rows. - 2. By default, csvtk thinks your files have header row, if not, switch flag "-H" on. + 1. By default, csvtk assumes input files have header row, if not, switch flag "-H" on. + 2. By default, csvtk handles CSV files, use flag "-t" for tab-delimited files. 3. Column names better be unique. 4. By default, lines starting with "#" will be ignored, if the header row starts with "#", please assign flag "-C" another rare symbol, e.g. '$'. - 5. By default, csvtk handles CSV files, use flag "-t" for tab-delimited files. - 6. If double quotes exist in fields, use flag "-l". - 7. Do not mix use field (column) numbers and names. + 5. Do not mix use field (column) numbers and names to specify columns to operate. + 6. The CSV parser requires all the lines have same numbers of fields/columns. + Even lines with spaces will cause error. + Use '-I/--ignore-illegal-row' to skip these lines if neccessary. + You can also use "csvtk fix" to fix files with different numbers of columns in rows. + 7. If double-quotes exist in fields not enclosed with double-quotes, e.g., + x,a "b" c,1 + It would report error: + bare " in non-quoted-field. + Please switch on the flag "-l" or use "csvtk fix-quotes" to fix it. + 8. If somes fields have only a double-quote eighter in the beginning or in the end, e.g., + x,d "e","a" b c,1 + It would report error: + extraneous or missing " in quoted-field + Please use "csvtk fix-quotes" to fix it, and use "csvtk del-quotes" to reset to the + original format as needed. Environment variables for frequently used global flags: diff --git a/doc/docs/usage.md b/doc/docs/usage.md index ecf02a4..4a85a9a 100644 --- a/doc/docs/usage.md +++ b/doc/docs/usage.md @@ -4,17 +4,36 @@ **Attention** -1. The CSV parser requires all the lines have same number of fields/columns. - Even lines with spaces will cause error. - Use '-I/--ignore-illegal-row' to skip these lines if neccessary. - You can also use 'csvtk fix' to fix files with different numbers of columns in rows. -2. By default, csvtk thinks your files have header row, if not, switch flag "-H" on. +1. By default, csvtk assumes input files have header row, if not, switch flag `-H` on. +2. By default, csvtk handles CSV files, use flag `-t` for tab-delimited files. 3. Column names better be unique. -4. By default, lines starting with "#" will be ignored, if the header row - starts with "#", please assign flag "-C" another rare symbol, e.g. '$'. -5. By default, csvtk handles CSV files, use flag "-t" for tab-delimited files. -6. If double quotes exist in fields, use flag "-l". -7. Do not mix use field (column) numbers and names. +4. By default, lines starting with `#` will be ignored, if the header row + starts with `#`, please assign flag `-C` another rare symbol, e.g. `$`. +5. Do not mix use field (column) numbers and names to specify columns to operate. +6. The CSV parser requires all the lines have same numbers of fields/columns. + Even lines with spaces will cause error. + Use `-I/--ignore-illegal-row` to skip these lines if neccessary. + You can also use "csvtk fix" to fix files with different numbers of columns in rows. +7. If double-quotes exist in fields not enclosed with double-quotes, e.g., + + x,a "b" c,1 + + It would report error: + + bare `"` in non-quoted-field. + + Please switch on the flag `-l` or use `csvtk fix-quotes` to fix it. + +8. If somes fields have only a double-quote eighter in the beginning or in the end, e.g., + + x,d "e","a" b c,1 + + It would report error: + + extraneous or missing " in quoted-field + + Please use `csvtk fix-quotes` to fix it, and use `csvtk del-quotes` to reset to the + original format as needed. @@ -109,17 +128,27 @@ Source code: https://github.com/shenwei356/csvtk Attention: - 1. The CSV parser requires all the lines have same number of fields/columns. - Even lines with spaces will cause error. - Use '-I/--ignore-illegal-row' to skip these lines if neccessary. - You can also use 'csvtk fix' to fix files with different numbers of columns in rows. - 2. By default, csvtk thinks your files have header row, if not, switch flag "-H" on. + 1. By default, csvtk assumes input files have header row, if not, switch flag "-H" on. + 2. By default, csvtk handles CSV files, use flag "-t" for tab-delimited files. 3. Column names better be unique. 4. By default, lines starting with "#" will be ignored, if the header row starts with "#", please assign flag "-C" another rare symbol, e.g. '$'. - 5. By default, csvtk handles CSV files, use flag "-t" for tab-delimited files. - 6. If double quotes exist in fields, use flag "-l". - 7. Do not mix use field (column) numbers and names. + 5. Do not mix use field (column) numbers and names to specify columns to operate. + 6. The CSV parser requires all the lines have same numbers of fields/columns. + Even lines with spaces will cause error. + Use '-I/--ignore-illegal-row' to skip these lines if neccessary. + You can also use "csvtk fix" to fix files with different numbers of columns in rows. + 7. If double-quotes exist in fields not enclosed with double-quotes, e.g., + x,a "b" c,1 + It would report error: + bare " in non-quoted-field. + Please switch on the flag "-l" or use "csvtk fix-quotes" to fix it. + 8. If somes fields have only a double-quote eighter in the beginning or in the end, e.g., + x,d "e","a" b c,1 + It would report error: + extraneous or missing " in quoted-field + Please use "csvtk fix-quotes" to fix it, and use "csvtk del-quotes" to reset to the + original format as needed. Environment variables for frequently used global flags: