Skip to content

Commit

Permalink
new commmand: fix-quotes, del-qoutes. #260
Browse files Browse the repository at this point in the history
  • Loading branch information
shenwei356 committed Nov 24, 2023
1 parent a7ff18b commit a913fad
Show file tree
Hide file tree
Showing 11 changed files with 606 additions and 16 deletions.
17 changes: 11 additions & 6 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,21 @@
- [csvtk v0.28.1](https://github.com/shenwei356/csvtk/releases/tag/v0.28.1)
[![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/csvtk/v0.28.1/total.svg)](https://github.com/shenwei356/csvtk/releases/tag/v0.28.1)
- [csvtk v0.29.0](https://github.com/shenwei356/csvtk/releases/tag/v0.29.0)
[![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/csvtk/v0.29.0/total.svg)](https://github.com/shenwei356/csvtk/releases/tag/v0.29.0)
- new commands:
- [`fix-quotes`](https://bioinf.shenwei.me/csvtk/usage/#fix-quotes): fix malformed CSV/TSV caused by double-quotes. [#260](https://github.com/shenwei356/csvtk/issues/260)
- [`del-quotes`](https://bioinf.shenwei.me/csvtk/usage/#del-quotes): remove extra double-quotes added by `fix-quotes`.
- `csvtk del-header`:
- fix deleting headers of 2nd and later files. [#257](https://github.com/shenwei356/csvtk/issues/257)
- `csvtk concat`:
- fix panic when no data found.
- `csvtk sort`:
- support column name containing colons. [#254](https://github.com/shenwei356/csvtk/issues/254)
- `csvtk filter2`:
- update doc: add the `in` keyword. [#195](https://github.com/shenwei356/csvtk/pull/195)
- fix specifying the position for the new column containing only a constant string. [#252](https://github.com/shenwei356/csvtk/issues/252)
- `csvtk plot`:
- add a new flag `--tick-label-size`.
- `csvtk del-header`:
- fix deleting headers of 2nd and later files. [#257](https://github.com/shenwei356/csvtk/issues/257)
- `csvtk concat`:
- fix panic when no data found.
- `csvtk pretty`:
- replace tabs with spaces.
- [csvtk v0.28.0](https://github.com/shenwei356/csvtk/releases/tag/v0.28.0)
[![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/csvtk/v0.28.0/total.svg)](https://github.com/shenwei356/csvtk/releases/tag/v0.28.0)
- `csvtk`:
Expand Down
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
# csvtk - a cross-platform, efficient and practical CSV/TSV toolkit

- **Documents:** [http://bioinf.shenwei.me/csvtk](http://bioinf.shenwei.me/csvtk/)
( [**Usage**](http://bioinf.shenwei.me/csvtk/usage/) and [**Tutorial**](http://bioinf.shenwei.me/csvtk/tutorial/)). [中文介绍](http://bioinf.shenwei.me/csvtk/chinese)
( [**Usage**](http://bioinf.shenwei.me/csvtk/usage/), [**Tutorial**](http://bioinf.shenwei.me/csvtk/tutorial/) and [**FAQs**](http://bioinf.shenwei.me/csvtk/faq/)).
[中文介绍](http://bioinf.shenwei.me/csvtk/chinese)
- **Source code:** [https://github.com/shenwei356/csvtk](https://github.com/shenwei356/csvtk) [![GitHub stars](https://img.shields.io/github/stars/shenwei356/csvtk.svg?style=social&label=Star&?maxAge=2592000)](https://github.com/shenwei356/csvtk)
[![license](https://img.shields.io/github/license/shenwei356/csvtk.svg?maxAge=2592000)](https://github.com/shenwei356/csvtk/blob/master/LICENSE)
- **Latest version:** [![Latest Stable Version](https://img.shields.io/github/release/shenwei356/csvtk.svg?style=flat)](https://github.com/shenwei356/csvtk/releases)
Expand Down Expand Up @@ -63,7 +64,7 @@ It could save you lots of time in (not) writing Python/R scripts.

## Subcommands

51 subcommands in total.
53 subcommands in total.

**Information**

Expand Down Expand Up @@ -108,6 +109,8 @@ It could save you lots of time in (not) writing Python/R scripts.
**Edit**

- [`fix`](https://bioinf.shenwei.me/csvtk/usage/#fix): fix CSV/TSV with different numbers of columns in rows
- [`fix-quotes`](https://bioinf.shenwei.me/csvtk/usage/#fix-quotes): fix malformed CSV/TSV caused by double-quotes
- [`del-quotes`](https://bioinf.shenwei.me/csvtk/usage/#del-quotes): remove extra double-quotes added by `fix-quotes`
- [`add-header`](https://bioinf.shenwei.me/csvtk/usage/#add-header): add column names
- [`del-header`](https://bioinf.shenwei.me/csvtk/usage/#del-header): delete column names
- [`rename`](https://bioinf.shenwei.me/csvtk/usage/#rename): renames column names with new names
Expand Down
125 changes: 125 additions & 0 deletions csvtk/cmd/del-quotes.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
// Copyright © 2016-2023 Wei Shen <[email protected]>
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in
// all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
// THE SOFTWARE.

package cmd

import (
"fmt"
"runtime"
"strings"
"unicode"
"unicode/utf8"

"github.com/shenwei356/xopen"
"github.com/spf13/cobra"
)

// delQuotesCmd represents the csv2tab command
var delQuotesCmd = &cobra.Command{
Use: "del-quotes",
Short: "remove extra double quotes added by 'fix-quotes'",
Long: `remove extra double quotes added by 'fix-quotes'
Limitation:
1. Values containing line breaks are not supported.
`,
Run: func(cmd *cobra.Command, args []string) {
config := getConfigs(cmd)
files := getFileListFromArgsAndFile(cmd, args, true, "infile-list", true)
if len(files) > 1 {
checkError(fmt.Errorf("no more than one file should be given"))
}
runtime.GOMAXPROCS(config.NumCPUs)

outfh, err := xopen.Wopen(config.OutFile)
checkError(err)
defer outfh.Close()

if config.Tabs {
config.Delimiter = '\t'
}

file := files[0]
csvReader, err := newCSVReaderByConfig(config, file)
if err != nil {
if err == xopen.ErrNoContent {
log.Warningf("csvtk csv2tab: skipping empty input file: %s", file)
return
}
checkError(err)
}

csvReader.Read(ReadOption{
FieldStr: "1-",
ShowRowNumber: config.ShowRowNumber,
})

d := string(config.Delimiter)
var i int
var v string
for record := range csvReader.Ch {
if record.Err != nil {
checkError(record.Err)
}
for i, v = range record.Selected {
// if fieldNeedsQuotes(v, config.Delimiter) {
if strings.Contains(v, d) {
record.Selected[i] = `"` + v + `"`
}
}
outfh.WriteString(strings.Join(record.Selected, d))
outfh.WriteByte('\n')
}

readerReport(&config, csvReader, file)
},
}

func init() {
RootCmd.AddCommand(delQuotesCmd)
}

// copy from https://cs.opensource.google/go/go/+/refs/tags/go1.21.4:src/encoding/csv/writer.go;l=157
func fieldNeedsQuotes(field string, comma rune) bool {
if field == "" {
return false
}

if field == `\.` {
return true
}

if comma < utf8.RuneSelf {
for i := 0; i < len(field); i++ {
c := field[i]
if c == '\n' || c == '\r' || c == '"' || c == byte(comma) {
return true
}
}
} else {
if strings.ContainsRune(field, comma) || strings.ContainsAny(field, "\"\r\n") {
return true
}
}

r1, _ := utf8.DecodeRuneInString(field)
return unicode.IsSpace(r1)
}
Loading

0 comments on commit a913fad

Please sign in to comment.