Skip to content

Benchmarks for the concatenation of a large quantity of CSV files using awk, csvtk, goawk, mlr, qsv, xsv, and a naive custom shell script.

License

Notifications You must be signed in to change notification settings

derekmahar/benchmark_cat_csv

Repository files navigation

benchmark_cat_csv

Benchmarks the concatenation of a large quantity of CSV files using Awk (The One True Awk), csvtk, GoAWK, Miller, qsv, xsv, and a naive custom shell script.

Dependencies

  • All of the scripts require GNU Bash.
  • cat_csv_awk requires awk.
  • cat_csv_csvtk requires csvtk.
  • cat_csv_custom requires GNU Bash read, GNU cat, GNU tail, and GNU xargs.
  • cat_csv_goawk requires goawk.
  • cat_csv_mlr requires mlr.
  • cat_csv_qsv requires qsv.
  • cat_csv_xsv requires xsv.
  • run_tests requires Hyperfine.

Generate Test Input CSV Files

Script generate_test_data creates the number of CSV files specified by argument FILE_COUNT, each containing a single row and column filled with the file number, in the directory specified by argument DATA_PATH. For example:

$ ./generate_test_data
Usage: ./generate_test_data [DATA_PATH] [FILE_COUNT]
$ generate_test_data data 100000
$ find data -name '*.csv' | sort | head -n 5; echo "..."; find data -name '*.csv' | sort | tail -n 5
data/0000001.csv
data/0000002.csv
data/0000003.csv
data/0000004.csv
data/0000005.csv
...
data/0099996.csv
data/0099997.csv
data/0099998.csv
data/0099999.csv
data/0100000.csv
$ cat data/0000001.csv
Column
1

About

Benchmarks for the concatenation of a large quantity of CSV files using awk, csvtk, goawk, mlr, qsv, xsv, and a naive custom shell script.

Topics

Resources

License

Stars

Watchers

Forks

Languages