Benchmarks the concatenation of a large quantity of CSV files using Awk (The One True Awk), csvtk, GoAWK, Miller, qsv, xsv, and a naive custom shell script.
- All of the scripts require GNU Bash.
cat_csv_awk
requires awk.cat_csv_csvtk
requires csvtk.cat_csv_custom
requires GNU Bash read, GNU cat, GNU tail, and GNU xargs.cat_csv_goawk
requires goawk.cat_csv_mlr
requires mlr.cat_csv_qsv
requires qsv.cat_csv_xsv
requires xsv.run_tests
requires Hyperfine.
Script generate_test_data
creates the number of CSV files specified by argument FILE_COUNT, each containing a single row and column filled with the file number, in the directory specified by argument DATA_PATH. For example:
$ ./generate_test_data
Usage: ./generate_test_data [DATA_PATH] [FILE_COUNT]
$ generate_test_data data 100000
$ find data -name '*.csv' | sort | head -n 5; echo "..."; find data -name '*.csv' | sort | tail -n 5
data/0000001.csv
data/0000002.csv
data/0000003.csv
data/0000004.csv
data/0000005.csv
...
data/0099996.csv
data/0099997.csv
data/0099998.csv
data/0099999.csv
data/0100000.csv
$ cat data/0000001.csv
Column
1