Use SQL to query your CSV file, and return a new CSV.
- SQlite 3.6.16 or newer.
Add this line to your application's Gemfile:
gem 'csvsql'
And then execute:
$ bundle
Or install it yourself as:
$ gem install csvsql
name,total:int,created_at:datetime
a,12,2018-1-1 11:22
b,21,2018-3-1 10:20
c,39,2018-1-19 20:10
The first line is title for each columns. We use :
split the title name and column type. the column type is varchar(255)
if you did give a type.
There are same types: int
, integer
float
, double
, date
, datetime
, varchar(255)
.
All sqlite3 type is supported, but maybe some type cannot convert to a sqlite3 value type.
# csvsql [options] SQL
csvsql -h # Get help
$ csvsql -i mydata.csv "select * from csv where created_at > '2018-1-1' and total > 20" > /tmp/result.csv
# name:varchar(255),total:int,created_at:datetime
# b,21,2018-3-1 10:20:00
# c,39,2018-1-19 20:10:00
csvsql -i mydata.csv "select count(*) as total_record from csv where created_at > '2018-1-1'" > /tmp/result.csv
# total_record:integer
# 2
csvsql -i mydata.csv "select name, total from csv where total < 30" > /tmp/result.csv
# name:varchar(255),total:integer
# a,12
# b,21
If not give a csv by -i
, we will get the input from stdin.
csvsql -i mydata.csv "select name, total from csv where total < 30" | csvsql "select count(*) as count from csv"
# count:integer
# 2
It will save the parsed data to ~/.csvsql_cache
folder. it will not parse data again next time if the data file didn't change
csvsql -i large.csv -c "select count(*) from csv"
# the second, it will be fast.
csvsql -i large.csv -c "select count(*) from csv"
For multiple files, we should name each files. This name will be a table name
.
csvsql -i users.csv:users -i posts.csv:posts "select * from posts join users on posts.user_id = users.id where users.role = 'guest'"
With cache, those name will be a database name
, the table name is csv
. If a csv file was updated, its db will automatically update. Other files still use cache.
csvsql -i users.csv:users -i posts.csv:posts -c "select * from posts.csv join users.csv on posts.csv.user_id = users.csv.id where users.csv.role = 'guest'"
This command will remove all data in the ~/.csvsql_cache
csvsql --clear-cache
Data
title,desc,created_at:datetime
1asdfklajskdjfk alksd flka sdfkja sldfk ,lka sdfa sdlkfj alkr jl2kjlkajslkdfjak,2017-10-20 20:20
10,000 lines
$ time csvsql -i /tmp/a.csv "select count(*) from csv"
output
count(*)
100000
real 0m5.070s
user 0m4.776s
sys 0m0.256s
10,000 lines with cache
$ time csvsql -c -i /tmp/a.csv "select count(*) from csv"
Output
count(*)
100000
real 0m4.677s
user 0m4.309s
sys 0m0.253s
Second output
$ time csvsql -c -i /tmp/a.csv "select count(*) from csv"
count(*)
100000
real 0m0.502s
user 0m0.327s
sys 0m0.151s
- 1,000,000 lines
- 1,000,000 lines with cache
- Output columns have no a date type.
Because the SQLite3 version too old. Upgrate it.
- Make sure your code has some testing.
- Run
rubocop -a
before commit your code.
Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/csvsql.
The gem is available as open source under the terms of the MIT License. /