Skip to content

Examples

ccoffey edited this page Jun 26, 2011 · 1 revision

Examples

For clarity's sake, each section of examples will contain a data set, some example queries and the output of each query.

data_set_0

#Import sql4csv and create a queryable data set from a .csv file.
from novacode import sql4csv
ds_0 = sql4csv('ds_0.csv')

#Extract two columns from the data set.
ds_0.query('select $fname, $lname')

data_set_0_query_0


#Extract two columns from the data set but this time rename them.
ds_0.query('select $fname as $first_name, $lname as $last_name')

data_set_0_query_1


#Combine two columns from the data set.
ds_0.query('select #0($fname, $lname) as $fullname', [lambda a, b: '%s %s' % (a, b)])

data_set_0_query_2


sql4csv is very flexible, each of the following queries produces the same result. You choose how you wish to write your queries, sql4csv won't impose too many restrictions on you.

#Extract two columns and their combination from the data set.
ds_0.query('select $fname, $lname, #0($fname, $lname) as $fullname', [lambda a, b: '%s %s' % (a, b)])
ds_0.query('select #0($fname as $fname, $lname as $lname) as $fullname', [lambda a, b: '%s %s' % (a, b)])

data_set_0_query_3


All of the examples so far have dealt with string data only, .csv files don't store type information. That's why the following query might not return what you expect.

ds_0.query('select #0($age, $fav_num) as $result', [lambda a, b: a + b])

data_set_0_query_3


If you want to play with non string types, then you have to tell sql4csv about them.

#Import sql4csv, create a queryable data set from a .csv file and specify the type for all non string fields.
from novacode import sql4csv
ds_0 = sql4csv('ds_0.csv', fieldtypes={'age': int, 'fav_num': float})

Now that sql4csv knows the type of age and fav_num, lets revisit the above example.

ds_0.query('select #0($age, $fav_num) as $result', [lambda a, b: a + b])

data_set_0_query_3


Filter the returned rows by using a where clause, once again --to show the flexibility of sql4csv-- each of the following queries returns the same answer.

ds_0.query('select * where $age = 21)

#0() is simply a function that returns 21.
ds_0.query('select * where $age = #0()', [lambda: 21])

#This is a contrived example, it simply shows the use of higher order functions. 
ds_0.query('select * where $age = #0(#1(), #2())', [lambda x, y: x * y, lambda: 7, lambda: 3])

data_set_0_query_6


You can have many conditions in your where clause joined using (and || or).

ds_0.query('select * where ($age = 21) or ($age = 23) and $fav_num > 2')

data_set_0_query_7


Working with very large .csv files.

All of the above queries have use query() which returns its results as a list of dictionaries. If you are processing a huge .csv file you probably don't want to do this. If you did, your program would block until query() returned and the memory requirements would be huge. Instead you should call lazy_query() which returns an iterator. Below is an example using lazy_query().

#Import sql4csv, create a queryable data set from this huge .csv file.
from novacode import sql4csv
ds_huge = sql4csv('ds_huge.csv', fieldtypes={'age': int, 'fav_num': float})

#Do a lazy query on this huge data set.
results = ds_huge.lazy_query('select * where $age > 18 and $age < 30')

#Iterate through the lazy query results. 
for row in results:
    #Do something here like: print or write to file.

Clone this wiki locally