This is a python based interface layer for storing and retreiving documents
from a pymongo database. The major modules uses are pymongo
, gridfs
,
pickle
, numpy
.
There are some restrictions to work around for using mongodb
to store typical scientific data structures (typically numpy arrays). First,
documents have to be less than 16MB, and the default python MongoDB driver
doesn't automatically store custom objects in a document. We use MongoDB's
custom filesystem, GridFS
, to store the numpy arrays behind the scenes.
However, when storing and retrieving your document, you'll never have to
think about this.
Core methods:
save(dictionary)
any python dictionary, even with NumPy arrays as values
load(query)
, where query is a regular pymongo query
delete(objectId)
Usage:
import monogowrapper as mdb
db = mdb.MongoWrapper(dbName='test',
collectionName='test_collection',
hostname="localhost",
port="27017")
my_dict = {"name": "Important experiment",
"data":np.random.random((100,100))}
The dictionary's just as you'd expect it to be:
print my_dict
{'data': array([[ 0.773217, 0.517796, 0.209353, ..., 0.042116, 0.845194,
0.733732],
[ 0.281073, 0.182046, 0.453265, ..., 0.873993, 0.361292,
0.551493],
[ 0.678787, 0.650591, 0.370826, ..., 0.494303, 0.39029 ,
0.521739],
...,
[ 0.854548, 0.075026, 0.498936, ..., 0.043457, 0.282203,
0.359131],
[ 0.099201, 0.211464, 0.739155, ..., 0.796278, 0.645168,
0.975352],
[ 0.94907 , 0.363454, 0.912208, ..., 0.480943, 0.810243,
0.217947]]),
'name': 'Important experiment'}
And saving is really super easy.
db.save(my_dict)
Loading the dictionary back in is just as super duper easy.
my_loaded_dict = db.load({"name":"Important experiment"})
You'll notice some extra stuff in the dictionary, along with everything we saved:
print my_loaded_dict
{u'_id': ObjectId('513797159ee8623b8e4c5868'),
u'_npObjectIDs': [ObjectId('513797159ee8623b8e4c5866')],
u'data': array([[ 0.464792, 0.356568, 0.941366, ..., 0.306581, 0.748432,
0.422235],
[ 0.328053, 0.374875, 0.42858 , ..., 0.151123, 0.224338,
0.355108],
[ 0.582536, 0.279675, 0.123347, ..., 0.608384, 0.829723,
0.551811],
...,
[ 0.647574, 0.890914, 0.249452, ..., 0.833569, 0.224882,
0.166035],
[ 0.360945, 0.426059, 0.112768, ..., 0.275226, 0.065119,
0.155346],
[ 0.278053, 0.532446, 0.592866, ..., 0.632853, 0.678774,
0.902657]]),
u'insertion_date': datetime.datetime(2013, 3, 6, 14, 20, 53, 293000),
u'name': u'Important experiment'}
The added keys are:
_npObjectIDs
- a list of all the GridFS IDs used to store the numpy arrays._id
- MongoDB id for the inserted document, automatically addedinsertion_date
- current date of insertion, useful for indexing