Skip to content

A wrapper for MongoDB that makes saving and loading NumPy arrays transparent.

Notifications You must be signed in to change notification settings

dattalab/mongowrapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mongo-numpy

A wrapper for MongoDB that makes saving and loading NumPy arrays transparent.

This is a python based interface layer for storing and retreiving documents from a pymongo database. The major modules uses are pymongo, gridfs, pickle, numpy.

There are some restrictions to work around for using mongodb to store typical scientific data structures (typically numpy arrays). First, documents have to be less than 16MB, and the default python MongoDB driver doesn't automatically store custom objects in a document. We use MongoDB's custom filesystem, GridFS, to store the numpy arrays behind the scenes. However, when storing and retrieving your document, you'll never have to think about this.

Core methods:

save(dictionary) any python dictionary, even with NumPy arrays as values
load(query), where query is a regular pymongo query
delete(objectId)

Usage:

import monogowrapper as mdb
db = mdb.MongoWrapper(dbName='test',
                      collectionName='test_collection', 
                      hostname="localhost", 
                      port="27017") 
my_dict = {"name": "Important experiment", 
            "data":np.random.random((100,100))}

The dictionary's just as you'd expect it to be:

print my_dict
{'data': array([[ 0.773217,  0.517796,  0.209353, ...,  0.042116,  0.845194,
         0.733732],
       [ 0.281073,  0.182046,  0.453265, ...,  0.873993,  0.361292,
         0.551493],
       [ 0.678787,  0.650591,  0.370826, ...,  0.494303,  0.39029 ,
         0.521739],
       ..., 
       [ 0.854548,  0.075026,  0.498936, ...,  0.043457,  0.282203,
         0.359131],
       [ 0.099201,  0.211464,  0.739155, ...,  0.796278,  0.645168,
         0.975352],
       [ 0.94907 ,  0.363454,  0.912208, ...,  0.480943,  0.810243,
         0.217947]]),
 'name': 'Important experiment'}

And saving is really super easy.

db.save(my_dict)

Loading the dictionary back in is just as super duper easy.

my_loaded_dict = db.load({"name":"Important experiment"})

You'll notice some extra stuff in the dictionary, along with everything we saved:

print my_loaded_dict
{u'_id': ObjectId('513797159ee8623b8e4c5868'),
 u'_npObjectIDs': [ObjectId('513797159ee8623b8e4c5866')],
 u'data': array([[ 0.464792,  0.356568,  0.941366, ...,  0.306581,  0.748432,
         0.422235],
       [ 0.328053,  0.374875,  0.42858 , ...,  0.151123,  0.224338,
         0.355108],
       [ 0.582536,  0.279675,  0.123347, ...,  0.608384,  0.829723,
         0.551811],
       ..., 
       [ 0.647574,  0.890914,  0.249452, ...,  0.833569,  0.224882,
         0.166035],
       [ 0.360945,  0.426059,  0.112768, ...,  0.275226,  0.065119,
         0.155346],
       [ 0.278053,  0.532446,  0.592866, ...,  0.632853,  0.678774,
         0.902657]]),
 u'insertion_date': datetime.datetime(2013, 3, 6, 14, 20, 53, 293000),
 u'name': u'Important experiment'}

The added keys are:

  • _npObjectIDs - a list of all the GridFS IDs used to store the numpy arrays.
  • _id - MongoDB id for the inserted document, automatically added
  • insertion_date - current date of insertion, useful for indexing

About

A wrapper for MongoDB that makes saving and loading NumPy arrays transparent.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages