-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BSON object too large during storage of item to Girder #181
Comments
Does it need to do a conversion before saving? The problem is that it gets too large for the MongoDB record size, which Romanesco is using for its task queue. If there are no conversions it should work on larger file sizes. |
Is this happening inside romanesco or inside girder? Where do you see the error message? |
the output matrix does sit safely in the local storage after an analysis is finished, it is a table:rows object. So the limitation size is only during conversion / processing jobs. Oh. I was able to save it as a rows.json just now. To Zach’s question, I saw the error when I tried to save a large table.rows object as a CSV using the Flow UI. Jeff got it right that it was during Romanesco conversion. Won't I bump into this problem any time I need to run an analysis on the big dataset, though? Will converting the assetstore to files have any beneficial effect on this or is this a size limitation of objects Romanesco can work on ?
|
The real fix for this is that we shouldn't be passing data objects through the message queue, but rather downloading them via input specs, perhaps using the girder_io utility in romanesco. |
It's a Romanesco limitation, not a Girder one. Perhaps if we used a different message queue instead of MongoDB this problem would go away. We could also enforce that data is stored to Girder directly on upload (and store things directly to Girder after running an analysis) in original format before running analyses on them (Minerva has this sort of approach). However, this would not allow the "no login needed" method of running analyses that we wanted for Arbor. Odd that it is small enough to be sent back to the browser as table:rows. The same data in table:rows.json format must expand enough to go beyond the size limitation (16MB I believe?). |
16MB seems like a decent place to start enforcing login - perhaps we change it so data goes into a working area if the user is logged in, and if logged out they get a nicer message in this instance instructing them to log in to use larger file sizes. |
this does sound like a good strategy. So the near term workaround at the “analysis level" is to pass proxies between analyses (a table that includes the filename where the full data is stored). This seems like the only quick fix, as the assetstore architecture has no effect on this problem since it was a message queue thing.
|
Correct, changing the Girder assetstore should not affect this issue. |
I could also make a quick conversion utility script to support a workaround of "download, convert, upload, save without conversion" if that would be helpful. Local-machine conversions can of course go up to any size. Also, note that larger data can pass between steps of a workflow just fine - it's the final output that has size restrictions. |
If this isn't too much trouble, I'd appreciate the example of how this On Thu, Feb 18, 2016 at 8:02 AM, Jeffrey Baumes [email protected]
|
I realized my suggestion has a limitation - a downloadable file needs to be in a serialized format - so table:rows cannot be downloaded (relates to that other issue) - it is "stuck" in the browser :( Here is the conversion script anyway if it helps: |
While processing large biology matrices, I came across a "BSON too large error" as I was trying to save an in-memory table to girder. Is this a case where a girder item is too big because of the way the asset store is configured using gridFS? To avoid this, should we start with a different asset store architecture reconfigured? Does this require a hand install? Oh, I see the asset creation in girder-setup.py. I'll look for the file system option.
The text was updated successfully, but these errors were encountered: