You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is for the discussion of how to implement a feature store for Paintera label data. New features such as navigating to the smalles segment (that intersects with a synapse) require low-latency responses to user requests. The label and image features that are required to execute such requests (fragment size in this example) must be stored in an efficient way for retrieval, computing them on the fly is not feasible, even with caching. Currently, there is one such feature: the label-to-block-mapping, an index of containing blocks for each label. This index can be generated efficiently from a "summary" of each block that contains a set of all contained labels, i.e. unique-labels. Changes to the voxel values of fragments, e.g. painting, affect only a small subset of the blocks and features can be updated efficiently when they can be composed from blockwise summaries:
data ---map---> block summaries ---reduce---> label/object/fragment features
Features that can not be calculated in such a way should be updated only offline on explicit user request.
Currently, Paintera "abuses" N5 to mimic a databse for label-to-block-mapping and unique-labels. While the latter falls well within the regime of N5 (data stored in blocks, indexed by block positions), the former is clearly not a good fit for N5, in particular as the set of labels will likely be sparse, i.e. if the max label is N, there will be label ids 0 < id < N + 1 that are not present in the data. For such a feature store, it would make much more sense to use a data base and a relational data base is probably appropiate for that purpose. To be consistent, the block summaries should probably be stored in a database as well, but this would need to be a non-relational database as far as I can tell.
Label/fragment/object based features are useful for analyzing connectomes and for data-driven proof-reading
Segment features may be composed from fragment features, but that is not necessarily true in all cases
Some features are computed from the label data and the underlying raw data
All features should be stored in a database
A relational database is probably a good choice because fragment features are essentially a table where each row is a fragment and each column is a feature
Indices over individual features allow for efficient queries
What would be a good design?
One table per feature, or
a single table with one (or more for vector features) column per feature
Some features, e.g. histograms, means, count etc, can be composed from block summaries and can be updated efficiently after small changes, e.g. painting
These block summaries should be stored in a database as well, but a relational database will not be useful here. A key-value store is probably much better for this. Maybe even stick with N5?
Other features need to be computed over the entire dataset at all times and updates must be triggered offline.
Features should be initialized in the Paintera conversion helper, including generation of indices.
It should be possible to (re-)generate or add new features to an existing Paintera label dataset.
How can we properly store the information of how features are generated in the database/the Paintera dataset. As Json object, e.g.
Thanks for starting this discussion @hanslovsky
I agree that label-to-block mapping is a perfect candidate to be stored in a database, and it probably makes sense to store segment counts and similar features in a database as well, even for the flexibility alone.
Of the common solutions, I really like SQLite: it's file-based and doesn't require a server, which makes it very easy to set up and use. Basically, instead of talking to the DB server over the network, the application simply uses API calls. In my experience it's really nice for storing application data locally and can handle large tables and complex queries very efficiently.
There seems to be a similar embedded database for NoSQL as well: BerkeleyDB
This issue is for the discussion of how to implement a feature store for Paintera label data. New features such as navigating to the smalles segment (that intersects with a synapse) require low-latency responses to user requests. The label and image features that are required to execute such requests (fragment size in this example) must be stored in an efficient way for retrieval, computing them on the fly is not feasible, even with caching. Currently, there is one such feature: the
label-to-block-mapping
, an index of containing blocks for each label. This index can be generated efficiently from a "summary" of each block that contains a set of all contained labels, i.e.unique-labels
. Changes to the voxel values of fragments, e.g. painting, affect only a small subset of the blocks and features can be updated efficiently when they can be composed from blockwise summaries:Features that can not be calculated in such a way should be updated only offline on explicit user request.
Currently, Paintera "abuses" N5 to mimic a databse for
label-to-block-mapping
andunique-labels
. While the latter falls well within the regime of N5 (data stored in blocks, indexed by block positions), the former is clearly not a good fit for N5, in particular as the set of labels will likely be sparse, i.e. if the max label isN
, there will be label ids0 < id < N + 1
that are not present in the data. For such a feature store, it would make much more sense to use a data base and a relational data base is probably appropiate for that purpose. To be consistent, the block summaries should probably be stored in a database as well, but this would need to be a non-relational database as far as I can tell."javaClass"
be a part of"feature"
, or"javaClass"
be inferred from"name"
?cc @igorpisarev @axtimwalde
The text was updated successfully, but these errors were encountered: