You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Trying to implement variable length string arrays #16 made it evident that having numcodecs-like filters would be a huge step towards supporting general data types; this is the result of multiple discussions with @bogovicj and @axtimwalde, see also this corresponding issue.
However, the Filter interface in its current state contains no methods and is not used anywhere. I suggest to flesh out the Filter interface such that implementers of this interface
are de-/serializable from/to json with an annotation interface similar to @CompressionType;
can be daisy-chained.
For the second point to work, methods for application and inverse application of a filter have to have the same input and output type. I see two possibilities here for this type:
Plain buffers (this is the case in numcodecs). This would either require to change the BlockReader and BlockWriter interfaces to work with buffers instead of DataBlocks, which seems unnatural given their names, or to manually expose the raw data of a DataBlock after creation, which seems to go against the intention of the concept.
DataBlocks. This would allow filters to create a new DataBlock if necessary (e.g., the size of raw data changes), or modify the data in-place if possible.
Adding filters allows to de-/serialize custom objects in a way that is compatible with the Python implementation of zarr.
A downside of this would be that, for general objects, a DataBlock cannot know the number of deserialized bytes before deserialization. This would probably necessitate some changes in the DataBlock interface and in the way DataBlocks created in the reading process (right now, they pre-allocate an array of the right size to hold the decompressed data).
The text was updated successfully, but these errors were encountered:
Trying to implement variable length string arrays #16 made it evident that having numcodecs-like filters would be a huge step towards supporting general data types; this is the result of multiple discussions with @bogovicj and @axtimwalde, see also this corresponding issue.
However, the
Filter
interface in its current state contains no methods and is not used anywhere. I suggest to flesh out theFilter
interface such that implementers of this interface@CompressionType
;For the second point to work, methods for application and inverse application of a filter have to have the same input and output type. I see two possibilities here for this type:
BlockReader
andBlockWriter
interfaces to work with buffers instead ofDataBlock
s, which seems unnatural given their names, or to manually expose the raw data of aDataBlock
after creation, which seems to go against the intention of the concept.DataBlock
s. This would allow filters to create a newDataBlock
if necessary (e.g., the size of raw data changes), or modify the data in-place if possible.Adding filters allows to de-/serialize custom objects in a way that is compatible with the Python implementation of zarr.
A downside of this would be that, for general objects, a
DataBlock
cannot know the number of deserialized bytes before deserialization. This would probably necessitate some changes in theDataBlock
interface and in the wayDataBlocks
created in the reading process (right now, they pre-allocate an array of the right size to hold the decompressed data).The text was updated successfully, but these errors were encountered: