Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discuss design for read #83

Open
oruebel opened this issue Aug 29, 2024 · 0 comments
Open

Discuss design for read #83

oruebel opened this issue Aug 29, 2024 · 0 comments
Assignees
Labels
category: proposal proposed enhancements or new features

Comments

@oruebel
Copy link
Contributor

oruebel commented Aug 29, 2024

1. User API

  • Read would be done via the Container classes
  • The Container classes would expose all their properties (datasets, attributes etc.) they own via respective access functions
  • The access function would not return the actual data, but instead return a standard wrapper BaseRecordingData that a user can then use to access the data values. This is to: a) allow lazy loading, and b) support different backends transperently
  • For accessing typed objects, e.g., to retrieve and ElectricalSeries, we would have similar access functions, but we'll need to be able to construct the Containers from the io

Pseudo-code for reading

BaseReadData channels = electricalSeries->channels()
channels.values()

2. Proposed Implementation for reading data arrays

BaseReadData

  • Create a new BaseReadData class for reading data arrays (datasets, attributes) from a file.
  • Update BaseRecordingData to inherit from BaseReadData such that we can also read during a recording. This may be optional, in case we can create BaseReadData separately???

BaseIO

  • Add abstract methods for reading objects from a file that the I/O backends then need to implement. The functions would return BaseReadData object for then reading the data:
    • BaseReadData readDataset(path)
    • BaseReadData readAttribute(path)
  • Add methods to read data values from a Dataset or Attribute that the BaseReadData can call for read

HDF5IO

  • Implement specific version of the BaseReadData class required for reading arrays from an HDF5 file:
    • HDF5ReadDataset for reading values from a dataset
    • HDF5ReadAttribute for reading an attribute
  • Implement the new read methods readDataset and readAttribute and return HDF5ReadDataset and HDF5ReadAttribute respectively
  • Implement the methods for reading data values from a Dataset or Attribute that the HDF5ReadDataSet and HDF5ReadAttribute wrappers can call can call for read

Container

  • Store the io object on the Container so that we can call io->readDataset and io->readAttribute in the read methods

NWB types: TimeSeries, ElectricalSeries etc.

  • Remove storage of properties from the Container classes and replace them with access methods that return BaseReadData objects instead. This allows for reading in both read and write mode and avoids keeping data in memory that we have already written to disk. For example, in TimeSeries, these variables would need to change to properties:
    /**
    * @brief Base unit of measurement for working with the data. Actual stored
    * values are not necessarily stored in these units. To access the data in
    * these units, multiply ‘data’ by ‘conversion’ and add ‘offset’.
    */
    std::string unit;
    /**
    * @brief The description of the TimeSeries.
    */
    std::string description;
    /**
    * @brief Human-readable comments about the TimeSeries.
    */
    std::string comments;
    /**
    * @brief Size used in dataset creation. Can be expanded when writing if
    * needed.
    */
    SizeArray dsetSize;
    /**
    * @brief Chunking size used in dataset creation.
    */
    SizeArray chunkSize;
    /**
    * @brief Scalar to multiply each element in data to convert it to the
    * specified ‘unit’.
    */
    float conversion;
    /**
    * @brief Smallest meaningful difference between values in data, stored in the
    * specified by unit.
    */
    float resolution;
    /**
    * @brief Scalar to add to the data after scaling by ‘conversion’ to finalize
    * its coercion to the specified ‘unit’.
    */
    float offset;
    /**
    * @brief The starting time of the TimeSeries.
    */
    float startingTime = 0.0;
  • Add access methods that return BaseReadData for missing fields

3. Proposed implementation for reading whole Containers (e.g., to read an ElectricalSeries)

  • Add access methods on the respective Container that owns the respective objects, e.g., NWBFile owning ElectricalSeries objects to retrieve the object
  • Add abstract factory method (that is templated on the return type) to Container to create an instance of the specific Container type using only the io and path for the Container as input. The specific Container classes, such as TimeSeries will then need to implement a corresponding constructor that uses io and path as input.

Step 1: Define the Template Factory Method in Container

class Container {
public:
   
    template <typename T>
    static std::unique_ptr<T> create(const BaseIO& io, const std::string& path) {
        static_assert(std::is_base_of<Container, T>::value, "T must be a derived class of Container");
        return std::unique_ptr<T>(new T(path, io));
    }
};

Step 2: Implement the constructors on the specific Container classes (e.g., TimeSeries)

class TimeSeries : public Container {
public:
    TimeSeries(const std::string& path, const BaseIO& io) {
        // Implementation of TimeSeries constructor
    }
};

4. Proposed implementation for reading untyped groups (e.g., /acquisition)

I'm not sure we'll need do this, since a group by itself does not define data. To access the contents we could define access methods on the parent Container class (e.g., NWBFile) that owns the untyped group to access its contents.

@oruebel oruebel added the category: proposal proposed enhancements or new features label Aug 29, 2024
@oruebel oruebel mentioned this issue Aug 31, 2024
27 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: proposal proposed enhancements or new features
Projects
None yet
Development

No branches or pull requests

2 participants