Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add extension best practices #338

Open
wants to merge 21 commits into
base: dev
Choose a base branch
from
Open
Changes from 2 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
5649098
add best practices from https://github.com/NeurodataWithoutBorders/nw…
bendichter Feb 2, 2023
a750c3f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 2, 2023
5c33003
Merge branch 'dev' into extension_best_practices
bendichter Feb 2, 2023
99d2482
Update docs/best_practices/extensions.rst
bendichter Feb 2, 2023
4076d70
Update conf.py
bendichter Feb 2, 2023
ef20e9d
Update requirements-rtd.txt
bendichter Feb 2, 2023
6af0fe7
fix underline length
bendichter Feb 3, 2023
25cdd41
Merge remote-tracking branch 'origin/extension_best_practices' into e…
bendichter Feb 3, 2023
21c5e8a
fix bullet list
bendichter Feb 3, 2023
2e00f7f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 3, 2023
f17e1b9
adjust language
bendichter Feb 3, 2023
fe72670
Merge remote-tracking branch 'origin/extension_best_practices' into e…
bendichter Feb 3, 2023
9ffe41a
fix external links
bendichter Feb 3, 2023
73d5b9b
fix warning
bendichter Feb 3, 2023
aa57eae
fix warning
bendichter Feb 3, 2023
ba32682
Update docs/best_practices/extensions.rst
bendichter Feb 3, 2023
efd03d8
Update docs/best_practices/extensions.rst
bendichter Feb 3, 2023
20a3b31
Merge branch 'dev' into extension_best_practices
CodyCBakerPhD Feb 7, 2023
7131b01
Merge branch 'dev' into extension_best_practices
oruebel Mar 3, 2023
d2d503e
Merge branch 'dev' into extension_best_practices
oruebel Jul 18, 2023
f38013c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 18, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
158 changes: 155 additions & 3 deletions docs/best_practices/extensions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,103 @@ If an extension is required, tutorials for the process may be found through the
It is also encouraged for extensions to contain their own check functions for their own best practices.
See the` :ref:`adding_custom_checks` section of the Developer Guide for how to do this.

Define new ``neurodata_types`` at the top-level (a.k.a., do not nest type definitions)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Rather than nesting definitions of ``neurodata_types``, all new types should be defined
at the top-level of the schema. To include a specific ``neurodata_type`` in another type
use the ``neurodata_type_inc`` key instead. For example:

Use Existing Neurodata Types
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. tabs::

.. tab:: Do This

.. tabs::

.. code-tab:: py Python

from pynwb.spec import NWBGroupSpec

# Define the first type
type1_ext = NWBGroupSpec(
name='custom_type1',
doc='Example extension type 1',
neurodata_type_def='MyNewType1',
neurodata_type_inc='LabMetaData',
)

# Then define the second type and include the first type
type2_ext = NWBGroupSpec(
name='custom_type2',
doc='Example extension type 2',
neurodata_type_def='MyNewType2',
groups=[NWBGroupSpec(neurodata_type_inc='MyNewType1',
doc='Included group of ype MyNewType1')]
)

.. code-tab:: yaml YAML

groups:
- neurodata_type_def: MyNewType1
neurodata_type_inc: LabMetaData
name: custom_type1
doc: Example extension type 1
- neurodata_type_def: MyNewType2
neurodata_type_inc: NWBContainer
name: custom_type2
doc: Example extension type 2
groups:
- neurodata_type_inc: MyNewType1
doc: Included group of ype MyNewType1

.. tab:: DON'T do this

.. tabs::

.. code-tab:: py Python

from pynwb.spec import NWBGroupSpec

# Do NOT define a new type via ``neurodata_type_def`` within the
# definition of another type. Always define the types separately
# and use ``neurodata_type_inc`` to include the type
type2_ext = NWBGroupSpec(
name='custom_type2',
doc='Example extension type 2',
neurodata_type_def='MyNewType2',
groups=[NWBGroupSpec(
name='custom_type1',
doc='Example extension type 1',
neurodata_type_def='MyNewType1',
neurodata_type_inc='LabMetaData',
)]
)

.. code-tab:: yaml YAML

groups:
- neurodata_type_def: MyNewType2
neurodata_type_inc: NWBContainer
name: custom_type2
doc: Example extension type 2
groups:
- neurodata_type_def: MyNewType1
neurodata_type_inc: LabMetaData
name: custom_type1
doc: Example extension type 1

Build on and Reuse Existing Neurodata Types
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When possible, use existing types when creating extensions either by creating new ``neurodata_types`` that inherit from
existing ones, or by creating ``neurodata_types`` that contain existing ones. Building on existing types facilitates the
reuse of existing functionality and interpretation of the data. If a community extension already exists that has a
similar scope, it is preferable to use that extension rather than creating a new one.
similar scope, it is preferable to use that extension rather than creating a new one. For example:
* Extend ``TimeSeries`` for storing timeseries data. NWB provides main types of ``TimeSeries``
and you should identify the most specific type of ``TimeSeries`` relevant for your use case
(e.g., extend ``ElectricalSeries`` to define a new kind of electrical recording).
Comment on lines +112 to +114
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these TimeSeries/ElectricalSeries be :py:class: intersphinx references?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto for below on TimeIntervals / DynamicTable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest intersphinx to the nwb-schema docs since this is for extensions

* Extend ``DynamicTable`` to store tabular data/
bendichter marked this conversation as resolved.
Show resolved Hide resolved
* Extend ``TimeIntervals`` to store specific annotations of intervals in time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Extend ``TimeIntervals`` to store specific annotations of intervals in time.
* Extend ``TimeIntervals`` to store specific annotations of intervals in time.
Strive for backward compatible changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NWB is already incorporated in many tools - proposing a change that will make already released NWB datasets non-compliant will cause a lot of confusion and will lead to significant cost to update codes.



Provide Documentation
Expand All @@ -45,3 +133,67 @@ anybody who receives the data also receives the necessary data to interpret it.
.. note::
In :pynwb-docs:`PyNWB <>`, the extension is cached automatically. This can be specified explicitly with
``io.write(filepath, cache_spec=True)``


Use Attributes for small metadata related to a particular data object (Group or Dataset)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Attributes should be used mainly to store small metadata (usually less than 64 Kbytes) that
is associated with a particular Group or Dataset. Typical uses of attributes are, e.g., to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just reading this, it's hard to grasp 'how big' something is to be less than 64 KB

Since most metadata are strings, maybe it would be useful to mention a relative scale to that?

A standard Python string under 64 KB would correspond to ~1300 characters, from

import sys

sys.getsizeof("a")
> 50

64_000 / 50
> 1280

Google also says the average number of characters per word in English is 4.7, so that's ~272 words per string

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly relevant to fields like description, or sometimes reference_frame

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of relating this to characters. Do you know how big characters are within HDF5? Is it the same as Python strings in memory?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'll have to investigate that a bit deeper, but most likely not the same as Python... I seem to remember us having some issues related to their byte string types here on the Inspector, but maybe that was for fields that are technically set as datasets, not attributes

Even in Python, if you apply a specific encoding it can change the size quite a bit

sys.getsizeof("a".encode("utf-8"))
> 34

which would put the max closer to ~400 words

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, from https://docs.h5py.org/en/stable/strings.html#storing-strings and https://docs.h5py.org/en/stable/special.html#h5py.string_dtype and manually inspecting some of these attributes with HDFView

It seems HDF5 stores with UTF-8 encoding, so about ~400 words on average from above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~400 words

words or characters?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

words or characters?

words using Google's estimate of average characters per word in English

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just shy of ~1900 characters if that's easier to communicate

define the ``unit`` of measurement of a dataset or to store a short ``description`` of
a group or dataset. For larger data, datasets should be used instead.

In practice, the main difference is that in PyNWB and MatNWB all Attributes are read into memory when reading the
bendichter marked this conversation as resolved.
Show resolved Hide resolved
NWB file. If you would like to allow users to read a file without reading all of these particular data values, use a
Dataset.


Link data to relevant metadata
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Often metadata relevant to a particular type of data is stored elsewhere, e.g., information
about the ``Device`` used. To ensure relevant metadata can be uniquely identified, the data
should include links to the relevant metadata. NWB provides a few key mechanisms for linking:

* Use ``links`` (defined via ``NWBLinkSpec``) to link to a particular dataset or group
* Use ``DynamicTableRegion`` to link to a set of rows in a ``DynamicTable``
* Use a ``dataset`` with an object reference data type to store collections of links
to other objects, e.g., the following dtype to define a dataset of links to ``TimeSeries``

.. code-block:: yaml
dtype:
target_type: TimeSeries
reftype: object


Best practices for object names
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Names for groups, datasets, attributes, or links should typically:

* **Use lowercase letters only**
CodyCBakerPhD marked this conversation as resolved.
Show resolved Hide resolved
* **Use ``_`` instead of `` `` to separate parts in names**. E.g., use the name
CodyCBakerPhD marked this conversation as resolved.
Show resolved Hide resolved
``starting_time`` instead of ``starting time``
* **Explicit**. E.g., avoid the use of ambiguous abbreviation in names.


Best practices for naming ``neurodata_types``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For defining new types via ``neurodata_type_def`` use:

* **Use camelcase:** notation, i.e., names of types should NOT include spaces,
CodyCBakerPhD marked this conversation as resolved.
Show resolved Hide resolved
bendichter marked this conversation as resolved.
Show resolved Hide resolved
always start with an uppercase letter, and use a single capitalized letter to
separate parts of the name. E.g,. ``neurodata_type_def: LaserMeasurement``
* **Use the postfix ``Series`` when extending a ``TimeSeries`` type.** E.g., when
creating a new ``TimeSeries`` for laser measurements then add ``Series`` to
the type name, e.g,. ``neurodata_type_def: LaserMeasurementSeries``
* **Use the postfix ``Table`` when extending a ``DynamicTable`` type.** e.g.,
``neurodata_type_def: LaserSettingsTable``
* **Explicit**. E.g., avoid the use of ambiguous abbreviation in names.

Copy link
Contributor

@oruebel oruebel Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Limit flexibility: Consider data reuse and tool developers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
One of the aims of NWB is to make reusing data easier. This means that when proposing an extension you need to put yourself in the shoes of someone who will receive an NWB dataset and attempt to analyze it. Additionally, consider developers that will try to write tools that take NWB datasets as inputs. It’s worth assessing how much additional code different ways of approaching your extension will lead to.

Use the ``ndx-template`` to create new extensions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

By using the :nwb_extension_git:`ndx-template` to create new extensions helps ensure
that extensions can be easily shared and reused and published via the :ndx-catalog:`NDX Catalog <>`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
that extensions can be easily shared and reused and published via the :ndx-catalog:`NDX Catalog <>`.
that extensions can be easily shared and reused and published via the :ndx-catalog:`NDX Catalog <>`.
Get the community involved
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Try to reach out to colleagues working with the type of data you are trying to add support for. The more eyes you will get on your extension the better it will get.