Skip to content

Commit

Permalink
Merge branch 'main' into set-probe
Browse files Browse the repository at this point in the history
  • Loading branch information
CodyCBakerPhD authored Nov 28, 2023
2 parents 6f5f9ba + 554e07b commit 9afd9a9
Show file tree
Hide file tree
Showing 42 changed files with 1,179 additions and 380 deletions.
32 changes: 8 additions & 24 deletions .github/workflows/add-to-dashboard.yml
Original file line number Diff line number Diff line change
@@ -1,35 +1,19 @@
name: Add Issue or PR to Dashboard
name: Add Issue or Pull Request to Dashboard

on:
issues:
types: opened

types:
- opened
pull_request:
types:
- opened

jobs:
issue_opened:
name: Add Issue to Dashboard
runs-on: ubuntu-latest
if: github.event_name == 'issues'
steps:
- name: Add Issue to Dashboard
uses: leonsteinhaeuser/[email protected]
with:
gh_token: ${{ secrets.MY_GITHUB_TOKEN }}
organization: catalystneuro
project_id: 3
resource_node_id: ${{ github.event.issue.node_id }}
pr_opened:
name: Add PR to Dashboard
add-to-project:
name: Add issue or pull request to project
runs-on: ubuntu-latest
if: github.event_name == 'pull_request' && github.event.action == 'opened'
steps:
- name: Add PR to Dashboard
uses: leonsteinhaeuser/[email protected]
- uses: actions/[email protected]
with:
gh_token: ${{ secrets.MY_GITHUB_TOKEN }}
organization: catalystneuro
project_id: 3
resource_node_id: ${{ github.event.pull_request.node_id }}
project-url: https://github.com/orgs/catalystneuro/projects/3
github-token: ${{ secrets.PROJECT_TOKEN }}
2 changes: 1 addition & 1 deletion .github/workflows/dev-testing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ env:

jobs:
run:
name: Dev Branch Testing with Python 3.9 and ubuntu-latest
name: Ubuntu tests with Python ${{ matrix.python-version }}
runs-on: ubuntu-latest
strategy:
fail-fast: false
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/doctests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ on:

jobs:
run:
name: Doctests on ${{ matrix.os }} with Python ${{ matrix.python-version }}
name: ${{ matrix.os }} Python ${{ matrix.python-version }}
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/formatwise-installation-testing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ on:

jobs:
run:
name: Formatwise gallery tests for ${{ format.type }}:${{ format.name }} on ${{ matrix.os }} with Python ${{ matrix.python-version }}
name: ${{ format.type }}:${{ format.name }} on ${{ matrix.os }} with Python ${{ matrix.python-version }}
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/live-service-testing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ env:

jobs:
run:
name: Live service testing on ${{ matrix.os }} with Python ${{ matrix.python-version }}
name: ${{ matrix.os }} Python ${{ matrix.python-version }}
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/testing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ on:

jobs:
run:
name: Minimal and full tests on ${{ matrix.os }} with Python ${{ matrix.python-version }}
name: ${{ matrix.os }} Python ${{ matrix.python-version }}
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
Expand Down
10 changes: 9 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,21 @@
* Changed the metadata schema for `Fluorescence` and `DfOverF` where the traces metadata can be provided as a dict instead of a list of dicts.
The name of the plane segmentation is used to determine which traces to add to the `Fluorescence` and `DfOverF` containers. [PR #632](https://github.com/catalystneuro/neuroconv/pull/632)
* Modify the filtering of traces to also filter out traces with empty values. [PR #649](https://github.com/catalystneuro/neuroconv/pull/649)
* Added tool function `get_default_dataset_configurations` for identifying and collecting all fields of an in-memory `NWBFile` that could become datasets on disk; and return instances of the Pydantic dataset models filled with default values for chunking/buffering/compression. [PR #569](https://github.com/catalystneuro/neuroconv/pull/569)
* Added `set_probe()` method to `BaseRecordingExtractorInterface`. [PR #639](https://github.com/catalystneuro/neuroconv/pull/639)

### Fixes
* Fixed GenericDataChunkIterator (in hdmf.py) in the case where the number of dimensions is 1 and the size in bytes is greater than the threshold of 1 GB. [PR #638](https://github.com/catalystneuro/neuroconv/pull/638)
* Changed `np.floor` and `np.prod` usage to `math.floor` and `math.prod` in various files. [PR #638](https://github.com/catalystneuro/neuroconv/pull/638)
* Updated minimal required version of DANDI CLI; updated `run_conversion_from_yaml` API function and tests to be compatible with naming changes. [PR #664](https://github.com/catalystneuro/neuroconv/pull/664)

# v0.4.5
### Improvements
* Change metadata extraction library from `fparse` to `parse`. [PR #654](https://github.com/catalystneuro/neuroconv/pull/654)
* The `dandi` CLI/API is now an optional dependency; it is still required to use the `tool` function for automated upload as well as the YAML-based NeuroConv CLI. [PR #655](https://github.com/catalystneuro/neuroconv/pull/655)



# v0.4.5 (November 6, 2023)

### Back-compatibility break
* The `CEDRecordingInterface` has now been removed; use the `Spike2RecordingInterface` instead. [PR #602](https://github.com/catalystneuro/neuroconv/pull/602)
Expand Down
6 changes: 5 additions & 1 deletion docs/developer_guide/testing_suite.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Then install all required and optional dependencies in a fresh environment.

.. code:: bash
pip install -e . neuroconv[test,full]
pip install -e .[test,full]
Then simply run all tests with pytest
Expand All @@ -29,6 +29,10 @@ Then simply run all tests with pytest
pytest
.. note::

You will likely observe many failed tests if the test data is not available. See the section 'Testing on Example Data' for instructions on how to download the test data.


Minimal
-------
Expand Down
5 changes: 3 additions & 2 deletions requirements-minimal.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,9 @@ h5py>=3.9.0
hdmf>=3.11.0
hdmf_zarr>=0.4.0
pynwb>=2.3.2;python_version>='3.8'
nwbinspector>=0.4.31
pydantic>=1.10.13,<2.0.0
psutil>=5.8.0
tqdm>=4.60.0
dandi>=0.57.0
pandas
fparse
parse>=1.20.0
3 changes: 1 addition & 2 deletions requirements-testing.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,5 @@ pytest
pytest-cov
ndx-events>=0.2.0 # for special tests to ensure load_namespaces is set to allow NWBFile load at all timess
parameterized>=0.8.1
scikit-learn # For SI Waveform tests
numba; python_version <= '3.10' # For SI Waveform tests
ndx-miniscope
spikeinterface[qualitymetrics]>=0.99.1
5 changes: 4 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@
testing_suite_dependencies = f.readlines()

extras_require = defaultdict(list)
extras_require["dandi"].append("dandi>=0.58.1")
extras_require["full"].extend(extras_require["dandi"])

extras_require.update(test=testing_suite_dependencies, docs=documentation_dependencies)
for modality in ["ophys", "ecephys", "icephys", "behavior", "text"]:
modality_path = root / "src" / "neuroconv" / "datainterfaces" / modality
Expand Down Expand Up @@ -75,7 +78,7 @@
extras_require=extras_require,
entry_points={
"console_scripts": [
"neuroconv = neuroconv.tools.yaml_conversion_specification.yaml_conversion_specification:run_conversion_from_yaml_cli",
"neuroconv = neuroconv.tools.yaml_conversion_specification._yaml_conversion_specification:run_conversion_from_yaml_cli",
],
},
license="BSD-3-Clause",
Expand Down
2 changes: 1 addition & 1 deletion src/neuroconv/datainterfaces/ecephys/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
spikeinterface>=0.98.2
spikeinterface>=0.99.1
packaging<22.0
1 change: 1 addition & 0 deletions src/neuroconv/tools/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
"""Collection of all helper functions that require at least one external dependency (some being optional as well)."""
from .importing import get_package
from .nwb_helpers import get_module
from .path_expansion import LocalPathExpander
Expand Down
5 changes: 5 additions & 0 deletions src/neuroconv/tools/data_transfers/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"""Collection of helper functions for assessing and performing automated data transfers."""
from ._aws import estimate_s3_conversion_cost
from ._dandi import automatic_dandi_upload
from ._globus import get_globus_dataset_content_sizes, transfer_globus_content
from ._helpers import estimate_total_conversion_runtime
34 changes: 34 additions & 0 deletions src/neuroconv/tools/data_transfers/_aws.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
"""Collection of helper functions for assessing and performing automated data transfers related to AWS."""


def estimate_s3_conversion_cost(
total_mb: float,
transfer_rate_mb: float = 20.0,
conversion_rate_mb: float = 17.0,
upload_rate_mb: float = 40.0,
compression_ratio: float = 1.7,
):
"""
Estimate potential cost of performing an entire conversion on S3 using full automation.
Parameters
----------
total_mb: float
The total amount of data (in MB) that will be transferred, converted, and uploaded to dandi.
transfer_rate_mb : float, default: 20.0
Estimate of the transfer rate for the data.
conversion_rate_mb : float, default: 17.0
Estimate of the conversion rate for the data. Can vary widely depending on conversion options and type of data.
Figure of 17MB/s is based on extensive compression of high-volume, high-resolution ecephys.
upload_rate_mb : float, default: 40.0
Estimate of the upload rate of a single file to the DANDI Archive.
compression_ratio : float, default: 1.7
Estimate of the final average compression ratio for datasets in the file. Can vary widely.
"""
c = 1 / compression_ratio # compressed_size = total_size * c
total_mb_s = (
total_mb**2 / 2 * (1 / transfer_rate_mb + (2 * c + 1) / conversion_rate_mb + 2 * c**2 / upload_rate_mb)
)
cost_gb_m = 0.08 / 1e3 # $0.08 / GB Month
cost_mb_s = cost_gb_m / (1e3 * 2.628e6) # assuming 30 day month; unsure how amazon weights shorter months?
return cost_mb_s * total_mb_s
118 changes: 118 additions & 0 deletions src/neuroconv/tools/data_transfers/_dandi.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
"""Collection of helper functions for assessing and performing automated data transfers for the DANDI archive."""
import os
from pathlib import Path
from shutil import rmtree
from tempfile import mkdtemp
from typing import Union
from warnings import warn

from pynwb import NWBHDF5IO

from ...utils import FolderPathType, OptionalFolderPathType


def automatic_dandi_upload(
dandiset_id: str,
nwb_folder_path: FolderPathType,
dandiset_folder_path: OptionalFolderPathType = None,
version: str = "draft",
staging: bool = False,
cleanup: bool = False,
number_of_jobs: Union[int, None] = None,
number_of_threads: Union[int, None] = None,
):
"""
Fully automated upload of NWBFiles to a DANDISet.
Requires an API token set as an envrinment variable named DANDI_API_KEY.
To set this in your bash terminal in Linux or MacOS, run
export DANDI_API_KEY=...
or in Windows
set DANDI_API_KEY=...
DO NOT STORE THIS IN ANY PUBLICLY SHARED CODE.
Parameters
----------
dandiset_id : str
Six-digit string identifier for the DANDISet the NWBFiles will be uploaded to.
nwb_folder_path : folder path
Folder containing the NWBFiles to be uploaded.
dandiset_folder_path : folder path, optional
A separate folder location within which to download the dandiset.
Used in cases where you do not have write permissions for the parent of the 'nwb_folder_path' directory.
Default behavior downloads the DANDISet to a folder adjacent to the 'nwb_folder_path'.
version : {None, "draft", "version"}
The default is "draft".
staging : bool, default: False
Is the DANDISet hosted on the staging server? This is mostly for testing purposes.
The default is False.
cleanup : bool, default: False
Whether to remove the dandiset folder path and nwb_folder_path.
Defaults to False.
number_of_jobs : int, optional
The number of jobs to use in the DANDI upload process.
number_of_threads : int, optional
The number of threads to use in the DANDI upload process.
"""
from dandi.download import download as dandi_download
from dandi.organize import organize as dandi_organize
from dandi.upload import upload as dandi_upload

assert os.getenv("DANDI_API_KEY"), (
"Unable to find environment variable 'DANDI_API_KEY'. "
"Please retrieve your token from DANDI and set this environment variable."
)

dandiset_folder_path = (
Path(mkdtemp(dir=nwb_folder_path.parent)) if dandiset_folder_path is None else dandiset_folder_path
)
dandiset_path = dandiset_folder_path / dandiset_id
# Odd big of logic upstream: https://github.com/dandi/dandi-cli/blob/master/dandi/cli/cmd_upload.py#L92-L96
if number_of_threads is not None and number_of_threads > 1 and number_of_jobs is None:
number_of_jobs = -1

url_base = "https://gui-staging.dandiarchive.org" if staging else "https://dandiarchive.org"
dandiset_url = f"{url_base}/dandiset/{dandiset_id}/{version}"
dandi_download(urls=dandiset_url, output_dir=str(dandiset_folder_path), get_metadata=True, get_assets=False)
assert dandiset_path.exists(), "DANDI download failed!"

# TODO: need PR on DANDI to expose number of jobs
dandi_organize(
paths=str(nwb_folder_path), dandiset_path=str(dandiset_path), devel_debug=True if number_of_jobs == 1 else False
)
organized_nwbfiles = dandiset_path.rglob("*.nwb")

# DANDI has yet to implement forcing of session_id inclusion in organize step
# This manually enforces it when only a single session per subject is organized
for organized_nwbfile in organized_nwbfiles:
if "ses" not in organized_nwbfile.stem:
with NWBHDF5IO(path=organized_nwbfile, mode="r") as io:
nwbfile = io.read()
session_id = nwbfile.session_id
dandi_stem = organized_nwbfile.stem
dandi_stem_split = dandi_stem.split("_")
dandi_stem_split.insert(1, f"ses-{session_id}")
corrected_name = "_".join(dandi_stem_split) + ".nwb"
organized_nwbfile.rename(organized_nwbfile.parent / corrected_name)
organized_nwbfiles = dandiset_path.rglob("*.nwb")
# The above block can be removed once they add the feature

assert len(list(dandiset_path.iterdir())) > 1, "DANDI organize failed!"

dandi_instance = "dandi-staging" if staging else "dandi" # Test
dandi_upload(
paths=[str(x) for x in organized_nwbfiles],
dandi_instance=dandi_instance,
jobs=number_of_jobs,
jobs_per_file=number_of_threads,
)

# Cleanup should be confirmed manually; Windows especially can complain
if cleanup:
try:
rmtree(path=dandiset_folder_path)
rmtree(path=nwb_folder_path)
except PermissionError: # pragma: no cover
warn("Unable to clean up source files and dandiset! Please manually delete them.")
Loading

0 comments on commit 9afd9a9

Please sign in to comment.