Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement find_full_path within ephys modules #35

Merged
merged 25 commits into from
Jan 11, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
f69e491
Update .gitignore
kabilar May 1, 2021
1ce53f3
Merge branch 'main' of https://github.com/datajoint/element-array-eph…
kabilar May 1, 2021
6472c19
Merge branch 'main' of https://github.com/datajoint/element-array-eph…
kabilar Aug 18, 2021
a0f49d2
Merge branch 'main' of https://github.com/datajoint/element-array-eph…
kabilar Sep 17, 2021
4f4be8d
Move functions to `element-data-loader`
kabilar Sep 20, 2021
ffaf60b
Add element_data_loader for multiple root dirs
kabilar Sep 27, 2021
b6b39c0
Update author
kabilar Sep 27, 2021
2be1f08
Fix import
kabilar Sep 28, 2021
68ef14b
[WIP] Print directory path
kabilar Sep 28, 2021
2233c5d
Fix OpenEphys session path
kabilar Sep 28, 2021
ab426c1
Update comments
kabilar Sep 28, 2021
49c554b
[WIP] Update directory path
kabilar Sep 28, 2021
b98192b
[WIP] Add print statement
kabilar Sep 28, 2021
cf533a2
Remove test print statement
kabilar Sep 29, 2021
44be355
Fix module import
kabilar Sep 30, 2021
139e99b
Update module import
kabilar Oct 4, 2021
9881350
Fixed doc string
kabilar Oct 4, 2021
818cc53
Update module import
kabilar Oct 4, 2021
665cc28
Fix for missing `fileTimeSecs`
kabilar Oct 4, 2021
84bb616
[WIP] Add print statement
kabilar Oct 4, 2021
1a4a7f5
Remove print statement
kabilar Oct 4, 2021
4ca9b32
Suggested adds re upstream components
CBroz1 Dec 30, 2021
09e8a96
Update error message
kabilar Jan 3, 2022
ce6adf1
Merge branch 'main' of https://github.com/kabilar/element-array-ephys…
kabilar Jan 3, 2022
6f9507c
Rename package
kabilar Jan 11, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
# User data
.DS_Store

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down
43 changes: 28 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
# DataJoint Element - Array Electrophysiology Element
DataJoint Element for array electrophysiology.

This repository features DataJoint pipeline design for extracellular array electrophysiology,
with ***Neuropixels*** probe and ***kilosort*** spike sorting method.
Expand All @@ -13,12 +12,16 @@ ephys pipeline.

See [Background](Background.md) for the background information and development timeline.

## The Pipeline Architecture
## Element architecture

![element-array-ephys diagram](images/attached_array_ephys_element.svg)

As the diagram depicts, the array ephys element starts immediately downstream from ***Session***,
and also requires some notion of ***Location*** as a dependency for ***InsertionLocation***.
and also requires some notion of ***Location*** as a dependency for ***InsertionLocation***. We
provide an [example workflow](https://github.com/datajoint/workflow-array-ephys/) with a
[pipeline script](https://github.com/datajoint/workflow-array-ephys/blob/main/workflow_array_ephys/pipeline.py)
that models (a) combining this Element with the corresponding [Element-Session](https://github.com/datajoint/element-session)
, and (b) declaring a ***SkullReference*** table to provide Location.

### The design of probe

Expand All @@ -45,14 +48,24 @@ This ephys element features automatic ingestion for spike sorting results from t
+ ***WaveformSet*** - A set of spike waveforms for units from a given CuratedClustering

## Installation
```
pip install element-array-ephys
```

If you already have an older version of ***element-array-ephys*** installed using `pip`, upgrade with
```
pip install --upgrade element-array-ephys
```
+ Install `element-array-ephys`
```
pip install element-array-ephys
```

+ Upgrade `element-array-ephys` previously installed with `pip`
```
pip install --upgrade element-array-ephys
```

+ Install `element-interface`

+ `element-interface` is a dependency of `element-array-ephys`, however it is not contained within `requirements.txt`.

```
pip install "element-interface @ git+https://github.com/datajoint/element-interface"
```

## Usage

Expand All @@ -65,12 +78,12 @@ To activate the `element-array-ephys`, ones need to provide:
+ schema name for the ephys module

2. Upstream tables
+ Session table
+ SkullReference table (Reference table for InsertionLocation, specifying the skull reference)
+ Session table: A set of keys identifying a recording session (see [Element-Session](https://github.com/datajoint/element-session)).
+ SkullReference table: A reference table for InsertionLocation, specifying the skull reference (see [example pipeline](https://github.com/datajoint/workflow-array-ephys/blob/main/workflow_array_ephys/pipeline.py)).

3. Utility functions
+ get_ephys_root_data_dir()
+ get_session_directory()
3. Utility functions. See [example definitions here](https://github.com/datajoint/workflow-array-ephys/blob/main/workflow_array_ephys/paths.py)
+ get_ephys_root_data_dir(): Returns your root data directory.
+ get_session_directory(): Returns the path of the session data relative to the root.

For more detail, check the docstring of the `element-array-ephys`:

Expand Down
69 changes: 0 additions & 69 deletions element_array_ephys/__init__.py
Original file line number Diff line number Diff line change
@@ -1,69 +0,0 @@
import datajoint as dj
import pathlib
import uuid
import hashlib


dj.config['enable_python_native_blobs'] = True


def find_full_path(root_directories, relative_path):
"""
Given a relative path, search and return the full-path
from provided potential root directories (in the given order)
:param root_directories: potential root directories
:param relative_path: the relative path to find the valid root directory
:return: root_directory (pathlib.Path object)
"""
relative_path = pathlib.Path(relative_path)

if relative_path.exists():
return relative_path

# turn to list if only a single root directory is provided
if isinstance(root_directories, (str, pathlib.Path)):
root_directories = [root_directories]

for root_dir in root_directories:
if (pathlib.Path(root_dir) / relative_path).exists():
return pathlib.Path(root_dir) / relative_path

raise FileNotFoundError('No valid full-path found (from {})'
' for {}'.format(root_directories, relative_path))


def find_root_directory(root_directories, full_path):
"""
Given multiple potential root directories and a full-path,
search and return one directory that is the parent of the given path
:param root_directories: potential root directories
:param full_path: the relative path to search the root directory
:return: full-path (pathlib.Path object)
"""
full_path = pathlib.Path(full_path)

if not full_path.exists():
raise FileNotFoundError(f'{full_path} does not exist!')

# turn to list if only a single root directory is provided
if isinstance(root_directories, (str, pathlib.Path)):
root_directories = [root_directories]

try:
return next(pathlib.Path(root_dir) for root_dir in root_directories
if pathlib.Path(root_dir) in set(full_path.parents))

except StopIteration:
raise FileNotFoundError('No valid root directory found (from {})'
' for {}'.format(root_directories, full_path))


def dict_to_uuid(key):
"""
Given a dictionary `key`, returns a hash string as UUID
"""
hashed = hashlib.md5()
for k, v in sorted(key.items()):
hashed.update(str(k).encode())
hashed.update(str(v).encode())
return uuid.UUID(hex=hashed.hexdigest())
65 changes: 39 additions & 26 deletions element_array_ephys/ephys.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,10 @@
import numpy as np
import inspect
import importlib
from element_interface.utils import find_root_directory, find_full_path, dict_to_uuid

from .readers import spikeglx, kilosort, openephys
from . import probe, find_full_path, find_root_directory, dict_to_uuid
from . import probe

schema = dj.schema()

Expand Down Expand Up @@ -46,7 +47,6 @@ def activate(ephys_schema_name, probe_schema_name=None, *, create_schema=True,
global _linking_module
_linking_module = linking_module

# activate
probe.activate(probe_schema_name, create_schema=create_schema,
create_tables=create_tables)
schema.activate(ephys_schema_name, create_schema=create_schema,
Expand All @@ -57,9 +57,10 @@ def activate(ephys_schema_name, probe_schema_name=None, *, create_schema=True,

def get_ephys_root_data_dir() -> list:
"""
All data paths, directories in DataJoint Elements are recommended to be stored as
relative paths, with respect to some user-configured "root" directory,
which varies from machine to machine (e.g. different mounted drive locations)
All data paths, directories in DataJoint Elements are recommended to be
stored as relative paths, with respect to some user-configured "root"
directory, which varies from machine to machine (e.g. different mounted
drive locations)

get_ephys_root_data_dir() -> list
This user-provided function retrieves the possible root data directories
Expand All @@ -78,7 +79,7 @@ def get_session_directory(session_key: dict) -> str:
Retrieve the session directory containing the
recorded Neuropixels data for a given Session
:param session_key: a dictionary of one Session `key`
:return: a string for full path to the session directory
:return: a string for relative or full path to the session directory
"""
return _linking_module.get_session_directory(session_key)

Expand Down Expand Up @@ -140,21 +141,24 @@ class EphysFile(dj.Part):
"""

def make(self, key):
sess_dir = pathlib.Path(get_session_directory(key))

session_dir = find_full_path(get_ephys_root_data_dir(),
get_session_directory(key))

inserted_probe_serial_number = (ProbeInsertion * probe.Probe & key).fetch1('probe')

# search session dir and determine acquisition software
for ephys_pattern, ephys_acq_type in zip(['*.ap.meta', '*.oebin'],
['SpikeGLX', 'Open Ephys']):
ephys_meta_filepaths = [fp for fp in sess_dir.rglob(ephys_pattern)]
ephys_meta_filepaths = [fp for fp in session_dir.rglob(ephys_pattern)]
if ephys_meta_filepaths:
acq_software = ephys_acq_type
break
kabilar marked this conversation as resolved.
Show resolved Hide resolved
else:
raise FileNotFoundError(
f'Ephys recording data not found!'
f' Neither SpikeGLX nor Open Ephys recording files found')
f' Neither SpikeGLX nor Open Ephys recording files found'
f' in {session_dir}')

if acq_software == 'SpikeGLX':
for meta_filepath in ephys_meta_filepaths:
Expand Down Expand Up @@ -187,12 +191,13 @@ def make(self, key):
'acq_software': acq_software,
'sampling_rate': spikeglx_meta.meta['imSampRate']})

root_dir = find_root_directory(get_ephys_root_data_dir(), meta_filepath)
root_dir = find_root_directory(get_ephys_root_data_dir(),
meta_filepath)
self.EphysFile.insert1({
**key,
'file_path': meta_filepath.relative_to(root_dir).as_posix()})
elif acq_software == 'Open Ephys':
dataset = openephys.OpenEphys(sess_dir)
dataset = openephys.OpenEphys(session_dir)
for serial_number, probe_data in dataset.probes.items():
if str(serial_number) == inserted_probe_serial_number:
break
Comment on lines 201 to 203
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for serial_number, probe_data in dataset.probes.items():
if str(serial_number) == inserted_probe_serial_number:
break
for serial_number in dataset.probes:
if str(serial_number) == inserted_probe_serial_number:
break

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable probe_data is used in the next section after the for loop, so we would need to keep this statement as is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I would get probe_data after the loop

Suggested change
for serial_number, probe_data in dataset.probes.items():
if str(serial_number) == inserted_probe_serial_number:
break
for serial_number in dataset.probes:
if str(serial_number) == inserted_probe_serial_number:
probe_data = dataset.probes[serial_number]
break

Expand Down Expand Up @@ -220,8 +225,7 @@ def make(self, key):
'acq_software': acq_software,
'sampling_rate': probe_data.ap_meta['sample_rate']})

root_dir = find_root_directory(
get_ephys_root_data_dir(),
root_dir = find_root_directory(get_ephys_root_data_dir(),
probe_data.recording_info['recording_files'][0])
self.EphysFile.insert([{**key,
'file_path': fp.relative_to(root_dir).as_posix()}
Expand Down Expand Up @@ -290,8 +294,11 @@ def make(self, key):
shank, shank_col, shank_row, _ = spikeglx_recording.apmeta.shankmap['data'][recorded_site]
electrode_keys.append(probe_electrodes[(shank, shank_col, shank_row)])
elif acq_software == 'Open Ephys':
sess_dir = pathlib.Path(get_session_directory(key))
loaded_oe = openephys.OpenEphys(sess_dir)

session_dir = find_full_path(get_ephys_root_data_dir(),
get_session_directory(key))

loaded_oe = openephys.OpenEphys(session_dir)
oe_probe = loaded_oe.probes[probe_sn]

lfp_channel_ind = np.arange(
Expand Down Expand Up @@ -442,16 +449,16 @@ class Curation(dj.Manual):
curation_id: int
---
curation_time: datetime # time of generation of this set of curated clustering results
curation_output_dir: varchar(255) # output directory of the curated results, relative to clustering root data directory
curation_output_dir: varchar(255) # output directory of the curated results, relative to root data directory
quality_control: bool # has this clustering result undergone quality control?
manual_curation: bool # has manual curation been performed on this clustering result?
curation_note='': varchar(2000)
"""

def create1_from_clustering_task(self, key, curation_note=''):
"""
A convenient function to create a new corresponding "Curation"
for a particular "ClusteringTask"
A function to create a new corresponding "Curation" for a particular
"ClusteringTask"
"""
if key not in Clustering():
raise ValueError(f'No corresponding entry in Clustering available'
Expand All @@ -465,8 +472,10 @@ def create1_from_clustering_task(self, key, curation_note=''):
# Synthesize curation_id
curation_id = dj.U().aggr(self & key, n='ifnull(max(curation_id)+1,1)').fetch1('n')
self.insert1({**key, 'curation_id': curation_id,
'curation_time': creation_time, 'curation_output_dir': output_dir,
'quality_control': is_qc, 'manual_curation': is_curated,
'curation_time': creation_time,
'curation_output_dir': output_dir,
'quality_control': is_qc,
'manual_curation': is_curated,
'curation_note': curation_note})


Expand Down Expand Up @@ -613,8 +622,9 @@ def yield_unit_waveforms():
spikeglx_meta_filepath = get_spikeglx_meta_filepath(key)
neuropixels_recording = spikeglx.SpikeGLX(spikeglx_meta_filepath.parent)
elif acq_software == 'Open Ephys':
sess_dir = pathlib.Path(get_session_directory(key))
openephys_dataset = openephys.OpenEphys(sess_dir)
session_dir = find_full_path(get_ephys_root_data_dir(),
get_session_directory(key))
openephys_dataset = openephys.OpenEphys(session_dir)
neuropixels_recording = openephys_dataset.probes[probe_serial_number]

def yield_unit_waveforms():
Expand Down Expand Up @@ -659,11 +669,13 @@ def get_spikeglx_meta_filepath(ephys_recording_key):
except FileNotFoundError:
# if not found, search in session_dir again
if not spikeglx_meta_filepath.exists():
sess_dir = pathlib.Path(get_session_directory(ephys_recording_key))
session_dir = find_full_path(get_ephys_root_data_dir(),
get_session_directory(
ephys_recording_key))
inserted_probe_serial_number = (ProbeInsertion * probe.Probe
& ephys_recording_key).fetch1('probe')

spikeglx_meta_filepaths = [fp for fp in sess_dir.rglob('*.ap.meta')]
spikeglx_meta_filepaths = [fp for fp in session_dir.rglob('*.ap.meta')]
for meta_filepath in spikeglx_meta_filepaths:
spikeglx_meta = spikeglx.SpikeGLXMeta(meta_filepath)
if str(spikeglx_meta.probe_SN) == inserted_probe_serial_number:
Expand Down Expand Up @@ -696,8 +708,9 @@ def get_neuropixels_channel2electrode_map(ephys_recording_key, acq_software):
for recorded_site, (shank, shank_col, shank_row, _) in enumerate(
spikeglx_meta.shankmap['data'])}
elif acq_software == 'Open Ephys':
sess_dir = pathlib.Path(get_session_directory(ephys_recording_key))
openephys_dataset = openephys.OpenEphys(sess_dir)
session_dir = find_full_path(get_ephys_root_data_dir(),
get_session_directory(ephys_recording_key))
openephys_dataset = openephys.OpenEphys(session_dir)
probe_serial_number = (ProbeInsertion & ephys_recording_key).fetch1('probe')
probe_dataset = openephys_dataset.probes[probe_serial_number]

Expand Down
Loading