Skip to content

Commit

Permalink
Merge pull request datajoint#38 from kabilar/main
Browse files Browse the repository at this point in the history
Rename packages.  Refactor instructions in `README`.
  • Loading branch information
dimitri-yatsenko authored Jan 14, 2022
2 parents b8cb931 + 20fbf98 commit 4edf634
Show file tree
Hide file tree
Showing 7 changed files with 52 additions and 3,095 deletions.
287 changes: 28 additions & 259 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,282 +1,51 @@
# Pipeline for extracellular electrophysiology using Neuropixels probe and kilosort clustering method
# DataJoint Workflow - Array Electrophysiology

Build a full ephys pipeline using the canonical pipeline elements
Workflow for extracellular array electrophysiology data acquired with a polytrode probe (e.g.
Neuropixels, Neuralynx) using the `SpikeGLX` or `OpenEphys` acquisition software and processed
with MATLAB- or python-based `Kilosort` spike sorting software.

A complete electrophysiology workflow can be built using the DataJoint Elements.
+ [element-lab](https://github.com/datajoint/element-lab)
+ [element-animal](https://github.com/datajoint/element-animal)
+ [element-session](https://github.com/datajoint/element-session)
+ [element-array-ephys](https://github.com/datajoint/element-array-ephys)

This repository provides demonstrations for:
1. Set up a workflow using different elements (see [workflow_array_ephys/pipeline.py](workflow_array_ephys/pipeline.py))
2. Ingestion of data/metadata based on:
+ predefined file/folder structure and naming convention
+ predefined directory lookup methods (see [workflow_array_ephys/paths.py](workflow_array_ephys/paths.py))
3. Ingestion of clustering results (built-in routine from the ephys element)

1. Set up a workflow using DataJoint Elements (see
[workflow_array_ephys/pipeline.py](workflow_array_ephys/pipeline.py))
2. Ingestion of data/metadata based on a predefined file structure, file naming
convention, and directory lookup methods (see
[workflow_array_ephys/paths.py](workflow_array_ephys/paths.py)).
3. Ingestion of clustering results.

## Pipeline Architecture
## Workflow architecture

The electrophysiology pipeline presented here uses pipeline components from 4 DataJoint Elements,
`element-lab`, `element-animal`, `element-session` and `element-array-ephys`, assembled together to form a fully functional workflow.
The electrophysiology workflow presented here uses components from 4 DataJoint
Elements (`element-lab`, `element-animal`, `element-session`,
`element-array-ephys`) assembled together to form a fully functional workflow.

### element-lab

![element-lab](https://github.com/datajoint/element-lab/raw/main/images/element_lab_diagram.svg)
![element-lab](
https://github.com/datajoint/element-lab/raw/main/images/element_lab_diagram.svg)

### element-animal

![element-animal](https://github.com/datajoint/element-animal/blob/main/images/subject_diagram.svg)
![element-animal](
https://github.com/datajoint/element-animal/blob/main/images/subject_diagram.svg)

### assembled with element-array-ephys

![element-array-ephys](images/attached_array_ephys_element.svg)

## Installation instruction

### Step 1 - clone this project

Clone this repository from [here](https://github.com/datajoint/workflow-array-ephys)

+ Launch a new terminal and change directory to where you want to clone the repository to
```
cd C:/Projects
```
+ Clone the repository:
```
git clone https://github.com/datajoint/workflow-array-ephys
```
+ Change directory to `workflow-array-ephys`
```
cd workflow-array-ephys
```
### Step 2 - Setup virtual environment
It is highly recommended (though not strictly required) to create a virtual environment to run the pipeline.
+ You can install with `virtualenv` or `conda`. Below are the commands for `virtualenv`.
+ If `virtualenv` not yet installed, run `pip install --user virtualenv`
+ To create a new virtual environment named `venv`:
```
virtualenv venv
```
+ To activated the virtual environment:
+ On Windows:
```
.\venv\Scripts\activate
```
+ On Linux/macOS:
```
source venv/bin/activate
```
### Step 3 - Install this repository
From the root of the cloned repository directory:
```
pip install -e .
```
Note: the `-e` flag will install this repository in editable mode,
in case there's a need to modify the code (e.g. the `pipeline.py` or `paths.py` scripts).
If no such modification required, using `pip install .` is sufficient
### Step 4 - Jupyter Notebook
+ Register an IPython kernel with Jupyter
```
ipython kernel install --name=workflow-array-ephys
```
### Step 5 - Configure the `dj_local_conf.json`
We provided a tutorial notebook [01-configuration](notebooks/01-configuration.ipynb) to guide the configuration.
At the root of the repository folder,
create a new file `dj_local_conf.json` with the following template:
```json
{
"database.host": "<hostname>",
"database.user": "<username>",
"database.password": "<password>",
"loglevel": "INFO",
"safemode": true,
"display.limit": 7,
"display.width": 14,
"display.show_tuple_count": true,
"custom": {
"database.prefix": "<neuro_>",
"ephys_root_data_dir": ["Full path to root directory of raw data",
"Full path to root directory of processed data"]
}
}
```

+ Specify database's `hostname`, `username`, and `password` properly.

+ Specify a `database.prefix` to create the schemas.

+ Setup your data directory (`ephys_root_data_dir`) following the convention described below.


### Installation complete

+ At this point the setup of this workflow is complete.

## Directory structure and file naming convention

The workflow presented here is designed to work with the directory structure and file naming convention as followed

+ The `ephys_root_data_dir` is configurable in the `dj_local_conf.json`, under `custom/ephys_root_data_dir` variable

+ The `subject` directory names must match the identifiers of your subjects in the [subjects.csv](./user_data/subjects.csv) script

+ The `session` directories can have any naming convention

+ Each session can have multiple probes, the `probe` directories must match the following naming convention:

`*[0-9]` (where `[0-9]` is a one digit number specifying the probe number)

+ Each `probe` directory should contain:

+ One neuropixels meta file, with the following naming convention:

`*[0-9].ap.meta`

+ Potentially one Kilosort output folder

```
root_data_dir/
└───subject1/
│ └───session0/
│ │ └───imec0/
│ │ │ │ *imec0.ap.meta
│ │ │ └───ksdir/
│ │ │ │ spike_times.npy
│ │ │ │ templates.npy
│ │ │ │ ...
│ │ └───imec1/
│ │ │ *imec1.ap.meta
│ │ └───ksdir/
│ │ │ spike_times.npy
│ │ │ templates.npy
│ │ │ ...
│ └───session1/
│ │ │ ...
└───subject2/
│ │ ...
```

We provide an example data set to run through this workflow. The instruction of data downloading is in the notebook [00-data-download](notebooks/00-data-download-optional.ipynb).


## Running this workflow

For new users, we recommend using the following two notebooks to run through the workflow.
+ [03-process](notebooks/03-process.ipynb)
+ [04-automate](notebooks/04-automate-optional.ipynb)

Here is a general instruction:

Once you have your data directory configured with the above convention,
populating the pipeline with your data amounts to these 3 steps:

1. Insert meta information (e.g. subjects, sessions, etc.) - modify:
+ user_data/subjects.csv
+ user_data/sessions.csv

2. Import session data - run:
```
python workflow_array_ephys/ingest.py
```
3. Import clustering data and populate downstream analyses - run:
```
python workflow_array_ephys/populate.py
```
+ For inserting new subjects, sessions or new analysis parameters, step 1 needs to be re-executed.
+ Rerun step 2 and 3 every time new sessions or clustering data become available.
+ In fact, step 2 and 3 can be executed as scheduled jobs that will automatically process any data newly placed into the `ephys_root_data_dir`.
## Interacting with the DataJoint pipeline and exploring data
For new users, we recommend using our notebook [05-explore](notebooks/05-explore.ipynb) to interact with the pipeline.
Here is a general instruction:
+ Connect to database and import tables
```
from workflow_array_ephys.pipeline import *
```
+ View ingested/processed data
```
subject.Subject()
session.Session()
ephys.ProbeInsertion()
ephys.EphysRecording()
ephys.Clustering()
ephys.Clustering.Unit()
```
+ If required to drop all schemas, the following is the dependency order. Also refer to [06-drop](notebooks/06-drop-optional.ipynb)
```
from workflow_array_ephys.pipeline import *
ephys.schema.drop()
probe.schema.drop()
session.schema.drop()
subject.schema.drop()
lab.schema.drop()
```
## Developer Guide
### Development mode installation
This method allows you to modify the source code for `workflow-array-ephys`, `element-array-ephys`, `element-animal`, `element-session`, and `element-lab`.
+ Launch a new terminal and change directory to where you want to clone the repositories
```
cd C:/Projects
```
+ Clone the repositories
```
git clone https://github.com/datajoint/element-lab
git clone https://github.com/datajoint/element-animal
git clone https://github.com/datajoint/element-session
git clone https://github.com/datajoint/element-array-ephys
git clone https://github.com/datajoint/workflow-array-ephys
```
+ Install each package with the `-e` option
```
pip install -e ./element-lab
pip install -e ./element-animal
pip install -e ./element-session
pip install -e ./element-array-ephys
pip install -e ./workflow-array-ephys
```
### Running tests
1. Download the test dataset to your local machine
(note the directory where the dataset is saved at - e.g. `/tmp/testset`)
2. Create an `.env` file with the following content:
> TEST_DATA_DIR=/tmp/testset
(replace `/tmp/testset` with the directory where you have the test dataset downloaded to)
## Installation instructions

3. Run:
+ The installation instructions can be found at [datajoint-elements/install.md](
https://github.com/datajoint/datajoint-elements/blob/main/install.md).

## Interacting with the DataJoint workflow

docker-compose -f docker-compose-test.yaml up --build
+ Please refer to the following workflow-specific
[Jupyter notebooks](/notebooks) for an in-depth explanation of how to run the
workflow ([03-process.ipynb](notebooks/03-process.ipynb)) and explore the data
([05-explore.ipynb](notebooks/05-explore.ipynb)).
Loading

0 comments on commit 4edf634

Please sign in to comment.