Repository containing the code necessary to reproduce the results of the Alpaca toolbox manuscript.
This repository must be cloned to a local folder. This can be done using the
git
CLI client:
git clone https://github.com/INM-6/alpaca_use_case.git
To run the analyses, the public experimental datasets availabe at https://doi.gin.g-node.org/10.12751/g-node.f83565 must be downloaded.
The scripts use the datasets in the NIX format (including the 30 kHz neural signal), and the versioned files are accessible using the links:
- i140703-001.nix: https://gin.g-node.org/INT/multielectrode_grasp/raw/a6d508be099c41b4047778bc2de55ac216f4e673/datasets_nix/i140703-001.nix
- l101210-001.nix: https://gin.g-node.org/INT/multielectrode_grasp/raw/a6d508be099c41b4047778bc2de55ac216f4e673/datasets_nix/l101210-001.nix
You can also follow the instructions on the GIN repository
to download the files to a local repository folder using the gin
client.
The NIX files must be downloaded/copied into the folder /data
with respect
to the root of this repository. This allows running the analyses using the
bash
scripts that are provided with each Python script. If downloaded using
the gin
client, a symbolic link can be created to the path where the GIN
repository was cloned in your system (subfolder datasets_nix
):
ln -s /path/to/multielectrode_grasp/datasets_nix ./data
To run the example using MPI parallelization, OpenMPI must be installed in the
system. This can be installed using apt
:
sudo apt install libopenmpi-dev openmpi-bin
Project requires Python 3.9 and the following packages:
- conda
- pip
- scipy
- numpy
- matplotlib
- nixio
- neo
- elephant
- odml
- alpaca-prov
The example using MPI parallelization requires mpi4py
.
The example using Snakemake requires snakemake
.
The code was run using Ubuntu 18.04.6 LTS 64-bit and conda
22.9.0.
The examples using MPI and Snakemake were run in an HPC cluster using
Debian 5.10.179-2.
The required environments can be created using conda
, using the templates in
the /environment
folder. For instructions on how to install conda
in your
system, please check the conda
documentation.
For convenience, all necessary environments can be created using a bash
script (this rewrites existing versions of the environments):
cd environment
./build_envs.sh
For visualization of provenance graphs as GEXF files, Gephi 0.9.7 (build 202208031831) was used. The instructions for downloading and installing are found in the Installation section of the Alpaca documentation.
The code is organized into subfolders inside the /code
folder:
-
no_provenance
: this is the original PSD analysis presented as use case in the paper. The analysis is implemented inpsd_by_trial_type.py
. The flowchart presented in Figure 4 was constructed based on this script. -
provenance
:psd_by_trial_type.py
is the code inno_provenance
modified to use Alpaca to track provenance. Thegenerate_gexf_from_prov.py
script generates several visualization graphs, with different levels of simplification, using the provenance information saved as a Turtle file. -
provenance_mpi
: thepsd_by_trial_type.py
code inprovenance
is modified to run the analysis using parallelization: each iteration of the main loop (processing a data file) will be run in a separate process. At the end, the root process collects the PSD data generated by the other process and plot it. Thegenerate_gexf_from_prov.py
script generates simplified visualization graphs from the provenance information saved as Turtle files. -
provenance_snakemake
: thepsd_by_trial_type.py
code inprovenance
is split into multiple scripts inworkflow
:compute_psd_by_trial_type.py
reads a data file and compute the PSDs for each trial type, saving into pickle files.plot_psds.py
reads all pickle files and produces the final plot.
The Snakemake workflow manager is used to orchestrate the execution. The workflow is defined in
workflow\Snakefile
and the workflow configuration inconfigs\config.yaml
. Thegenerate_gexf_from_prov.py
script generates simplified visualization graphs from the provenance information saved as Turtle files. -
smoothed_plot
:psd_by_trial_type.py
contains a modification of the original analysis to produce a smoothed version of the PSD plot. Thegenerate_gexf_from_prov.py
script generates simplified visualization graphs from the provenance information saved as a Turtle file.
To run the code, the correct environment must be activated with conda
and
the scripts run using the provided bash
scripts:
-
no_provenance
:To run the analysis:
cd code/no_provenance conda activate no_provenance ./run.sh
-
provenance
:To run the analysis:
cd code/provenance conda activate provenance ./run.sh
After the analysis is run, generate the GEXF graphs to visualize provenance with Gephi:
./visualize_provenance.sh
-
provenance_mpi
:To run the analysis and generate the GEXF graphs at the end:
cd code/provenance_mpi conda activate provenance-mpi ./run.sh
-
provenance_snakemake
:To run the analysis and generate the GEXF graphs at the end:
cd code/provenance_snakemake conda activate provenance-snakemake ./run.sh
-
smoothed_plot
:To run the analysis and generate the GEXF graphs at the end:
cd code/smoothed_plot conda activate provenance ./run.sh
The outputs presented in the paper are available in the /outputs
folder.
The bash scripts write files to the /outputs
folder, with respect to
the root of this repository. There are subfolders with the same names as
in /code
, each with the respective outputs:
- outputs/no_provenance: plot in
R2G_PSD_all_subjects.png
is the Figure 3 in the paper. - outputs/provenance: one of the outputs is the
R2G_PSD_all_subjects.ttl
file, that contains the provenance information used to generate the graphs presented in Figures 11 and 12 in the paper (specific details below). Several visualization graphs as GEXF files are provided. The plot inR2G_PSD_all_subjects.png
is the analysis output, equivalent to the one generated by/code/no_provenance
. - outputs/provenance_mpi: the two Turtle files
(
R2G_PSD_all_subjects_rank0.ttl
andR2G_PSD_all_subjects_rank1.ttl
) contains the provenance information obtained from running each MPI process. They were used to generate visualization graphs (*.gexf) in this folder. - outputs/provenance_snakemake: the folder
psds_by_trial_type
contains the intermediate pickle files generated by each run of thecompute_psd_by_trial_type.py
script. For each run, a Turtle file with the session name is saved (i140703-001.ttl
andl101210-001.ttl
) with the provenance information, corresponding to the NIX file processed. An additional Turtle fileR2G_PSD_all_subjects.ttl
contains the provenance information for the execution of theplot_psds.py
script. The three Turtle files are used to generate the graph presented in Figure 14B (specific details below). - outputs/smoothed_plot: the outputs are presented as
Figure 13 in the paper. The output file
R2G_PSD_all_subjects.png
was used in Figure 13A. TheR2G_PSD_all_subjects.ttl
file was used to generate the visualization graph presented in Figure 13B (specific details below).
The specific GEXF graph outputs used for the figures in the paper are:
- Figure 11A: outputs/provenance/R2G_PSD_all_subjects_full.gexf
- Figure 11B (top): outputs/provenance/R2G_PSD_all_subjects_full.gexf
- Figure 11B (bottom): outputs/provenance/R2G_PSD_all_subjects.gexf
- Figure 11C: outputs/provenance/R2G_PSD_all_subjects.gexf
- Figure 11D: outputs/provenance/R2G_PSD_all_subjects.gexf
- Figure 11E: outputs/provenance/R2G_PSD_all_subjects.gexf
- Figure 12A: outputs/provenance/R2G_PSD_all_subjects_simplified_Q_shape_units_function.gexf
- Figure 12B: outputs/provenance/R2G_PSD_all_subjects_simplified_Q_units.gexf
- Figure 13B: outputs/smoothed_plot/R2G_PSD_all_subjects_simplified.gexf
- Figure 14B: outputs/provenance_snakemake/R2G_PSD_all_subjects_simplified.gexf
For each analysis script run, the respective folder in /outputs
will
contain text files to further document the execution:
environment.txt
: details on the Python and package version information;psd_by_trial_type.out
: STDOUT and STDERR output of the script execution (for theno_provenance
,provenance
,provenance_mpi
andsmoothed_plot
runs;snakemake.out
: STDOUT and STDERR output of the workflow execution, in theprovenance_snakemake
example.
This work was performed as part of the Helmholtz School for Data Science in Life, Earth and Energy (HDS-LEE) and received funding from the Helmholtz Association of German Research Centres. This project has received funding from the European Union’s Horizon 2020 Framework Programme for Research and Innovation under Specific Grant Agreements No. 785907 (Human Brain Project SGA2) and 945539 (Human Brain Project SGA3), and by the Helmholtz Association Initiative and Networking Fund under project number ZT-I-0003.
BSD 3-Clause License