Skip to content

Latest commit

 

History

History
284 lines (205 loc) · 10.7 KB

README.md

File metadata and controls

284 lines (205 loc) · 10.7 KB

Alpaca Use Case

Repository containing the code necessary to reproduce the results of the Alpaca toolbox manuscript.

Table of contents

Prerequisites

Clone the repository to a local folder

This repository must be cloned to a local folder. This can be done using the git CLI client:

git clone https://github.com/INM-6/alpaca_use_case.git

Data

To run the analyses, the public experimental datasets availabe at https://doi.gin.g-node.org/10.12751/g-node.f83565 must be downloaded.

The scripts use the datasets in the NIX format (including the 30 kHz neural signal), and the versioned files are accessible using the links:

You can also follow the instructions on the GIN repository to download the files to a local repository folder using the gin client.

The NIX files must be downloaded/copied into the folder /data with respect to the root of this repository. This allows running the analyses using the bash scripts that are provided with each Python script. If downloaded using the gin client, a symbolic link can be created to the path where the GIN repository was cloned in your system (subfolder datasets_nix):

ln -s /path/to/multielectrode_grasp/datasets_nix ./data

Install OpenMPI

To run the example using MPI parallelization, OpenMPI must be installed in the system. This can be installed using apt:

sudo apt install libopenmpi-dev openmpi-bin

Requirements

Project requires Python 3.9 and the following packages:

  • conda
  • pip
  • scipy
  • numpy
  • matplotlib
  • nixio
  • neo
  • elephant
  • odml
  • alpaca-prov

The example using MPI parallelization requires mpi4py.

The example using Snakemake requires snakemake.

The code was run using Ubuntu 18.04.6 LTS 64-bit and conda 22.9.0. The examples using MPI and Snakemake were run in an HPC cluster using Debian 5.10.179-2.

Installation

The required environments can be created using conda, using the templates in the /environment folder. For instructions on how to install conda in your system, please check the conda documentation.

For convenience, all necessary environments can be created using a bash script (this rewrites existing versions of the environments):

cd environment
./build_envs.sh

For visualization of provenance graphs as GEXF files, Gephi 0.9.7 (build 202208031831) was used. The instructions for downloading and installing are found in the Installation section of the Alpaca documentation.

Code repository

The code is organized into subfolders inside the /code folder:

  • no_provenance: this is the original PSD analysis presented as use case in the paper. The analysis is implemented in psd_by_trial_type.py. The flowchart presented in Figure 4 was constructed based on this script.

  • provenance: psd_by_trial_type.py is the code in no_provenance modified to use Alpaca to track provenance. The generate_gexf_from_prov.py script generates several visualization graphs, with different levels of simplification, using the provenance information saved as a Turtle file.

  • provenance_mpi: the psd_by_trial_type.py code in provenance is modified to run the analysis using parallelization: each iteration of the main loop (processing a data file) will be run in a separate process. At the end, the root process collects the PSD data generated by the other process and plot it. The generate_gexf_from_prov.py script generates simplified visualization graphs from the provenance information saved as Turtle files.

  • provenance_snakemake: the psd_by_trial_type.py code in provenance is split into multiple scripts in workflow:

    • compute_psd_by_trial_type.py reads a data file and compute the PSDs for each trial type, saving into pickle files.
    • plot_psds.py reads all pickle files and produces the final plot.

    The Snakemake workflow manager is used to orchestrate the execution. The workflow is defined in workflow\Snakefile and the workflow configuration in configs\config.yaml. The generate_gexf_from_prov.py script generates simplified visualization graphs from the provenance information saved as Turtle files.

  • smoothed_plot: psd_by_trial_type.py contains a modification of the original analysis to produce a smoothed version of the PSD plot. The generate_gexf_from_prov.py script generates simplified visualization graphs from the provenance information saved as a Turtle file.

How to run

To run the code, the correct environment must be activated with conda and the scripts run using the provided bash scripts:

  • no_provenance:

    To run the analysis:

    cd code/no_provenance
    conda activate no_provenance
    ./run.sh
  • provenance:

    To run the analysis:

    cd code/provenance
    conda activate provenance
    ./run.sh

    After the analysis is run, generate the GEXF graphs to visualize provenance with Gephi:

    ./visualize_provenance.sh
  • provenance_mpi:

    To run the analysis and generate the GEXF graphs at the end:

    cd code/provenance_mpi
    conda activate provenance-mpi
    ./run.sh
  • provenance_snakemake:

    To run the analysis and generate the GEXF graphs at the end:

    cd code/provenance_snakemake
    conda activate provenance-snakemake
    ./run.sh
  • smoothed_plot:

    To run the analysis and generate the GEXF graphs at the end:

    cd code/smoothed_plot
    conda activate provenance
    ./run.sh

Outputs

The outputs presented in the paper are available in the /outputs folder.

The bash scripts write files to the /outputs folder, with respect to the root of this repository. There are subfolders with the same names as in /code, each with the respective outputs:

  • outputs/no_provenance: plot in R2G_PSD_all_subjects.png is the Figure 3 in the paper.
  • outputs/provenance: one of the outputs is the R2G_PSD_all_subjects.ttl file, that contains the provenance information used to generate the graphs presented in Figures 11 and 12 in the paper (specific details below). Several visualization graphs as GEXF files are provided. The plot in R2G_PSD_all_subjects.png is the analysis output, equivalent to the one generated by /code/no_provenance.
  • outputs/provenance_mpi: the two Turtle files (R2G_PSD_all_subjects_rank0.ttl and R2G_PSD_all_subjects_rank1.ttl) contains the provenance information obtained from running each MPI process. They were used to generate visualization graphs (*.gexf) in this folder.
  • outputs/provenance_snakemake: the folder psds_by_trial_type contains the intermediate pickle files generated by each run of the compute_psd_by_trial_type.py script. For each run, a Turtle file with the session name is saved (i140703-001.ttl and l101210-001.ttl) with the provenance information, corresponding to the NIX file processed. An additional Turtle file R2G_PSD_all_subjects.ttl contains the provenance information for the execution of the plot_psds.py script. The three Turtle files are used to generate the graph presented in Figure 14B (specific details below).
  • outputs/smoothed_plot: the outputs are presented as Figure 13 in the paper. The output file R2G_PSD_all_subjects.png was used in Figure 13A. The R2G_PSD_all_subjects.ttl file was used to generate the visualization graph presented in Figure 13B (specific details below).

The specific GEXF graph outputs used for the figures in the paper are:

Logs

For each analysis script run, the respective folder in /outputs will contain text files to further document the execution:

  • environment.txt: details on the Python and package version information;
  • psd_by_trial_type.out: STDOUT and STDERR output of the script execution (for the no_provenance, provenance, provenance_mpi and smoothed_plot runs;
  • snakemake.out: STDOUT and STDERR output of the workflow execution, in the provenance_snakemake example.

Acknowledgments

This work was performed as part of the Helmholtz School for Data Science in Life, Earth and Energy (HDS-LEE) and received funding from the Helmholtz Association of German Research Centres. This project has received funding from the European Union’s Horizon 2020 Framework Programme for Research and Innovation under Specific Grant Agreements No. 785907 (Human Brain Project SGA2) and 945539 (Human Brain Project SGA3), and by the Helmholtz Association Initiative and Networking Fund under project number ZT-I-0003.

License

BSD 3-Clause License