From 70ef8f30dd002edc09e28f51dbbe3f8fce7b4293 Mon Sep 17 00:00:00 2001 From: Rachel Lee <65371136+ScientistRachel@users.noreply.github.com> Date: Tue, 5 Nov 2024 16:59:37 -0500 Subject: [PATCH] Add pipeline documentation --- .github/workflows/pages.yml | 2 +- docs/pipeline/bdv_save.md | 4 +- docs/pipeline/config.md | 99 ++++++++++++++++++++++++++++++++++++- docs/pipeline/install.md | 6 ++- docs/pipeline/pipeline.md | 44 +++++++++++++++-- 5 files changed, 146 insertions(+), 9 deletions(-) diff --git a/.github/workflows/pages.yml b/.github/workflows/pages.yml index b90e169..d8ae86e 100644 --- a/.github/workflows/pages.yml +++ b/.github/workflows/pages.yml @@ -8,7 +8,7 @@ name: Deploy Jekyll site to Pages on: push: - branches: ["main"] + branches: ["master"] paths: - "docs/**" diff --git a/docs/pipeline/bdv_save.md b/docs/pipeline/bdv_save.md index beec38c..c5ac36f 100644 --- a/docs/pipeline/bdv_save.md +++ b/docs/pipeline/bdv_save.md @@ -1,8 +1,8 @@ --- -title: BDV-compatible Saving +title: File Organization layout: default parent: Pipeline Usage nav_order: 3 --- -This page will describe filenaming conventions. \ No newline at end of file +This page will describe file naming conventions. \ No newline at end of file diff --git a/docs/pipeline/config.md b/docs/pipeline/config.md index fe04abb..53e2aae 100644 --- a/docs/pipeline/config.md +++ b/docs/pipeline/config.md @@ -5,4 +5,101 @@ parent: Pipeline Usage nav_order: 2 --- -This page will describe how to set up a configuration file. \ No newline at end of file +# Configuration File + +The configuration file controls how the pipeline runs all requested modules. JSON formatting rules (e.g., using `{ }` and `,` appropriately) apply. If you work with an appropriate IDE (e.g., [Visual Studio Code](https://code.visualstudio.com/)), it can help highlight any errors you might have made in your JSON syntax. + +One configuration file can be used to process multiple experiments. For documentation purposes, it is often ideal to have one `config.json` file for a given project that is stored with the rest of the project's files. + +## Paths + +### _root_ + +All configuration files need to include information on image paths. The `root` directory can contain multiple experiments to process. The pipeline starts by walking the `root` path. Any folder or subfolder that contains a `*Settings.txt` file will be processed by the pipeline. + +### _processed.json_ +Once a folder has been processed, it will be added to a new file, `processed.json`. This new json file is made in the `root` directory. Any directories that are listed in `processed.json` will be skipped during any future processing. This is useful if, for example, you want to analyze the first experiment in a project, but want to keep using the same `config.json` to process the rest of the project. All previously processed experiments will be skipped, saving time and money. However, if you make a mistake and want to re-run any given folder, you will need to open `processed.json` and manually delete the entry that is associated with that folder. + +### _psf_ +PSFs are required for running the deconvolution module. In prior versions of the pipeline, this section was required regardless, but the current version only checks for the `psf` section if deconvolution is requested. Inside the `psf` section, there are two required subsections: `dir` and `laser`. The directory `dir` is parsed as a subdirectory of `root` and should contain PSF images with their corresponding settings file. See [Deconvolution](https://aicjanelia.github.io/LLSM/decon/decon.html) for more about these files. The file names that correspond to each laser must be provided as name-value pairs in the `laser` subsection. The laser names must exactly match the values in the acquisition settings file (e.g., don't use 561 if the settings file uses 560). + +## BDV +This optional section determines the naming convention for output files. It defaults to false if the section is not provided. To learn more see [File Organization](https://aicjanelia.github.io/LLSM/pipeline/bdv_save.html). + +## Individual Modules +Individual modules are requested by adding their own section to the JSON file. The [example json file](#example-configjson) requests cropping, deskewing, deconvolution, and the generation of MIP files. More details on the parameters for each are provided in the discussion of each module, but a high-level overview is provided here. If all modules are requested, they are run in the order of `crop > deskew > decon`, with mips created at each stage as appropriate. + +### _crop_ +For each side of the image that cropping is desired, the number of pixels to remove from that side is a parameter. For example, `"cropTop": 10` removes 10 pixels from the top of the image. Any sides that are not provide are assumed to be zero. Other optional parameters are described further in [Cropping](). + +### _deskew_ +Deskewing is based on the xy-resolution and the step size of the images. The step size of the images is automatically parsed from the acquisition settings.txt file, but `xy-res` should be provided in μm in the configuration file. The value of `fill` determines the values added to empty space created by the deskewing process, while `bit-depth` is 16 for our systems. If omitted, `angle` will default to the LLSM value of 31.8 degrees or the MOSAIC value of -32.45 degrees. + +### _decon_ +The value of n in `decon` is not related to the bsub command, but rather is the number of Richardson-Lucy iterations. Subtract will subtract a camera offset from all images; this value should generally be 100 for the AIC systems. Our systems have a `bit-depth` of 16. + +### _mip_ +The true/false values in `mip` determine if projections will be made along the x, y, and/or z axes. Setting all values to true is recommended. + +## Bsub +The `bsub` section determines how jobs will be sent to the LSF cluster and is the section most specific to using the Janelia set up. A job is created for each individual tiff file, so thus there is one job for each timepoint and each channel. + +### _job output_ +If no output path, `o`, is specified, an email will be generated for every individual job, which corresponds to each individual tif, and can number in the thousands. This should be avoided! Setting `o` to `"/dev/null"` will result in no output being sent. If you are troubleshooting, you can specify a path to a file (e.g., `"/nrs/aic/instruments/llsm/pipeline-test/output.txt"`) that can be viewed as the processing progresses. + +### _run times_ +Two values, `We` and `W`, impact the run times of each individual job (i.e., each individual tiff file) and are specified in minutes. The estimated runtime, `We` is a guess of how long the files will take to process. This is generally short (approximately 10 min) on the LLSM, but can vary widely for MOSAIC data. The hard runtime limit, `W`, will stop your job at that time point, whether the job has completed or not. This keeps jobs with mistakes from running too long on the cluster and is currently set to a default of 8 hours. **If jobs are expected to run for longer than 8 hours, the value of `W` must be increased in the configuration file or jobs will not complete.** If confident that jobs will take less than 1 hour each, setting `W` to 59 will lead to jobs being placed in a faster queue for cluster processing. + +### _slots_ +The value of `n` determines the number of slots requested on the cluster. The only purpose of this parameter is to guarantee the correct amount of memory for processing. If more slots than necessary for memory are requested, jobs will actually slow down, not speed up. *The modules are not written to take advantage of parallel processing.* + +Each slot has 15 GB of memory. The maximal memory is used by the deconvolution module. To determine how much memory to request for deconvolution, calculate the total voxels in the final output image. If you are deconvolving after deskewing, this will be the voxels in the deskewed image, not the raw input images. An empirical equation for memory usage is `memory = (7.5E-8 * total voxels) - 1.39119`. Divide this memory value by 15 GB and round up to determine the number of slots. If the value is close to an integer value (e.g., 1.95), a cautious approach is to add 1 to the rounded value (e.g., 2-->3) to avoid small fluctuations in memory usage causing errors. + +## Example config.json + +```json +{ + "paths": { + "root": "/nrs/aic/instruments/llsm/pipeline-test/", + "psf": { + "dir": "20210222/Calibration", + "laser": { + "560": "560_PSF.tif", + "488": "488_PSF.tif" + } + } + }, + "bdv": { + "bdv_save": true + }, + "crop": { + "cropTop": 10, + "cropBottom": 0, + "cropLeft": 15, + "cropRight": 5, + "cropFront": 100, + "cropBack": 50 + }, + "deskew": { + "xy-res": 0.104, + "fill": 0.0, + "bit-depth": 16 + }, + "decon": { + "n": 5, + "bit-depth": 16, + "subtract": 100.0 + }, + "mip": { + "x": true, + "y": true, + "z": true + }, + "bsub": { + "o": "/dev/null", + "We": 10, + "n": 4, + "W": 480 + } +} +``` \ No newline at end of file diff --git a/docs/pipeline/install.md b/docs/pipeline/install.md index a92f914..bd5e2f8 100644 --- a/docs/pipeline/install.md +++ b/docs/pipeline/install.md @@ -5,6 +5,10 @@ parent: Pipeline Usage nav_order: 1 --- +# Installation + +The main modules are all written in C++ and compiled to executable binaries. The overall pipeline wrapper is a python script. If you are running this pipeline at Janelia, the only installation necessary is to ensure that these binaries and pythons scripts are on your path. If you would like to rebuild the pipeline elsewhere, please see additional information below. + # Dependencies * boost-program-options >= 1.73.0 @@ -15,7 +19,7 @@ nav_order: 1 The pipeline assumes that all LLSM settings files have been generated by *v4.04505.Development* of the LLSM control software. Settings files generated by different versions of the LLSM control software are not likely to be parsed correctly by our parsing routine. -# Installation +# Module Building The LLSM pipeline was purpose built to run on the Janelia cluster. For example, the pipeline assumes that it can directly submit jobs to an LSF cluster. We recommend using VCPKG to install the necessary dependencies. diff --git a/docs/pipeline/pipeline.md b/docs/pipeline/pipeline.md index 62cb67a..934bcb1 100644 --- a/docs/pipeline/pipeline.md +++ b/docs/pipeline/pipeline.md @@ -8,9 +8,33 @@ nav_order: 2 This pipeline was designed to be used with the high performance computing cluster at Janelia Research Campus. Using the overview pipeline commands requires that jobs can be submitted to an LSF cluster, but individual modules can always be used directly on the command line. Additionally, the commands need to be on your path to be used directly. (For new AIC members, this requires a one-time set up for your cluster account.) -The main input into either `llsm-pipeline` or `mosaic-pipeline` is a configuration JSON file. This configuration file is a structured way of informing the pipeline of which modules will be used, what the relevant file paths are, and any necessary parameters. +The main input into either `llsm-pipeline` or `mosaic-pipeline` is a configuration JSON file. This configuration file is a structured way of informing the pipeline of which modules will be used, what the relevant file paths are, and any necessary parameters. An example `config.json` file is provided in example directory of the [GitHub Repository](https://github.com/aicjanelia/LLSM) and further details about its organization are available the [Configuration File](https://aicjanelia.github.io/LLSM/pipeline/config.html) documentation. -## Usage + +### Use a dry run before submitting jobs +We recommend using the optional `--dry-run` (or `-d`) command before submitting jobs to the cluster. When this optional argument is passed to the pipeline command, the script will attempt to process the files without actually submitting any jobs to the cluster. The command line will display information about the requested processing that can be used to confirm that processing will proceed as desired. + +The following dry run output confirms the path of files to be processed, that the files will be saved with the bdv naming format (see [File Organization](https://aicjanelia.github.io/LLSM/pipeline/bdv_save.html)), and that there is one combination of cameras and channels, which corresponds to the 488 nm laser. + +``` +processing 'full\path\to\folder` +parsing 'Scan_Settings.text'... +saving in bdv naming format... +scan type is tile +CamA_ch0=488 +Done +``` + +For most cases, the `llsm-pipeline` \ `mosaic-pipeline` command should be all that is needed. However, the individual modules can be run directly on a single file if needed. See the sections for each module for information on running the modules separate from the pipeline. + +# Usage + +### Example command +``` +llsm-pipeline -d /nrs/aic/instruments/llsm/pipeline-test/config.json +``` + +### LLSM Options ``` usage: llsm-pipeline [-h] [--dry-run] [--verbose] input @@ -26,6 +50,18 @@ optional arguments: --verbose, -v print details (including commands to bsub) ``` -The pipeline is initiated by calling the `llsm-pipeline` command and providing it with a properly formatted `config.json` file. See the `example` directory of this repo for an example `config.json`. We recommend using the `--dry-run` command to check your pipeline run before submitting jobs to the cluster. +### MOSAIC Options -For most cases, the `llsm-pipeline` command should be all that is needed. However, the individual modules can be run directly on a single file if needed. See the sections before for information on running the modules separate from the pipeline. \ No newline at end of file +``` +usage: mosaic-pipeline [-h] [--dry-run] [--verbose] input + +Batch processing script for MOSAIC images. + +positional arguments: + input path to configuration JSON file + +optional arguments: + -h, --help show this help message and exit + --dry-run, -d execute without submitting any bsub jobs + --verbose, -v print details (including commands to bsub) +``` \ No newline at end of file