Skip to content

Commit

Permalink
Update README and CSV schema validation
Browse files Browse the repository at this point in the history
  • Loading branch information
dialvarezs committed Sep 5, 2024
1 parent 2261c5d commit b517f60
Show file tree
Hide file tree
Showing 3 changed files with 20 additions and 19 deletions.
32 changes: 16 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,35 +15,35 @@ Uses Dorado for basecalling and demultiplexing.
git clone https://github.com/catg-umag/ont-basecalling-demultiplexing
```
2. Demultiplexing setup (optional):
- If demultiplexing is needed, create a samples.csv file containing at least the barcode and sample columns.
- If demultiplexing is needed, create a `samples.csv` file containing at least the `barcode` and `sample` columns.
- Ensure the barcode column includes the barcode identifier (e.g., barcode01), and the sample column lists the sample name, which will be used in reports and as the FASTQ filename.
3. Configure parameters:
- Copy the example parameters file:
```bash
cp params.example.yml my_params.yml
```
- Modify my_params.yml according to your needs. Ensure that the sample_data parameter points to your samples.csv file if you are demultiplexing.
- Modify my_params.yml according to your needs. Ensure that the `sample_data` parameter points to your `samples.csv` file if you are demultiplexing.
4. Run the pipeline:
```bash
nextflow run ont-basecalling-demultiplexing/ -profile apptainer -params-file my_params.yml
```

## Pipeline Parameters

| Parameter | Required | Default | Description |
| -------------------------- | -------- | ---------------------------------- | ----------------------------------------------------------------------------------------------- |
| `experiment_name` | No | - | Name of the experiment, used for final reports (title and filename). |
| `data_dir` | Yes | - | Path to the directory containing POD5 files. |
| `sample_data` | No | - | Path to the CSV file containing the sample data (if not provided, will not perform demux). |
| `output_dir` | No | `results` | Directory for saving results. |
| `fastq_output` | No | `true` | Generates FASTQ files if `true`; otherwise, generates UBAM files. |
| `qscore_filter` | No | `10` | Minimum QScore threshold for "pass" data, used in demultiplexing. |
| `dorado_basecalling_model` | No | `sup` | Model used for basecalling. Check Dorado help for available options. |
| `dorado_basecalling_gpus` | No | `1` | Number of GPUs to allocate for basecalling. |
| `dorado_demux_kit` | No | `EXP-NBD196` | Kit identifier used for demultiplexing. |
| `dorado_demux_both_ends` | No | `false` | Demultiplexes using barcodes on both ends (5' and 3') if `true`. |
| `use_dorado_container` | No | `true` | Uses Dorado via container if `true`; expects a local installation if `false`. |
| `qc_tools` | No | `['fastqc', 'nanoq', 'toulligqc']` | Specifies which QC tools to run. Options: 'nanoq', 'nanoplot', 'fastqc', 'toulligqc', 'pycoqc'. |
| Parameter | Required | Default | Description |
| -------------------------- | -------- | ---------------------------------- | --------------------------------------------------------------------------------------------------- |
| `experiment_name` | No | - | Name of the experiment, used for reports (title and filename). |
| `data_dir` | Yes | - | Path to the directory containing POD5 files. |
| `sample_data` | No | - | Path to the CSV file containing the sample data (if not provided, will not perform demultiplexing). |
| `output_dir` | No | `results` | Directory for saving results. |
| `fastq_output` | No | `true` | Generates FASTQ files if `true`; otherwise, generates UBAM files. |
| `qscore_filter` | No | `10` | Minimum QScore threshold for "pass" data, used in demultiplexing. |
| `dorado_basecalling_model` | No | `sup` | Model used for basecalling. Check Dorado help for available options. |
| `dorado_basecalling_gpus` | No | `1` | Number of GPUs to allocate for basecalling. |
| `dorado_demux_kit` | No | `EXP-NBD196` | Kit identifier used for demultiplexing. |
| `dorado_demux_both_ends` | No | `false` | Demultiplexes using barcodes on both ends (5' and 3') if `true`. |
| `use_dorado_container` | No | `true` | Uses Dorado via container if `true`; expects a local installation if `false`. |
| `qc_tools` | No | `['fastqc', 'nanoq', 'toulligqc']` | Specifies which QC tools to run. Options: 'nanoq', 'nanoplot', 'fastqc', 'toulligqc', 'pycoqc'. |

## Considerations

Expand Down
3 changes: 2 additions & 1 deletion assets/samples_data_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
"properties": {
"barcode": { "type": "string" },
"sample": { "type": "string" }
}
},
"required": ["barcode", "sample"]
}
}
4 changes: 2 additions & 2 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"properties": {
"experiment_name": {
"type": "string",
"description": "Name of the experiment, used for final reports (title and filename)."
"description": "Name of the experiment, used for reports (title and filename)."
},
"data_dir": {
"type": "string",
Expand All @@ -20,7 +20,7 @@
"format": "file-path",
"schema": "/assets/samples_data_schema.json",
"mimetype": "text/csv",
"description": "Path to the CSV file containing the sample data (if not provided, will not perform demux)."
"description": "Path to the CSV file containing the sample data (if not provided, will not perform demultiplexing)."
},
"output_dir": {
"type": "string",
Expand Down

0 comments on commit b517f60

Please sign in to comment.