This program processes an input file containing read length distribution data and prepares it for use with n50_simreads. It generates a string representation of the read length distribution and executes n50_simreads with the prepared data.
- Processes input files with read length distribution data from n50_binner
- Runs n50_simreads to generate reads based on the input data
n50_prepare -i INPUTFILE -o OUTDIR [-f FORMAT] [-s PATH] [-v]
-i INPUTFILE
: Path to the input file (required)-o OUTDIR
: Output directory for n50_simreads results (required)-f FORMAT
: Output format (optional, FASTQ by default, FASTA also supported)-s PATH
: Path to n50_simreads executable (optional)-v
: Verbose mode, prints additional information-h
: Display help message
The input file should be a CSV file with the following format:
length,count
100,1000
200,500
300,250
...
The first line (header) is skipped during processing.
To compile the program, use a C compiler such as gcc:
gcc -o n50_prepare n50_prepare.c
This will create an executable named n50_prepare
.
- Basic usage:
./n50_prepare -i input_distribution.csv -o output_directory
- Specifying FASTA output format:
./n50_prepare -i input_distribution.csv -o output_directory -f FASTA
- Using a custom path for n50_simreads:
./n50_prepare -i input_distribution.csv -o output_directory -s /path/to/n50_simreads
- Running in verbose mode:
./n50_prepare -i input_distribution.csv -o output_directory -v
The n50_simreads output will be saved in the specified output directory.
The program calculates and displays:
- Total number of reads
- Maximum read length
- n50_simreads (should be in the same directory as n50_prepare or specified with -s option)
- The program assumes that n50_simreads is in the same directory as n50_prepare unless specified otherwise, but you can supply a custom path with the
-s
option. - Make sure you have the necessary permissions to execute
n50_simreads
and write to the output directory. - Invalid data in the input file will be skipped with a warning message.
This program is provided under the MIT License. See the source code for full license text.
Andrea Telatin, 2023 Quadram Institute Bioscience