Rawasm is the first software tool that enables the construction of genome assembly from raw nanopore signals. It mostly reuses the miniasm features, but adds support to FAST5, POD5 and SLOW5 formats. Rawasm can be used in pipelining with RawHash2, using Rawsamble overlapping feature.
To install Rawasm, do the following:
- Clone the repository
git clone https://github.com/CMU-SAFARI/rawasm.git rawasm
- install Rawasm
cd rawasm && make
Rawasm makefile downloads a new miniasm and patches it. Default installation directory is rawasm/miniasm, but you can change it in the Makefile. If a local copy of miniasm exists, you can specify it in the Makefile. If this is the case, to skip the miniasm download and install use:
make install
Rawasm is a self-contained implementation that can be downloaded and run. It uses a set of pre-compiled static libraries. However, recompiling might be required depending on your system. If it is the case, you can choose two options:
- Refer to RawHash2 repo for SLOW5, POD5 and FAST5 libraries compilation.
- Compile the libraries using gcc ar from the original repos:
- libhdf5.a : compile from HD5 Group repo.
- libpod5_format.a, libarrow.a, libjemalloc_pic.a, libzstd.a : compile from POD5 format repo.
- libslow5.a : compile from Slow5 Tools repo.
Finally, Rawasm requires libuuid. you can do the following:
- Download libuuid
git clone https://github.com/cloudbase/libuuid/tree/master libuuid
- Build .o files (run the following for each .c or just write a Makefile/script to automate)
gcc -c -g -Wall -O2 -Wno-all -Wno-write-strings -Wno-deprecated-declarations -Wcpp -I. file.c -o file.o
- Generate the library
ar rcs libuuid.a *.o
All static libraries must be in the lib directory before running the make or make install command.
Rawasm supports by default all of miniasm features. Moreover, it introduces two new features:
- Processing of single or multiple FAST5, POD5, S/BLOW5 input files
- Output the assembly as FAST5, POD5, S/BLOW5 unitigs files. For more info about unitigs, check miniasm.
Using Rawasm is straightforward.
./miniasm -f input_data[.fast5/pod5/slow5/blow5] overlaps.paf -H outdir > assembly.gfa
input_data can be either a directory containing multiple files, or a single file (fast5, pod5, slow5, blow5). In case of a directory, do not mix different-type files. overlaps.paf is the all vs all overlaps file produced by RawHash2. outdir specifies the unitig files output directory. Rawasm creates a unitig file for each distinct unitig that makes the assembly. The format type is the same of the input. assembly.gfa is the assembly text output.
If you use Rawasm in your work, please consider citing the following papers:
TBD