ExpansionHunter v5.0.0 introduces substantial changes to accelerate analysis of large genome-wide STR catalogs, in addition to build system improvements, bug fixes and catalog updates.
Large Catalog Support
- ExpansionHunter genotyping can now be accelerated across multiple threads by using the new
--threads
option. - To further support large catalog analysis, streaming mode memory requirements have been decreased by a factor of 20.
- Together with additional runtime optimizations, a catalog of 240,441 STRs can now be genotyped on a 35x human sample in less than 31 minutes (on 16 threads) with 25 GB of memory.
Comparison of v5.0.0 to v4.0.2 for a catalog of 37,413 STRs on a 35x human sample:
Version | Analysis Mode | Threads | Wall TIme (mm:ss) | Peak RSS (GB) |
---|---|---|---|---|
v4.0.2 | streaming | 1 | 93:10 | 85.6 |
v5.0.0 | streaming | 16 | 4:35 | 3.5 |
v4.0.2 | seeking | 1 | 272:23 | 0.5 |
v5.0.0 | seeking | 16 | 19:00 | 1.1 |
Additional Updates
- Added definitions for GIPC1 repeats and sorted all catalogs
- Updated depth calculation to Include singleton reads
- BAM/CRAM input to streaming mode is no longer required to be sorted or indexed
- Bug fixes
- Reorganized source code and build system
- The build system has been redesigned to download most third-party libraries. An active internet connection is now required to build from source.
Contributors: @ctsa, @yjqiu, @egor-dolzhenko
ExpansionHunter-v5.0.0-linux_x86_64.tar.gz and ExpansionHunter-v5.0.0-macOS.tar.gz are binary distributions for 64-bit Linux and macOS respectively.