Skip to content

An ultrafast and highly sensitive Next-Generation Sequencing (NGS) read mapper and methylation extractor.

License

Notifications You must be signed in to change notification settings

grev-uv/hpg-methyl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HPG-Methyl

If you want to use this tool just now, there is an executable file for Linux x86_64 systems. This compressed file is available at releases page.

HPG-Methyl is an ultrafast and highly sensitive Next-Generation Sequencing (NGS) read mapper and methylation context extractor. Compared with other current mapping and methylation extraction tools, HPG-Methyl offers better sensitivity and shorter execution times even for long reads.

Since the files generated by HPG-Methyl are fully compatible with the files generated by other popular tools, it can be used as a drop-in replacement to accelerate existing methylation analysis pipelines.

This read-me file contains a short guide to get started quickly with HPG-Methyl. Check the manual pages for more information about building, debugging, extending and using the software.

Building

Be careful. If the distribution and release of the OS is Ubuntu 18.04 or 20.04, please, take a look at Issues section before compile.

If you are interested in modifying, extending or debugging the software, the following instructions show how to build HPG-Methyl. To be able to build the software, HPG-Methyl requires a working installation of GCC 4.9.2+ and the following packages:

Library Ubuntu / Debian Red Hat / Fedora / Centos
GNU C Toolchain build-essential make / automake / gcc
SConstruct scons scons
ZLib zlib1g-dev zlib-devel
Curl libcurl4-gnutls-dev libcurl-devel
libxml libxml2-dev libxml2-devel
ncurses libncurses5-dev ncurses-devel
GNU GSL libgsl0-dev gsl-devel
check check check-devel

When all the packages are installed, build a release executable with:

$ scons

Or a debug executable with:

$ scons debug=1

The binary will be created on the /bin directory.

Running

To run HPG-Methyl, first the BWT index must be created. This process must be done only once per reference genome, using a FASTA reference genome file:

$ hpg-methyl build-index -g <your-fasta-file> -i <index-output-directory> -r 10 --bs-index

When the BWT index building has finished, HPG-Methyl can be used to map reads from a FASTQ file:

$ hpg-methyl bs -i <index-directory> -f <fastq-file-path> -o <output-directory> --cpu-threads <thread-count>

Or to map the reads and extract the methylation context status simultaneously:

$ hpg-methyl bs -i <index-directory> -f <fastq-file-path> -o <output-directory> --cpu-threads <thread-count> --write-mcontext

If the mapping tool hmapper is going to be used afterwards, then the option --write-context is highly recommended.

Example datasets

In order to test the application, the following public datasets are available on the GREV's external SFTP server.

The login details are:

  • Username: anonymous
  • Password: anonymous
  • Hostname: clariano.uv.es

Reference genome

  • Homo sapiens GRCh37 reference genome: sftp://[email protected]/datasets/Homo_sapiens.GRCh37.68.dna.fa

Real bisulphite treated sequences

  • SRR309230 (75 nt, 15 million samples): sftp://[email protected]/datasets/real/SRR309230_1_075nt_15M.fastq
  • SRR837425 (100 nt, 15 million samples): sftp://[email protected]/datasets/real/SRR837425_1_100nt_15M.fastq

Synthethic bisulphite treated sequences

  • 100 nt, 4 million samples: sftp://[email protected]/datasets/synthethic/test_4M_100nt_n3_r010.bwa.read1.fastq_convert.fastq
  • 150 nt, 4 million samples: sftp://[email protected]/datasets/synthethic/test_4M_150nt_n3_r010.bwa.read1.fastq_convert.fastq
  • 400 nt, 4 million samples: sftp://[email protected]/datasets/synthethic/test_4M_400nt_n3_r010.bwa.read1.fastq_convert.fastq
  • 800 nt, 4 million samples: sftp://[email protected]/datasets/synthethic/test_4M_800nt_n3_r010.bwa.read1.fastq_convert.fastq
  • 1600 nt, 4 million samples: sftp://[email protected]/datasets/synthethic/test_4M_1600nt_n3_r010.bwa.read1.fastq_convert.fastq
  • 3200 nt, 4 million samples: sftp://[email protected]/datasets/synthethic/test_4M_3200nt_n3_r010.bwa.read1.fastq_convert.fastq

License

HPG-Methyl is free software and licensed under the GNU General Public License version 2. Check the COPYING file for more information.

Issues

Firstly, check the distibution and release of the Operating System:

$ lsb_release -a

If the OS is Ubuntu 18.04 or 20.04, some changes are needed to compile properly:

In the file lib/third_party/config/libconfig.c, the line 37 must be commented

37 //#include <xlocale.h>

In the file lib/c/src/aligners/bwt/BW_preprocess.h, the lines 45, 46, 49 and 50 must be commented

45 //#ifdef __GNUC__
46 //#ifdef __NO_INLINE__
 ...
49 //#endif
50 //#endif

And in the file src/pair_server.h the lines 60, 61, 68 y 69 must be commented

60 //#ifdef __GNUC__
61 //#ifdef __NO_INLINE__
 ...
68 //#endif
69 //#endif

Finally, it can compile following the steps showed at Building section.

If you find any other bugs, issues, want a specific feature added or need help, feel free to add an issue or extend an existing one. Pull requests are welcome.

Contact

Contact any of the following developers for any enquiry: