HAWK_Q

Hitting associations with k-mers for quantitaive phenotypes

This is an extension of HAWK

Installation

The work is ongoing. This will be updated when the work is finished.

Prerequisites

JELLYFISH (modified version available in supplements)

EIGENSTRAT (modified version available in supplements)

R (with foreach and doParallel packages)

ABYSS

GNU sort with parallel support

Perl

Counting k-mers

The first step in the pipeline is to count k-mers in each sample, find total number of k-mers per sample, discard k-mers that appear once in samples and sort the k-mers. The k-mer file contains one line per k-mer present and each line contains an integer representing the k-mer and its count separated by a space. The integer representation is given by using 0 for 'A', 1 for 'C', 2 for 'G' and 3 for 'T'.

k-mer counting can be done using a modified version of the tool JELLYFISH provided in the 'supplements' folder with HAWK. All of the steps mentioned above can be performed by installing this version of JELLYFISH and then running the script 'q_countKmers' in supplements with necessary modifications. Note that: Perl should be installed.

The version provided assumes reads from each sample is in a separate directory and prefixes of all directories containing reads is Reads. For example reads from sample1, sample 2, etc. could be in directories named Reads_sample1, Reads_sample2, etc. It also assumes that the read files are gzipped and have extensions fastq.gz. If the read files are not gzipped please change the zcat *.fastq.gz to cat *.fastq in line 21 in countKmers.

This will write the names of sorted k-mer count files in 'sorted_files.txt' and total k-mer count in samples in 'total_kmer_counts.txt'.

Running HAWK

[This part will undergo change]

Copy 'sorted_files.txt' and 'total_kmer_counts.txt' corresponding to the samples into a folder as well as a file named 'gwas_info.txt' containing three columns separated by tabs giving a sample ID, male/female/unknown denoted by M/F/U and Case/Control status of the sample for each sample. For example

SRR3050845	U	Control
SRR3050846	U	Case
SRR3050847	U	Control

Copy the scripts 'runHawk' and 'runAbyss' into the folder and run

./runHawk

The k-mers with significant association to case and controls will be in 'case_kmers.fasta' and 'control_kmers.fasta' which can then be assembled by running

./runAbyss

The assembled sequences will be in 'case_abyss.25_49.fasta' and 'control_abyss.25_49.fasta' respectively.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
cpp		cpp
dataset		dataset
ecoli_analysis		ecoli_analysis
supplements		supplements
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
alglibinternal.cpp		alglibinternal.cpp
alglibinternal.h		alglibinternal.h
ap.cpp		ap.cpp
ap.h		ap.h
bonf_fasta.cpp		bonf_fasta.cpp
convertToFasta.cpp		convertToFasta.cpp
countTotalKmer.awk		countTotalKmer.awk
hawk.cpp		hawk.cpp
hawk_q.cpp		hawk_q.cpp
kmer.h		kmer.h
kmersearch.cpp		kmersearch.cpp
kmersummary.cpp		kmersummary.cpp
log_reg_case.R		log_reg_case.R
log_reg_control.R		log_reg_control.R
makefile		makefile
parfile.txt		parfile.txt
preProcess.cpp		preProcess.cpp
specialfunctions.cpp		specialfunctions.cpp
specialfunctions.h		specialfunctions.h
stdafx.h		stdafx.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HAWK_Q

Installation

Prerequisites

Counting k-mers

Running HAWK

About

Releases

Packages

Languages

License

dmehrab06/Hawk_Q

Folders and files

Latest commit

History

Repository files navigation

HAWK_Q

Installation

Prerequisites

Counting k-mers

Running HAWK

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages