Whole-genome re-sequencing to examine genetic changes in a population of Ithaca, NY honeybees using samples collected in 1977 and 2011

This repo contains the part of the analysis that was performed on the cluster. Downstream analysis done in R and plotting in python have not yet been added.

Genomes were sequenced on an Illumina HiSeq, using genomic libraries prepared without PCR. In addition to the Ithaca samples, there were some bees included from populations in Arizona, Chiapas (Africanized) and from Hawaii, Korea and Japan (non-Africanized).

Some of the steps are parallelized on an SGE cluster.

Workflow

The first step was to align the reads to the reference using bowtie2, and then to re-calibrate alignments around indels using GATK.

SNP calling

major_split.py

create a file of limits for GATK, corresponding to the 16 major chromosomes
- this was piped to data/scaffolds_long.txt

bqsr.sh

perform base quality recalibration using known SNP sites from NCBI and validated sites kindly provided by Greg Hunt

call.sh

starting with mapped fragments, call genotypes for all samples

vqsr.sh

perform variant quality score recalibration to filter low-quality SNPs

SNP frequency measurement using ANGSD

angsd.sh

compute minor allele frequencies for old and modern populations, and conduct likelihood ratio tests for significant changes

intersect_mafs.py

intersect minor allele frequency files for old and modern populations

Imputation and association testing using BEAGLE

vcf2bgl.sh

convert GATK vcf to BEAGLE format

phase.sh

phase genotypes and impute missing values

assoc.sh

association testing on imputed haplotypes, looking for evidence of selection between old and modern populations
- This is a parallel analysis to likelihoood ratio testing with ANGDS

c2h.py and c2h.sh

extract haplotypes from BEAGLE results

Differentiation between European and Africanized bees

ahb.sh

calculate Fst between populations with European and African ancestry using vcftools
- note: output files manually moved into the data directory

angsd_ahb.sh

trying to compute Fst using ngsutils.
- this approach has not worked, given the different number of snp calls between samples.
- I have given up on this for now, focusing instead on the vcftools analysis

plotting differentiation between populations

angsd2bgl.sh

generate BEAGLE-formatted data from ngs count data

ngsAdmix.sh

use NgsAdmix to infer ancestral population clusters

pca.sh

compute covariance matrix using posterior probabilities of genotypes computed by angsd.sh

Still left to do

intersect beagle and angds results
iEHH (using rehh package in R)
visualize data
look at genes in beagle haplotype blocks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whole-genome re-sequencing to examine genetic changes in a population of Ithaca, NY honeybees using samples collected in 1977 and 2011

Workflow

SNP calling

major_split.py

bqsr.sh

call.sh

vqsr.sh

SNP frequency measurement using ANGSD

angsd.sh

intersect_mafs.py

Imputation and association testing using BEAGLE

vcf2bgl.sh

phase.sh

assoc.sh

c2h.py and c2h.sh

Differentiation between European and Africanized bees

ahb.sh

angsd_ahb.sh

plotting differentiation between populations

angsd2bgl.sh

ngsAdmix.sh

pca.sh

Still left to do

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.gitignore		.gitignore
Ithaca coordinates for morphometric		Ithaca coordinates for morphometric
Morphometric analysis		Morphometric analysis
NGS Admix for Ithaca population		NGS Admix for Ithaca population
README.md		README.md
ahb.sh		ahb.sh
angsd.sh		angsd.sh
angsd_ahb.sh		angsd_ahb.sh
assoc.sh		assoc.sh
bqsr.sh		bqsr.sh
c2h.py		c2h.py
c2h.sh		c2h.sh
call.sh		call.sh
gatk2bgl.sh		gatk2bgl.sh
intersect_mafs.py		intersect_mafs.py
join_mafs.py		join_mafs.py
major_split.py		major_split.py
ngsAdmix.sh		ngsAdmix.sh
pca.sh		pca.sh
phase.sh		phase.sh
pops.sh		pops.sh
vcf2bgl.sh		vcf2bgl.sh
vqsr.sh		vqsr.sh

mikheyev/ithaca-bees

Folders and files

Latest commit

History

Repository files navigation

Whole-genome re-sequencing to examine genetic changes in a population of Ithaca, NY honeybees using samples collected in 1977 and 2011

Workflow

SNP calling

major_split.py

bqsr.sh

call.sh

vqsr.sh

SNP frequency measurement using ANGSD

angsd.sh

intersect_mafs.py

Imputation and association testing using BEAGLE

vcf2bgl.sh

phase.sh

assoc.sh

c2h.py and c2h.sh

Differentiation between European and Africanized bees

ahb.sh

angsd_ahb.sh

plotting differentiation between populations

angsd2bgl.sh

ngsAdmix.sh

pca.sh

Still left to do

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages