GitHub - arundurvasula/Maize-Background-Selection-Simulations: A pipeline for both forward and coalescent simulations of background selection during maize domestication on Slurm

###Purpose: To use approximate bayesian computation and a good rejection algorithm to create parameters that give summary statistics similar to the data found in Hufford et. al. 2012.

###Dependencies/Assumptions: To run this full pipeline you need the following:

The Sun Grid Engine
Working executables of:
- sfs_code
- convertSFS_CODE
- ms
- msstats
- ABCreg
- R
- Python

If those executables are not already in your path, you must add them:

create a file called ".bash_profile" in your home directory (if it doesn't exist)

add this, substituting the paths to your exectuable:

 PATH=$PATH:/home/user/sfs_code/bin/:/home/user/msstats-0.3.1/bin/
 export PATH

###Forward Simulations: This pipeline uses sfs_code to simulate maize domestication forward in time.

Directions to recreate simulations:

source submit_million_forward.sh
source cleanup.sh
source create_posterior.sh
Graph bneck.0.post in R

You can test out those parameters with Rscript generate_[simtype]_params.r 1 2 3. The output will be on the terminal with parameters saved as 'paramsFile.1.2.3'

There is also a pipeline that uses slim to simulate maize domestication forward in time.

To use the slim pipeline:

Make sure the folder slurm-log exists in /forward-sims/
Run simulations: source ./submit_multiple_slim_arrays.sh

This script submits the slim_pipeline.sh script multiple times, as specified in the the for loop. After the simulations finish, the files will be placed in slurm-log. There are 4 types of output files:
- bneck.[JOBID].txt => holds the size of the population after the bottleneck for the simulation
- paramsFile.[JOBID].txt => holds the all of the parameters used for simulation
- slim-pipeline-out.[ARRAYID].[JOBID].txt => holds one line of summary statistics for a population
- slim-pipeline-stderr.[ARRAYID].[JOBID].txt => contains any error messages generated during simulation as well as anything else sent to sdterr.
Analysis: to run the analysis using ABCreg, you need a data file containing the mean of the summary staticstics and a prior file containing all of the simulated summary statistics. NOTE: file names and paths need to be changed. 3. Create data file (uses Hufford statistics grabs the mean using awk):
```
 	awk '{print($48,"\t",$37,"\t",$39, "\t",$40,"\t",$41,"\t",$42, "\t",$43, "\t")}' ~/sfs_code_workflow2/Hufford_et_al._2012_10kb_statistics/Hufford_et_al._2012_10kb_statistics.txt | grep -v "NA" | awk '{for(i=1; i<=NF; i++){sum[i]+=$i}} END {for(i=1; i<=NF; i++){printf sum[i]/NR "\t"}}' > data_file.txt
```
1. Create prior file:
  1. Run slim-cleanup.sh to concatenate only valid stats and bottlenecks into one file.
  2. Remove stats labels from stats.txt
    
    awk '$1 ~ /[0-9]/ {print}' semifinal_stats.txt > final_stats.txt
  3. Grab only necessary stats:
    
    awk '{print($1,"\t",$2,"\t",$4, "\t",$5,"\t",$6,"\t",$8, "\t",$13, "\t")}' final_stats.txt > temp_sim_stats.txt
  4. Paste bottlenecks and sim stats together:
    
    paste bneck temp_sim_stats | grep -v 'nan' > prior_file.txt
2. Run ABCreg:
  
  ~/ABCreg/reg -T -P 1 -S 7 -p prior_file.txt -d data_file.txt -b bneck -t 0.001

###Coalescent Simulations: This pipeline uses ms to simulate maize domestication backwards in time. Directions to recreate simulations:

Rscript gen_tbs.r
Rscript generate_ms_params.r
source ../create_posterior_ms.sh
Graph bneck.0.post in R

Change parameters in generate_[simtype]_params.r

###"Oh man I can't believe I did that"

The submit million scripts do a for loop twice that qsubs the sim script. The sim script submits an array job from 1-50,000 (max that SGE allows). Each job in the sim script has its own for loop that runs generate_[simtype]_params.r and the rest of the pipeline 10 times. So we have:
```
  2 array jobs x 50,000 jobs per array x 10 simulations per job = 1,000,000 simulations
```

====

Tricks for analysis

data file contains summary stats from observed data. It should only be 1 line. (no headers):
```
  .0054 -0.434
```

(the stats above are Pi and TajD)

prior file contains parameters and summary stats from the simulations. Each line should have the parameter value and the summary stats associated with it (no headers):

  0.835507987765595       65.7986 0.241462
  0.550444400170818       29.6863 -0.276919
  0.212850250769407       42.6132 0.212022
  0.781859162030742       106.241 -0.0995736

(the stats above are theta, Pi and TajD)

make sure you're using the right data. If you're simulating with --annotate N then use stats from noncoding windows.
make sure your stats are actually comparable: msstats gives theta pi per locus. Your data might have theta pi per base pair.
if your graph is crappy, try reducing the number of values in your posterior.
use cut -f [column #] to get a column of a file.

stats currently being compared:

  sims        obs
  1-S*         48-S_rhoMZ
  2-n1*        37-singletonsMZ
  4-theta*     39-ThetaWMZ
  5-pi*        40-ThetaPiMZ
  6-thetaH*    41-ThetaHMZ
  8-tajd      42-TajDMZ
  13-rm       43-rmin

Guide to final_stats (for reference when changing create_posterior.sh):
1 S
2 n1
3 next
4 theta
5 pi
6 thetaH
7 Hprime
8 tajd
9 fulif
10 fulid
11 fulifs
12 fulids
13 rm
14 rmmg
15 nhaps
16 hdiv
17 wallb
18 wallq
19 rosasrf
20 rosasru
21 zns

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
coalescent-sims		coalescent-sims
forward-sims		forward-sims
info		info
.gitignore		.gitignore
README.md		README.md
cleanup.sh		cleanup.sh
create_posterior.sh		create_posterior.sh
pi_dist.txt		pi_dist.txt
simple_rejection.py		simple_rejection.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Contributors 2

Languages

arundurvasula/Maize-Background-Selection-Simulations

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages