Skip to content

Latest commit

 

History

History
146 lines (111 loc) · 6.88 KB

README.md

File metadata and controls

146 lines (111 loc) · 6.88 KB

Ct_pop_gen_project

This repository contains scripts and files used to perform the analyses and make the figures and tables associated with the Caenorhabditis tropicalis population genetics manuscript

  • The "data" folder contains the raw datasets from public database (e.g. CaeNDR)
  • The "plots" folder contains all the figures generated by the scripts
  • The "processed_data" folder contains all the intermediate results files generated during the analysis
  • The "scripts" folder contains all the codes used for generating figures and tables
  • The "tables" folder contains all the tables generated by the scripts




How to use the scripts in this repository to plot figures and generate tables in the manuscript

This section shows the names and order of the scripts used to plot figures and generate tables.
Notes: The "→" reprensents the order of scripts. The “+” means you should use them both/all in that step.


Main Figures & Tables

Figure 1. Global distribution of 518 C. tropicalis isotype reference strains

  • Figure 1 a-f: Plot_isotype_tropicalis_global_map.R
  • Figure 1 g-i: Plot_PCA_vcf.R

Figure 2. The ML tree of 518 C. tropicalis isotype reference strains generated from variants pruned with LD r2 values less than or equal to 0.9

  • Figure 2: Calculate_C_tro_vcf_to_pyh.sh + vcf2phylip.py →
    Calculate_C_tro_LD_pruned_0.9_pyh_to_tree.sh →
    Plot_tree_in_lat_with_label.R
    Notes: The vcf2phylip.py can be found in https://github.com/edgardomortiz/vcf2phylip

Figure 3. Diversity statistic calculated across global C. tropicalis isotypes

  • Figure 3: Plot_tro_pi_theta_d.R

Figure 4. LD decay for all C. tropicalis isotype reference strains across all autosomes

  • Figure 4: Calculate_LD_per_Mb_all.sh + Calculate_LD_per_Mb_geo.sh →
    Plot_LD_geo_all.R

Figure 5. Hyper-variable regions (HVRs) in C. tropicalis

  • Figure 5 b, c: Calculate_HVRs_TajimasD_bed_file.R →
    Calculate_vcf_to_zarr_HVRs.sh →
    Calculate_zarr_to_pi_theta_d_HVRs.sh →
    Plot_HVRs_TajimasD.R

Figure 6. Relatedness and frequency of 518 C. tropicalis isotype reference strains at the Medea regions

  • Figure 6: Calculate_C_tro_Medea_vcf.sh →
    Plot_Medea_vcf2tree.R

Figure 7. Human Impact Index of nematode sampling sites

  • Figure 7: Calculate_download_and_unzip_tif_files.sh →
    Calculate_human_footprint_index.R →
    Plot_human_footprint_index.R

Table 1. Effective population size (Ne) for Caenorhabditis spp. samples from different studies

Table 2. Outcrossing rate of isotype reference strains from different sampling sites

  • Table 2:
    C. tropicalis: Calculate_vcf_to_zarr_geo.sh →
    Calculate_zarr_to_pi_theta_d_geo.sh + Calculate_LD_per_Mb_all.sh + Calculate_LD_per_Mb_geo.sh
    C. elegans: Calculate_Ce_vcf_to_zarr.sh →
    Calculate_Ce_zarr_to_pi_theta_d.sh + Calculate_Ce_LD_per_Mb.sh
    C. briggsae: trim-fq-nf workflow: https://github.com/andersenlab/trim-fq-nf
    alignment-nf workflow: https://github.com/andersenlab/alignment-nf
    wi-gatk workflow: https://github.com/andersenlab/wi-gatk
    Calculate_Cb_vcf_to_zarr.sh →
    Calculate_Cb_zarr_to_pi_theta_d.sh + Calculate_Cb_LD_per_Mb.sh
    Then, use all the output above and this R script to generate the table: Table_isotypes_Ne_Outcrossing rate.R


Supplemental Figures & Tables

Figure S1. Global distribution of 690 C. tropicalis strains

  • Figure S1: Plot_strains_tropicalis_global_map.R

Figure S2. Geographical distance between strains within an isotype

  • Figure S2 a: Plot_Vincenty_geo_Ce.R
  • Figure S2 b: Plot_Vincenty_geo_Ct.R

Figure S3-S5. The ML trees of 518 C. tropicalis isotype reference strains generated from LD-pruned variants with r2 values less than or equal to 0.7-0.9

  • Figure S3-S5: Calculate_C_tro_vcf_to_pyh.sh + vcf2phylip.py →
    Calculate_C_tro_LD_pruned_0.6_pyh_to_tree.sh + Calculate_C_tro_LD_pruned_0.7_pyh_to_tree.sh + Calculate_C_tro_LD_pruned_0.8_pyh_to_tree.sh →
    Plot_tree_in_lat_with_label.R
    Notes: The vcf2phylip.py can be found in https://github.com/edgardomortiz/vcf2phylip

Figure S6. Unrooted maximum likelihood trees of 518 C. tropicalis isotype reference strains generated from LD-pruned variants

  • Figure S6: Plot_equal angle_unrooted_tree.R

Figure S7. Scatter plot shows significant positive correlation between geographic distance and phylogenetic distance

  • Figure S7: Plot_phy_geo.R

Figure S8. Genetic distance (Dxy) correlation with geographic distance and regional variation

  • Figure S8: Calculate_C_tro_dxy_10kb_geo.sh →
    Plot_dxy_geo_change_in_dxy_per_km.R

Figure S9. Pairwise fixation index (Fst) among C. tropicalis from different geographic regions

  • Figure S9: Calculate_C_tro_fst_geo.sh →
    Plot_fst_geo.R

Figure S10. LD decay for all autosomes

  • Figure S10: Calculate_LD_per_Mb_all.sh + Calculate_LD_per_Mb_geo.sh →
    Plot_LD_geo_all_chromosome.R

Figure S11. Species wide (Dxy) of C. tropicalis

  • Figure S11: Calculate_C_tro_dxy_10kb_chunk_I_to_X.sh + Calculate_C_tro_dxy_10kb_chunk_V_1.sh + Calculate_C_tro_dxy_10kb_chunk_V_2.sh →
    Calculate_C_tro_dxy_10kb_stat.sh →
    Plot_dxy_species_wide.R

Figure S12-S16. Relatedness of 518 C. tropicalis isotype reference strains at five different maternal effect regions

  • Figure S12-S16: Plot_TAs_vcf2tree.R

Table S4. Diversity statistic (π, θ, Tajima's D) for all isotype reference strains and for each geographic region

  • Table S4: Calculate_vcf_to_zarr_geo.sh →
    Calculate_zarr_to_pi_theta_d_geo.sh →
    Table_geo_p_theta.R

Table S5. Effective population size (Ne) for all isotype reference strains and for each geographic region except for the two under-sampled regions, Africa and Australia.

  • Table S5: Calculate_vcf_to_zarr_geo.sh →
    Calculate_zarr_to_pi_theta_d_geo.sh + Calculate_LD_per_Mb_all.sh + Calculate_LD_per_Mb_geo.sh →
    Table_isotypes_Ne_Outcrossing rate.R