This repository contains scripts and files used to perform the analyses and make the figures and tables associated with the Caenorhabditis tropicalis population genetics manuscript
- The "data" folder contains the raw datasets from public database (e.g. CaeNDR)
- The "plots" folder contains all the figures generated by the scripts
- The "processed_data" folder contains all the intermediate results files generated during the analysis
- The "scripts" folder contains all the codes used for generating figures and tables
- The "tables" folder contains all the tables generated by the scripts
This section shows the names and order of the scripts used to plot figures and generate tables.
Notes: The "→" reprensents the order of scripts. The “+” means you should use them both/all in that step.
- Figure 1 a-f: Plot_isotype_tropicalis_global_map.R
- Figure 1 g-i: Plot_PCA_vcf.R
Figure 2. The ML tree of 518 C. tropicalis isotype reference strains generated from variants pruned with LD r2 values less than or equal to 0.9
- Figure 2: Calculate_C_tro_vcf_to_pyh.sh + vcf2phylip.py →
Calculate_C_tro_LD_pruned_0.9_pyh_to_tree.sh →
Plot_tree_in_lat_with_label.R
Notes: The vcf2phylip.py can be found in https://github.com/edgardomortiz/vcf2phylip
- Figure 3: Plot_tro_pi_theta_d.R
- Figure 4: Calculate_LD_per_Mb_all.sh + Calculate_LD_per_Mb_geo.sh →
Plot_LD_geo_all.R
- Figure 5 b, c: Calculate_HVRs_TajimasD_bed_file.R →
Calculate_vcf_to_zarr_HVRs.sh →
Calculate_zarr_to_pi_theta_d_HVRs.sh →
Plot_HVRs_TajimasD.R
Figure 6. Relatedness and frequency of 518 C. tropicalis isotype reference strains at the Medea regions
- Figure 6: Calculate_C_tro_Medea_vcf.sh →
Plot_Medea_vcf2tree.R
- Figure 7: Calculate_download_and_unzip_tif_files.sh →
Calculate_human_footprint_index.R →
Plot_human_footprint_index.R
- Table 1:
C. tropicalis: Calculate_vcf_to_zarr_geo.sh →
Calculate_zarr_to_pi_theta_d_geo.sh
C. elegans: Calculate_Ce_vcf_to_zarr.sh →
Calculate_Ce_zarr_to_pi_theta_d.sh
C. briggsae: trim-fq-nf workflow: https://github.com/andersenlab/trim-fq-nf →
alignment-nf workflow: https://github.com/andersenlab/alignment-nf →
wi-gatk workflow: https://github.com/andersenlab/wi-gatk →
Calculate_Cb_vcf_to_zarr.sh →
Calculate_Cb_zarr_to_pi_theta_d.sh
Then, use all the output above and this R script to generate the table: Table_isotypes_Ne_Outcrossing rate.R
- Table 2:
C. tropicalis: Calculate_vcf_to_zarr_geo.sh →
Calculate_zarr_to_pi_theta_d_geo.sh + Calculate_LD_per_Mb_all.sh + Calculate_LD_per_Mb_geo.sh
C. elegans: Calculate_Ce_vcf_to_zarr.sh →
Calculate_Ce_zarr_to_pi_theta_d.sh + Calculate_Ce_LD_per_Mb.sh
C. briggsae: trim-fq-nf workflow: https://github.com/andersenlab/trim-fq-nf →
alignment-nf workflow: https://github.com/andersenlab/alignment-nf →
wi-gatk workflow: https://github.com/andersenlab/wi-gatk →
Calculate_Cb_vcf_to_zarr.sh →
Calculate_Cb_zarr_to_pi_theta_d.sh + Calculate_Cb_LD_per_Mb.sh
Then, use all the output above and this R script to generate the table: Table_isotypes_Ne_Outcrossing rate.R
- Figure S1: Plot_strains_tropicalis_global_map.R
- Figure S2 a: Plot_Vincenty_geo_Ce.R
- Figure S2 b: Plot_Vincenty_geo_Ct.R
Figure S3-S5. The ML trees of 518 C. tropicalis isotype reference strains generated from LD-pruned variants with r2 values less than or equal to 0.7-0.9
- Figure S3-S5: Calculate_C_tro_vcf_to_pyh.sh + vcf2phylip.py →
Calculate_C_tro_LD_pruned_0.6_pyh_to_tree.sh + Calculate_C_tro_LD_pruned_0.7_pyh_to_tree.sh + Calculate_C_tro_LD_pruned_0.8_pyh_to_tree.sh →
Plot_tree_in_lat_with_label.R
Notes: The vcf2phylip.py can be found in https://github.com/edgardomortiz/vcf2phylip
Figure S6. Unrooted maximum likelihood trees of 518 C. tropicalis isotype reference strains generated from LD-pruned variants
- Figure S6: Plot_equal angle_unrooted_tree.R
Figure S7. Scatter plot shows significant positive correlation between geographic distance and phylogenetic distance
- Figure S7: Plot_phy_geo.R
- Figure S8: Calculate_C_tro_dxy_10kb_geo.sh →
Plot_dxy_geo_change_in_dxy_per_km.R
- Figure S9: Calculate_C_tro_fst_geo.sh →
Plot_fst_geo.R
- Figure S10: Calculate_LD_per_Mb_all.sh + Calculate_LD_per_Mb_geo.sh →
Plot_LD_geo_all_chromosome.R
- Figure S11: Calculate_C_tro_dxy_10kb_chunk_I_to_X.sh + Calculate_C_tro_dxy_10kb_chunk_V_1.sh + Calculate_C_tro_dxy_10kb_chunk_V_2.sh →
Calculate_C_tro_dxy_10kb_stat.sh →
Plot_dxy_species_wide.R
Figure S12-S16. Relatedness of 518 C. tropicalis isotype reference strains at five different maternal effect regions
- Figure S12-S16: Plot_TAs_vcf2tree.R
Table S4. Diversity statistic (π, θ, Tajima's D) for all isotype reference strains and for each geographic region
- Table S4: Calculate_vcf_to_zarr_geo.sh →
Calculate_zarr_to_pi_theta_d_geo.sh →
Table_geo_p_theta.R
Table S5. Effective population size (Ne) for all isotype reference strains and for each geographic region except for the two under-sampled regions, Africa and Australia.
- Table S5: Calculate_vcf_to_zarr_geo.sh →
Calculate_zarr_to_pi_theta_d_geo.sh + Calculate_LD_per_Mb_all.sh + Calculate_LD_per_Mb_geo.sh →
Table_isotypes_Ne_Outcrossing rate.R