Skip to content
forked from NovembreLab/feems

Forked repo with an initial goal to add features such as long-range migration.

License

Notifications You must be signed in to change notification settings

VivaswatS/feems

 
 

Repository files navigation

Binder

FEEMS

Fast Estimation of Effective Migration Surfaces (FEEMS) is a python package implementing a statistical method for inferring and visualizing gene-flow in spatial population genetic data, requiring only genotype data and locations of samples.

The FEEMS method and software was developed by Joe Marcus and Wooseok Ha and advised by Rina Foygel Barber and John Novembre. We also used code from Benjamin M. Peter to help construct the spatial graphs.

For details on the method see our publication.

FEEMSmix

This method is still currently under review, so there may be some changes along the way.

Fast Estimation of Effective Migration Surfaces + admixture (FEEMSmix) is built on top of FEEMS, and is a method for representing long-range gene flow on a background migration surface estimated by FEEMS.

The FEEMSmix method and software was developed by Vivaswat Shastry and John Novembre.

Jump to RUNNING FEEMS to get started right away!

IMPORTANT UPDATES (since v1.0)

Please read the following changes carefully as we have updated older functionality (all backward compatible), while introducing new features that will aid in interpretation and visualization.

Change in the default mode of FEEMS

In the original publication above, FEEMS only fit the edge weights on the graph and kept the node-specific parameters (proportional to heterozygosities) fixed at a constant value. However, in the latest version, we change the default functionality to be one in which we fit node-specific parameters under the model. This will increase both model fit and runtime, however, we've found this to be a worthy trade-off, especially when looking for long-range gene flow events on a baseline FEEMS grid. The old functionality can still be used by setting sp_graph.fit(..., optimize_q=None).

This new map of heterozygosity values (roughly proportional to effective population size) can also be visualized across the grid. See getting-started.ipynb for an example of how to plot this and cross-validation.ipynb for more techincal details on this mode.

Inclusion of a shape file with coarser resolution (grid_500.sh*)

If you have samples spanning continental scales, then a coarser grid may be of use to you, so we provide three resolutions (spanning from fine to coarse):

  1. grid_100 (corresponds to res = 6 in dgconstruct from package dggridR with cell area of approx. $6{,}200$ sq km and a cell spacing of $110$ km)
  2. grid_250 (corresponds to res = 5 in dgconstruct with a cell area of approx. $25{,}000$ sq km and a cell spacing of $220$ km)
  3. grid_500 (corresponds to res = 4 in dgconstruct and a cell area of $100{,}000$ sq km with a cell spacing of $440$ km)

You can also create your own shapefile with a custom resolution using this command in R:

dggs <- dgconstruct(
    res = 4,
    precision = 30,
    projection = "ISEA",
    aperture = 4,
    topology = "TRIANGLE"
)

dgearthgrid(dggs, savegrid="PATH/world_triangle_res4.shp")

Note: For finer resolutions (res > 8), this command will take a really long time to save the grid for the entire earth. In these cases, you can subset a region of the world and intersect this region with the triangular grid, see an example R script (thanks to @karolisr)

Rule of thumb: For the density of the grid, it is a balance between finer resolutions and runtime: a good place to start is a resolution in which most individuals sampled as part of a single sampling population/unit get assigned to a unique deme.

The outer extent/boundary of the grid can be constructed using this tool: https://www.keene.edu/campus/maps/tool/. First, draw a polyline around your samples in this tool, and then, just copy paste the latitude and longitudes from the right-hand panel into an outer.csv file.

Inclusion of a flag to avoid wrapping locations around America

Previously, FEEMS employed a trick to place points in Western Alaska on the 'other side' of the International Date Line, i.e., on the 'same side' of the map as the North American landmass (only an issue with some projections) using a translated flag. This threshold longitude was set to a default of $-40$ (to represent the North Ameirican landmass contiguously).

Now, we provide an option in prepare_graph_inputs(..., translated=True, wrap_longitude=-40) to turn the flag off AND choose a threshold longitude for flexibility.

Inclusion of a plotting script using ggplot2

We also now have a script that can plot the FEEMS map in R using ggplot2 (thanks to @LukeAndersonTrocme). First, you will need to export the relevant graph attributes in python (after FEEMS/FEEMSmix has been fit) using these commands:

# write the relevant edge weights out into a csv file
np.savetxt('edgew.csv', np.vstack((np.array(sp_graph.edges).T, sp_graph.w)).T, delimiter=',')

# write the deme coordinates + sample size (node attributes) out into a csv file
np.savetxt('nodepos.csv', np.vstack((sp_graph.node_pos.T, [sp_graph.nodes[n]['n_samples'] for n in range(len(sp_graph.nodes))])).T, delimiter=',')

# (if using FEEMSmix, print the MLE source & admix. prop.)
contour_df = sp_graph.calc_joint_contour(...) 
print(contour_df.iloc[np.argmax(contour_df['scaled log-lik'])])

and then you can use this script ggplot_feems.R to plot both the baseline FEEMS weights and the long-range edges from FEEMSmix in R. With the figure in ggplot2 format, you will also be able to add any extra features as you see fit (e.g., sample locations, ecological gradients, topographical features, etc.)

For example, see the figure produced from this script here.

Inclusion of spatial prediction as a feature

With FEEMSmix, we also provide the functionality of predicting the location of samples on a migration surface estimated with FEEMS. We observe comparable performance to another state-of-the-art deep learning method called Locator (Battey et al 2020) and believe it could be a useful tool for the spatial population genetics community. This functionality can be found in miscellaneous-functions.ipynb.

Visualization of admixture pies on FEEMS map

We provide a basic function to overlay admixture proportions from an admixture/STRUCTURE-like model as pie charts on an underlying FEEMS map. This has proven useful as a visualization tool when interpreting both FEEMS and FEEMSmix results (especially, the latter). See example code in miscellaneous-functions.ipynb.

Visualization of model fits with PCA and admixture (Alexander et al 2009)

We provide functions to visualize fits of two widely-used models (Principal Components Analysis and admixture) to the observed genetic data and plot the outliers on a geographic map (akin to a FEEMSmix analysis). For PCA, we compute the principal components in-house, whereas for admixture, we ask you provide the .P & .Q matrices for different $K$ values. This functionality can be found in miscellaneous-functions.ipynb.

INSTALLATION

Note: MS Windows users will struggle to install FEEMS directly in a Windows environment because at least one of the dependencies does not have a Windows port. A virtual Linux machine should be preferable if you are on a Windows machine.

Quick start using bioconda (not available for FEEMSmix)

Typically, the simplest way to get started with FEEMS is to install Anaconda or Miniconda or mamba, then install FEEMS using the Bioconda recipe:

conda install -c bioconda feems -c conda-forge

See the next section for alternative ways to install FEEMS, or if "conda install" worked for you, skip ahead to "Running FEEMS".

Alternative installation instructions (Python >=3.8)

As an alternative way to get started, setup a conda environment:

conda create -n=feems_e python=3.12
conda activate feems_e

Some of the plotting utilities in the FEEMS package require geos as a dependency which can be installed on a Mac with brew as follows:

brew install geos

If you are on a Windows machine, you can install geos using:

conda install -c conda-forge geos

Then, you can install further dependencies using:

conda install -c conda-forge scikit-sparse suitesparse 
conda install -c conda-forge cartopy

Jupyter and jupyterlab are also needed to explore the example notebooks but are not required for the functioning of the FEEMS/FEEMSmix package (you could also use brew, if you want it to be available outside the environment):

pip install notebook

Once the conda environment has been setup with these dependencies, we can install FEEMS/FEEMSmix:

pip install git+https://github.com/VivaswatS/feems

You can also install FEEMS/FEEMSmix locally by:

git clone https://github.com/VivaswatS/feems
cd feems/
pip install .

NOTE: Some users have reported a compatibility error arising at this step with the installation of shapely v1.7.1 (specificed in requirements.txt). If this arises, recreate the feems_e conda environment, and run pip install shapely --no-binary shapely before the pip install feems command above.

Another alternative using the .yml file

Another easy option is also to use the feems.yml file from this repo as a blueprint for the installation and setup of the appropriate conda environment.

You can download the file onto your computer and simply run in the appropriate directory:

conda env create -f feems.yml
conda activate feems_e

This will create an environment called feems_e which will contain all the dependencies and FEEMS/FEEMSmix.

RUNNING FEEMS

To help get your analysis started (and to verify appropriate installation), we provide an example workflow in the getting-started.ipynb notebook. The notebook analyzes empirical data from North American gray wolves populations published in Schweizer et al. 2015.

An example workflow using a λ value estimated from a cross-validation procedure is highlighted in cross-validation.ipynb. We recommend using this procedure in choosing an appropriate λ value for the fit.

NOTE: We have not tested working with plink == v2.0 format for input files.

RUNNING FEEMSmix

Since FEEMSmix is built on top of FEEMS, this analysis will start where the previous section left off (i.e., after the initial FEEMS fit). We will also use the data from North American gray wolves to illustrate the working of this method, scroll down the getting-started.ipynb to see example analyses.

About

Forked repo with an initial goal to add features such as long-range migration.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 87.6%
  • OpenEdge ABL 11.2%
  • Python 1.0%
  • R 0.1%
  • HiveQL 0.1%
  • Batchfile 0.0%