Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add macos 13 workflow #39

Merged
merged 6 commits into from
Aug 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/workflows/macos.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,3 +41,6 @@ jobs:
pip install -r requirements.txt
pip install -r requirements_tests.txt
pip install .
- name: Test import
run: |
python -c "import spaceprime"
46 changes: 46 additions & 0 deletions .github/workflows/macos13.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
on:
push:
branches:
- main
- master
pull_request:
branches:
- main
- master

name: macOS build
jobs:
test-macOS:
runs-on: ${{ matrix.os }}
name: ${{ matrix.os }} (${{ matrix.python-version}})

strategy:
fail-fast: false
matrix:
os: ["macos-13"]
python-version: ["3.11"]

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version}}
- name: Install GDAL
run: |
brew install gdal
- name: Test GDAL installation
run: |
gdalinfo --version
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install --no-cache-dir Cython
pip install -r requirements.txt
pip install -r requirements_tests.txt
pip install .
- name: Test import
run: |
python -c "import spaceprime"
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@

This page provides an overview of the package and its main features. For more detailed information, please refer to the [documentation](https://connor-french.github.io/spaceprime).

**Note for R users**: *spaceprime* is coded in Python, yet many interested users may come from an R background. I have a [*spaceprime* for R users]() vignette (COMING SOON) that provides a brief introduction to the Python concepts necessary to use *spaceprime* in a practical walk-through of an example analysis. Additionally, it is possible to use Python code in R using the *reticulate* package. For more information on how to use Python code in R, see the [reticulate documentation](https://rstudio.github.io/reticulate/).
**Note for R users**: *spaceprime* is coded in Python, yet many interested users may come from an R background. I have a [*spaceprime* for R users](spaceprime-for-r-users.qmd) vignette that provides a brief introduction to the Python concepts necessary to use *spaceprime* in a practical walk-through of an example analysis. Additionally, it is possible to use Python code in R using the *reticulate* package. For more information on how to use Python code in R, see the [reticulate documentation](https://rstudio.github.io/reticulate/).

## Main features
*spaceprime* includes a number of features:
Expand Down
4 changes: 2 additions & 2 deletions docs/_quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ website:
pinned: true
#logo: logo.png
left:
- text: "Guide"
file: overview.qmd
#- text: "Guide"
# file: overview.qmd
- text: "Reference"
file: reference/index.qmd
- text: Vignettes
Expand Down
27 changes: 16 additions & 11 deletions docs/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ format:

::: {.callout-note}
## Note for R users
*spaceprime* is implemented in Python, yet many interested users may come from an R background. I have a [*spaceprime* for R users]() vignette (COMING SOON) that provides a brief introduction to the Python concepts necessary to use *spaceprime* through a practical walk-through of an example analysis. If you still want to use R, it is possible to use Python code in R using the *reticulate* package. For more information on how to use Python code in R, see the [reticulate documentation](https://rstudio.github.io/reticulate/).
*spaceprime* is implemented in Python, yet many interested users may come from an R background. I have a [*spaceprime* for R users](spaceprime-for-r-users.qmd) vignette that provides a brief introduction to the Python concepts necessary to use *spaceprime* through a practical walk-through of an example analysis. If you still want to use R, it is possible to use Python code in R using the *reticulate* package. For more information on how to use Python code in R, see the [reticulate documentation](https://rstudio.github.io/reticulate/).
:::

## Main features
Expand Down Expand Up @@ -102,7 +102,7 @@ Make sure to install the relevant analysis dependencies with `pip install spacep
## Quickstart

::: {.callout-tip}
This quickstart guide assumes you have a basic understanding of Python. If you are an R user, please refer to the [spaceprime for R Users vignette](https://connor-french.github.io/spaceprime) for an overview of *spaceprime* with the necessary Python concepts explained.
This quickstart guide assumes you have a basic understanding of Python. If you are an R user, please refer to the [spaceprime for R users vignette](spaceprime-for-r-users.qmd) for an overview of *spaceprime* with the necessary Python concepts explained.
:::

### 1. Download data
Expand Down Expand Up @@ -137,7 +137,7 @@ plt.tight_layout()
plt.show()
```

The GeoJSON file contains geographic localities of *P. distincta* in the Brazilian Atlantic Forest, along with metadata about each locality. Here are the localities plotted on top of the present-day habitat suitability model:
The GeoJSON file contains geographic sampling localities of *P. distincta* in the Brazilian Atlantic Forest, along with metadata about each locality. Each row is a single individual/observation. *spaceprime* counts the number of observations with coordinates that overlap with a raster cell/deme and samples the calculated number for simulations and summary statistics. Here are the localities plotted on top of the present-day habitat suitability model:

```{python}
#| label: localities
Expand All @@ -161,22 +161,28 @@ plt.show()

### 2. Read in packages and data

Now that we have our data, let's read in the packages and data we'll be using. The [GeoPandas](https://geopandas.org/en/stable/) and [Rasterio](https://rasterio.readthedocs.io/en/stable/index.html) packages are dependencies of *spaceprime*, so you shouldn't need to install them separately. They are needed for reading in locality data and habitat suitability rasters, respectively. Make sure to replace the `projections.tif` file path with the path to the file on your system:
Now that we have our data, let's read in the packages and data we'll be using. The [GeoPandas](https://geopandas.org/en/stable/) and [Rasterio](https://rasterio.readthedocs.io/en/stable/index.html) packages are dependencies of *spaceprime*, so you shouldn't need to install them separately. They are needed for reading in locality data and habitat suitability rasters, respectively.

```{python}
#| label: import-data
#| label: import-packages
#| eval: false

import spaceprime as sp
import geopandas as gpd
import rasterio

```

Make sure to replace the `projections.tif` file path with the path to the file on your system. The GeoJSON file is read in from the web, so you don't need to download it.

```{python}
#| label: read-data
#| eval: false

r = rasterio.open("projections.tif")
locs = gpd.read_file("https://raw.githubusercontent.com/connor-french/spaceprime/main/spaceprime/data/localities.geojson")
```


### 3. Set up the demographic model

Next, we'll convert the habitat suitability values into deme sizes, so each cell in the raster will represent a deme in our model. We'll use a linear transformation to convert the suitability values to deme sizes, where the suitability value is multiplied by a constant to get the deme size. The constant is the maximum local deme size, which we set to 1000. For more on transformations, see the [suitability to deme size transformation functions vignette](transformation-functions.qmd).
Expand Down Expand Up @@ -239,7 +245,7 @@ sp.plot_model(demo, r, 0)
```


### 4. Simulate genetic data
### 5. Simulate genetic data

Before simulating this demography, we need to create a sample dictionary that translates the empirical sampling localities to the model's deme indices and maps those to the number of samples to take from each deme. By default, this function sets the number of individuals to sample from each deme to the number of empirical localities in that deme. The `coords_to_sample_dict()` function also returns two other dictionaries that are not used in this example, so we'll ignore them.

Expand All @@ -250,7 +256,7 @@ Before simulating this demography, we need to create a sample dictionary that tr
sample_dict, _, _ = sp.coords_to_sample_dict(r, locs)
```

Now we get to simulate! The first task is to simulate the ancestry of the samples using the coalescent. All of the hard work is done through `msprime`'s `sim_ancestry()` function, for which `spaceprime` provides a convenience wrapper. This function returns a [tskit TreeSequence](https://tskit.dev/tskit/docs/stable/python-api.html#trees-and-tree-sequences), which "represents a sequence of correlated evolutionary trees along a genome" and is and incredibly powerful and compact data representation for population genomic analyses. The minimum number of arguments required for this function are the sample dictionary and the demographic model. The sequence length is also necessary for overlaying mutations in the next step. Notice the lack of mutations in the table. We'll set `record_provenance` to False to decrease the memory overhead of storing a bunch of metadata about the simulation.
Now we get to simulate! The first task is to simulate the ancestry of the samples using the coalescent. All of the hard work is done through `msprime`'s `sim_ancestry()` function, for which `spaceprime` provides a convenience wrapper. This function returns a [tskit TreeSequence](https://tskit.dev/tskit/docs/stable/python-api.html#trees-and-tree-sequences), which "represents a sequence of correlated evolutionary trees along a genome" and is an incredibly powerful and compact data representation for population genomic analyses. The minimum number of arguments required for this function are the sample dictionary and the demographic model. If you need to overlay mutations, you need to supply the sequence length. Notice the lack of mutations in the table. We'll set `record_provenance` to False to decrease the memory overhead of storing a bunch of metadata about the simulation.

This step may take a minute or so to run.

Expand All @@ -264,7 +270,7 @@ print(sim)
```


We'll take a peak at a single tree from the TreeSequence object to see what it looks like. The `draw_tree()` function is a convenience function that uses `msprime`'s `draw_text()` function to plot a single tree from the TreeSequence object. Here, I removed the node labels because there are tons of nodes that crowd the plot and we're only interested in the tree structure.
We'll take a peak at a single tree from the TreeSequence object to see what it looks like. The `draw_svg()` method plots trees from the TreeSequence object. Here, I selected a single tree and removed the node labels because there are tons of nodes that crowd the plot and we're only interested in the tree structure.

```{python}
#| label: draw-tree
Expand All @@ -282,7 +288,6 @@ Overlaying mutations after simulating ancestry isn't necessary for calculating g
#| label: mutations
#| eval: true


sim = sp.sim_mutations(sim, rate=1e-10, random_seed=490)

print(sim)
Expand All @@ -305,7 +310,7 @@ first_tree_mut.draw_svg(y_axis=True, size=(600, 400), node_labels=node_labels)
From here, you have a few options. You can:

- Use the `analysis` module to calculate genetic summary statistics on the TreeSequence object. For more on the `analysis` module, see the [analysis module documentation]().
- Save the TreeSequence to use later or analyze on a platform like [tskit](https://tskit.dev/tskit/docs/stable/introduction.html) with `sim.dump(file/path/to/write/to)`.
- Save the TreeSequence to use later or analyze on a platform like [tskit](https://tskit.dev/tskit/docs/stable/introduction.html) with `sim.dump(file/path/to/write/to.trees)`.
- Convert the TreeSequence with mutations to a genotype matrix for use in a program like [scikit-allel](https://scikit-allel.readthedocs.io/en/stable/) with `sim.genotype_matrix()`. For more information on this function, see the [tskit documentation](https://tskit.dev/tskit/docs/stable/python-api.html#tskit.TreeSequence.genotype_matrix).
- Export the TreeSequence with mutations to a VCF file using `sim.write_vcf`. For more information on how to use this function, see the [tskit documentation](https://tskit.dev/tskit/docs/stable/python-api.html#tskit.TreeSequence.write_vcf).

Expand Down
Loading
Loading