create r doc and few index changes

connor-french · Aug 8, 2024 · 048274d · 048274d
1 parent 44dd925
commit 048274d
Show file tree

Hide file tree

Showing 2 changed files with 359 additions and 11 deletions.
diff --git a/docs/index.qmd b/docs/index.qmd
@@ -22,7 +22,7 @@ format:
 
 ::: {.callout-note}
 ## Note for R users
-*spaceprime* is implemented in Python, yet many interested users may come from an R background. I have a [*spaceprime* for R users]() vignette (COMING SOON) that provides a brief introduction to the Python concepts necessary to use *spaceprime* through a practical walk-through of an example analysis. If you still want to use R, it is possible to use Python code in R using the *reticulate* package. For more information on how to use Python code in R, see the [reticulate documentation](https://rstudio.github.io/reticulate/).  
+*spaceprime* is implemented in Python, yet many interested users may come from an R background. I have a [*spaceprime* for R users](spaceprime-for-r-users.qmd) vignette that provides a brief introduction to the Python concepts necessary to use *spaceprime* through a practical walk-through of an example analysis. If you still want to use R, it is possible to use Python code in R using the *reticulate* package. For more information on how to use Python code in R, see the [reticulate documentation](https://rstudio.github.io/reticulate/).  
 :::
 
 ## Main features
@@ -102,7 +102,7 @@ Make sure to install the relevant analysis dependencies with `pip install spacep
 ## Quickstart
 
 ::: {.callout-tip}
-This quickstart guide assumes you have a basic understanding of Python. If you are an R user, please refer to the [spaceprime for R Users vignette](https://connor-french.github.io/spaceprime) for an overview of *spaceprime* with the necessary Python concepts explained.
+This quickstart guide assumes you have a basic understanding of Python. If you are an R user, please refer to the [spaceprime for R users vignette](spaceprime-for-r-users.qmd) for an overview of *spaceprime* with the necessary Python concepts explained.
 :::
 
 ### 1. Download data
@@ -137,7 +137,7 @@ plt.tight_layout()
 plt.show()
 ```
 
-The GeoJSON file contains geographic localities of *P. distincta* in the Brazilian Atlantic Forest, along with metadata about each locality. Here are the localities plotted on top of the present-day habitat suitability model:  
+The GeoJSON file contains geographic sampling localities of *P. distincta* in the Brazilian Atlantic Forest, along with metadata about each locality. Each row is a single individual/observation. *spaceprime* counts the number of observations with coordinates that overlap with a raster cell/deme and samples the calculated number for simulations and summary statistics. Here are the localities plotted on top of the present-day habitat suitability model:  
 
 ```{python}
 #| label: localities
@@ -161,22 +161,28 @@ plt.show()
 
 ### 2. Read in packages and data
 
-Now that we have our data, let's read in the packages and data we'll be using. The [GeoPandas](https://geopandas.org/en/stable/) and [Rasterio](https://rasterio.readthedocs.io/en/stable/index.html) packages are dependencies of *spaceprime*, so you shouldn't need to install them separately. They are needed for reading in locality data and habitat suitability rasters, respectively. Make sure to replace the `projections.tif` file path with the path to the file on your system:  
+Now that we have our data, let's read in the packages and data we'll be using. The [GeoPandas](https://geopandas.org/en/stable/) and [Rasterio](https://rasterio.readthedocs.io/en/stable/index.html) packages are dependencies of *spaceprime*, so you shouldn't need to install them separately. They are needed for reading in locality data and habitat suitability rasters, respectively. 
 
 ```{python}
-#| label: import-data
+#| label: import-packages
 #| eval: false
 
 import spaceprime as sp
 import geopandas as gpd
 import rasterio
 
+```
+
+Make sure to replace the `projections.tif` file path with the path to the file on your system. The GeoJSON file is read in from the web, so you don't need to download it.   
+
+```{python}
+#| label: read-data
+#| eval: false
 
 r = rasterio.open("projections.tif")
 locs = gpd.read_file("https://raw.githubusercontent.com/connor-french/spaceprime/main/spaceprime/data/localities.geojson")
 ```
 
-
 ### 3. Set up the demographic model
 
 Next, we'll convert the habitat suitability values into deme sizes, so each cell in the raster will represent a deme in our model. We'll use a linear transformation to convert the suitability values to deme sizes, where the suitability value is multiplied by a constant to get the deme size. The constant is the maximum local deme size, which we set to 1000. For more on transformations, see the [suitability to deme size transformation functions vignette](transformation-functions.qmd).  
@@ -239,7 +245,7 @@ sp.plot_model(demo, r, 0)
 ```
 
 
-### 4. Simulate genetic data
+### 5. Simulate genetic data
 
 Before simulating this demography, we need to create a sample dictionary that translates the empirical sampling localities to the model's deme indices and maps those to the number of samples to take from each deme. By default, this function sets the number of individuals to sample from each deme to the number of empirical localities in that deme. The `coords_to_sample_dict()` function also returns two other dictionaries that are not used in this example, so we'll ignore them.  
 
@@ -250,7 +256,7 @@ Before simulating this demography, we need to create a sample dictionary that tr
 sample_dict, _, _ = sp.coords_to_sample_dict(r, locs)
 ```
 
-Now we get to simulate! The first task is to simulate the ancestry of the samples using the coalescent. All of the hard work is done through `msprime`'s `sim_ancestry()` function, for which `spaceprime` provides a convenience wrapper. This function returns a [tskit TreeSequence](https://tskit.dev/tskit/docs/stable/python-api.html#trees-and-tree-sequences), which "represents a sequence of correlated evolutionary trees along a genome" and is and incredibly powerful and compact data representation for population genomic analyses. The minimum number of arguments required for this function are the sample dictionary and the demographic model. The sequence length is also necessary for overlaying mutations in the next step. Notice the lack of mutations in the table. We'll set `record_provenance` to False to decrease the memory overhead of storing a bunch of metadata about the simulation.  
+Now we get to simulate! The first task is to simulate the ancestry of the samples using the coalescent. All of the hard work is done through `msprime`'s `sim_ancestry()` function, for which `spaceprime` provides a convenience wrapper. This function returns a [tskit TreeSequence](https://tskit.dev/tskit/docs/stable/python-api.html#trees-and-tree-sequences), which "represents a sequence of correlated evolutionary trees along a genome" and is an incredibly powerful and compact data representation for population genomic analyses. The minimum number of arguments required for this function are the sample dictionary and the demographic model. If you need to overlay mutations, you need to supply the sequence length. Notice the lack of mutations in the table. We'll set `record_provenance` to False to decrease the memory overhead of storing a bunch of metadata about the simulation.  
 
 This step may take a minute or so to run.  
 
@@ -264,7 +270,7 @@ print(sim)
 ```
 
 
-We'll take a peak at a single tree from the TreeSequence object to see what it looks like. The `draw_tree()` function is a convenience function that uses `msprime`'s `draw_text()` function to plot a single tree from the TreeSequence object. Here, I removed the node labels because there are tons of nodes that crowd the plot and we're only interested in the tree structure.  
+We'll take a peak at a single tree from the TreeSequence object to see what it looks like. The `draw_svg()` method plots trees from the TreeSequence object. Here, I selected a single tree and removed the node labels because there are tons of nodes that crowd the plot and we're only interested in the tree structure.  
 
 ```{python}
 #| label: draw-tree
@@ -282,7 +288,6 @@ Overlaying mutations after simulating ancestry isn't necessary for calculating g
 #| label: mutations
 #| eval: true
 
-
 sim = sp.sim_mutations(sim, rate=1e-10, random_seed=490)
 
 print(sim)
@@ -305,7 +310,7 @@ first_tree_mut.draw_svg(y_axis=True, size=(600, 400), node_labels=node_labels)
 From here, you have a few options. You can:  
 
 - Use the `analysis` module to calculate genetic summary statistics on the TreeSequence object. For more on the `analysis` module, see the [analysis module documentation]().  
-- Save the TreeSequence to use later or analyze on a platform like [tskit](https://tskit.dev/tskit/docs/stable/introduction.html) with `sim.dump(file/path/to/write/to)`.  
+- Save the TreeSequence to use later or analyze on a platform like [tskit](https://tskit.dev/tskit/docs/stable/introduction.html) with `sim.dump(file/path/to/write/to.trees)`.  
 - Convert the TreeSequence with mutations to a genotype matrix for use in a program like [scikit-allel](https://scikit-allel.readthedocs.io/en/stable/) with `sim.genotype_matrix()`. For more information on this function, see the [tskit documentation](https://tskit.dev/tskit/docs/stable/python-api.html#tskit.TreeSequence.genotype_matrix).  
 - Export the TreeSequence with mutations to a VCF file using `sim.write_vcf`. For more information on how to use this function, see the [tskit documentation](https://tskit.dev/tskit/docs/stable/python-api.html#tskit.TreeSequence.write_vcf).