Add getting started for Python and R.

onnela-lab · Dec 16, 2023 · e6bcdb5 · e6bcdb5
1 parent 9eaf53e
commit e6bcdb5
Show file tree

Hide file tree

Showing 10 changed files with 190 additions and 15 deletions.
diff --git a/.github/workflows/main.yaml b/.github/workflows/main.yaml
@@ -17,6 +17,10 @@ jobs:
         run: docker build -t gptools .
       - name: Verify notebooks are up to date.
         run: docker run --rm -v `pwd`:/workdir gptools pytest -v
+      - name: Render the getting started notebook for Python.
+        run: docker run --rm -v `pwd`:/workdir gptools cook exec getting_started:run
+      - name: Render the getting started Rmarkdown for R.
+        run: docker run --rm -v `pwd`:/workdir gptools cook exec getting_started_R:run
       - name: Generate figures (using ./in-docker.sh but we can't use `-it` in the Action).
         run: docker run --rm -e FAST=true -v `pwd`:/workdir gptools cook exec figures
       - name: Upload figures and reports as artifacts.

diff --git a/Dockerfile b/Dockerfile
@@ -1,6 +1,14 @@
 FROM python:3.10
+# Install R and dependencies.
+RUN apt-get update && apt-get install -y \
+    r-base \
+    r-cran-devtools \
+    && rm -rf /var/lib/apt/lists/*
 WORKDIR /workdir
+COPY setup.R .
+RUN Rscript setup.R
+
+# Install Python dependencies and compile cmdstan.
 COPY requirements.txt .
 RUN pip install --no-cache-dir -r requirements.txt
 RUN python -m cmdstanpy.install_cmdstan --verbose --version 2.33.0
-
diff --git a/README.md b/README.md
@@ -31,28 +31,37 @@ If you switch between containerized and local runtime, you may need to remove co
 
 To ensure the reproducibility of these materials, the results are also computed as the output of a GitHub Action workflow [![Reproduction Materials](https://github.com/onnela-lab/gptools-reproduction-material/actions/workflows/main.yaml/badge.svg)](https://github.com/onnela-lab/gptools-reproduction-material/actions/workflows/main.yaml) with the `FAST=true` flag. Figures can be obtained by selecting a workflow run and downloading the `figures-reports` artifact.
 
+## Getting started
+
+The `getting_started` folder contains a Python notebook and Rmarkdown file to reproduce the results of the "Getting started" sections in the accompanying manuscript. HTML reports can be generated in the `getting_started` folder by running
+
+- `[./in-docker.sh] cook exec getting_started:run` for Python
+- `[./in-docker.sh] cook exec getting_started_R:run` for R
+
+However, the folder is likely most suitable for interactive exploration and experimentation to get familiar with the package.
+
 ## Running the experiments
 
 Figures in the manuscript were generated using the containerized runtime, and all runtime estimates below are based on a 2020 Macbook Pro with M1 chip and 16 GB of memory running macOS 13.4 (22F66). All figures can be reproduced by running `[./in-docker.sh] cook exec figures` (see below for details). Figures are generated by executing Jupyter notebooks stored in markdown format; the notebooks can be opened directly in a standard Jupyter environment using the [`jupytext`](https://jupytext.readthedocs.io/en/latest/) extension. If you prefer, the folder for each experiment also contains a corresponding `*.ipynb` file. To open and use the notebooks, please set up a local computing environment as described above.
 
 ### Applications
 
-The folders `trees` and `tube` contain code and data to reproduce the two applied examples in the manuscript. Each example takes about five minutes to run. The figures can be reproduced by running `[./in-docker.sh] cook exec tube:fig trees:fig` and will be saved in the corresponding folder as png and pdf files.
+The folders `trees` and `tube` contain code and data to reproduce the two applied examples in the manuscript. Each example takes about five minutes to run. The figures can be reproduced by running `[./in-docker.sh] cook exec tube:run trees:run` and will be saved in the corresponding folder as png and pdf files.
 
 ### Profiling experiments
 
-The folder `profile` contains code to reproduce profiling experiments, and running all experiments can take up to ten hours. The profiling figure can be reproduced by running `[./in-docker.sh] cook exec profile:fig`. If a reduced runtime (but more noisy results) are desired, run `FAST=true cook exec profile:fig` which takes about 90 minutes.
+The folder `profile` contains code to reproduce profiling experiments, and running all experiments can take up to ten hours. The profiling figure can be reproduced by running `[./in-docker.sh] cook exec profile:run`. If a reduced runtime (but more noisy results) are desired, run `FAST=true cook exec profile:run` which takes about 90 minutes.
 
 All experiments are seeded for reproducibility, but profiling experiments are subject to variability due to different hardware and competing processes running on the same machine. Despite seeding, results may also vary depending on the operating system and stdlib implementation.
 
 ### Kernel properties and effect of padding
 
-The folders `kernels` and `padding` contain code to reproduce figures on the properties of different kernels and their spectral properties as well as the effect of padding on Fourier methods, respectively. The figures can be reproduced by running `[./in-docker.sh] cook exec kernels:fig padding:fig`; the latter takes about ten minutes to generate.
+The folders `kernels` and `padding` contain code to reproduce figures on the properties of different kernels and their spectral properties as well as the effect of padding on Fourier methods, respectively. The figures can be reproduced by running `[./in-docker.sh] cook exec kernels:run padding:run`; the latter takes about ten minutes to generate.
 
 ## Expected results
 
-- `[./in-docker.sh] cook exec kernels:fig` ![](kernels/kernels.png)
-- `[./in-docker.sh] cook exec padding:fig` ![](padding/padding.png)
-- `[./in-docker.sh] cook exec profile:fig` ![](profile/profile.png)
-- `[./in-docker.sh] cook exec trees:fig` ![](trees/trees.png)
-- `[./in-docker.sh] cook exec tube:fig` ![](tube/tube.png)
+- `[./in-docker.sh] cook exec kernels:run` ![](kernels/kernels.png)
+- `[./in-docker.sh] cook exec padding:run` ![](padding/padding.png)
+- `[./in-docker.sh] cook exec profile:run` ![](profile/profile.png)
+- `[./in-docker.sh] cook exec trees:run` ![](trees/trees.png)
+- `[./in-docker.sh] cook exec tube:run` ![](tube/tube.png)
diff --git a/getting_started/.gitignore b/getting_started/.gitignore
@@ -0,0 +1 @@
+getting_started
diff --git a/getting_started/getting_started.Rmd b/getting_started/getting_started.Rmd
@@ -0,0 +1,32 @@
+Install the packages. We explicitly specify the repos here so we don't get asked for the mirror when
+rendering the Rmarkdown.
+
+```{r}
+install.packages(
+  "cmdstanr",
+  repos = c("https://mc-stan.org/r-packages/", "http://cran.us.r-project.org")
+)
+install.packages("gptoolsStan", repos=c("http://cran.us.r-project.org"))
+```
+
+Compile and run the model.
+
+```{r}
+library(cmdstanr)
+library(gptoolsStan)
+
+model <- cmdstan_model(
+  stan_file="getting_started.stan",
+  include_paths=gptools_include_path(),
+)
+fit <- model$sample(
+  data=list(n=100, sigma=1, length_scale=0.1, period=1),
+  chains=1,
+  iter_warmup=500,
+  iter_sampling=50
+)
+f <- fit$draws("f")
+dim(f)
+```
+
+Expected output: `[1]  50   1 100`
diff --git a/getting_started/getting_started.ipynb b/getting_started/getting_started.ipynb
@@ -0,0 +1,44 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d2b9dac2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    ">>> import cmdstanpy\n",
+    ">>> from gptools.stan import get_include\n",
+    ">>>\n",
+    ">>> model = cmdstanpy.CmdStanModel(\n",
+    "...     stan_file=\"getting_started.stan\",\n",
+    "...     stanc_options={\"include-paths\": get_include()},\n",
+    "... )\n",
+    ">>> fit = model.sample(\n",
+    "...     data = {\"n\": 100, \"sigma\": 1, \"length_scale\": 0.1, \"period\": 1},\n",
+    "...     chains=1,\n",
+    "...     iter_warmup=500,\n",
+    "...     iter_sampling=50,\n",
+    "... )\n",
+    ">>> fit.f.shape"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "34ffcd17",
+   "metadata": {},
+   "source": [
+    "Expected output: `(50, 100)`"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/getting_started/getting_started.md b/getting_started/getting_started.md
@@ -0,0 +1,31 @@
+---
+jupytext:
+  text_representation:
+    extension: .md
+    format_name: myst
+    format_version: 0.13
+    jupytext_version: 1.15.1
+kernelspec:
+  display_name: Python 3 (ipykernel)
+  language: python
+  name: python3
+---
+
+```{code-cell} ipython3
+>>> import cmdstanpy
+>>> from gptools.stan import get_include
+>>>
+>>> model = cmdstanpy.CmdStanModel(
+...     stan_file="getting_started.stan",
+...     stanc_options={"include-paths": get_include()},
+... )
+>>> fit = model.sample(
+...     data = {"n": 100, "sigma": 1, "length_scale": 0.1, "period": 1},
+...     chains=1,
+...     iter_warmup=500,
+...     iter_sampling=50,
+... )
+>>> fit.f.shape
+```
+
+Expected output: `(50, 100)`
diff --git a/getting_started/getting_started.stan b/getting_started/getting_started.stan
@@ -0,0 +1,22 @@
+functions {
+    #include gptools/util.stan
+    #include gptools/fft.stan
+}
+
+data {
+    int n;
+    real<lower=0> sigma, length_scale, period;
+}
+
+transformed data {
+    vector [n %/% 2 + 1] cov_rfft =
+        gp_periodic_exp_quad_cov_rfft(n, sigma, length_scale, period) + 1e-9;
+}
+
+parameters {
+    vector [n] f;
+}
+
+model {
+    f ~ gp_rfft(zeros_vector(n), cov_rfft);
+}
diff --git a/recipe.py b/recipe.py
@@ -91,28 +91,49 @@ def create_profile_task(
     create_profile_task("sample", "fourier_non_centered", 0, 10_000, timeout=300)
 
 
-# Run the notebooks to generate figures.
+# Run the notebooks to generate figures (the booleans indicate if a figure should be generated.)
 figures = []
-for example in ["kernels", "linear", "padding", "profile", "trees", "tube"]:
+examples = {
+    "getting_started": False,
+    "kernels": True,
+    "linear": False,
+    "padding": True,
+    "profile": True,
+    "trees": True,
+    "tube": True,
+}
+for example, has_figure in examples.items():
     ipynb = Path(example, f"{example}.ipynb")
     md = ipynb.with_suffix(".md")
     create_task(f"{example}:nb", dependencies=[md], targets=[ipynb],
                 action=f"jupytext --to notebook {md}")
     targets = [ipynb.with_suffix(".html")]
-    if example != "linear":
-        targets.append(ipynb.with_suffix(".png"))
+    if has_figure:
+        figure = ipynb.with_suffix(".png")
+        targets.append(figure)
+        figures.append(figure)
     task = create_task(
-        f"{example}:fig", dependencies=[ipynb], targets=targets,
+        f"{example}:run", dependencies=[ipynb], targets=targets,
         action=f"jupyter nbconvert --to=html --execute --ExecutePreprocessor.timeout=-1 {ipynb}"
     )
     if example == "profile":
         task.task_dependencies.append(profile_group.task)
-    figures.append(targets[0])
 
 
 # Task that reproduces all outputs.
 create_task("figures", dependencies=figures)
 
+# Add the R example for getting started.
+rmd = "getting_started/getting_started.Rmd"
+html = Path("getting_started/getting_started_R.html")
+action = [
+    "Rscript",
+    "-e",
+    f"rmarkdown::render('{rmd}', output_file = '{html.name}', output_dir='getting_started')"
+]
+create_task(name="getting_started_R:run", dependencies=[rmd], targets=[html],
+            action=action)
+
 
 def delete_compiled_stan_files(_: Task) -> None:
     # Find all Stan files and remove compiled versions if they exist.

diff --git a/setup.R b/setup.R
@@ -0,0 +1,3 @@
+repos <- c("https://mc-stan.org/r-packages/", "http://cran.us.r-project.org")
+install.packages("cmdstanr", repos=repos)
+devtools::install_github("onnela-lab/gptoolsStan")