Skip to content

Commit

Permalink
Add getting started for Python and R.
Browse files Browse the repository at this point in the history
  • Loading branch information
tillahoffmann committed Dec 16, 2023
1 parent 9eaf53e commit e6bcdb5
Show file tree
Hide file tree
Showing 10 changed files with 190 additions and 15 deletions.
4 changes: 4 additions & 0 deletions .github/workflows/main.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,10 @@ jobs:
run: docker build -t gptools .
- name: Verify notebooks are up to date.
run: docker run --rm -v `pwd`:/workdir gptools pytest -v
- name: Render the getting started notebook for Python.
run: docker run --rm -v `pwd`:/workdir gptools cook exec getting_started:run
- name: Render the getting started Rmarkdown for R.
run: docker run --rm -v `pwd`:/workdir gptools cook exec getting_started_R:run
- name: Generate figures (using ./in-docker.sh but we can't use `-it` in the Action).
run: docker run --rm -e FAST=true -v `pwd`:/workdir gptools cook exec figures
- name: Upload figures and reports as artifacts.
Expand Down
10 changes: 9 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,14 @@
FROM python:3.10
# Install R and dependencies.
RUN apt-get update && apt-get install -y \
r-base \
r-cran-devtools \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /workdir
COPY setup.R .
RUN Rscript setup.R

# Install Python dependencies and compile cmdstan.
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
RUN python -m cmdstanpy.install_cmdstan --verbose --version 2.33.0

25 changes: 17 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,28 +31,37 @@ If you switch between containerized and local runtime, you may need to remove co

To ensure the reproducibility of these materials, the results are also computed as the output of a GitHub Action workflow [![Reproduction Materials](https://github.com/onnela-lab/gptools-reproduction-material/actions/workflows/main.yaml/badge.svg)](https://github.com/onnela-lab/gptools-reproduction-material/actions/workflows/main.yaml) with the `FAST=true` flag. Figures can be obtained by selecting a workflow run and downloading the `figures-reports` artifact.

## Getting started

The `getting_started` folder contains a Python notebook and Rmarkdown file to reproduce the results of the "Getting started" sections in the accompanying manuscript. HTML reports can be generated in the `getting_started` folder by running

- `[./in-docker.sh] cook exec getting_started:run` for Python
- `[./in-docker.sh] cook exec getting_started_R:run` for R

However, the folder is likely most suitable for interactive exploration and experimentation to get familiar with the package.

## Running the experiments

Figures in the manuscript were generated using the containerized runtime, and all runtime estimates below are based on a 2020 Macbook Pro with M1 chip and 16 GB of memory running macOS 13.4 (22F66). All figures can be reproduced by running `[./in-docker.sh] cook exec figures` (see below for details). Figures are generated by executing Jupyter notebooks stored in markdown format; the notebooks can be opened directly in a standard Jupyter environment using the [`jupytext`](https://jupytext.readthedocs.io/en/latest/) extension. If you prefer, the folder for each experiment also contains a corresponding `*.ipynb` file. To open and use the notebooks, please set up a local computing environment as described above.

### Applications

The folders `trees` and `tube` contain code and data to reproduce the two applied examples in the manuscript. Each example takes about five minutes to run. The figures can be reproduced by running `[./in-docker.sh] cook exec tube:fig trees:fig` and will be saved in the corresponding folder as png and pdf files.
The folders `trees` and `tube` contain code and data to reproduce the two applied examples in the manuscript. Each example takes about five minutes to run. The figures can be reproduced by running `[./in-docker.sh] cook exec tube:run trees:run` and will be saved in the corresponding folder as png and pdf files.

### Profiling experiments

The folder `profile` contains code to reproduce profiling experiments, and running all experiments can take up to ten hours. The profiling figure can be reproduced by running `[./in-docker.sh] cook exec profile:fig`. If a reduced runtime (but more noisy results) are desired, run `FAST=true cook exec profile:fig` which takes about 90 minutes.
The folder `profile` contains code to reproduce profiling experiments, and running all experiments can take up to ten hours. The profiling figure can be reproduced by running `[./in-docker.sh] cook exec profile:run`. If a reduced runtime (but more noisy results) are desired, run `FAST=true cook exec profile:run` which takes about 90 minutes.

All experiments are seeded for reproducibility, but profiling experiments are subject to variability due to different hardware and competing processes running on the same machine. Despite seeding, results may also vary depending on the operating system and stdlib implementation.

### Kernel properties and effect of padding

The folders `kernels` and `padding` contain code to reproduce figures on the properties of different kernels and their spectral properties as well as the effect of padding on Fourier methods, respectively. The figures can be reproduced by running `[./in-docker.sh] cook exec kernels:fig padding:fig`; the latter takes about ten minutes to generate.
The folders `kernels` and `padding` contain code to reproduce figures on the properties of different kernels and their spectral properties as well as the effect of padding on Fourier methods, respectively. The figures can be reproduced by running `[./in-docker.sh] cook exec kernels:run padding:run`; the latter takes about ten minutes to generate.

## Expected results

- `[./in-docker.sh] cook exec kernels:fig` ![](kernels/kernels.png)
- `[./in-docker.sh] cook exec padding:fig` ![](padding/padding.png)
- `[./in-docker.sh] cook exec profile:fig` ![](profile/profile.png)
- `[./in-docker.sh] cook exec trees:fig` ![](trees/trees.png)
- `[./in-docker.sh] cook exec tube:fig` ![](tube/tube.png)
- `[./in-docker.sh] cook exec kernels:run` ![](kernels/kernels.png)
- `[./in-docker.sh] cook exec padding:run` ![](padding/padding.png)
- `[./in-docker.sh] cook exec profile:run` ![](profile/profile.png)
- `[./in-docker.sh] cook exec trees:run` ![](trees/trees.png)
- `[./in-docker.sh] cook exec tube:run` ![](tube/tube.png)
1 change: 1 addition & 0 deletions getting_started/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
getting_started
32 changes: 32 additions & 0 deletions getting_started/getting_started.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
Install the packages. We explicitly specify the repos here so we don't get asked for the mirror when
rendering the Rmarkdown.

```{r}
install.packages(
"cmdstanr",
repos = c("https://mc-stan.org/r-packages/", "http://cran.us.r-project.org")
)
install.packages("gptoolsStan", repos=c("http://cran.us.r-project.org"))
```

Compile and run the model.

```{r}
library(cmdstanr)
library(gptoolsStan)
model <- cmdstan_model(
stan_file="getting_started.stan",
include_paths=gptools_include_path(),
)
fit <- model$sample(
data=list(n=100, sigma=1, length_scale=0.1, period=1),
chains=1,
iter_warmup=500,
iter_sampling=50
)
f <- fit$draws("f")
dim(f)
```

Expected output: `[1] 50 1 100`
44 changes: 44 additions & 0 deletions getting_started/getting_started.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "d2b9dac2",
"metadata": {},
"outputs": [],
"source": [
">>> import cmdstanpy\n",
">>> from gptools.stan import get_include\n",
">>>\n",
">>> model = cmdstanpy.CmdStanModel(\n",
"... stan_file=\"getting_started.stan\",\n",
"... stanc_options={\"include-paths\": get_include()},\n",
"... )\n",
">>> fit = model.sample(\n",
"... data = {\"n\": 100, \"sigma\": 1, \"length_scale\": 0.1, \"period\": 1},\n",
"... chains=1,\n",
"... iter_warmup=500,\n",
"... iter_sampling=50,\n",
"... )\n",
">>> fit.f.shape"
]
},
{
"cell_type": "markdown",
"id": "34ffcd17",
"metadata": {},
"source": [
"Expected output: `(50, 100)`"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
31 changes: 31 additions & 0 deletions getting_started/getting_started.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
jupytext:
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.15.1
kernelspec:
display_name: Python 3 (ipykernel)
language: python
name: python3
---

```{code-cell} ipython3
>>> import cmdstanpy
>>> from gptools.stan import get_include
>>>
>>> model = cmdstanpy.CmdStanModel(
... stan_file="getting_started.stan",
... stanc_options={"include-paths": get_include()},
... )
>>> fit = model.sample(
... data = {"n": 100, "sigma": 1, "length_scale": 0.1, "period": 1},
... chains=1,
... iter_warmup=500,
... iter_sampling=50,
... )
>>> fit.f.shape
```

Expected output: `(50, 100)`
22 changes: 22 additions & 0 deletions getting_started/getting_started.stan
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
functions {
#include gptools/util.stan
#include gptools/fft.stan
}

data {
int n;
real<lower=0> sigma, length_scale, period;
}

transformed data {
vector [n %/% 2 + 1] cov_rfft =
gp_periodic_exp_quad_cov_rfft(n, sigma, length_scale, period) + 1e-9;
}

parameters {
vector [n] f;
}

model {
f ~ gp_rfft(zeros_vector(n), cov_rfft);
}
33 changes: 27 additions & 6 deletions recipe.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,28 +91,49 @@ def create_profile_task(
create_profile_task("sample", "fourier_non_centered", 0, 10_000, timeout=300)


# Run the notebooks to generate figures.
# Run the notebooks to generate figures (the booleans indicate if a figure should be generated.)
figures = []
for example in ["kernels", "linear", "padding", "profile", "trees", "tube"]:
examples = {
"getting_started": False,
"kernels": True,
"linear": False,
"padding": True,
"profile": True,
"trees": True,
"tube": True,
}
for example, has_figure in examples.items():
ipynb = Path(example, f"{example}.ipynb")
md = ipynb.with_suffix(".md")
create_task(f"{example}:nb", dependencies=[md], targets=[ipynb],
action=f"jupytext --to notebook {md}")
targets = [ipynb.with_suffix(".html")]
if example != "linear":
targets.append(ipynb.with_suffix(".png"))
if has_figure:
figure = ipynb.with_suffix(".png")
targets.append(figure)
figures.append(figure)
task = create_task(
f"{example}:fig", dependencies=[ipynb], targets=targets,
f"{example}:run", dependencies=[ipynb], targets=targets,
action=f"jupyter nbconvert --to=html --execute --ExecutePreprocessor.timeout=-1 {ipynb}"
)
if example == "profile":
task.task_dependencies.append(profile_group.task)
figures.append(targets[0])


# Task that reproduces all outputs.
create_task("figures", dependencies=figures)

# Add the R example for getting started.
rmd = "getting_started/getting_started.Rmd"
html = Path("getting_started/getting_started_R.html")
action = [
"Rscript",
"-e",
f"rmarkdown::render('{rmd}', output_file = '{html.name}', output_dir='getting_started')"
]
create_task(name="getting_started_R:run", dependencies=[rmd], targets=[html],
action=action)


def delete_compiled_stan_files(_: Task) -> None:
# Find all Stan files and remove compiled versions if they exist.
Expand Down
3 changes: 3 additions & 0 deletions setup.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
repos <- c("https://mc-stan.org/r-packages/", "http://cran.us.r-project.org")
install.packages("cmdstanr", repos=repos)
devtools::install_github("onnela-lab/gptoolsStan")

0 comments on commit e6bcdb5

Please sign in to comment.