bayesflow-org · eodole · Oct 29, 2024 · Oct 29, 2024 · Oct 29, 2024
diff --git a/examples/r_notebook/twomoons_r.Rmd b/examples/r_notebook/twomoons_r.Rmd
@@ -5,7 +5,7 @@ date: "2024-09-24"
 output: html_document
 ---
 
-## 1.1 Reticulate Library
+## Reticulate Library
 
 Using Python packages in R is relatively simple. The [reticulate](https://rstudio.github.io/reticulate/) package allows users to use Python packages directly from R, with more detailed instructions on how its used found on their github link above. Before starting, bayesflow needs to be installed following the installation instructions and there should be a virtual environment within which bayesflow is installed. 
 
@@ -28,34 +28,25 @@ Next point reticulate to the correct virtual environment where bayesflow is stor
 # use_condaenv("/path_to_environment/environment_name")
 
 #use_condaenv("/Users/marvin/miniforge3/envs/bayesflow_r")
-use_condaenv("/Users/eodole/miniforge3/envs/bfv2")
+use_condaenv("/Users/eodole/miniforge3/envs/bfv3")
 ```
 
 
-
-
 Then call the bayesflow module into the R session and give it an alias
 
 ```{r}
 bf <- import("bayesflow")
 keras <- import("keras")
 ```
 
-
-Then we can set a local seed for reproducibility, although the results will not necessarily be the same as in Python. 
-
-```{r}
-set.seed(2023)
-```
-
-One of the many quirks of using bayesflow with R is type matching. Many errors arise from variable types being passed from R to Python in an unexpected format. To try and avoid these errors it's important to familiarlize yourself with the Python equivalents of R variable types. 
+One of the many quirks of using bayesflow with R is type matching. Many errors arise from variable types being passed from R to Python in an unexpected format. To try and avoid these errors it's important to familiarize yourself with the Python equivalents of R variable types. 
 Below is a handy conversion guide from Posit
 
 <!-- ![Link to posit](/Users/eodole/Desktop/HiWi Stelle/Software/R/rcheatsheet.png) -->
 
 [Posit Quick Guide](https://raw.githubusercontent.com/rstudio/cheatsheets/main/reticulate.pdf)
 
-## 1.2  Libraries 
+## Libraries 
 For your convenience all additional  R libraries are loaded here, and then mentioned again specifically where they are used during the tutorial.
 
 ```{r}
@@ -72,7 +63,7 @@ For your convenience all additional  R libraries are loaded here, and then menti
 *need to somehow set the backend to tensorflow*
 
 
-## 2.1 Simulator 
+## Simulator 
 
 As in the python example, this example will demonstrate amortized estimation of a Bayesian model, whose posterior evaluated at the origin $x = (0,0)$ of the "data" will resemble two crescent moons. The forward process is a noisy non-linear transformation on a 2D plane:
 
@@ -81,7 +72,9 @@ $$ x_1 = -|\theta_1 + \theta_2| / \sqrt{2} + r \cos(\alpha) + 0.25 $$
 $$ x_2 = (-\theta_1 + \theta_2) / \sqrt{2} + r \sin(\alpha) $$
 with $x=(x_1, x_2)$ playing the role of "observables" (data to be learned from),$\alpha \sim Uniform(-\pi/2, \pi/2)$ , and $r \sim N(0.1, 0.01)$ being latent variables creating noise in the data, and being the parameters that we will later seek to infer from new $x$.
 
-*insert more text* 
+We set their priors to 
+
+$$ \theta_1, \theta_2 \sim \Uniform (-1, 1) $$
 
 
 ```{r}
@@ -102,119 +95,120 @@ theta_prior <- function(...) {
 
 ```
 
-```{r eval=FALSE}
-# Original
-forward_model <- function(...) {
-  theta <- theta_prior()
-  alpha <- alpha_prior()
-  r <- r_prior()
-  x_1 <- abs(theta[1] + theta[2]) / sqrt(2) + r * cos(alpha) + 0.25
-  x_2 <- ( theta[2] - theta[1] ) / sqrt(2) + r * sin(alpha)
-  # return(list(x1 = x_1 ,x2 = x_2)) # working
-  return(list(x = np_array(c(x_1, x_2)), theta = np_array(c(theta[1], theta[2])), a = alpha, r = r)) # in order for a variable to have more than one direction we need to return it as a numpy array
-}
-```
 
-```{r eval=FALSE}
-# Test 1 
-forward_model <- function(...) {
-  # Need to explicitly make all output into floats 
-  theta <- as.float(theta_prior())
-  alpha <- as.float(alpha_prior())
-  r <- as.float(r_prior())
-  x_1 <- as.float(abs(theta[1] + theta[2]) / sqrt(2) + r * cos(alpha) + 0.25)
-  x_2 <- as.float((theta[2] - theta[1] ) / sqrt(2) + r * sin(alpha))
-  # return(list(x1 = x_1 ,x2 = x_2)) # working
-  return(list(x = np_array(c(x_1, x_2)), theta = np_array(c(theta[1], theta[2])), a = alpha, r = r)) # in order for a variable to have more than one direction we need to return it as a numpy array
-}
-```
+This model is typically used for benchmarking simulation-based inference (SBI) methods (see https://arxiv.org/pdf/2101.04653) and any method for amortized Bayesian inference should be capable of recovering the two moons posterior without using a gazillion simulations. Note, that this is a considerably harder task than modeling the common unconditional two moons data set used often in the context of normalizing flows.
+
+BayesFlow offers many ways to define your data generating process but here the sampling function is  defined as a single `forward_model`, which allows all the functions related to R to stay in R without causing problems with type conversions to python. Additionally all the functions use the `...` argument which allow additional arguments to be passed to these functions. This was done in order to avoid unused argument errors, as the unused arguments passed by python are then simply ignored. 
 
 ```{r }
-# Test 2 
+# Original
 forward_model <- function(...) {
   theta <- theta_prior()
   alpha <- alpha_prior()
   r <- r_prior()
   x_1 <- abs(theta[1] + theta[2]) / sqrt(2) + r * cos(alpha) + 0.25
   x_2 <- ( theta[2] - theta[1] ) / sqrt(2) + r * sin(alpha)
-  # in order for a variable to have more than one direction we need to return it as a numpy array
-  # we also need to explicitly define each output as having type "float32" since R used "float64 as default and tensorflow want float 32 
-  return(list(x = np_array(c(x_1, x_2), dtype="float32"),
-              theta = np_array(c(theta[1], theta[2]),dtype="float32"),
-              a = np_array(alpha, dtype="float32"),
-              r = np_array(r, dtype="float32"))) 
+  # return(list(x1 = x_1 ,x2 = x_2)) # working
+  return(list(x = np_array(c(x_1, x_2)), theta = np_array(c(theta[1], theta[2])), a = alpha, r = r)) # in order for a variable to have more than one direction we need to return it as a numpy array
 }
 ```
 
-Notice here that the sampling function is eventually defined as a single `forward_model`, which allows all the functions related to R to stay in R without causing problems with type conversions to python. Additionally all the functions use the `...` argument which allow additional arguments to be passed to these functions. This was done in order to avoid unused argument errors, as the unused arguments passed by python are then simply ignored. 
+The `forward_model` should then return all variables of interest, and here we've choosen to return all variables, although not all are used.  
 
+Next define our simulator using the previously defined sampling function and generate some data. 
 
 ```{r}
 simulator <- bf$simulators$LambdaSimulator(sample_fn = r_to_py(forward_model))
 sample_data <- simulator$sample(r_to_py(c(4L))) # notice here we need to specify the batch size as an integer
 ```
 
-#3.1 Data Adapter 
+```{r}
+sample_data
+```
+
+
+# Adapter 
+
+The next step is to tell BayesFlow how to deal with all the simulated variables. You may also think of this as informing BayesFlow about the data flow, i.e., which variables go into which network and what transformations needs to be performed prior to passing the simulator outputs into the networks. This is done via an adapter layer, which is implemented as a sequence of fixed, pseudo-invertible data transforms. 
+
+This adapter can be built either manually as shown in the python tutorial or automatically as show here, using the `build_adapter` method. 
+
+For this example, we want to learn the posterior distribution $p(\theta|x)$ so we *infer* $\theta$, *conditioning* on $x$. 
 
 
 ```{r error=TRUE}
-data_adapter <- bf$ContinuousApproximator$build_data_adapter(
-  inference_variables = list("theta"),
+adapter <- bf$ContinuousApproximator$build_adapter(
+  inference_variables = list("theta"), #notice the keys need to be passed as a string inside of a list. 
   inference_conditions = list("x"),
 )
 ```
 
-This step can cause problems, essentially if you get an error saying that inference variables is not found, then the 
 
+## Dataset 
+
+For this example, we will sample our training data ahead of time and use offline training with a `bf$datasets$OfflineDataset`.
 
-## 4 Data Set 
+This makes the training process faster, since we avoid repeated sampling. If you want to use online training, you can use an `OnlineDataset` analogously, or just pass your simulator directly to `approximator$fit()`!
 
 ```{r}
-# works
-num_training_batches <- 2L
-num_validation_batches <- 2L
+num_training_batches <- 512L
+num_validation_batches <- 512L
 batch_size <- 4L
+epochs <- 30L
+total_steps <- num_training_batches * epochs
+
 ```
 
 ```{r}
-# works
 training_samples <- simulator$sample(r_to_py(num_training_batches * batch_size))
 validation_samples <- simulator$sample(r_to_py(num_validation_batches * batch_size))
 ```
 
 ```{r}
-print(training_samples)
+training_dataset <- bf$datasets$OfflineDataset(training_samples, batch_size = batch_size, adapter = adapter)
+validation_dataset <- bf$datasets$OfflineDataset(validation_samples, batch_size = batch_size,adapter = adapter)
 ```
 
-```{r}
-training_dataset <- bf$datasets$OfflineDataset(training_samples, batch_size = batch_size, data_adapter = data_adapter)
-validation_dataset <- bf$datasets$OfflineDataset(validation_samples, batch_size = batch_size, data_adapter = data_adapter)
-```
 
+## 5 Training a neural network to approximate all posteriors 
+
+The next step is to set up the neural network that will approximate the posterior $p(\theta | x)$ 
 
+We choose *Flow Matching* [1, 2] as the backbone architecture for this example, as it can deal well with the multimodal nature of the posteriors that some observables imply.
 
+    [1] Lipman, Y., Chen, R. T., Ben-Hamu, H., Nickel, M., & Le, M. Flow Matching for Generative Modeling. In The Eleventh International Conference on Learning Representations.
+
+    [2] Wildberger, J. B., Dax, M., Buchholz, S., Green, S. R., Macke, J. H., & Schölkopf, B. Flow Matching for Scalable Simulation-Based Inference. In Thirty-seventh Conference on Neural Information Processing Systems.
 
-## 5 Training a neural network to approximate all posteriors 
 
 ```{r}
 inference_network <- bf$networks$FlowMatching(
   subnet = "mlp",
-  subnet_kwargs = list(depth = 6L, width = 256L),
+  subnet_kwargs = list(widths = rep(256L,6), dropout = 0), # the width of each layer is 256 and the depth is 6
 )
 ```
 
+This inference network is just a general Flow Matching backbone, not yet adapted to the specific inference task at hand (i.e., posterior appproximation). To achieve this adaptation, we combine the network with our adapter, which together form an approximator. In this case, we need a `ContinuousApproximator` since the target we want to approximate is the posterior of the continuous parameter vector $\theta$.
+
 
 ```{r}
 approximator <- bf$ContinuousApproximator(
   inference_network = inference_network,
-  data_adapter = data_adapter,
+  adapter =adapter,
 )
 ```
 
-### 5.1 Optimizer and Learning Rate 
+### Optimizer and Learning Rate 
+
+We find learning rate schedules, such as cosine decay, work well for a wide variety of approximation tasks.
 
 ```{r}
-learning_rate <- 1e-4
+initial_learning_rate <- 1e-4
+learning_rate <- keras$optimizers$schedules$CosineDecay(
+  initial_learning_rate = initial_learning_rate, 
+  decay_steps = total_steps, 
+  alpha = 1e-8
+)
 optimizer <- keras$optimizers$Adam(learning_rate = learning_rate)
 ```
 
@@ -224,12 +218,14 @@ approximator$compile(optimizer = optimizer)
 
 
 # 5.2 Training 
+We are ready to train our deep posterior approximator on the two moons example. We pass the dataset object to the `fit` method and watch as Bayesflow trains.
 
-```{r eval=FALSE}
+```{r }
+system.time(
 history <- approximator$fit(
-  epochs = 3L,
+  epochs = epochs,
   dataset = training_dataset,
   validation_data = validation_dataset
-)
+))
 ```