From e6b900e5356d00f474aa394089ad5a16b11cac1a Mon Sep 17 00:00:00 2001
From: Aki Vehtari <aki.vehtari@aalto.fi>
Date: Fri, 15 Nov 2024 18:33:55 +0200
Subject: [PATCH 1/2] Add section about Pathfinder diagnostic and using for
 inits

---
 src/reference-manual/pathfinder.qmd | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/src/reference-manual/pathfinder.qmd b/src/reference-manual/pathfinder.qmd
index 39a07b6b4..1cc7f4533 100644
--- a/src/reference-manual/pathfinder.qmd
+++ b/src/reference-manual/pathfinder.qmd
@@ -25,3 +25,29 @@ evaluations, with greater reductions for more challenging posteriors.
 While the evaluations in @zhang_pathfinder:2022 found that
 single-path and multi-path Pathfinder outperform ADVI for most of the models in the PosteriorDB evaluation set,
 we recognize the need for further experiments on a wider range of models.
+
+## Diagnosing Pathfinder
+
+Pathfinder diagnoses the accuracy of the approximation by computing the density ratio of the true posterior and 
+the approximation and using Pareto-$\hat{k}$ diagnostic (Vehtari et al., 2024) to assess whether these ratios can
+be used to improve the approximation via resmapling. /, the
+normalization for the posterior can be  estimated reliably (Section 3, Vehtari et al., 2024), which is the
+first requirement for reliable resampling.  If estimated Pareto-$\hat{k}$ for the ratios is smaller than 0.7,
+there is still need to further diagnose importance sampling estimates by taking into account also the expetant
+function (Section 2.2, Vehtari et al., 2024). If estimated Pareto-$\hat{k}$ is larger than 0.7, then the 
+estimate for the normalization is unreliable and any Mote Carlo estimate may have a big error. The resampled draws
+can still contain some useful information about the location and shape of the posterior which can be used in early
+parts of Bayesian workflow (Gelman et al, 2020).
+
+## Using Pathfinder for initializing MCMC
+
+If estimated Pareto-$\hat{k}$ for the ratios is smaller than 0.7, the resampled posterior draws are almost as
+good for initializing MCMC as would indepepent draws from the posterior be. If estimated Pareto-$\hat{k}$ for the 
+ratios is larger than 0.7, the Pathfinder draws are not reliable for posterior inference directly, but they are still 
+very likely better for initializing MCMC than random draws from an arbitrary pre-defined distribution (e.g. uniform from 
+-2 to 2 used by Stan by default). If Pareto-$\hat{k}$ is larger than 0.7, it is likely that one of the ratios is much bigger
+than others and the default resampling with replacement would produce copies of one unique draw. For initializing several
+Markov chains, it is better to use resampling without replacement to guarantee unique initialization for each chain. At the
+moment Stan allows turning off the resampling completely, and then the resampling without replacement can be done outside of
+Stan.
+

From 6a2e556ad4fc0683e78193c52c7c84ad5ca9a760 Mon Sep 17 00:00:00 2001
From: Aki Vehtari <Aki.Vehtari@aalto.fi>
Date: Sat, 16 Nov 2024 17:07:40 +0200
Subject: [PATCH 2/2] several fixes

---
 src/bibtex/all.bib                  | 23 +++++++++++++++++++++++
 src/reference-manual/pathfinder.qmd | 21 ++++++++++-----------
 2 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/src/bibtex/all.bib b/src/bibtex/all.bib
index aa7eb4975..1aae02559 100644
--- a/src/bibtex/all.bib
+++ b/src/bibtex/all.bib
@@ -1845,3 +1845,26 @@ @article{Timonen+etal:2023:ODE-PSIS
   pages = {e614} 
 }
 
+@article{Vehtari+etal:2024:PSIS,
+  author  = {Aki Vehtari and Daniel Simpson and Andrew Gelman and Yuling Yao and Jonah Gabry},
+  title   = {Pareto smoothed importance sampling},
+  journal = {Journal of Machine Learning Research},
+  year    = {2024},
+  volume  = {25},
+  number  = {72},
+  pages   = {1--58}
+}
+
+@article{Gelman:etal:2020:workflow,
+  title={Bayesian workflow},
+  author={Gelman, Andrew and Vehtari, Aki and Simpson, Daniel and Margossian, Charles C and Carpenter, Bob and Yao, Yuling and Kennedy, Lauren and Gabry, Jonah and B{\"u}rkner, Paul-Christian and Modr{\'a}k, Martin},
+  journal={arXiv preprint arXiv:2011.01808},
+  year={2020}
+}
+
+@article{Magnusson+etal:2024:posteriordb,
+  title={posteriordb: Testing, benchmarking and developing {Bayesian} inference algorithms},
+  author={Magnusson, M{\aa}ns and Torgander, Jakob and B{\"u}rkner, Paul-Christian and Zhang, Lu and Carpenter, Bob and Vehtari, Aki},
+  journal={arXiv preprint arXiv:2407.04967},
+  year={2024}
+}
\ No newline at end of file
diff --git a/src/reference-manual/pathfinder.qmd b/src/reference-manual/pathfinder.qmd
index 1cc7f4533..436254623 100644
--- a/src/reference-manual/pathfinder.qmd
+++ b/src/reference-manual/pathfinder.qmd
@@ -4,7 +4,7 @@ pagetitle: Pathfinder
 
 # Pathfinder
 
-Stan supports the Pathfinder algorithm @zhang_pathfinder:2022.
+Stan supports the Pathfinder algorithm [@zhang_pathfinder:2022].
 Pathfinder is a variational method for approximately
 sampling from differentiable log densities.  Starting from a random
 initialization, Pathfinder locates normal approximations to the target
@@ -22,27 +22,26 @@ the problem of L-BFGS getting stuck at local optima or in saddle points on plate
 Compared to ADVI and short dynamic HMC runs, Pathfinder
 requires one to two orders of magnitude fewer log density and gradient
 evaluations, with greater reductions for more challenging posteriors.
-While the evaluations in @zhang_pathfinder:2022 found that
-single-path and multi-path Pathfinder outperform ADVI for most of the models in the PosteriorDB evaluation set,
+While the evaluations by @zhang_pathfinder:2022 found that
+single-path and multi-path Pathfinder outperform ADVI for most of the models in the PosteriorDB [@Magnusson+etal:2024:posteriordb] evaluation set,
 we recognize the need for further experiments on a wider range of models.
 
 ## Diagnosing Pathfinder
 
 Pathfinder diagnoses the accuracy of the approximation by computing the density ratio of the true posterior and 
-the approximation and using Pareto-$\hat{k}$ diagnostic (Vehtari et al., 2024) to assess whether these ratios can
-be used to improve the approximation via resmapling. /, the
-normalization for the posterior can be  estimated reliably (Section 3, Vehtari et al., 2024), which is the
+the approximation and using Pareto-$\hat{k}$ diagnostic [@Vehtari+etal:2024:PSIS] to assess whether these ratios can
+be used to improve the approximation via resampling. The
+normalization for the posterior can be  estimated reliably [@Vehtari+etal:2024:PSIS, Section 3], which is the
 first requirement for reliable resampling.  If estimated Pareto-$\hat{k}$ for the ratios is smaller than 0.7,
-there is still need to further diagnose importance sampling estimates by taking into account also the expetant
-function (Section 2.2, Vehtari et al., 2024). If estimated Pareto-$\hat{k}$ is larger than 0.7, then the 
-estimate for the normalization is unreliable and any Mote Carlo estimate may have a big error. The resampled draws
+there is still need to further diagnose reliability of importance sampling estimate for all quantities of interest [@Vehtari+etal:2024:PSIS, Section 2.2]. If estimated Pareto-$\hat{k}$ is larger than 0.7, then the 
+estimate for the normalization is unreliable and any Monte Carlo estimate may have a big error. The resampled draws
 can still contain some useful information about the location and shape of the posterior which can be used in early
-parts of Bayesian workflow (Gelman et al, 2020).
+parts of Bayesian workflow [@Gelman:etal:2020:workflow].
 
 ## Using Pathfinder for initializing MCMC
 
 If estimated Pareto-$\hat{k}$ for the ratios is smaller than 0.7, the resampled posterior draws are almost as
-good for initializing MCMC as would indepepent draws from the posterior be. If estimated Pareto-$\hat{k}$ for the 
+good for initializing MCMC as would independent draws from the posterior be. If estimated Pareto-$\hat{k}$ for the 
 ratios is larger than 0.7, the Pathfinder draws are not reliable for posterior inference directly, but they are still 
 very likely better for initializing MCMC than random draws from an arbitrary pre-defined distribution (e.g. uniform from 
 -2 to 2 used by Stan by default). If Pareto-$\hat{k}$ is larger than 0.7, it is likely that one of the ratios is much bigger