diff --git a/tutorials/11-probabilistic-pca/index.qmd b/tutorials/11-probabilistic-pca/index.qmd index 258177abb..cb25bc93c 100755 --- a/tutorials/11-probabilistic-pca/index.qmd +++ b/tutorials/11-probabilistic-pca/index.qmd @@ -278,12 +278,12 @@ Another way to put it: 2 dimensions is enough to capture the main structure of t A direct question arises from above practice is: how many principal components do we want to keep, in order to sufficiently represent the latent structure in the data? This is a very central question for all latent factor models, i.e. how many dimensions are needed to represent that data in the latent space. In the case of PCA, there exist a lot of heuristics to make that choice. -For example, We can tune the number of principal components using empirical methods such as cross-validation based some criteria such as MSE between the posterior predicted (e.g. mean predictions) data matrix and the original data matrix or the percentage of variation explained [3]. +For example, We can tune the number of principal components using empirical methods such as cross-validation based some criteria such as MSE between the posterior predicted (e.g. mean predictions) data matrix and the original data matrix or the percentage of variation explained [^3]. For p-PCA, this can be done in an elegant and principled way, using a technique called *Automatic Relevance Determination* (ARD). -ARD can help pick the correct number of principal directions by regularizing the solution space using a parameterized, data-dependent prior distribution that effectively prunes away redundant or superfluous features [4]. +ARD can help pick the correct number of principal directions by regularizing the solution space using a parameterized, data-dependent prior distribution that effectively prunes away redundant or superfluous features [^4]. Essentially, we are using a specific prior over the factor loadings $\mathbf{W}$ that allows us to prune away dimensions in the latent space. The prior is determined by a precision hyperparameter $\alpha$. Here, smaller values of $\alpha$ correspond to more important components. -You can find more details about this in e.g. [5]. +You can find more details about this in, for example, Bishop (2006) [^5]. ```{julia} @model function pPCA_ARD(X)