diff --git a/tutorials/11-probabilistic-pca/index.qmd b/tutorials/11-probabilistic-pca/index.qmd
index 258177abb..cb25bc93c 100755
--- a/tutorials/11-probabilistic-pca/index.qmd
+++ b/tutorials/11-probabilistic-pca/index.qmd
@@ -278,12 +278,12 @@ Another way to put it: 2 dimensions is enough to capture the main structure of t
 A direct question arises from above practice is: how many principal components do we want to keep, in order to sufficiently represent the latent structure in the data?
 This is a very central question for all latent factor models, i.e. how many dimensions are needed to represent that data in the latent space.
 In the case of PCA, there exist a lot of heuristics to make that choice.
-For example, We can tune the number of principal components using empirical methods such as cross-validation based some criteria such as MSE between the posterior predicted (e.g. mean predictions) data matrix and the original data matrix or the percentage of variation explained [3].
+For example, We can tune the number of principal components using empirical methods such as cross-validation based some criteria such as MSE between the posterior predicted (e.g. mean predictions) data matrix and the original data matrix or the percentage of variation explained [^3].
 
 For p-PCA, this can be done in an elegant and principled way, using a technique called *Automatic Relevance Determination* (ARD).
-ARD can help pick the correct number of principal directions by regularizing the solution space using a parameterized, data-dependent prior distribution that effectively prunes away redundant or superfluous features [4].
+ARD can help pick the correct number of principal directions by regularizing the solution space using a parameterized, data-dependent prior distribution that effectively prunes away redundant or superfluous features [^4].
 Essentially, we are using a specific prior over the factor loadings $\mathbf{W}$ that allows us to prune away dimensions in the latent space. The prior is determined by a precision hyperparameter $\alpha$. Here, smaller values of $\alpha$ correspond to more important components.
-You can find more details about this in e.g. [5].
+You can find more details about this in, for example, Bishop (2006) [^5].
 
 ```{julia}
 @model function pPCA_ARD(X)