Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extend gpdfit to return marginal posterior of k #186 #188

Closed
wants to merge 3 commits into from
Closed

extend gpdfit to return marginal posterior of k #186 #188

wants to merge 3 commits into from

Conversation

ozanstats
Copy link

@ozanstats ozanstats commented Oct 10, 2021

Addresses #186: @jgabry @avehtari

Extended gpdfit() to return the marginal posterior density of k in addition to k-hat and sigma-hat. We can also create a separate function but this seemed like a better approach to me.

  1. Updated the profile log-likelihood calculation by using an outer product (%o%) instead of vapply, which made it more efficient.
# switched to
k <- matrixStats::rowMeans2(log1p(-theta %o% x))
N * (log(-theta / k) - k - 1)

# instead of
a <- -theta
k <- vapply(-a, FUN = function(a_i) mean(log1p(a_i * x)), FUN.VALUE = numeric(1))
N * (log(-a / k) - k - 1)
  1. Dropped the lx() function and moved its updated content to gpdfit() to avoid repeated calculations because the code example in the issue thread actually computes the k values twice:
# first (within the `lx()` function)
l_theta <- N * lx(theta, x) # profile log-lik

# second 
ks <- numeric(length=M)
for (i in 1:M) {
  ks[i] <- mean.default(log1p(-theta[i] * x))
}
  1. Following the notation of Zhang, J., and Stephens, M. A. (2009), suggested new names for the gpdfit() output components:
# switched to
nlist(k_hat, sigma_hat, k, w_theta)

# instead of
nlist(k, sigma)

This name change shouldn't really cause backward compatibility issues but it's not that important anyway. I will use whatever you believe is more appropriate.

Copy link
Collaborator

@avehtari avehtari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I'm ok with the name changes.

Based on the discussion in the issue thread, we still need to decide whether we want to return normalized quadrature weights or normalized densities.

@ozanstats
Copy link
Author

@avehtari - I can update the function with normalization term estimation using the trapezoidal rule once you decide what to return. Below are some options:

# normalization term estimation with the trapezoidal rule
c <- 2 / sum((w_theta[-M] + w_theta[-1]) * (k[-M] - k[-1]))

# normalized densities
d_theta <- c * w_theta

# some return options:
nlist(k_hat, sigma_hat, k, w_theta) # normalized quadrature weights
nlist(k_hat, sigma_hat, k, d_theta) # normalized densities
nlist(k_hat, sigma_hat, k, w_theta, c) # normalized quadrature weights + density normalization term
nlist(k_hat, sigma_hat, k, w_theta, d_theta) # normalized quadrature weights + normalized densities

@ozanstats
Copy link
Author

I noticed a small side effect of this recent commit. That is, when the estimation doesn't work, k-hat and sigma-hat are not necessarily Inf and NaN, respectively:

set.seed(147)
x <- exp(rnorm(10))

# CRAN version (without the commit)
gpdfit(x, sort_x = FALSE)
$k
[1] Inf

$sigma
[1] NaN

# GitHub version on master (with the commit)
$k
[1] NA

$sigma
[1] NA

Should we avoid NAs and return always Inf and NaN here? I can also suppress the NaNs produced warnings that occur in these cases.

@avehtari
Copy link
Collaborator

Should we avoid NAs and return always Inf and NaN here?

I would prefer khat=Inf when the failure is due to extremely long tail. At least in the cases I've seen Inf appearing, the tail was very long.

@jgabry
Copy link
Member

jgabry commented Oct 11, 2021

Thanks for the PR!

Should we avoid NAs and return always Inf and NaN here?

Yeah I think Inf for k and and NaN for sigma would be better, thanks!

Based on the discussion in the issue thread, we still need to decide whether we want to return normalized quadrature weights or normalized densities.

I haven't had a chance to think much about this. I'm ok with whatever @avehtari decides.

@avehtari
Copy link
Collaborator

Thinking for a moment, I think normalized densities would make more sense, and if someone would like to use normalized quadrature weights it's easier to obtain them from normalized densities than vice versa.

@ParadaCarleton
Copy link

Quick question -- is the gpd_fit posterior actually a good one? The estimator is good, but because the prior is derived from the data, the posterior will be overconfident. This doesn't matter much if you just need k-hat, but I think you'd have to test how much the empirical prior affects the overconfidence.

@avehtari
Copy link
Collaborator

I have tested. When using sample sizes that are typical gpd_fit use and when tail thicknesses are in the interesting region (<0.7), gpd_fit approximation is very good with no practical difference when compared to the marginals of the full posterior computed with Stan. I guess I should add this comment to the PSIS paper revision.

Autumn has been very busy, but I hope to be able to go through some PRs and issues in a a few weeks.

@avehtari
Copy link
Collaborator

avehtari commented Jan 6, 2022

Thanks to a question by @sethaxen, I realized that what is above named as densities are not yet densities. Originally, the algorithm computes quadrature weights and for those most natural scale would be normalized weights. To compute densities, it is not enough to compute the normalization term, as each quadrature point presents a different volume of the parameter space. I'll prepare a note with illustration and code. We can then discuss what we should return and how to document clearly what we are returning.

@avehtari
Copy link
Collaborator

Here is an illustration
image

The yellow contours present the likelihood, the red dashed contours present the posterior, blue line is the profile line, the blue dots present the quadrature points. w_theta are normalized likelihood values at the qudrature points and the effect of the prior comes from non-equidistant spacing of the quadrature points. To plot the marginal posterior of k, we need to evaluate the posterior density at the quadrature points (or in any other evaluation points).

The following code change returns the quadrature weights k_w that are specific to the quadrature points k, and posterior densities k_d evaluated at the quadrature points k.

  # quadrature weights for k are same as for theta
  k_w <- w_theta
  # quadrature weights are just the normalized likelihoods
  # we get the unnormalized posterior by multiplying these by the prior
  k_d <- k_w * dgpd(-theta, mu=-1/x[N], sigma=1/prior/xstar, xi=0.5)
  # normalize using the trapezoidal rule
  Z <- sum((k_d[-M] + k_d[-1]) * (k[-M] - k[-1])) / 2
  k_d <- k_d / Z

  nlist(k_hat, sigma_hat, k, k_w, k_d)

Here is the exact marginal posterior (red dashed) and the profile posterior (blue) corresponding to the above figure
image

The following plot shows comparisons with one data realization each using M=20, 100, 1000 and k=0.3, 0.5, 0.7 ,0.9.
image

The exact marginal posterior is wider than the profile posterior, as expected, but the profile posterior is quite close and thus useful for presenting the uncertainty. In my experiments I used extraDistr::dgpd, but to avoid the dependency we should implement that in loo. Computing the prior added maybe 10% or less computation time, so computing the marginal approximation could be moved also to another function.

@ParadaCarleton
Copy link

I have tested. When using sample sizes that are typical gpd_fit use and when tail thicknesses are in the interesting region (<0.7), gpd_fit approximation is very good with no practical difference when compared to the marginals of the full posterior computed with Stan. I guess I should add this comment to the PSIS paper revision.

Autumn has been very busy, but I hope to be able to go through some PRs and issues in a a few weeks.

I see; does that hold even though the samples aren't IID? That would also tend to make the posterior too narrow. Perhaps "correcting" the log-likelihood by multiplying it with r_eff would fix that problem well enough in practice, though I'm not sure such an approach would be fully justified.

I do also wonder how Zhang's method compares to Laplace approximation for finding the posterior in terms of performance.

@avehtari
Copy link
Collaborator

I see; does that hold even though the samples aren't IID? That would also tend to make the posterior too narrow.

True. I don't think this is an issue as the plan at the moment is just to provide some indication of the uncertainty. Of course, if you plan to use it for something which needs more accuracy, then follow the usual extreme value analysis procedure and thin the chains to have approximately independent draws. For Pareto-k diagnostic and PSIS, having too narrow posterior doesn't harm if there is no big change in the posterior mean, which seems to be true based on the experiments with dependencies typical for MCMC from Stan. If you are worried, then just thin.

I do also wonder how Zhang's method compares to Laplace approximation for finding the posterior in terms of performance.

Not a huge difference, but it doesn't matter as the posterior is much more skewed than normal and the posterior mode is then clearly different from the posterior mean.

@ozanstats ozanstats requested review from avehtari and jgabry January 27, 2022 09:35
@ozanstats
Copy link
Author

  • tried to update gpdfit() to reflect the improvements @avehtari mentioned above
  • implemented the full suite of dgpd/pgpd/qgpd/rgpd() functions
  • cleaned up and condensed some internal helpers

@jgabry; the new functions pass my local checks and the existing tests but it'd be great if you could also give them a try, perhaps with some fresh test cases.

@avehtari
Copy link
Collaborator

Sorry, this has been left hanging for such a long time. After considering, I think we shouldn't change the existing output variable names (ie k -> k_hat, sigma -> sigma_hat), as it is possible the function is already used (and at least I'm using it in different simulations), and that would break those.

I'm also thinking that maybe we should have a separate function for the marginal posterior. It is almost just a by product, but still it is adding a bit of computation and memory usage, and maybe it would be better to have the streamlined one for PSIS-LOO, and then have separate function for the cases when someone wants to investigate more. It would be then possible to add also computationally more intensive, but more accurate marginal posterior computation, too.

@avehtari
Copy link
Collaborator

Adding a comment that this will very likely go to posterior package, and there is no need to merge this to loo

@ozanstats ozanstats closed this by deleting the head repository May 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants