-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poisson
variance deviation at high lambda
#1515
Comments
I've experimentally implemented the PD algorithm (the one used by R) and replaced the rejection sampling method. It passed the Additionally, I found that the test parameter You can find my implementation here: Implementation Link |
We cannot accept straight translations from R code which is distributed under the GPLv2. But I assume this is same algorithm published here in Fortran?
But the values of λ used in the test are |
My implementation follows the textual description of the PD algorithm in case A (μ ≥ 10) from the referenced paper (page 11 of the PDF). In fact, the R implementation uses some goto statements, which, even if permissible from a licensing perspective, are difficult to directly translate.
Locally, I enabled the 1e9 test, and it passed. However, due to the slow performance of the gamma_lr function (it took 70 seconds to run on my machine), I haven't pushed this part of the changes yet. Regarding the discrete KS test function, it may need to be generalized to accept a CDF function of the form |
I guess is should be fine to test with |
Further to this, we have a for (seed, lambda) in parameters.into_iter().enumerate() {
let dist = Poisson::new(lambda).unwrap();
test_discrete::<f64, Poisson<f64>, _>(seed as u64, dist, |k| cdf(k, lambda));
test_discrete::<u64, Poisson<f64>, _>(seed as u64, dist, |k| cdf(k, lambda));
} |
I recently conducted some challenging tests, and both the current implementation and the PD algorithm I implemented fail the KS test when lambda=1e12, but they pass when lambda is <=1e11, even though the variance obtained from the PD algorithm appears more reasonable. It seems that when the parameter is too large, the convergence speed of the gamma function becomes too slow, and each test takes more than an hour to run on my computer. These tests are purely experimental, and it's hard to imagine someone actually needing a Poisson distribution with such large parameters. Running these tests in a GitHub Action environment is also unrealistic. Although the PD algorithm does not pass the tests with extreme parameters, the current benchmarks show that its sampling speed is significantly faster than the current rejection sampling algorithm.
|
I am pretty sure the current approach cannot be fixed to work for high lambda and it seems very plausible that it is so much slower. So we should definitely replace it completely. At some point we could just use the normal approximation. Unfortunately it is hard to quantify how much error you do. The best upper bound I know is O(lambda^(-1/2)) but it should actually be much better. If it passes the KS test, I guess the normal distribution should be fine. |
Summary
Poisson distribution where, at high lambda values, the sample variance deviates noticeably from the mean.
During a Kolmogorov-Smirnov (KS) test, the Poisson distribution fails when lambda approaches
MAX_LAMBDA
. I conducted a simple test to examine the sample mean and variance, and observed the following results:For samples from a Poisson distribution, the sample mean and variance should be very close to lambda.
For reference, I also checked the sample variance of Poisson generators in NumPy and R (I haven't conducted a full goodness-of-fit test yet).
NumPy implementation for
lam>=10
is based on the paper (The transformed rejection method for generating Poisson random variables)R implementation is based on the paper (Computer generation of Poisson deviates from modified normal distributions)
Code sample
The text was updated successfully, but these errors were encountered: