Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SkewNormal distribution #1104

Merged
merged 13 commits into from
Aug 8, 2020
Merged

Conversation

azev77
Copy link
Contributor

@azev77 azev77 commented Apr 23, 2020

Add SkewNormal distribution based on:

@codecov-io
Copy link

codecov-io commented May 9, 2020

Codecov Report

Merging #1104 into master will decrease coverage by 0.32%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1104      +/-   ##
==========================================
- Coverage   79.54%   79.21%   -0.33%     
==========================================
  Files         113      114       +1     
  Lines        5787     5811      +24     
==========================================
  Hits         4603     4603              
- Misses       1184     1208      +24     
Impacted Files Coverage Δ
src/univariate/continuous/skewnormal.jl 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2d7f203...3f40d9c. Read the comment docs.

@johnczito
Copy link
Member

Thanks! This would be a good add. My thoughts:

  • I'm not sure about the fact that this implementation comes from a discourse post. At the very least, the source code must include much more explicit attribution.
  • Needs tests. It would be good to include tests where you check against archived output from Azzalini's sn package. Note that the unit tests for the univariates have a fairly standardized format that you can check out here: Distributions.jl/test/ref/ and Distributions.jl/test/univariates.jl. If it's straightforward to ultimately cast the tests in that mold, that might be nice.
  • Since this is a well-studied univariate, it should have a cdf method as soon as possible. In the (medium-term?) that would require that someone merge an implementation of Owen's T function into StatsFuns.jl.

@azev77
Copy link
Contributor Author

azev77 commented May 10, 2020

@johnczito thanks for responding.
The PR is from wikipedia, @maximerischard's discourse post, @fairbrot's Multivariate pkg.
1 I opened issue #1101 weeks before the PR, where I gave a shoutout to @maximerischard & @fairbrot
2 @fairbrot encouraged this.
3 I haven't heard from @maximerischard. I emailed him on discourse.
4 I'm happy to add more explicit attribution
5 happy to add tests
Let me know anything else I can do.

@maximerischard
Copy link

Sorry I didn't see your discourse message. I'm very happy that you found my code snippet useful and for some of the code to find its way to Distributions.jl.

return d.ξ + d.ω * sign(u0) * u1
end

## Fitting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the commented lines should be removed, they seem to come from another file

@matbesancon
Copy link
Member

this looks like a cool addition to Distributions. +1 to add tests

@codecov-commenter
Copy link

codecov-commenter commented Jun 29, 2020

Codecov Report

Merging #1104 into master will increase coverage by 0.16%.
The diff coverage is 64.51%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1104      +/-   ##
==========================================
+ Coverage   79.54%   79.70%   +0.16%     
==========================================
  Files         113      115       +2     
  Lines        5787     5958     +171     
==========================================
+ Hits         4603     4749     +146     
- Misses       1184     1209      +25     
Impacted Files Coverage Δ
src/Distributions.jl 100.00% <ø> (ø)
src/univariates.jl 61.62% <ø> (-3.63%) ⬇️
src/univariate/continuous/skewnormal.jl 64.51% <64.51%> (ø)
src/multivariate/mvnormal.jl 70.94% <0.00%> (-0.65%) ⬇️
src/matrix/matrixreshaped.jl 90.47% <0.00%> (ø)
src/truncated/normal.jl 84.61% <0.00%> (+0.26%) ⬆️
src/mixtures/mixturemodel.jl 77.19% <0.00%> (+0.54%) ⬆️
src/multivariates.jl 79.41% <0.00%> (+1.63%) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2d7f203...8b9a254. Read the comment docs.

@azev77
Copy link
Contributor Author

azev77 commented Jun 29, 2020

@johnczito @matbesancon @fairbrot @maximerischard
I fine-tuned the code & added tests. Please have a look.

@azev77 azev77 mentioned this pull request Jun 29, 2020
68 tasks
@maximerischard
Copy link

Looks great! I noticed there weren't any tests for rand, which also appears to be lacking for most distributions in the package. As a sanity check I compared a histogram of a million samples to the PDF, and it all looks good.

using PyPlot: PyPlot; plt=PyPlot
using Distributions
d1 = SkewNormal(1, 2, 3)
samples = rand(d1, 10^6);
plt.hist(samples; density=true, bins=1000)
let
    xx = range(-3.0, stop=8.0, length=1000)
    plt.plot(xx, pdf.(d1, xx))
end

index

I then also compared the empirical to the theoretical moments:

using Statistics
@show mean(samples), mean(d1)
@show std(samples), std(d1)
@show skewness(samples), skewness(d1)
@show kurtosis(samples), kurtosis(d1)
xx = range(-3.0, stop=8.0, length=100000)
@show xx[argmax(pdf.(d1, xx))], mode(d1)
(mean(samples), mean(d1)) = (2.5142092281272057, 2.513879513212096)
(std(samples), std(d1)) = (1.3088404926822659, 1.3069693261422424)
(skewness(samples), skewness(d1)) = (0.6675144945558559, 0.6670235701524081)
(kurtosis(samples), kurtosis(d1)) = (0.5017438999403141, 0.5097701294494131)
(xx[argmax(pdf.(d1, xx))], mode(d1)) = (1.946749467494675, 2.1058875860443687)

Again, all looking good except the mode is slightly off since it is an approximation (as is made clear in the comments in the code).

@johnczito
Copy link
Member

johnczito commented Jun 29, 2020

I noticed there weren't any tests for rand, which also appears to be lacking for most distributions in the package.

Yeah, it's a tricky one, as discussed in #143 and #1142. Anything would have some positive failure rate, even if the code is error free. I include some KS testing in the unit tests for the matrix variates, along with crude checks for the sample mean and covariance, but they're really just regression tests. They happen not to fail for this seed. If they start failing, it's just something to look into.

@azev77
Copy link
Contributor Author

azev77 commented Jul 3, 2020

@johnczito @matbesancon
Are we ready to roll?

@johnczito
Copy link
Member

johnczito commented Jul 3, 2020

Looks good to me. Can you add tests checking that it agrees with Normal in the α = 0 case?

@azev77
Copy link
Contributor Author

azev77 commented Jul 3, 2020

@johnczito @matbesancon
Great idea. I added tests checking that Normal is nested in SkewNormal when α = 0.
Are we ready to roll?

@johnczito
Copy link
Member

Sorry, for the sake of coverage, can you check every method that Normal and SkewNormal share? So skewness, kurtosis, mgf, and cf as well?

@azev77
Copy link
Contributor Author

azev77 commented Jul 3, 2020

@johnczito done!
Ready to roll?

@azev77
Copy link
Contributor Author

azev77 commented Jul 9, 2020

@johnczito please let me know if there is anything else I can do

@johnczito
Copy link
Member

Looks good to me. I'm not sure what the path forward is on the cdf. Such as it is, this would be one of only two or three univariates that don't provide it, which means you can't really compute probabilities. I'm not sure how central folks think that functionality is.

@azev77
Copy link
Contributor Author

azev77 commented Jul 9, 2020

@johnczito thanks for approving!

  1. there are several other properties I would like to add to SkewNormal in the future & I will think more about the cdf & possibly contributing to Owen's T
  2. in mean time SkewDist.jl does
    cdf(dist::SkewNormal, x::Real)=quadgk(t->pdf(dist,t), -Inf, x)[1]
    quantile(dist::SkewNormal, β::Float64)=newton(x->cdf(dist,x) - β, dist.ξ)
  3. I'm new to PRs & Github, does this mean the current PR is approved & will be integrated in the package?

@johnczito
Copy link
Member

@mschauer thoughts on merging this without a cdf method?

@azev77
Copy link
Contributor Author

azev77 commented Jul 16, 2020

@mschauer do you think we can merge this PR?

var(d::SkewNormal) = abs2(d.ω)*(1-mean_z(d)^2)
std(d::SkewNormal) = √var(d)
skewness(d::SkewNormal) = (4-π)/2 * mean_z(d)^3 / (1-mean_z(d)^2)^(3/2)
kurtosis(d::SkewNormal) = 2*(π-3)*( (delta(d)* sqrt(2/π) )^4 / (1-2*(delta(d)^2)/π)^2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random whitespace

@test pdf(d3, 3.3) == Distributions.pdf(d4, 3.3)
@test pdf.(d3, 1:3) == Distributions.pdf.(d4, 1:3)
a = mean(d3), var(d3), std(d3)
#b = mean(d4), var(d4), std(d4)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#b = mean(d4), var(d4), std(d4)

a = mean(d3), var(d3), std(d3)
#b = mean(d4), var(d4), std(d4)
b = Distributions.mean(d4), Distributions.var(d4), Distributions.std(d4)
@test a == b
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@test a == b
@test a == b

@azev77
Copy link
Contributor Author

azev77 commented Jul 17, 2020

@mschauer
Thank you. Please have a look at the updated code.

@azev77
Copy link
Contributor Author

azev77 commented Aug 5, 2020

@mschauer are we ready to roll?

@matbesancon
Copy link
Member

seems travis is still failing

@mschauer mschauer merged commit 3f4856c into JuliaStats:master Aug 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants