Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple rand and eltype #1905

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

Conversation

devmotion
Copy link
Member

@devmotion devmotion commented Sep 25, 2024

One of my take-aways from issues such as #1252, #1902, #894, #1783, #1041, and #1071 is that eltype is not only quite inconsistently implemented and unclearly documented currently but more generally it might be a bad design decision to use it for pre-allocating containers of samples: Setting it to a fixed type (historically Float64 for continuous and Int for discrete distributions) is too limiting, but trying to infer it from parameters is challenging and doomed to fail for more challenging scenarios that are not as clear as e.g. Normal.

This PR tries to decouple rand and eltype, to make it easier to possibly eventually deprecate and remove eltype.

Fixes #1041.
Fixes #1071.
Fixes #1082.
Fixes #1252.
Fixes #1783.
Fixes #1884.
Fixes #1902.
Fixes #1907.

@codecov-commenter
Copy link

codecov-commenter commented Sep 26, 2024

Codecov Report

Attention: Patch coverage is 98.18182% with 2 lines in your changes missing coverage. Please review.

Project coverage is 85.60%. Comparing base (a1010e4) to head (17154a2).

Files with missing lines Patch % Lines
src/multivariate/mvtdist.jl 83.33% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1905      +/-   ##
==========================================
- Coverage   86.08%   85.60%   -0.49%     
==========================================
  Files         144      144              
  Lines        8696     8758      +62     
==========================================
+ Hits         7486     7497      +11     
- Misses       1210     1261      +51     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@andreasnoack
Copy link
Member

Given that, according to the docstring

"""
eltype(::Type{Sampleable})
The default element type of a sample. This is the type of elements of the samples generated
by the `rand` method. However, one can provide an array of different element types to
store the samples using `rand!`.
"""

the sole purpose of eltype is to return the type of rand, which I agree isn't really feasible, should we also deprecate the methods here as part of this PR?

@quildtide
Copy link
Contributor

#1907 is related.

@devmotion
Copy link
Member Author

#1907 is related.

17154a2 fixes it.

@quildtide
Copy link
Contributor

My personal opinion:
rand(d::Distribution) should remain a valid method (I don't think anyone is trying to change this), and its return type should be predictable by knowing what d is. It would also be useful to have a method that tells you what type rand(d::Distribution) returns without running rand(d::Distribution). eltype(d::Distribution) is logical for this.

This is orthogonal to the fact that Distributions should probably support rand(d::Distribution, T::Type).

In an ideal situation, I think we would define rand(d::Distribution, T::Type) for distributions (to allow for potential type-specific optimizations and evade unnecessary typecasts) and then dispatch rand(d::Distribution) = rand(d, eltype(d)).

@devmotion
Copy link
Member Author

its return type should be predictable by knowing what d is

I became convinced that in general this is not possible. It basically means trying to manually figure out type inference, a task that even the Julia compiler can fail at. Of course, for some number types and distributions it can be done (as also the compiler will succeed in many instances). But in general it's far from trivial. As in the initial stages of Distributions, one could be restrictive and limit samples to Int and Float but that's IMO not desirable either. Therefore I think it's best to remove eltype (which IMO has also been a weird name since I don't view distributions as containers or collections) from the API. IMO it's just too brittle and currently to inconsistent. Of course, that doesn't prevent users from trying to figure out the (element) type of samples with eg the help of the compiler, it's just not an official feature anymore.

singularitti added a commit to singularitti/ToyHamiltonians.jl that referenced this pull request Oct 30, 2024
@singularitti
Copy link

singularitti commented Nov 3, 2024

A few distributions still give me Float64 after this PR, while others work fine:

julia> dist = Semicircle(50.0f0)
Semicircle{Float32}(r=50.0f0)

julia> rand(dist, 5)
5-element Array{Float64, 1}:
  36.80671953487414
 -18.355635129900335
 -11.701855436648922
 -21.444118928985656
  -5.80120463505302

julia> dist = JohnsonSU(0.0f0, 1.0f0, 0.0f0, 1.0f0)
JohnsonSU{Float32}=0.0f0, λ=1.0f0, γ=0.0f0, δ=1.0f0)

julia> rand(dist, 5)
5-element Array{Float64, 1}:
  0.5311696707298299
  1.632313034117999
  0.04951771555318912
  0.4721610259428258
 -3.052321854866766

julia> dist = Chisq(5.0f0)
Chisq{Float32}=5.0f0)

julia> rand(dist, 5)
5-element Array{Float32, 1}:
 15.465032
 1.888659
 7.013455
 4.258529
 3.9611576

Can this be fixed?

@devmotion
Copy link
Member Author

The reason is that for quite a few of the older, probably less used and definitely less frequently updated distributions the rand implementation contains hardcoded Float64, either as literals or as calls of rand(rng) or randn(rng). I fixed the SemicircleandJohnsonSU`, but I'm not sure if it's possible (or even desirable) to address all of these in a single PR (it's already a quite massive undertaking). Maybe it would be best to open issues for others and address them in follow-up PRs (most (all?) other such changes in this PR are included to fix open issues).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment