-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate to DifferentiationInterface #98
Conversation
@wsmoses Hi Bill, it seems Enzyme is failing again: Got exception outside of a @test
AssertionError: sz == sizeof(Int)
Stacktrace:
[1] should_recurse(typ2::Any, arg_t::LLVM.IntegerType, byref::GPUCompiler.ArgumentCC, dl::LLVM.DataLayout)
@ Enzyme.Compiler ~/.julia/packages/Enzyme/ZRw0g/src/absint.jl:188
[2] abs_typeof(arg::LLVM.LoadInst, partial::Bool)
@ Enzyme.Compiler ~/.julia/packages/Enzyme/ZRw0g/src/absint.jl:441
[3] codegen(output::Symbol, job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}; libraries::Bool, deferred_codegen::Bool, optimize::Bool, toplevel::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
@ Enzyme.Compiler ~/.julia/packages/Enzyme/ZRw0g/src/compiler.jl:8023
[4] codegen
@ ~/.julia/packages/Enzyme/ZRw0g/src/compiler.jl:7026 [inlined]
[5] _thunk(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}, postopt::Bool)
@ Enzyme.Compiler ~/.julia/packages/Enzyme/ZRw0g/src/compiler.jl:9292
[6] _thunk
@ ~/.julia/packages/Enzyme/ZRw0g/src/compiler.jl:9292 [inlined]
[7] cached_compilation
@ ~/.julia/packages/Enzyme/ZRw0g/src/compiler.jl:9333 [inlined]
[8] thunkbase(ctx::LLVM.Context, mi::Core.MethodInstance, ::Val{0x0000000000007b9b}, ::Type{Const{typeof(AdvancedVI.estimate_repgradelbo_ad_forward)}}, ::Type{Duplicated{Any}}, tt::Type{Tuple{Duplicated{Vector{Float64}}, Const{@NamedTuple{rng::StableRNGs.LehmerRNG, adtype::AutoEnzyme{ReverseMode{true, true, FFIABI, false, false}, Nothing}, obj::RepGradELBO{StickingTheLandingEntropy}, problem::TestNormal{Fill{Float64, 1, Tuple{Base.OneTo{Int64}}}, PDMat{Float64, Matrix{Float64}}}, restructure::Optimisers.Restructure{MvLocationScale{LowerTriangular{Float64, Matrix{Float64}}, Normal{Float64}, Vector{Float64}, Float64}, @NamedTuple{location::Int64, scale::Int64}}, q_stop::MvLocationScale{LowerTriangular{Float64, Matrix{Float64}}, Normal{Float64}, Vector{Float64}, Float64}}}}}, ::Val{Enzyme.API.DEM_ReverseModeGradient}, ::Val{1}, ::Val{(false, false, false)}, ::Val{true}, ::Val{true}, ::Type{FFIABI}, ::Val{true}, ::Val{true})
@ Enzyme.Compiler ~/.julia/packages/Enzyme/ZRw0g/src/compiler.jl:9465
[9] #s2064#19047
@ ~/.julia/packages/Enzyme/ZRw0g/src/compiler.jl:9602 [inlined]
[10] var"#s2064#19047"(FA::Any, A::Any, TT::Any, Mode::Any, ModifiedBetween::Any, width::Any, ReturnPrimal::Any, ShadowInit::Any, World::Any, ABI::Any, ErrIfFuncWritten::Any, RuntimeActivity::Any, ::Any, ::Type, ::Type, ::Type, tt::Any, ::Type, ::Type, ::Type, ::Type, ::Type, ::Type, ::Type, ::Any)
@ Enzyme.Compiler ./none:0
[11] (::Core.GeneratedFunctionStub)(::UInt64, ::LineNumberNode, ::Any, ::Vararg{Any})
@ Core ./boot.jl:602
[12] autodiff
@ ~/.julia/packages/Enzyme/ZRw0g/src/Enzyme.jl:368 [inlined]
[13] autodiff
@ ~/.julia/packages/Enzyme/ZRw0g/src/Enzyme.jl:504 [inlined]
[14] macro expansion
@ ~/.julia/packages/Enzyme/ZRw0g/src/Enzyme.jl:1586 [inlined]
[15] gradient
@ ~/.julia/packages/Enzyme/ZRw0g/src/Enzyme.jl:1569 [inlined]
[16] value_and_gradient(f::typeof(AdvancedVI.estimate_repgradelbo_ad_forward), ::DifferentiationInterface.NoGradientPrep, backend::AutoEnzyme{ReverseMode{true, true, FFIABI, false, false}, Nothing}, x::Vector{Float64}, contexts::Constant{@NamedTuple{rng::StableRNGs.LehmerRNG, adtype::AutoEnzyme{ReverseMode{true, true, FFIABI, false, false}, Nothing}, obj::RepGradELBO{StickingTheLandingEntropy}, problem::TestNormal{Fill{Float64, 1, Tuple{Base.OneTo{Int64}}}, PDMat{Float64, Matrix{Float64}}}, restructure::Optimisers.Restructure{MvLocationScale{LowerTriangular{Float64, Matrix{Float64}}, Normal{Float64}, Vector{Float64}, Float64}, @NamedTuple{location::Int64, scale::Int64}}, q_stop::MvLocationScale{LowerTriangular{Float64, Matrix{Float64}}, Normal{Float64}, Vector{Float64}, Float64}}})
@ DifferentiationInterfaceEnzymeExt ~/.julia/packages/DifferentiationInterface/yXBFa/ext/DifferentiationInterfaceEnzymeExt/reverse_onearg.jl:266
[17] value_and_gradient(f::typeof(AdvancedVI.estimate_repgradelbo_ad_forward), backend::AutoEnzyme{ReverseMode{true, true, FFIABI, false, false}, Nothing}, x::Vector{Float64}, contexts::Constant{@NamedTuple{rng::StableRNGs.LehmerRNG, adtype::AutoEnzyme{ReverseMode{true, true, FFIABI, false, false}, Nothing}, obj::RepGradELBO{StickingTheLandingEntropy}, problem::TestNormal{Fill{Float64, 1, Tuple{Base.OneTo{Int64}}}, PDMat{Float64, Matrix{Float64}}}, restructure::Optimisers.Restructure{MvLocationScale{LowerTriangular{Float64, Matrix{Float64}}, Normal{Float64}, Vector{Float64}, Float64}, @NamedTuple{location::Int64, scale::Int64}}, q_stop::MvLocationScale{LowerTriangular{Float64, Matrix{Float64}}, Normal{Float64}, Vector{Float64}, Float64}}})
@ DifferentiationInterface ~/.julia/packages/DifferentiationInterface/yXBFa/src/fallbacks/no_prep.jl:24
[18] estimate_gradient(rng::StableRNGs.LehmerRNG, obj::RepGradELBO{StickingTheLandingEntropy}, adtype::AutoEnzyme{ReverseMode{true, true, FFIABI, false, false}, Nothing}, prob::TestNormal{Fill{Float64, 1, Tuple{Base.OneTo{Int64}}}, PDMat{Float64, Matrix{Float64}}}, params::Vector{Float64}, restructure::Optimisers.Restructure{MvLocationScale{LowerTriangular{Float64, Matrix{Float64}}, Normal{Float64}, Vector{Float64}, Float64}, @NamedTuple{location::Int64, scale::Int64}}, state::Nothing)
@ AdvancedVI ~/.julia/dev/AdvancedVI/src/objectives/elbo/repgradelbo.jl:122
[19] optimize(::StableRNGs.LehmerRNG, ::TestNormal{Fill{Float64, 1, Tuple{Base.OneTo{Int64}}}, PDMat{Float64, Matrix{Float64}}}, ::RepGradELBO{StickingTheLandingEntropy}, ::MvLocationScale{LowerTriangular{Float64, Matrix{Float64}}, Normal{Float64}, Vector{Float64}, Float64}, ::Int64; adtype::AutoEnzyme{ReverseMode{true, true, FFIABI, false, false}, Nothing}, optimizer::Descent{Float64}, averager::NoAveraging, show_progress::Bool, state_init::@NamedTuple{}, callback::Nothing, prog::ProgressMeter.Progress)
@ AdvancedVI ~/.julia/dev/AdvancedVI/src/optimize.jl:76
[20] macro expansion
@ ~/.julia/dev/AdvancedVI/test/inference/repgradelbo_locationscale.jl:53 [inlined]
[21] macro expansion
@ /usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
[22] macro expansion
@ ~/.julia/dev/AdvancedVI/test/inference/repgradelbo_locationscale.jl:52 [inlined]
[23] macro expansion
@ /usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1669 [inlined]
[24] macro expansion
@ ~/.julia/dev/AdvancedVI/test/inference/repgradelbo_locationscale.jl:17 [inlined]
[25] macro expansion
@ /usr/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
[26] top-level scope
@ ~/.julia/dev/AdvancedVI/test/inference/repgradelbo_locationscale.jl:17
[27] include(fname::String)
@ Base.MainInclude ./client.jl:489
[28] top-level scope
@ ~/.julia/dev/AdvancedVI/test/runtests.jl:67
[29] include(fname::String)
@ Base.MainInclude ./client.jl:489
[30] top-level scope
@ none:6
[31] eval
@ ./boot.jl:385 [inlined]
[32] exec_options(opts::Base.JLOptions)
@ Base ./client.jl:291
[33] _start()
@ Base ./client.jl:552 This also happens in our old AD machinery as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmark Results
Benchmark suite | Current: 79e5c62 | Previous: 90a865c | Ratio |
---|---|---|---|
normal + bijector/meanfield/ForwardDiff |
491433178 ns |
450838467.5 ns |
1.09 |
normal + bijector/meanfield/ReverseDiff |
197406284 ns |
187788951 ns |
1.05 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can see, this looks good. It'll be great once more packages move towards depending on DifferentiationInterface, rather than other packages.
I’d probably recommend not removing the Enzyme extension for now as it presently makes some code error or significantly slower that could be directly differentiated with Enzyme |
@Red-Portal can open a MWE issue for the size mismatch? |
@yebai @willtebbutt Any thoughts on @wsmoses 's advice on leaving |
I'm happy to keep the Enzyme interface as is, but perhaps we should move its tests to a separate CI task. |
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
As suggested, I reverted back to the old AD interface, where we indirect to |
@@ -38,7 +39,14 @@ Evaluate the value and gradient of a function `f` at `x` using the automatic dif | |||
- `aux`: Auxiliary input passed to `f`. | |||
- `out::DiffResults.MutableDiffResult`: Buffer to contain the output gradient and function value. | |||
""" | |||
function value_and_gradient! end | |||
function value_and_gradient!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be better as an extension x/ref JuliaDiff/DifferentiationInterface.jl#509
In this case if I'm not mistaken there is only a single differentiated argument? Can you give a concrete example of significant slowdown or error? |
sure, here's a ~100x slowdown example with one arg: using Enzyme, BenchmarkTools, ADTypes, DifferentiationInterface
struct T
a::NTuple{10,Float64}
b
end
a = T((ones(10)...,),nothing)
sumfirst(x) = sum(x.a)
@btime Enzyme.autodiff(Reverse, sumfirst, $(Active(a))) # 6.460 ns (0 allocations: 0 bytes)
function bench_di(backend, a)
prep = DifferentiationInterface.prepare_gradient(sumfirst, backend, a)
@btime DifferentiationInterface.gradient($sumfirst, $prep, $backend, $a)
return nothing
end
bench_di(AutoEnzyme(), a) # 541.346 ns (13 allocations: 1.25 KiB) |
Many of the slowdown cases you present are cases where julia> @btime Enzyme.autodiff(Reverse, sumfirst, $(Active(a)));
6.452 ns (0 allocations: 0 bytes)
julia> @btime Enzyme.gradient(Reverse, sumfirst, $a);
805.476 ns (12 allocations: 1.23 KiB) Maybe the conclusion is that |
Yeah I just lightly modified the same two arg example from slack to make this. But sure, if you can do that go ahead! |
Haved opened JuliaDiff/DifferentiationInterface.jl#512 to keep track, will stop derailing this PR ^^ |
ad::ADTypes.AbstractADType, f, x, aux, out::DiffResults.MutableDiffResult | ||
) | ||
grad_buf = DiffResults.gradient(out) | ||
y, _ = DifferentiationInterface.value_and_gradient!(f, grad_buf, ad, x, Constant(aux)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You would benefit from DI's preparation mechanism, especially for backends like ForwardDiff and ReverseDiff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also for Moooncake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gdalle @willtebbutt In our case, features to come in the future (like subsampling) will result in aux.prob
change at every iteration. Is is still valid to re-use the same prep
object? Or is the idea that I should call DI.prepare_gradient
every time before calling DI.value_and_gradient
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mooncake.jl ought to be fine provided that the type doesn't change. Am I correct in assuming that in the case of subsampling, you're just changing the indices of the data which are subsampled at each iteration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But @gdalle is that true in general? For example I imagine reversediff or the upcoming reactant support would result in an invalid prep object, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the size of x
and the size of the contents of aux.prob
don't change, you should be good to go (except for ReverseDiff as explained above). That would require the minibatches to always have the same length I guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For most backends, what preparation does is allocate caches that correspond to the various differentiated inputs and contexts. These caches have a specific size which determines whether preparation can be reused, but then inside your function you can create more objects with arbitrary sizes, that's none of my business.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's none of my business.
Haha fair enough.
(except for ReverseDiff as explained above). That would require the minibatches to always have the same length I guess?
I think compiled tapes actually need everything to be the same except x
no? But in any case, I think the best course of action would be to expose control on whether to use preparing or not. But I feel that would be a little too complicated for this PR alone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah I had forgotten constants. As soon as you use constants then DI recompiles the tape every time anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. Okay, then I guess there's nothing preventing us from using prepare_gradient
.
In light of the discussion, I will merge this PR and maybe submit a separate PR that exposes |
This removes the local custom AD machinery in favor of
DifferentiationInterface
, which meets our needs as of v0.6, as noted in #85 . It also does some minor things:DifferentiationInterface
Tapir
toMooncake