Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nested gradient with hessian #1264

Open
YichengDWu opened this issue Jul 17, 2022 · 16 comments · May be fixed by #1270
Open

nested gradient with hessian #1264

YichengDWu opened this issue Jul 17, 2022 · 16 comments · May be fixed by #1270
Labels
second order zygote over zygote, or otherwise

Comments

@YichengDWu
Copy link

YichengDWu commented Jul 17, 2022

Reverse on forward on reverse:

julia> function f1(x, ps)  # [edit: renamed not to clash]
       hess = Zygote.hessian(x->sum(x.^3), x)
       return hess * x .+ ps.bias
       end
f1 (generic function with 1 method)

julia> x = rand(3)
3-element Vector{Float64}:
 0.9750274052932691
 0.015723824729741764
 0.9792305251283961

julia> ps = (;bias = rand(3))
(bias = [0.6184575461887033, 0.24789621977449272, 0.5451996227584986],)

julia> f1(x,ps)
3-element Vector{Float64}:
 6.322528192626253
 0.24937965175928256
 6.298554150817905

julia> Zygote.gradient(p -> sum(f1(x,p)), ps)
ERROR: Mutating arrays is not supported -- called setindex!(Matrix{Float64}, ...)
This error occurs when you ask Zygote to differentiate operations that change
the elements of arrays in place (e.g. setting values with x .= ...)

Possible fixes:
- avoid mutating operations (preferred)
- or read the documentation and solutions for this error
  https://fluxml.ai/Zygote.jl/dev/limitations.html#Array-mutation-1

Stacktrace:
  [1] error(s::String)
    @ Base .\error.jl:33
  [2] _throw_mutation_error(f::Function, args::Matrix{Float64})
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\array.jl:70
  [3] (::Zygote.var"#444#445"{Matrix{Float64}})(#unused#::Nothing)
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\array.jl:82
  [4] (::Zygote.var"#2496#back#446"{Zygote.var"#444#445"{Matrix{Float64}}})(Δ::Nothing)
    @ Zygote C:\Users\Luffy\.julia\packages\ZygoteRules\AIbCs\src\adjoint.jl:67
  [5] Pullback
    @ C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\forward.jl:31 [inlined]
  [6] (::typeof((forward_jacobian)))(Δ::Tuple{Nothing, Matrix{Float64}})
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface2.jl:0
  [7] Pullback
    @ C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\forward.jl:44 [inlined]
  [8] Pullback
    @ C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\forward.jl:43 [inlined]
  [9] Pullback
    @ C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\grad.jl:76 [inlined]
 [10] (::typeof((hessian_dual)))(Δ::Matrix{Float64})
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface2.jl:0
 [11] Pullback
    @ C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\grad.jl:74 [inlined]
 [12] Pullback
    @ .\REPL[119]:2 [inlined]
 [13] (::typeof((f)))(Δ::FillArrays.Fill{Float64, 1, Tuple{Base.OneTo{Int64}}})
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface2.jl:0
 [14] Pullback
    @ .\REPL[123]:1 [inlined]
 [15] (::typeof((#97)))(Δ::Float64)
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface2.jl:0
 [16] (::Zygote.var"#60#61"{typeof((#97))})(Δ::Float64)
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface.jl:41
 [17] gradient(f::Function, args::NamedTuple{(:bias,), Tuple{Vector{Float64}}})
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface.jl:76
 [18] top-level scope
    @ REPL[123]:1

Forward on forward on reverse

julia> Zygote.forward_jacobian(p -> sum(f1(x,p)), ps)
ERROR: MethodError: no method matching size(::NamedTuple{(:bias,), Tuple{Vector{Float64}}})
Closest candidates are:
  size(::Union{LinearAlgebra.QR, LinearAlgebra.QRCompactWY, LinearAlgebra.QRPivoted}) at C:\Users\Luffy\AppData\Local\Programs\Julia-1.7.2\share\julia\stdlib\v1.7\LinearAlgebra\src\qr.jl:567
  size(::Union{LinearAlgebra.QR, LinearAlgebra.QRCompactWY, LinearAlgebra.QRPivoted}, ::Integer) at C:\Users\Luffy\AppData\Local\Programs\Julia-1.7.2\share\julia\stdlib\v1.7\LinearAlgebra\src\qr.jl:566
  size(::Union{LinearAlgebra.Cholesky, LinearAlgebra.CholeskyPivoted}) at C:\Users\Luffy\AppData\Local\Programs\Julia-1.7.2\share\julia\stdlib\v1.7\LinearAlgebra\src\cholesky.jl:494
  ...
Stacktrace:
 [1] seed(x::NamedTuple{(:bias,), Tuple{Vector{Float64}}}, ::Val{1}, offset::Int64) (repeats 2 times)
   @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\forward.jl:7
 [2] forward_jacobian(f::var"#99#100", x::NamedTuple{(:bias,), Tuple{Vector{Float64}}}, #unused#::Val{1})
   @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\forward.jl:29
 [3] forward_jacobian(f::Function, x::NamedTuple{(:bias,), Tuple{Vector{Float64}}}; chunk_threshold::Int64)
   @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\forward.jl:44
 [4] forward_jacobian(f::Function, x::NamedTuple{(:bias,), Tuple{Vector{Float64}}})
   @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\forward.jl:43
 [5] top-level scope
   @ REPL[124]:1

It would be great to support the second mode. Looks like it won't take too much to achieve that. If I change ps to a vector it can work smoothly.

julia> function f2(x, bias)
       hess = Zygote.hessian(x->sum(x.^3), x)
       return hess * x .+ bias
       end
f2 (generic function with 1 method)

julia> Zygote.forward_jacobian(p -> sum(f2(x,p)), rand(3))
(13.320223875130782, [1.0; 1.0; 1.0;;])
@YichengDWu
Copy link
Author

It boils down to generalizing Zygote.seed to NamedTuple

julia> Zygote.seed(ps,Val(12))
ERROR: MethodError: no method matching size(::NamedTuple{(:bias,), Tuple{Vector{Float64}}})
Closest candidates are:
  size(::Union{LinearAlgebra.QR, LinearAlgebra.QRCompactWY, LinearAlgebra.QRPivoted}) at C:\Users\Luffy\AppData\Local\Programs\Julia-1.7.2\share\julia\stdlib\v1.7\LinearAlgebra\src\qr.jl:567
  size(::Union{LinearAlgebra.QR, LinearAlgebra.QRCompactWY, LinearAlgebra.QRPivoted}, ::Integer) at C:\Users\Luffy\AppData\Local\Programs\Julia-1.7.2\share\julia\stdlib\v1.7\LinearAlgebra\src\qr.jl:566
  size(::Union{LinearAlgebra.Cholesky, LinearAlgebra.CholeskyPivoted}) at C:\Users\Luffy\AppData\Local\Programs\Julia-1.7.2\share\julia\stdlib\v1.7\LinearAlgebra\src\cholesky.jl:494
  ...
Stacktrace:
 [1] seed(x::NamedTuple{(:bias,), Tuple{Vector{Float64}}}, ::Val{12}, offset::Int64) (repeats 2 times)
   @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\forward.jl:7
 [2] top-level scope
   @ REPL[160]:1
 [3] top-level scope
   @ C:\Users\Luffy\.julia\packages\CUDA\tTK8Y\src\initialization.jl:52

@mcabbott mcabbott added the second order zygote over zygote, or otherwise label Jul 19, 2022
@YichengDWu

This comment was marked as duplicate.

@mcabbott
Copy link
Member

mcabbott commented Jul 19, 2022

Zygote's jacobian function isn't Zygote-differentiable. There's no major barrier to making it so, someone just has to do it. I think the most relevant issue for this is #953 . That's pure-Zygote, unlike hessian which involves ForwardDiff... maybe this issue can be about that alone?

@YichengDWu
Copy link
Author

Mathematically, I did not differentiate the jacobian itself. The Jacobian here should be treated as a constant. This could be a dumb question but why can't Zygote tell from the input of Zygote.jacobian to avoid redundant differentiation?

@mcabbott
Copy link
Member

Zygote does not know this, unfortunately. It works backwards from the final result, in complete ignorance of which branches of the expression tree ultimately lead to gradients you do or do not want.

@ToucheSir
Copy link
Member

ToucheSir commented Jul 19, 2022

More specifically, Zygote will run the AD transform on any function which doesn't have an existing rrule/@adjoint (N.B: @nograd and @non_differentiable generate these automatically), and generate an output which is used in the primal computation. For your particular case, ChainRulesCore provides a manual stop gradient operation via https://juliadiff.org/ChainRulesCore.jl/stable/api.html#ChainRulesCore.@ignore_derivatives and co.

@YichengDWu
Copy link
Author

Ok, I get it now. What also confuses me is the error message here. It Mutating arrays is not supported not no adjoint for Zygote.jacobian. Quite misleading to me.

@ToucheSir
Copy link
Member

ToucheSir commented Jul 19, 2022

Just because Zygote fails on a function due to it using mutation does not mean the solution is to write an adjoint for it. Alternatives may include rewriting the offending bits to not use mutation, using Buffer, marking the function or a function higher up the call stack @non_differentiable, writing a custom rule for a function higher up the stack, etc.

@YichengDWu
Copy link
Author

YichengDWu commented Jul 19, 2022

In essence, It looks like the same kind of failure as the reverse on forward on reverse given that they have similar error messages.

For forward on forward on reverse, after adding a method to Zygote.seed it is working fine SciML/ComponentArrays.jl#149

@mcabbott
Copy link
Member

no adjoint for Zygote.jacobian. Quite misleading

The reason for this is that looking inside unknown functions is literally what it does for a living. It doesn't stop at the call to Zygote.jacobian because this is nothing special; it hopes to successfully figure out what happens inside. And it fails because the function makes an array & writes into it.

Many failures thus have the same error message. Zygote over ForwardDiff is its own ball of problems besides mutation.

@YichengDWu
Copy link
Author

Ah you're right, just a little whining

@YichengDWu
Copy link
Author

I have a piece of code that involves multiple errors and it took me a long time to isolate each one 🥲

@mcabbott
Copy link
Member

Oh I know. My only advice is to start small and add things...

I've marked the jacobian posts "duplicate" since these are exactly #1268 now.

@YichengDWu
Copy link
Author

Should I change the title of this issue to "hessian is not differentiable"?

@mcabbott
Copy link
Member

There are two hessian functions, and they are very short:

hessian(f, x) = hessian_dual(f, x)
hessian_dual(f, x::AbstractArray) = forward_jacobian(x -> gradient(f, x)[1], x)[2]
hessian_dual(f, x::Number) = ForwardDiff.derivative(x -> gradient(f, x)[1], x)
"""
hessian_reverse(f, x)
This should be equivalent to [`hessian(f, x)`](@ref hessian),
but implemented using reverse over reverse mode, all Zygote.
(This is usually much slower, and more likely to find errors.)
"""
hessian_reverse(f, x::AbstractArray) = jacobian(x -> gradient(f, x)[1], x)[1]
hessian_reverse(f, x::Number) = gradient(x -> gradient(f, x)[1], x)[1]

The all-zygote one would in principal be differentiable, if #1268 were solved.

julia> function f3(x, ps)
         hess = Zygote.hessian_reverse(x->sum(abs2, x.^3), x)
         return hess * x .+ ps.bias
       end
f3 (generic function with 1 method)

julia> Zygote.gradient(p -> sum(f3(x,p)), ps)
ERROR: Mutating arrays is not supported -- called copyto!(SubArray{Float64, 1, Matrix{Float64}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true}, ...)

However, 3rd order derivatives using Zygote are unlikely to ever be a good idea. I think the tests contain one like this, and it is very very slow:

julia> @time @eval sin'''(1.0)
120.620770 seconds (93.61 M allocations: 5.603 GiB, 13.76% gc time, 99.81% compilation time)
-0.5403023058681398

@YichengDWu
Copy link
Author

YichengDWu commented Jul 19, 2022

Sure it's slow. Forward over forward over reverse is more reasonable. So this goes back to supporting Zygote.seed(x::NamedTuple,...)

@YichengDWu YichengDWu linked a pull request Jul 23, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
second order zygote over zygote, or otherwise
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants