Use `ProjectTo` in broadcasting & `gradient` #1044

mcabbott · 2021-07-27T20:45:57Z

This starts building ChainRulesCore's type projection story into how Zygote handles broadcasting, and into its user-facing functions. This will already be called in some rules handled by ChainRules, but this applies it a bit more broadly.

After:

julia> gradient(x->imag(x + 2.0*im), 3.0)  # https://github.com/FluxML/Zygote.jl/issues/342
(0.0,)

julia> gradient(x -> getindex(x,2,1), Diagonal(rand(3,3)))[1]  # https://github.com/FluxML/Zygote.jl/issues/402
3×3 Diagonal{Float64, Vector{Float64}}:
 0.0   ⋅    ⋅ 
  ⋅   0.0   ⋅ 
  ⋅    ⋅   0.0

julia> gradient(x -> sum(sqrt.(x .+ 1)), [1,2,3]')[1]  # previously became a matrix
1×3 adjoint(::Vector{Float64}) with eltype Float64:
 0.353553  0.288675  0.25

Before:

julia> gradient(x->imag(x + 2.0*im), 3.0) 
(0.0 + 1.0im,)

julia> gradient(x -> getindex(x,2,1), Diagonal(rand(3,3)))[1]
3×3 Zygote.OneElement{Float64, 2, Tuple{Int64, Int64}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}:
 0.0  0.0  0.0
 1.0  0.0  0.0
 0.0  0.0  0.0

julia> gradient(x -> sum(sqrt.(x .+ 1)), [1,2,3]')[1]
1×3 Matrix{Float64}:
 0.353553  0.288675  0.25

Replaces #965, or most of it.

~~Many tests will fail, including most of the FFT tests I think, since those tend to return a complex gradient for a real input.~~ FFT tests are unchanged.

Closes #342, closes #402. Fixes #917, fixes #431.

Closes FluxML/Flux.jl#886

src/compiler/interface.jl

mzgubic · 2021-07-28T08:56:38Z

src/compiler/interface.jl

@@ -73,7 +73,8 @@ julia> gradient([7, 11], 0, 1) do x, y, d
 """
 function gradient(f, args...)
  y, back = pullback(f, args...)
-  return back(sensitivity(y))
+  grad = back(sensitivity(y))
+  map(_project, args, grad)


do we want this at the gradient or the pullback level?

My thinking was to start small! Applying it to gradient applies it to the user-facing calls, once. Applying it to pullback or _pullback inserts it into many more places internally... maybe it'll make sin'''(1.0) unhappy.

mcabbott · 2021-08-01T15:17:18Z

One side-effect of this is that it makes this wrong answer into an error:

julia> gradient((x,y) -> sum(map(+,x,y)), [1,2], [3,4,5,6])  # before
([1, 1], [1, 1])

julia> gradient((x,y) -> sum(map(+,x,y)), [1,2], [3,4,5,6])  # after
ERROR: DimensionMismatch("variable with size(x) == (4,) cannot have a gradient with size(dx) == (2,)")
Stacktrace:
 [1] (::ChainRulesCore.ProjectTo{AbstractArray, NamedTuple{(:element, :axes), Tuple{ChainRulesCore.ProjectTo{Float64, NamedTuple{(), Tuple{}}}, Tuple{Base.OneTo{Int64}}}}})(dx::Vector{Int64})
   @ ChainRulesCore ~/.julia/packages/ChainRulesCore/ySyqy/src/projection.jl:197

DhairyaLGandhi · 2021-08-19T02:48:23Z

That error seems awkward to me. Previously, the Julia behaviour of the function was the reason behind this gradient. Presumably, the resultant gradient should be sized appropriately, not error.

DhairyaLGandhi · 2021-08-19T02:51:36Z

src/compiler/interface.jl

@@ -95,11 +97,32 @@ true
 """
 function withgradient(f, args...)
  y, back = pullback(f, args...)
-  (val = y, grad = back(sensitivity(y)))
+  grad = back(sensitivity(y))
+  isnothing(grad) && return (val=y, grad=nothing)


Why is this check necessary?

You can't map over nothing.

src/compiler/chainrules.jl

DhairyaLGandhi · 2021-08-19T02:54:28Z

src/lib/broadcast.jl

@@ -45,18 +45,20 @@ function Base.reducedim_init(::typeof(identity), ::typeof(accum), A::AbstractArr
  Base.reducedim_initarray(A, region, nothing, Union{Nothing,eltype(A)})
 end

-trim(x, Δ) = reshape(Δ, ntuple(i -> size(Δ, i), Val(ndims(x))))


I think doing this makes unbroadcast less generic, we don't need to define projections here afaict. Let's retain the current definition.

What case exactly is not handled, if this is less generic?

It restricts it to what can be handled by _project as opposed to simple sizes and lengths of arrays.

those are broadly the same now, as of recent changes. _project will never method error now.

Note that before CRC changes, _project had extra methods to handle other cases.

DhairyaLGandhi · 2021-08-19T03:08:01Z

src/lib/broadcast.jl

@@ -45,18 +45,20 @@ function Base.reducedim_init(::typeof(identity), ::typeof(accum), A::AbstractArr
  Base.reducedim_initarray(A, region, nothing, Union{Nothing,eltype(A)})
 end

-trim(x, Δ) = reshape(Δ, ntuple(i -> size(Δ, i), Val(ndims(x))))


It restricts it to what can be handled by _project as opposed to simple sizes and lengths of arrays.

src/lib/broadcast.jl

DhairyaLGandhi · 2021-09-05T18:50:17Z

src/compiler/chainrules.jl

+@inline function _project(x::Union{Numeric, Ref{<:Numeric}}, dx)
+  wrap_chainrules_output(ProjectTo(x)(wrap_chainrules_input(dx)))
+end
+_project(x::AbstractArray, dx) = dx isa AbstractArray ? reshape(dx, axes(x)) : dx


This can be broken down into a different method

Can you write exactly what method you prefer? There are obviously always other ways to write things.

DhairyaLGandhi · 2021-09-05T18:50:47Z

src/compiler/interface.jl

 ```
 """
 function gradient(f, args...)
  y, back = pullback(f, args...)
-  return back(sensitivity(y))
+  grad = back(sensitivity(y))
+  isnothing(grad) ? nothing : map(_project, args, grad)
 end


You can add a method to _project and avoid this change

You can add a method to _project and avoid this change

Can you write exactly what method that would be?

Something like _project(x, ::Nothing) = nothing maybe

This is easy to try:

julia> _project(x, ::Nothing) = nothing _project (generic function with 1 method) julia> map(_project, (1,2,3), nothing) ERROR: MethodError: no method matching length(::Nothing)

oxinabox · 2021-09-06T18:53:23Z

Is there a reason not to pull all of broadcasting down into ChainRules.jl?
Probably combined with setting up Zygote to claim that ForwardDiff is it's ForwardMode AD?

DhairyaLGandhi · 2021-09-06T19:05:44Z

Why don't we give Zygote.Forward more love? It's better for neural networks.

mcabbott · 2021-09-06T22:17:54Z

Is there a reason not to pull all of broadcasting down into ChainRules.jl?

One reason not to is that Zygote's un-fused broadcast might not be the last word here. Maybe you can write a fused forward broadcast in Diffractor which would be hopelessly slow here. I think there's a lot of exploring left to be done. Unlike the basic rules in ChainRules, where we can write a pretty close to optimal rule once & let everything use it.

Anyway this PR has much more modest goals. In the linked Flux issues it comes pretty close to entirely removing the penalty for mixing up your eltypes. And it fixes a lot of Zygote issues about real/complex.

DhairyaLGandhi · 2021-09-07T10:45:05Z

Mixing eltypes is going to get really important with low precision work picking up the pace. We shouldn't have to write custom passes for every operation related to 16 bit floats.

Besides, its good not to be opinionated and guide users to be type stable. Wouldn't we expect complex numbers to have gradients with complex types? Changing that seems like a bug.

mcabbott · 2021-09-07T13:40:32Z

Yes there have been rumours of mixed-precision training for ages. I don't see any obvious problem though. It does not involve randomly mixing types and hoping that Julia's promotion will figure it out.

Complex/real has been discussed at great length. This PR really isn't the place to argue it; if you think it's wrong you should open an issue on ChainRulesCore and make your case.

Wouldn't we expect complex numbers to have gradients with complex types?

Err, they do? There would be a lot of broken tests if that were altered. I think you may have misunderstood what problem this projection solves. The first message has examples, and links to issues closed.

src/compiler/chainrules.jl

Co-authored-by: Lyndon White <[email protected]>

DhairyaLGandhi · 2021-09-22T14:20:21Z

I did respond on the slack where I'd mentioned wanting to take a look at it today.

mcabbott commented Jul 27, 2021

View reviewed changes

src/compiler/interface.jl Outdated Show resolved Hide resolved

mzgubic reviewed Jul 28, 2021

View reviewed changes

mcabbott mentioned this pull request Jul 30, 2021

how to selectively take structural gradient #1042

Open

mcabbott force-pushed the projectto branch from 9fa587c to 878afca Compare August 1, 2021 14:38

mcabbott mentioned this pull request Aug 1, 2021

Missing methods for reshape? JuliaArrays/StaticArrays.jl#947

Closed

mcabbott mentioned this pull request Aug 1, 2021

How to implement vect JuliaDiff/ChainRules.jl#492

Closed

sethaxen mentioned this pull request Aug 11, 2021

Release type constraints slimgroup/InvertibleNetworks.jl#34

Closed

mcabbott force-pushed the projectto branch from c2937fc to 6c51745 Compare August 19, 2021 01:24

DhairyaLGandhi reviewed Aug 19, 2021

View reviewed changes

mcabbott force-pushed the projectto branch 2 times, most recently from 2c1252b to 09a0ed6 Compare September 5, 2021 13:15

mcabbott marked this pull request as ready for review September 5, 2021 16:02

DhairyaLGandhi suggested changes Sep 5, 2021

View reviewed changes

mcabbott changed the title ~~Use ProjectTo in broadcasting, etc.~~ Use ProjectTo in broadcasting & gradient Sep 6, 2021

This was referenced Sep 6, 2021

generic_matmul! hit in back! because type-promotion in activation function FluxML/Flux.jl#613

Open

Poor performance relative to PyTorch FluxML/Flux.jl#886

Closed

mcabbott requested a review from oxinabox September 6, 2021 17:29

mcabbott requested a review from willtebbutt September 7, 2021 13:39

mcabbott mentioned this pull request Sep 8, 2021

Interaction of Complex{Dual} with rrule for ^ on Julia ≤ 1.5 JuliaDiff/ChainRules.jl#525

Closed

mcabbott force-pushed the projectto branch from 2153f71 to c479200 Compare September 9, 2021 15:12

mcabbott mentioned this pull request Sep 21, 2021

Add ProjectTo(::Any) = identity JuliaDiff/ChainRulesCore.jl#458

Merged

oxinabox reviewed Sep 21, 2021

View reviewed changes

src/compiler/chainrules.jl Show resolved Hide resolved

mcabbott and others added 20 commits September 21, 2021 21:47

less piracy

5bf5342

adjoint

e9ea88a

piract

0013fd3

skip a test

c07ae9f

splat tests

7ff1159

skip on 1.3

6549c57

simplify _project

298f119

a typo

e3922a9

tweak

a2814ae

broken GPU test, unrelated

08f8c46

unexpected pass

c8bc588

only broken on 1.6

5080490

let nothing through

1b37161

rm some broken things

4c08118

target 1.3 fix

7197491

comments

dde922b

update for ProjectTo(::Any)

1c07a7c

fix a test

35280d5

Update test/utils.jl

80123a1

Co-authored-by: Lyndon White <[email protected]>

Update src/lib/broadcast.jl

3bc2e09

mcabbott force-pushed the projectto branch from d8937fb to 3bc2e09 Compare September 22, 2021 01:48

mcabbott added 2 commits September 21, 2021 22:44

cu tests

02397b5

v0.6.22

a3e3a97

mcabbott merged commit 528e0be into FluxML:master Sep 22, 2021

mcabbott deleted the projectto branch September 22, 2021 03:04

willtebbutt mentioned this pull request Sep 22, 2021

Add lazy kronecker product for matrix kernels, if Kronecker.jl is loaded JuliaGaussianProcesses/KernelFunctions.jl#364

Merged

This was referenced Sep 23, 2021

Check master SciML/SciMLSensitivity.jl#493

Closed

_project error #1078

Closed

oxinabox mentioned this pull request Sep 29, 2021

remove Manifest FluxML/Flux.jl#1725

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `ProjectTo` in broadcasting & `gradient` #1044

Use `ProjectTo` in broadcasting & `gradient` #1044

mcabbott commented Jul 27, 2021 •

edited

Loading

mzgubic Jul 28, 2021

mcabbott Aug 1, 2021

mcabbott commented Aug 1, 2021

DhairyaLGandhi commented Aug 19, 2021 •

edited

Loading

DhairyaLGandhi Aug 19, 2021

mcabbott Aug 19, 2021 •

edited

Loading

DhairyaLGandhi Aug 19, 2021

mcabbott Aug 19, 2021

DhairyaLGandhi Aug 19, 2021

oxinabox Sep 21, 2021 •

edited

Loading

mcabbott Sep 21, 2021

DhairyaLGandhi Aug 19, 2021

DhairyaLGandhi Sep 5, 2021

mcabbott Sep 7, 2021

DhairyaLGandhi Sep 5, 2021

mcabbott Sep 7, 2021

DhairyaLGandhi Sep 22, 2021

mcabbott Sep 22, 2021

oxinabox commented Sep 6, 2021

DhairyaLGandhi commented Sep 6, 2021

mcabbott commented Sep 6, 2021

DhairyaLGandhi commented Sep 7, 2021 •

edited

Loading

mcabbott commented Sep 7, 2021

DhairyaLGandhi commented Sep 22, 2021

Use ProjectTo in broadcasting & gradient #1044

Use ProjectTo in broadcasting & gradient #1044

Conversation

mcabbott commented Jul 27, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcabbott commented Aug 1, 2021

DhairyaLGandhi commented Aug 19, 2021 • edited Loading

Choose a reason for hiding this comment

mcabbott Aug 19, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oxinabox Sep 21, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oxinabox commented Sep 6, 2021

DhairyaLGandhi commented Sep 6, 2021

mcabbott commented Sep 6, 2021

DhairyaLGandhi commented Sep 7, 2021 • edited Loading

mcabbott commented Sep 7, 2021

DhairyaLGandhi commented Sep 22, 2021

Use `ProjectTo` in broadcasting & `gradient` #1044

Use `ProjectTo` in broadcasting & `gradient` #1044

mcabbott commented Jul 27, 2021 •

edited

Loading

DhairyaLGandhi commented Aug 19, 2021 •

edited

Loading

mcabbott Aug 19, 2021 •

edited

Loading

oxinabox Sep 21, 2021 •

edited

Loading

DhairyaLGandhi commented Sep 7, 2021 •

edited

Loading