-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use recursion to fix inference failure #78
Conversation
I'm verifying (here) that this branch is solely responsible, but this seems to reduce gpu allocations for us by a factor of 15:
to build https://buildkite.com/clima/climaatmos-ci/builds/17105#018e004b-661b-404c-96e9-fafa28d875ed:
This basically fixes runtime dispatch issues that we're seeing (mentioned here, but in a different context). |
Here is a reproducer:
import ClimaCore;
import ClimaComms;
using Test
import ClimaCore.Fields
import ClimaCore.Operators
import ClimaCore.Spaces
import ClimaCore.Geometry
using BenchmarkTools;
@isdefined(TU) || include(joinpath(pkgdir(ClimaCore), "test", "TestUtilities", "TestUtilities.jl"));
import .TestUtilities as TU;
FT = Float64;
zelem=25
cspace = TU.CenterExtrudedFiniteDifferenceSpace(FT;zelem, helem=10);
fspace = Spaces.FaceExtrudedFiniteDifferenceSpace(cspace);
const CT3 = Geometry.Contravariant3Vector
const C12 = ClimaCore.Geometry.Covariant12Vector
@show ClimaComms.device(cspace)
ᶜx = fill((;uₕ=zero(C12{FT}) ,ρ=FT(0)), cspace);
ᶠx = fill((;ᶠuₕ³=zero(CT3{FT})), fspace);
const ᶠwinterp = Operators.WeightedInterpolateC2F(
bottom = Operators.Extrapolate(),
top = Operators.Extrapolate(),
)
function set_ᶠuₕ³!(ᶜx, ᶠx)
ᶜJ = Fields.local_geometry_field(ᶜx).J
@. ᶠx.ᶠuₕ³ = ᶠwinterp(ᶜx.ρ * ᶜJ, CT3(ᶜx.uₕ))
return nothing
end
set_ᶠuₕ³!(ᶜx, ᶠx) # compile
p_allocated = @allocated set_ᶠuₕ³!(ᶜx, ᶠx)
@show p_allocated
using CUDA
using JET
using Adapt
# @test_opt ignored_modules = (CUDA,) set_ᶠuₕ³!(ᶜx, ᶠx)
import Base.Broadcast: broadcasted
ᶜJ = Fields.local_geometry_field(ᶜx).J
bc = broadcasted(ᶠwinterp, broadcasted(*, ᶜx.ρ, ᶜJ), broadcasted(CT3, ᶜx.uₕ));
Adapt.adapt(CUDA.KernelAdaptor(), bc.args);
# Call chain:
@test_opt ignored_modules = (CUDA,) Adapt.adapt(CUDA.KernelAdaptor(), bc.args);
@test_opt ignored_modules = (CUDA,) Adapt.adapt_structure(CUDA.KernelAdaptor(), bc.args);
@test_opt map(Adapt.adapt(CUDA.KernelAdaptor()), bc.args); . You may need to have Any chance we can get this merged? cc @maleadt |
Ok, an update on this: this PR is about half responsible for the allocations mentioned above. So, I'd really appreciate if we can merge this. |
Bluntly applying |
It’s the call to |
If that resolves the problem, I think that would be a cleaner solution. The other methods should normally all be inlined already. |
a55877e
to
73d96dd
Compare
Ok, I've updated the PR. It turns out that |
Co-authored-by: Tim Besard <[email protected]>
I don't understand why, but it seems that the compiler is suddenly having issues with adapt_structure(to, xs::NamedTuple{names}) where {names} =
NamedTuple{names}(adapt_structure(to, Tuple(xs))) fixes things (and it still infers in my original example). I've pushed this fix. Is this alright @maleadt? |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #78 +/- ##
==========================================
+ Coverage 93.65% 94.02% +0.37%
==========================================
Files 6 6
Lines 63 67 +4
==========================================
+ Hits 59 63 +4
Misses 4 4 ☔ View full report in Codecov by Sentry. |
You used |
190fc80
to
9b91675
Compare
|
This PR inlines, and uses recursion to fix cases where
Adapt.adapt
performs runtime dispatch due to either recursion limits, or inability to unrollmap
onTuple
s. I have and will post a reproducer shortly.Closes #79.