Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New deferred_codegen implementation #582

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

vchuravy
Copy link
Member

Copy link

codecov bot commented May 13, 2024

Codecov Report

Attention: Patch coverage is 0% with 139 lines in your changes missing coverage. Please review.

Project coverage is 0.00%. Comparing base (d68a7fc) to head (927fc30).

Files Patch % Lines
src/jlgen.jl 0.00% 77 Missing ⚠️
src/driver.jl 0.00% 57 Missing ⚠️
src/irgen.jl 0.00% 5 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #582    +/-   ##
=======================================
  Coverage    0.00%   0.00%            
=======================================
  Files          24      24            
  Lines        3064    3190   +126     
=======================================
- Misses       3064    3190   +126     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@vchuravy
Copy link
Member Author

vchuravy commented May 15, 2024

Okay this looks like the right direction

Using:

@noinline child(i) = i
kernel(i) = GPUCompiler.var"gpuc.deferred"(child, i)

This gets refined from

GPUCompiler.code_typed(job, optimize=false)
1-element Vector{Any}:
 CodeInfo(
1 ─ %1 = GPUCompiler.:(var"gpuc.deferred")::Core.Const(GPUCompiler.var"gpuc.deferred")
│   %2 = Main.child::Core.Const(child)
│   %3 = (%1)(%2, i)::Ptr{Nothing}
└──      return %3
) => Ptr{Nothing}

to

GPUCompiler.code_typed(job, optimize=true)
1-element Vector{Any}:
 CodeInfo(
1 ─ %1 = (GPUCompiler.var"gpuc.lookup")(MethodInstance for child(::Int64), Main.child, i)::Ptr{Nothing}
└──      return %1
) => Ptr{Nothing}

but codegen doesn't like what we are doing and generates a julia.call, which involves boxing.

;  @ /home/vchuravy/src/GPUCompiler/deferred.jl:13 within `kernel`
define i64 @julia_kernel_4280(i64 signext %0) local_unnamed_addr {
top:
  %1 = call {}*** @julia.get_pgcstack()
  %2 = bitcast {}*** %1 to {}**
  %current_task = getelementptr inbounds {}*, {}** %2, i64 -14
  %3 = bitcast {}** %current_task to i64*
  %world_age = getelementptr inbounds i64, i64* %3, i64 15
  %4 = call fastcc nonnull {}* @ijl_box_int64(i64 signext %0)
  %5 = call nonnull {}* ({}* ({}*, {}**, i32)*, {}*, ...) @julia.call({}* ({}*, {}**, i32)* @ijl_apply_generic, {}* inttoptr (i64 125916715977696 to {}*), {}* inttoptr (i64 125918231668992 to {}*), {}* inttoptr (i64 125916747896912 to {}*), {}* %4)
  %6 = bitcast {}* %5 to i64*
  %unbox = load i64, i64* %6, align 8
  ret i64 %unbox
}

@vchuravy
Copy link
Member Author

Okay much nicer instead of refining to a Julia function we go straight to a llvmcall

;  @ /home/vchuravy/src/GPUCompiler/deferred.jl:13 within `kernel`
define i64 @julia_kernel_451(i64 signext %0) local_unnamed_addr {
top:
  %1 = call {}*** @julia.get_pgcstack()
  %2 = bitcast {}*** %1 to {}**
  %current_task = getelementptr inbounds {}*, {}** %2, i64 -14
  %3 = bitcast {}** %current_task to i64*
  %world_age = getelementptr inbounds i64, i64* %3, i64 15
  %4 = call i64 @gpuc.lookup({}* inttoptr (i64 132498188785376 to {}*), {}* inttoptr (i64 132498210393720 to {}*), i64 %0)
  ret i64 %4
}

@maleadt added benefit is that this should handle invalidations of child correctly xD

@vchuravy
Copy link
Member Author

@maleadt I left the old implementation alive since Enzyme is using it.

We could add a token to declare who owns it...

The addition of AbstractGPUCompiler is so that Enzyme can inherit the implementation here, and maybe customize it.
But I haven't thought that interaction fully through.

Potentially Enzyme ought to have a enz.lookup function instead, but then we need to somehow make the processing here extendable.

@vchuravy
Copy link
Member Author

But this now sucessfully turns

; ModuleID = 'start'
source_filename = "start"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-linux-gnu"

;  @ /home/vchuravy/src/GPUCompiler/deferred.jl:13 within `kernel`
define i64 @julia_kernel_400(i64 signext %0) local_unnamed_addr {
top:
  %1 = call i64 @gpuc.lookup({}* inttoptr (i64 139647662117264 to {}*), {}* inttoptr (i64 139646283783472 to {}*), i64 %0)
  ret i64 %1
}

declare i64 @gpuc.lookup({}*, {}*, i64) local_unnamed_addr

!llvm.module.flags = !{!0, !1}

!0 = !{i32 2, !"Dwarf Version", i32 4}
!1 = !{i32 2, !"Debug Info Version", i32 3}

Into

; ModuleID = 'start'
source_filename = "start"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-linux-gnu"

;  @ /home/vchuravy/src/GPUCompiler/deferred.jl:13 within `kernel`
define i64 @julia_kernel_400(i64 signext %0) local_unnamed_addr {
top:
  ret i64 ptrtoint (i64 (i64)* @julia_child_465 to i64)
}

;  @ /home/vchuravy/src/GPUCompiler/deferred.jl:12 within `child`
; Function Attrs: noinline
define i64 @julia_child_465(i64 signext %0) local_unnamed_addr #0 {
top:
  ret i64 %0
}

attributes #0 = { noinline }

!llvm.module.flags = !{!0, !1}

!0 = !{i32 2, !"Dwarf Version", i32 4}
!1 = !{i32 2, !"Debug Info Version", i32 3}

From:

@noinline child(i) = i
kernel(i) = GPUCompiler.var"gpuc.deferred"(child, i)

@maleadt
Copy link
Member

maleadt commented Jul 4, 2024

@maleadt I left the old implementation alive since Enzyme is using it.

Why does Enzyme always require horrible things... Can't this just be a breaking release?

src/driver.jl Outdated Show resolved Hide resolved
@vchuravy
Copy link
Member Author

vchuravy commented Jul 4, 2024

Why does Enzyme always require horrible things... Can't this just be a breaking release?

I wonder that myself... Yeah we can make this a breaking release, but I will need some more time to figure out how the Enzyme part will work.

src/jlgen.jl Show resolved Hide resolved
@vchuravy
Copy link
Member Author

@maleadt The specfunc name is "julia_##child#234_3583"
But the name in the IR is "julia___child_237_18493"

Any ideas when that sanitation is happening?

@vchuravy
Copy link
Member Author

Ah it's probably

for val in [collect(globals(mod)); collect(functions(mod))]

@maleadt maleadt force-pushed the vc/rework_deferred_codegen branch from d2d7a73 to e532812 Compare July 24, 2024 08:23
Copy link
Member

@maleadt maleadt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Rebased and cleaned-up a little.

How is this looking from the Enzyme.jl side, do we need to keep the old implementation around for now?

src/irgen.jl Show resolved Hide resolved
src/jlgen.jl Outdated Show resolved Hide resolved
src/jlgen.jl Outdated Show resolved Hide resolved
src/driver.jl Show resolved Hide resolved
@vchuravy vchuravy force-pushed the vc/rework_deferred_codegen branch 2 times, most recently from b37223e to 5281e86 Compare September 26, 2024 07:38
Copy link
Member Author

vchuravy commented Sep 26, 2024

@vchuravy vchuravy changed the title Experiment with new deferred_codegen implementation New deferred_codegen implementation Sep 27, 2024
Comment on lines +472 to +477
minfo = info.info
results = minfo.results
if length(results.matches) != 1
return nothing
end
match = only(results.matches)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be updated for 1.12

@vchuravy vchuravy mentioned this pull request Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants