-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize memorynew
intrinsic for constant length Memory
#55913
optimize memorynew
intrinsic for constant length Memory
#55913
Conversation
@gbaraldi so with LLVM assertions enabled I'm getting
which is on the line that does |
I'd print everyone involved here with the way I showed you yesterday |
This now works! For simple examples like |
As an example of what is possible. Allocopt was able to go from define i64 @julia_f_769() #0 !dbg !5 {
top:
%pgcstack = call ptr @julia.get_pgcstack()
%current_task1 = getelementptr inbounds i8, ptr %pgcstack, i64 -112, !dbg !14
%memoryref_mem = call dereferenceable(40) ptr addrspace(10) @julia.gc_alloc_obj(ptr nonnull %current_task1, i64 40, ptr addrspace(10) addrspacecast (ptr @"+Core.GenericMemory#771.jit" to ptr addrspace(10))), !dbg !14
%0 = addrspacecast ptr addrspace(10) %memoryref_mem to ptr addrspace(11), !dbg !14
%1 = getelementptr inbounds { i64, ptr }, ptr addrspace(11) %0, i64 0, i32 1, !dbg !14
%2 = call nonnull ptr @julia.pointer_from_objref(ptr addrspace(11) %0) #4, !dbg !14
%3 = getelementptr inbounds i8, ptr %2, i64 16, !dbg !14
store ptr %3, ptr addrspace(11) %1, align 8, !dbg !14
store i64 3, ptr addrspace(11) %0, align 8, !dbg !14
%memoryref_data4 = call ptr addrspace(13) @julia.gc_loaded(ptr addrspace(10) %memoryref_mem, ptr %3), !dbg !15
store i64 2, ptr addrspace(13) %memoryref_data4, align 8, !dbg !15, !tbaa !20, !alias.scope !24, !noalias !27
%memoryref_data11 = getelementptr inbounds i8, ptr addrspace(13) %memoryref_data4, i64 8, !dbg !32
store i64 4, ptr addrspace(13) %memoryref_data11, align 8, !dbg !32, !tbaa !20, !alias.scope !24, !noalias !27
%memoryref_data18 = getelementptr inbounds i8, ptr addrspace(13) %memoryref_data4, i64 16, !dbg !34
store i64 5, ptr addrspace(13) %memoryref_data18, align 8, !dbg !34, !tbaa !20, !alias.scope !24, !noalias !27
ret i64 11, !dbg !36
} to. Removing the allocation. Which likely would allow it to just return the 11 define i64 @julia_f_769() #0 !dbg !5 {
top:
%memoryref_mem = alloca [40 x i8], align 16
%pgcstack = call ptr @julia.get_pgcstack()
%current_task1 = getelementptr inbounds i8, ptr %pgcstack, i64 -112, !dbg !14
call void @llvm.lifetime.start.p0(i64 40, ptr %memoryref_mem)
%0 = freeze [40 x i8] undef, !dbg !14
store [40 x i8] %0, ptr %memoryref_mem, align 1, !dbg !14
%1 = getelementptr inbounds { i64, ptr }, ptr %memoryref_mem, i64 0, i32 1, !dbg !14
%2 = getelementptr inbounds i8, ptr %memoryref_mem, i64 16, !dbg !14
store ptr %2, ptr %1, align 8, !dbg !14
store i64 3, ptr %memoryref_mem, align 8, !dbg !14
%memoryref_data4 = call ptr addrspace(13) @julia.gc_loaded(ptr addrspace(10) null, ptr %2), !dbg !15
store i64 2, ptr addrspace(13) %memoryref_data4, align 8, !dbg !15, !tbaa !20, !alias.scope !24, !noalias !27
%memoryref_data11 = getelementptr inbounds i8, ptr addrspace(13) %memoryref_data4, i64 8, !dbg !32
store i64 4, ptr addrspace(13) %memoryref_data11, align 8, !dbg !32, !tbaa !20, !alias.scope !24, !noalias !27
%memoryref_data18 = getelementptr inbounds i8, ptr addrspace(13) %memoryref_data4, i64 16, !dbg !34
store i64 5, ptr addrspace(13) %memoryref_data18, align 8, !dbg !34, !tbaa !20, !alias.scope !24, !noalias !27
ret i64 11, !dbg !36
} |
|
|
6222082
to
b65a483
Compare
b65a483
to
724b8c5
Compare
Can you please add an llvm pass test for #56030 (comment) (removing all memory for a simple case where the Memory object doesn't escape)? |
Do you want an actual LLVM pass, or can I just write a test for 0 allocations? |
I think an llvm test would be more robust, but probably a simple zero-allocation test would do the job as well. |
LOL. This test is so good it broke a doctest in performance tips. We're testing to show that you get allocations if you have "bad" code that allocates arrays, but now it doesn't allocate :laughing |
2c2b098
to
e6e26ab
Compare
This is now on top of #55995 (to figure out why we weren't optimizing correctly), but other than that, I think this is good to go! |
Maybe a test of no allocations in simple cases as discussed above? 🙂 |
dfef35a
to
3e780d3
Compare
@nsajko thanks for the example you gave! Turns out it simplifies to
(which will get added to the tests) |
tuple test fixed (we had invalid TBAA on |
@nanosoldier |
@maleadt @KristofferC any idea why my nanosoldier seems to be stalled? |
It's not. There's other jobs running.
The run also served to test the new |
The package evaluation job you requested has completed - possible new issues were detected. |
@nanosoldier |
429910d
to
e003251
Compare
the NearestNeighbors issue isn't reproducing locally which is a little scary. |
The package evaluation job you requested has completed - possible new issues were detected. |
so I've investigated 3 of these so far, and 2 were package bugs, and one doesn't reproduce locally. we definitely are getting close |
@nanosoldier |
The package evaluation job you requested has completed - possible new issues were detected. |
I don't know if squashing the whole commit history here was great. It's hard to see how different bug fixes were added and if those have correspond tests now etc. |
A concerning number of these are failing from within type inference which is highly unfortunate. I think we're likely constant folding code that we're miscompiling, but it's hard to know for sure. |
There are also these precompile failures in some of the failed packages:
I wonder if there is some corruption that happens that causes the precompile process to not work properly. |
Co-authored-by: Jameson Nash <[email protected]> Co-authored-by: Jeff Bezanson <[email protected]> Co-authored-by: Gabriel Baraldi <[email protected]>
e003251
to
bdb29cf
Compare
memorynew
intrinsicmemorynew
intrinsic for constant length Memory
Attempt to split up #55913 into 2 pieces. This piece now only adds the `memorynew` intrinsic without any of the optimizations enabled by #55913. As such, this PR should be ready to merge now. (and will make #55913 smaller and simpler) --------- Co-authored-by: gbaraldi <[email protected]>
closing in favor of #56847 |
Attempt to split up #55913 into 2 pieces. This piece now only adds the `memorynew` intrinsic without any of the optimizations enabled by #55913. As such, this PR should be ready to merge now. (and will make #55913 smaller and simpler) --------- Co-authored-by: gbaraldi <[email protected]>
replaces #55913 (the rebase was more annoying than starting from scratch) This allows the compiler to better understand what's going on for `memorynew` with compile-time constant length, allowing for LLVM level escape analysis in some cases. There is more room to grow this (currently this only optimizes for fairly small Memory since bigger ones would require writing some more LLVM code, and we probably want a size limit on putting Memory on the stack to avoid stackoverflow. For larger ones, we could potentially inline the free so the Memory doesn't have to be swept by the GC, etc. ``` julia> function g() m = Memory{Int}(undef, 2) for i in 1:2 m[i] = i end m[1]+m[2] end julia> @Btime g() 9.735 ns (1 allocation: 48 bytes) #before 1.719 ns (0 allocations: 0 bytes) #after ```
This speeds up making new
Memory
s and allow the compiler to better understand what's going on, allowing for LLVM level escape analysis in some cases. There is more room to grow this (currently this only optimizes for fairly smallMemory
since bigger ones would require writing some more LLVM code, and we probably want a size limit on puttingMemory
on the stack to avoid stackoverflow. For larger ones, we could potentially inline thefree
so theMemory
doesn't have to be swept by the GC, etc.Benchmarks: