Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compiler: ensure iterators are inlined in -opt=z #4512

Open
eliasnaur opened this issue Oct 9, 2024 · 6 comments
Open

compiler: ensure iterators are inlined in -opt=z #4512

eliasnaur opened this issue Oct 9, 2024 · 6 comments

Comments

@eliasnaur
Copy link
Contributor

$ cat iter_test.go
package inlineiter

import (
	"iter"
	"testing"
)

var slice = make([]int, 1000)

func TestIterAllocs(t *testing.T) {
	tests := []struct {
		name string
		body func() int
	}{
		{
			"for-range-func",
			func() int {
				sum := 0
				for v := range Iterator([]int{0, 1, 2}) {
					sum += v
				}
				return sum
			},
		},
		{
			"callback-func",
			func() int {
				sum := 0
				Iterator([]int{0, 1, 2})(func(v int) bool {
					sum += v
					return true
				})
				return sum
			},
		},
	}
	for _, test := range tests {
		t.Run(test.name, func(t *testing.T) {
			allocs := testing.Benchmark(func(b *testing.B) {
				for range b.N {
					sum := test.body()
					if sum != 3 {
						t.Errorf("sum %d, want 3", sum)
					}
				}
			}).AllocsPerOp()
			if allocs > 0 {
				t.Errorf("Layout allocates %d, expected %d", allocs, 0)
			}
		})
	}
}

func Iterator(ints []int) iter.Seq[int] {
	return func(yield func(int) bool) {
		for _, v := range ints {
			if !yield(v) {
				break
			}
		}
	}
}
$ go test
PASS
ok  	seedhammer.com/inlineiter	4.616s
$ tinygo test
--- FAIL: TestIterAllocs (3.08s)
    --- FAIL: TestIterAllocs/for-range-func (1.58s)
        Layout allocates 5, expected 0
    --- FAIL: TestIterAllocs/callback-func (1.50s)
        Layout allocates 3, expected 0
FAIL
FAIL	seedhammer.com/inlineiter	3.297s
$ tinygo test -opt 2
ok  	seedhammer.com/inlineiter	3.211s

The idiom for iterator constructors is to return a closure that captures the iterator arguments, if any. It's crucial for the performance of range-over-func that the constructor doesn't incur allocations. Therefore, it seems to me functions that (nearly) immediately return a closure should always be inlined, regardless of optimization level. Or, at the very least, any function that returns a function that can be ranged over.

@dgryski
Copy link
Member

dgryski commented Oct 21, 2024

Probably we'll need to figure out exactly which optimization pass(es) are responsible for this particular optimization and hope that turning just those ones on is sufficient to make this "always work".

@aykevl
Copy link
Member

aykevl commented Oct 22, 2024

While I agree it would be great to fix this, in general you can't really rely on the compiler optimizing anything (including heap allocs).

Also note that -opt=z (the default) optimizes for code size, and not much else. So inlining would have to reduce or at least not increase code size. If you want to optimize for performance, you need to explicitly use -opt=2.

@aykevl
Copy link
Member

aykevl commented Oct 22, 2024

I did some more investigation and it looks like everything is inlined as expected (even at -opt=z), but there's indeed one allocation left. What I suspect is happening is that while this allocation could be optimized away, the inlining happens after our heap-to-stack optimization pass.

I think the right way to fix this is by using the heap-to-stack transform that's part of the LLVM Attributor pass. Unfortunately that pass doesn't know that pointers returned by our runtime.alloc pass will never explicitly be freed. That means the heap-to-stack transform in the Attributor pass isn't as effective as it could be. I might write a patch to LLVM to fix this.

@aykevl
Copy link
Member

aykevl commented Oct 22, 2024

Wrote a patch: llvm/llvm-project#113299

@eliasnaur
Copy link
Contributor Author

While I agree it would be great to fix this, in general you can't really rely on the compiler optimizing anything (including heap allocs).

Also note that -opt=z (the default) optimizes for code size, and not much else. So inlining would have to reduce or at least not increase code size. If you want to optimize for performance, you need to explicitly use -opt=2.

"Performance" is actually not even that important for me. The deal breaker for my uses is the tendency of (typically small) frequent allocations to fragment the heap so much that out-of-memory panics occur when a larger allocation arrives. My current strategy is to eliminate allocations for frequently run code (GUI refreshes) which both avoids fragmentation and is faster.

A longer term strategy is to not fragment the heap so much, but as you wrote somewhere a moving collector is difficult and won't help performance.

Thank you for the LLVM patch. Do I have to do anything special on the TinyGo side to try it out, other than patching LLVM and adding the "nofree" attribute? Also, is the comment "for your own purpose, you can disable/filter out H2S AA explicitly when you construct the attributor" relevant to achieve better H2S in TinyGo?

@aykevl
Copy link
Member

aykevl commented Oct 23, 2024

Do I have to do anything special on the TinyGo side to try it out, other than patching LLVM and adding the "nofree" attribute?

Yes. I have not tested this patch with TinyGo yet, some more LLVM work might be needed (I just don't know).
Also, I think the Attributor is opt-in at the moment so it would need to be enabled specifically.

Also, is the comment "for your own purpose, you can disable/filter out H2S AA explicitly when you construct the attributor" relevant to achieve better H2S in TinyGo?

I actually don't know what that comment really means, or what it is referring to. I've asked for clarification.
(Yes I know it can be disabled, but why are they saying that?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants