compiler: ensure iterators are inlined in -opt=z #4512

eliasnaur · 2024-10-09T17:56:05Z

$ cat iter_test.go
package inlineiter

import (
	"iter"
	"testing"
)

var slice = make([]int, 1000)

func TestIterAllocs(t *testing.T) {
	tests := []struct {
		name string
		body func() int
	}{
		{
			"for-range-func",
			func() int {
				sum := 0
				for v := range Iterator([]int{0, 1, 2}) {
					sum += v
				}
				return sum
			},
		},
		{
			"callback-func",
			func() int {
				sum := 0
				Iterator([]int{0, 1, 2})(func(v int) bool {
					sum += v
					return true
				})
				return sum
			},
		},
	}
	for _, test := range tests {
		t.Run(test.name, func(t *testing.T) {
			allocs := testing.Benchmark(func(b *testing.B) {
				for range b.N {
					sum := test.body()
					if sum != 3 {
						t.Errorf("sum %d, want 3", sum)
					}
				}
			}).AllocsPerOp()
			if allocs > 0 {
				t.Errorf("Layout allocates %d, expected %d", allocs, 0)
			}
		})
	}
}

func Iterator(ints []int) iter.Seq[int] {
	return func(yield func(int) bool) {
		for _, v := range ints {
			if !yield(v) {
				break
			}
		}
	}
}
$ go test
PASS
ok  	seedhammer.com/inlineiter	4.616s
$ tinygo test
--- FAIL: TestIterAllocs (3.08s)
    --- FAIL: TestIterAllocs/for-range-func (1.58s)
        Layout allocates 5, expected 0
    --- FAIL: TestIterAllocs/callback-func (1.50s)
        Layout allocates 3, expected 0
FAIL
FAIL	seedhammer.com/inlineiter	3.297s
$ tinygo test -opt 2
ok  	seedhammer.com/inlineiter	3.211s

The idiom for iterator constructors is to return a closure that captures the iterator arguments, if any. It's crucial for the performance of range-over-func that the constructor doesn't incur allocations. Therefore, it seems to me functions that (nearly) immediately return a closure should always be inlined, regardless of optimization level. Or, at the very least, any function that returns a function that can be ranged over.

dgryski · 2024-10-21T19:56:40Z

Probably we'll need to figure out exactly which optimization pass(es) are responsible for this particular optimization and hope that turning just those ones on is sufficient to make this "always work".

aykevl · 2024-10-22T08:28:58Z

While I agree it would be great to fix this, in general you can't really rely on the compiler optimizing anything (including heap allocs).

Also note that -opt=z (the default) optimizes for code size, and not much else. So inlining would have to reduce or at least not increase code size. If you want to optimize for performance, you need to explicitly use -opt=2.

aykevl · 2024-10-22T09:50:35Z

I did some more investigation and it looks like everything is inlined as expected (even at -opt=z), but there's indeed one allocation left. What I suspect is happening is that while this allocation could be optimized away, the inlining happens after our heap-to-stack optimization pass.

I think the right way to fix this is by using the heap-to-stack transform that's part of the LLVM Attributor pass. Unfortunately that pass doesn't know that pointers returned by our runtime.alloc pass will never explicitly be freed. That means the heap-to-stack transform in the Attributor pass isn't as effective as it could be. I might write a patch to LLVM to fix this.

aykevl · 2024-10-22T11:33:13Z

Wrote a patch: llvm/llvm-project#113299

eliasnaur · 2024-10-23T15:46:37Z

While I agree it would be great to fix this, in general you can't really rely on the compiler optimizing anything (including heap allocs).

Also note that -opt=z (the default) optimizes for code size, and not much else. So inlining would have to reduce or at least not increase code size. If you want to optimize for performance, you need to explicitly use -opt=2.

"Performance" is actually not even that important for me. The deal breaker for my uses is the tendency of (typically small) frequent allocations to fragment the heap so much that out-of-memory panics occur when a larger allocation arrives. My current strategy is to eliminate allocations for frequently run code (GUI refreshes) which both avoids fragmentation and is faster.

A longer term strategy is to not fragment the heap so much, but as you wrote somewhere a moving collector is difficult and won't help performance.

Thank you for the LLVM patch. Do I have to do anything special on the TinyGo side to try it out, other than patching LLVM and adding the "nofree" attribute? Also, is the comment "for your own purpose, you can disable/filter out H2S AA explicitly when you construct the attributor" relevant to achieve better H2S in TinyGo?

aykevl · 2024-10-23T17:53:53Z

Do I have to do anything special on the TinyGo side to try it out, other than patching LLVM and adding the "nofree" attribute?

Yes. I have not tested this patch with TinyGo yet, some more LLVM work might be needed (I just don't know).
Also, I think the Attributor is opt-in at the moment so it would need to be enabled specifically.

Also, is the comment "for your own purpose, you can disable/filter out H2S AA explicitly when you construct the attributor" relevant to achieve better H2S in TinyGo?

I actually don't know what that comment really means, or what it is referring to. I've asked for clarification.
(Yes I know it can be disabled, but why are they saying that?)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compiler: ensure iterators are inlined in -opt=z #4512

compiler: ensure iterators are inlined in -opt=z #4512

eliasnaur commented Oct 9, 2024

dgryski commented Oct 21, 2024

aykevl commented Oct 22, 2024

aykevl commented Oct 22, 2024

aykevl commented Oct 22, 2024

eliasnaur commented Oct 23, 2024

aykevl commented Oct 23, 2024

compiler: ensure iterators are inlined in -opt=z #4512

compiler: ensure iterators are inlined in -opt=z #4512

Comments

eliasnaur commented Oct 9, 2024

dgryski commented Oct 21, 2024

aykevl commented Oct 22, 2024

aykevl commented Oct 22, 2024

aykevl commented Oct 22, 2024

eliasnaur commented Oct 23, 2024

aykevl commented Oct 23, 2024