Remove func blocks unifier indirections #774

tilk · 2024-12-10T16:18:41Z

This PR simplifies the announcement mechanism using the dependency system. At the same time, two layers of Collectors for accepting results were flattened to a single collector. The func_blocks_unifier module became trivial, and it might make sense to remove it later.

Benchmark results dropped slightly for some reason, ~~but device utilization also seems to be reduced~~.

github-actions · 2024-12-10T16:35:30Z

Benchmarks summary

Performance benchmarks

aha-mont64	crc32	minver	nettle-sha256	nsichneu	slre	statemate	ud
▼ 0.409 (-0.008)	▼ 0.513 (-0.000)	▼ 0.336 (-0.002)	▼ 0.604 (-0.051)	▼ 0.352 (-0.008)	▼ 0.285 (-0.005)	▼ 0.324 (-0.002)	▼ 0.431 (-0.001)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▼ 13517 (-726)	▼ 4258 (-140)	1456 (0)	1164 (0)	▼ 48 (-6)

Synthesis benchmarks (full)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▼ 20438 (-3037)	▼ 6873 (-140)	▼ 1786 (-32)	1216 (0)	▼ 41 (-0)

github-actions · 2024-12-11T10:14:29Z

Benchmarks summary

Performance benchmarks

aha-mont64	crc32	minver	nettle-sha256	nsichneu	slre	statemate	ud
▼ 0.408 (-0.008)	▼ 0.525 (-0.000)	▼ 0.368 (-0.002)	▼ 0.589 (-0.042)	▼ 0.350 (-0.009)	▼ 0.287 (-0.004)	▼ 0.326 (-0.002)	▼ 0.438 (-0.001)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▲ 14379 (+55)	▼ 4258 (-140)	1456 (0)	1164 (0)	▼ 47 (-5)

Synthesis benchmarks (full)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▲ 24033 (+1866)	▼ 6873 (-140)	▲ 1818 (+32)	1216 (0)	▼ 36 (-9)

piotro888 · 2024-12-13T22:05:57Z

test/backend/test_annoucement.py

@@ -116,30 +113,24 @@ async def producer(sim: TestbenchContext):

    async def consumer(self, sim: TestbenchContext):
        # TODO: this test doesn't do anything, fix it!


Is this comment up-to-date?

I believe so. The condition in while looks to be false in the beginning. Maybe a negation was intended there?

After adding the negation, the test fails. Created issue #775 for this.

piotro888 · 2024-12-13T22:10:09Z

coreblocks/func_blocks/fu/common/rs_func_block.py

        self.insert.proxy(m, self.rs.insert)
        self.select.proxy(m, self.rs.select)
        self.update.proxy(m, self.rs.update)
-        self.get_result.proxy(m, collector.method)


self.get_result should be removed (+ in docstring too)

github-actions · 2024-12-14T10:47:18Z

Benchmarks summary

Performance benchmarks

aha-mont64	crc32	minver	nettle-sha256	nsichneu	slre	statemate	ud
▼ 0.408 (-0.008)	▼ 0.525 (-0.000)	▼ 0.368 (-0.002)	▼ 0.589 (-0.042)	▼ 0.350 (-0.009)	▼ 0.287 (-0.004)	▼ 0.326 (-0.002)	▼ 0.438 (-0.001)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▼ 14015 (-309)	▼ 4258 (-140)	▼ 1424 (-32)	1164 (0)	▲ 53 (+1)

Synthesis benchmarks (full)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▼ 21828 (-339)	▼ 6873 (-140)	▲ 1818 (+32)	1216 (0)	▼ 36 (-9)

github-actions · 2024-12-16T17:59:52Z

Benchmarks summary

Performance benchmarks

aha-mont64	crc32	minver	nettle-sha256	nsichneu	slre	statemate	ud
▼ 0.408 (-0.008)	▼ 0.525 (-0.000)	▼ 0.368 (-0.002)	▼ 0.589 (-0.042)	▼ 0.350 (-0.009)	▼ 0.287 (-0.004)	▼ 0.326 (-0.002)	▼ 0.438 (-0.001)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▼ 13297 (-2602)	▼ 4258 (-140)	1456 (0)	1164 (0)	▲ 53 (+0)

Synthesis benchmarks (full)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▼ 24019 (-2116)	▼ 6873 (-140)	▲ 1818 (+32)	1216 (0)	▼ 38 (-3)

github-actions · 2024-12-16T19:18:56Z

Benchmarks summary

Performance benchmarks

aha-mont64	crc32	minver	nettle-sha256	nsichneu	slre	statemate	ud
▼ 0.408 (-0.008)	▼ 0.525 (-0.000)	▼ 0.368 (-0.002)	▼ 0.589 (-0.042)	▼ 0.350 (-0.009)	▼ 0.287 (-0.004)	▼ 0.326 (-0.002)	▼ 0.438 (-0.001)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▲ 17139 (+1240)	▼ 4258 (-140)	▼ 1424 (-32)	1164 (0)	▼ 47 (-7)

Synthesis benchmarks (full)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▼ 21258 (-4877)	▼ 6873 (-140)	1786 (0)	1216 (0)	▼ 33 (-8)

lekcyjna123 · 2024-12-25T15:58:38Z

The Fmax drop is a little bit worrying and it looks like the FuncBlockUnifier is on critical path now. I checked the synthesis results (https://github.com/kuznia-rdzeni/coreblocks/actions/runs/12359494095/job/34492356220) and it looks like:

In sync->sync most of 30ns critical path is travelling between different parts of unifier.
We should add registers on wishbone input from gpio. It takes 13ns to route data from GPIO and this impact LSU scheduling.
There is a path from wishbone GPIO, LSU, FuncUnitResultKey_unifier, RF to the CSR. All that in one cycle.

lekcyjna123 · 2024-12-25T16:14:35Z

coreblocks/func_blocks/interface/func_blocks_unifier.py

@@ -18,19 +17,10 @@ def __init__(
    ):
        self.rs_blocks = [(block.get_module(gen_params), block.get_optypes()) for block in blocks]

-        self.result_collector = Collector([block.get_result for block, _ in self.rs_blocks])


If I correctly see, removing that Collector cause that all FUs get_results methods are joined with the announcement methods (so with RS and RF), which make scheduling more complex and critical path longer. In Collector there is hidden a Forwarder which cut the critical path on data.

lekcyjna123

LNGTM

The changes are affecting Fmax and make transactron network more complicated. Additionaly they removed buffers in announcement.

lekcyjna123 · 2024-12-25T16:19:06Z

coreblocks/interface/keys.py

@@ -52,6 +52,16 @@ class FetchResumeKey(UnifierKey, unifier=Collector):
    pass


+@dataclass(frozen=True)


Maybe we should start adding the doc strings to our keys? In practice they are a global variables and we haven't documented them...

lekcyjna123 · 2024-12-25T16:24:08Z

coreblocks/func_blocks/fu/common/rs_func_block.py

@@ -87,12 +85,9 @@ def elaborate(self, platform):
            m.submodules[f"func_unit_{n}"] = func_unit
            m.submodules[f"wakeup_select_{n}"] = wakeup_select

-        m.submodules.collector = collector = Collector([func_unit.accept for func_unit, _ in self.func_units])


This also complicates the transactron network. Probably its is connected with observed IPC drop. Previously, when two results were ready in the same cycle, one have been announced and second stored in Forwarder for a cycle, what made the FU ready to process the next instruction. Now the FU have to stall till it can push out its result.

tilk added 4 commits December 10, 2024 16:29

Remove result collectors per RS block

c09fb33

Announcement key

3700089

Announce directly from func blocks

e0cc9bf

Lint and tests

4532478

tilk added refactor Doesn't change functionality, but makes stuff nicer benchmark Benchmarks should be run for this change labels Dec 10, 2024

Merge branch 'master' into remove_func_blocks_unifier_indirections

1305549

piotro888 approved these changes Dec 13, 2024

View reviewed changes

tilk and others added 2 commits December 16, 2024 20:04

Remove remnants of get_result

576adff

LZA (Leading zeros anticipation) (kuznia-rdzeni#741)

df115b7

tilk force-pushed the remove_func_blocks_unifier_indirections branch from afc3583 to df115b7 Compare December 16, 2024 19:05

lekcyjna123 reviewed Dec 25, 2024

View reviewed changes

lekcyjna123 requested changes Dec 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove func blocks unifier indirections #774

Remove func blocks unifier indirections #774

tilk commented Dec 10, 2024 •

edited

Loading

github-actions bot commented Dec 10, 2024

github-actions bot commented Dec 11, 2024

piotro888 Dec 13, 2024

tilk Dec 13, 2024

tilk Dec 14, 2024

piotro888 Dec 13, 2024

tilk Dec 14, 2024

github-actions bot commented Dec 14, 2024

github-actions bot commented Dec 16, 2024

github-actions bot commented Dec 16, 2024

lekcyjna123 commented Dec 25, 2024

lekcyjna123 Dec 25, 2024

lekcyjna123 left a comment

lekcyjna123 Dec 25, 2024

lekcyjna123 Dec 25, 2024

		@@ -116,30 +113,24 @@ async def producer(sim: TestbenchContext):

		async def consumer(self, sim: TestbenchContext):
		# TODO: this test doesn't do anything, fix it!

		@@ -52,6 +52,16 @@ class FetchResumeKey(UnifierKey, unifier=Collector):
		pass


		@dataclass(frozen=True)

Remove func blocks unifier indirections #774

Are you sure you want to change the base?

Remove func blocks unifier indirections #774

Conversation

tilk commented Dec 10, 2024 • edited Loading

github-actions bot commented Dec 10, 2024

Benchmarks summary

Performance benchmarks

Synthesis benchmarks (basic)

Synthesis benchmarks (full)

github-actions bot commented Dec 11, 2024

Benchmarks summary

Performance benchmarks

Synthesis benchmarks (basic)

Synthesis benchmarks (full)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Dec 14, 2024

Benchmarks summary

Performance benchmarks

Synthesis benchmarks (basic)

Synthesis benchmarks (full)

github-actions bot commented Dec 16, 2024

Benchmarks summary

Performance benchmarks

Synthesis benchmarks (basic)

Synthesis benchmarks (full)

github-actions bot commented Dec 16, 2024

Benchmarks summary

Performance benchmarks

Synthesis benchmarks (basic)

Synthesis benchmarks (full)

lekcyjna123 commented Dec 25, 2024

Choose a reason for hiding this comment

lekcyjna123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tilk commented Dec 10, 2024 •

edited

Loading