You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As described in more detail in this gist not all types of loop carried dependencies (LCDs) can be detected. E.g., the carried dependency in this loop cannot be detected:
add eax, 1 ; A (deps: E-previous)
add eax, 1 ; B (deps: A)
add eax, 1 ; C (deps: B)
; swap eax, ebx (exploded version of xchg eax, ebx)
mov ecx, eax ; D (deps: C)
mov eax, ebx ; E (deps: F-previous)
mov ebx, ecx ; F (deps: D)
... since it takes two iterations for the loop to complete.
I think this can be fixed fairly simply - you just keep unrolling copies until for each node in the current copy the set of original nodes that feed into every node (i.e., the "reachable from" or "depends on" set) is the same as the previous iteration. So you'll need 3 copies rather than 2 in the common case (where there are no skipped a generation things like the above) - the 3rd copy just verifies that iteration is complete. In this case you'd need 4 copies. In general, the number of copies you need to create should be bounded by the size of the loop (although I don't have a proof of that), so small in practice.
The text was updated successfully, but these errors were encountered:
As described in more detail in this gist not all types of loop carried dependencies (LCDs) can be detected. E.g., the carried dependency in this loop cannot be detected:
... since it takes two iterations for the loop to complete.
I think this can be fixed fairly simply - you just keep unrolling copies until for each node in the current copy the set of original nodes that feed into every node (i.e., the "reachable from" or "depends on" set) is the same as the previous iteration. So you'll need 3 copies rather than 2 in the common case (where there are no skipped a generation things like the above) - the 3rd copy just verifies that iteration is complete. In this case you'd need 4 copies. In general, the number of copies you need to create should be bounded by the size of the loop (although I don't have a proof of that), so small in practice.
The text was updated successfully, but these errors were encountered: