Match CPU/GPU logic for start tangents #705

raphlinus · 2024-09-26T20:42:23Z

When encoding a start tangent for an end cap or closed segment, use logic designed to match the GPU computed tangent for the first segment.

Note: this changes the GPU calculation to write out the lerp calculation explicitly rather than use the mix intrinsic, so we can rely on the rounding behavior. In the presence of fastmath, the rounding behavior is not guaranteed, but it is verified to work on M1.

Fixes #704, unless we get bitten by fastmath.

When encoding a start tangent for an end cap or closed segment, use logic designed to match the GPU computed tangent for the first segment. Note: this changes the GPU calculation to write out the lerp calculation explicitly rather than use the mix intrinsic, so we can rely on the rounding behavior. In the presence of fastmath, the rounding behavior is not guaranteed, but it is verified to work on M1. Fixes #704, unless we get bitten by fastmath.

raphlinus · 2024-09-26T20:50:13Z

The current state of this is "works for me," and I'm feeling a bit uncertain whether it's fully portable. If we do get hit by fastmath issues, which is entirely plausible given the weak guarantees of the language, then I'm inclined to take this in a somewhat different direction: revert the encoding changes so that the final segment writes the unmodified point values, but change the GPU-side tangent calculation so that it's based on the encoded values rather than the cubic points after degree raising.

So I'm uploading this hoping to get more feedback. If it generally works, I think we can go forward.

DJMcNab · 2024-10-01T09:42:45Z

As I mentioned on Zulip, I still see some watertightness failures detected by #416 with this PR. Are those out-of-scope for this change?

I would also like to see a snapshot test for the longpathdash test scene added (ideally validated as showing the failure)

raphlinus · 2024-10-02T14:21:21Z

I'll dig into this. I do think fixing watertightness issues is in scope, but also would consider unblocking 0.3 if we have no known regressions.

Use the explicit `fma()` intrinsic to express affine transformations. This reduces the chance that the compiler will compile different instances of the same source expression with different association/rounding behavior, and in particular reduces the incidence of watertightness failures with a rotation transform.

raphlinus · 2024-10-02T21:19:37Z

I've done a fair amount of investigation.

I did implement the calculation of tangents before degree raising. It didn't make a significant difference, so I'm not sure it's worth preparing the patch. It is possible that on some GPU it would be an improvement.

The watertightness issues get significantly worse when there is a rotation transform. I have a patch (which I will upload as a commit to this PR) that changes transform_apply (in flatten.wgsl) to use fma() instead of multiplication and addition, and that behavior goes away. That is extremely strong evidence that we're being hit by fastmath, and that in particular the same calculation in the source is being compiled with different rounding behavior in different callsites.

I'm doing some more exploration, but I'm inclined to declare victory soon, as we have a state that is not a regression and is in fact considerably better (with rotations) than any previous state. To solve this completely systematically may require a profound rethink - generate path points once then use indirection to use the same point as both the line start of one segment and line end of another, which would guarantee watertightness by construction.

This adds a test. It's clipped to 250x250 otherwise it would hit the oversize file check, but I think it's good enough to catch serious regressions.

Also tweak threshold so it catches the failure.

raphlinus · 2024-10-02T22:15:55Z

As of the most recent commit, the new test is validated by failing on main.

DJMcNab

I can't meaningfully review the maths here, but I can confirm it does resolve the artifacts

vello_encoding/src/path.rs

Improve docstring Co-authored-by: Daniel McNab <[email protected]>

Adds #706 and #711 to the changelog, as they are somewhat major changes. I've not added #705, as it seemed to be a regression from 0.2.0 rather than a relevant change

PoignardAzur · 2024-10-05T10:51:18Z

Okay, so I'm a bit late to review this, but one nitpick:

This PR says it matches CPU and GPU logic. But the main change in shader code is:

    // Degree-raise
    if seg_type == PATH_TAG_LINETO {
        p3 = p1;
        p2 = p3 + (1.0 / 3.0) * (p0 - p3);
        p1 = p0 + (1.0 / 3.0) * (p3 - p0);
    } else if seg_type == PATH_TAG_QUADTO {
        p3 = p2;
        p2 = p1 + (1.0 / 3.0) * (p2 - p1);
        p1 = p1 + (1.0 / 3.0) * (p0 - p1);
    }

Whereas the equivalent code in the CPU shader in vello_shaders/src/cpu/flatten.rs is still:

    // Degree-raise
    if seg_type == PATH_TAG_LINETO {
        p3 = p1;
        p2 = p3.mix(p0, 1.0 / 3.0);
        p1 = p0.mix(p3, 1.0 / 3.0);
    } else if seg_type == PATH_TAG_QUADTO {
        p3 = p2;
        p2 = p1.mix(p2, 1.0 / 3.0);
        p1 = p1.mix(p0, 1.0 / 3.0);
    }

That is, same as the old GPU code.

Not sure what's going on here. Probably the CPU shader should be edited to match the change in the GPU shader?

DJMcNab · 2024-10-05T10:56:12Z

Those look the same to me? Rust doesn't have fast math.

Unless you're claiming that mix isn't the same operation as the CPU side implementation?

PoignardAzur · 2024-10-05T11:02:49Z

Well the GPU side went from using mix to explicitly using operators. I assumed there was a reason for that. Maybe I'm reading too much into it.

DJMcNab · 2024-10-08T08:09:03Z

Note: this changes the GPU calculation to write out the lerp calculation explicitly rather than use the mix intrinsic, so we can rely on the rounding behavior. In the presence of fastmath, the rounding behavior is not guaranteed, but it is verified to work on M1.

From the pr description

PoignardAzur · 2024-10-08T08:25:44Z

I should really read PR descriptions more often. I don't know when I got the habit of not doing that.

DJMcNab added this to the Vello 0.3 release milestone Sep 27, 2024

raphlinus added 2 commits October 2, 2024 14:55

Add longpathdash test

a0af67e

This adds a test. It's clipped to 250x250 otherwise it would hit the oversize file check, but I think it's good enough to catch serious regressions.

Change longpath test image size to include artifact

8853b28

Also tweak threshold so it catches the failure.

Additional tweaking to snapshot size and threshold

391b25d

DJMcNab approved these changes Oct 3, 2024

View reviewed changes

vello_encoding/src/path.rs Outdated Show resolved Hide resolved

Update vello_encoding/src/path.rs

bae8049

Improve docstring Co-authored-by: Daniel McNab <[email protected]>

raphlinus added this pull request to the merge queue Oct 3, 2024

Merged via the queue into main with commit 30c5e8e Oct 3, 2024
17 checks passed

raphlinus deleted the more_watertight branch October 3, 2024 16:10

DJMcNab mentioned this pull request Oct 4, 2024

Some final changelog items for 0.3.0 #714

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Match CPU/GPU logic for start tangents #705

Match CPU/GPU logic for start tangents #705

raphlinus commented Sep 26, 2024

raphlinus commented Sep 26, 2024

DJMcNab commented Oct 1, 2024

raphlinus commented Oct 2, 2024

raphlinus commented Oct 2, 2024

raphlinus commented Oct 2, 2024

DJMcNab left a comment

PoignardAzur commented Oct 5, 2024

DJMcNab commented Oct 5, 2024 •

edited

Loading

PoignardAzur commented Oct 5, 2024

DJMcNab commented Oct 8, 2024

PoignardAzur commented Oct 8, 2024

Match CPU/GPU logic for start tangents #705

Match CPU/GPU logic for start tangents #705

Conversation

raphlinus commented Sep 26, 2024

raphlinus commented Sep 26, 2024

DJMcNab commented Oct 1, 2024

raphlinus commented Oct 2, 2024

raphlinus commented Oct 2, 2024

raphlinus commented Oct 2, 2024

DJMcNab left a comment

Choose a reason for hiding this comment

PoignardAzur commented Oct 5, 2024

DJMcNab commented Oct 5, 2024 • edited Loading

PoignardAzur commented Oct 5, 2024

DJMcNab commented Oct 8, 2024

PoignardAzur commented Oct 8, 2024

DJMcNab commented Oct 5, 2024 •

edited

Loading