fix: Consistent image ID hashes across machines #711

AdrianGonz97 · 2024-04-01T17:33:53Z

This PR fixes an issue that we've been experiencing in huntabyte/shadcn-svelte#978 where the hash for the generated image ID is different when it's computed on different machines. The goal of that PR was to cache the images so that they can be utilized across our GH runners.

The source of the issue is the mtime stat from the image file, which seemingly differs from machine to machine, causing the hash to generate a different id, resulting in a cache MISS. Instead, mtime has been replaced for the size stat, which (in conjunction with the image config and file url) should provide sufficient uniqueness for the hash.

Quick Checklist

I have read the contributing guidelines
I have written new tests, as applicable (for bug fixes / features)
Docs have been added / updated (for bug fixes / features)
I have added a changeset, if applicable

What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
Bug fix
What is the new behavior (if this is a feature change)?
Implements consistent image ID hashing across machines
Does this PR introduce a breaking change? (What changes might users need to make in their application due to this
PR?)
No
Other information:
I wasn't sure the best way to express this change in a test, so I've hardcoded the expected id string. I'm sure there's a better way to do it, so please feel free to modify it!

changeset-bot · 2024-04-01T17:33:57Z

🦋 Changeset detected

Latest commit: e43d13f

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package

Name	Type
vite-imagetools	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

benmccann · 2024-04-09T22:10:19Z

Hmm, that's really weird. I'm surprised the mtime would be different across machines. Are they different OSes? Or is the mtime set based upon when you check the repo out from git. I'd love to better understand the issue before we switch away from it

codecov · 2024-04-09T22:20:17Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.48%. Comparing base (8790098) to head (a4aec62).
Report is 2 commits behind head on main.

❗ Current head a4aec62 differs from pull request most recent head e43d13f. Consider uploading reports for the commit e43d13f to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #711      +/-   ##
==========================================
- Coverage   95.49%   95.48%   -0.01%     
==========================================
  Files          33       33              
  Lines        1288     1286       -2     
  Branches      226      226              
==========================================
- Hits         1230     1228       -2     
  Misses         58       58

Flag	Coverage Δ
imagetools-core	`95.48% <100.00%> (-0.01%)`	⬇️
vite-imagetools	`95.48% <100.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

AdrianGonz97 · 2024-04-09T22:26:15Z

Hmm, that's really weird. I'm surprised the mtime would be different across machines. Are they different OSes?

It's definitely strange 😄. The OSes are set to be the same as they're running from the same workflow, so I don't think it's that. Perhaps it's more related to timezones? 🤔

Or is the mtime set based upon when you check the repo out from git

I'd be happy to dig into this question when I get a chance!

AdrianGonz97 · 2024-04-10T01:15:11Z

Or is the mtime set based upon when you check the repo out from git

It seems like this assumption was correct, @benmccann! After checking out the repo on 2 separate instances (on the same machine), I'm ending up with two different mtime stats.

File stats 1:

{
  stats: Stats {
    dev: 2080,
    mode: 33188,
    nlink: 1,
    uid: 1000,
    gid: 1000,
    rdev: 0,
    blksize: 4096,
    ino: 768440,
    size: 159643,
    blocks: 312,
    atimeMs: 1712710759640.3115,
    mtimeMs: 1712710733650.3157,
    ctimeMs: 1712710733650.3157,
    birthtimeMs: 1712710733650.3157,
    atime: 2024-04-10T00:59:19.640Z,
    mtime: 2024-04-10T00:58:53.650Z,
    ctime: 2024-04-10T00:58:53.650Z,
    birthtime: 2024-04-10T00:58:53.650Z
  }
}

File stats 2:

{
  stats: Stats {
    dev: 2080,
    mode: 33188,
    nlink: 1,
    uid: 1000,
    gid: 1000,
    rdev: 0,
    blksize: 4096,
    ino: 1598355,
    size: 159643,
    blocks: 312,
    atimeMs: 1712711152420.249,
    mtimeMs: 1712711150750.2493,
    ctimeMs: 1712711150750.2493,
    birthtimeMs: 1712711150750.2493,
    atime: 2024-04-10T01:05:52.420Z,
    mtime: 2024-04-10T01:05:50.750Z,
    ctime: 2024-04-10T01:05:50.750Z,
    birthtime: 2024-04-10T01:05:50.750Z
  }
}

benmccann · 2024-04-28T21:05:36Z

Sorry for the long delay. I'm afraid I'd forgotten about this PR.

I'm nervous about using the file size because you could hit false positives with it. I'm not sure how expensive it would be to compute a hash for every file, but it seems safer.

I had an idea that we could use mtime first and then fallback to the hash if it's different, but that probably wouldn't work because it would require us storing two hashes.

AdrianGonz97 · 2024-04-29T19:59:13Z

I'm nervous about using the file size because you could hit false positives with it.

Fair enough!

I'm not sure how expensive it would be to compute a hash for every file, but it seems safer.

I agree. Testing locally, I modified it to use imageBuffer instead, which had no perceivable performance impact relative to the rest of build.

I also did some microbenchmarking for it (just in case), testing different image sizes to see their impact:

┌─────────┬───────────────────────────────────┬───────────┬────────────────────┬──────────┬─────────┐
│ (index) │ Task Name                         │ ops/sec   │ Average Time (ns)  │ Margin   │ Samples │
├─────────┼───────────────────────────────────┼───────────┼────────────────────┼──────────┼─────────┤
│ 0       │ 'hash with size - base (500kb)'   │ '231,842' │ 4313.26988522398   │ '±0.63%' │ 231843  │
│ 1       │ 'hash with buffer - base (500kb)' │ '5,979'   │ 167226.18963211094 │ '±0.23%' │ 5980    │
│ 2       │ 'hash with buffer - 11mb'         │ '165'     │ 6036291.090361429  │ '±0.43%' │ 166     │
│ 3       │ 'hash with buffer - 5mb'          │ '311'     │ 3215181.397435922  │ '±0.31%' │ 312     │
│ 4       │ 'hash with buffer - 2.5mb'        │ '1,099'   │ 909751.2799999929  │ '±0.20%' │ 1100    │
│ 5       │ 'hash with buffer - 1mb'          │ '2,599'   │ 384681.17346155015 │ '±0.40%' │ 2600    │
│ 6       │ 'hash with buffer - 500kb'        │ '5,875'   │ 170197.13359428805 │ '±0.28%' │ 5876    │
│ 7       │ 'hash with buffer - 350kb'        │ '6,425'   │ 155626.61889200658 │ '±0.25%' │ 6426    │
│ 8       │ 'hash with buffer - 100kb'        │ '16,379'  │ 61051.895238088415 │ '±0.44%' │ 16380   │
└─────────┴───────────────────────────────────┴───────────┴────────────────────┴──────────┴─────────┘

Here's the repo for it too: https://github.com/AdrianGonz97/imagetools-hashing-benchmarks

Hashing with the size (or mtime) stat is significantly faster, however, I would consider hashing with the image to be sufficiently fast enough for our needs. Especially since this operation doesn't execute hundreds of thousands of times per build.

So in practice, I think it's completely fine to hash with the image itself (even without the mtime optimization you described) as the performance impact is miniscule when compared to the rest of the build.

Side note: While writing the benchmark, I noticed that generateImageID is async when it doesn't need to be. ~~Do you have any idea why that's the case?~~ Scratch that, looking back at the history, it seems like it used to await a promise before the caching PR was introduced. Not a problem, I can fix that.

packages/vite/src/utils.ts

benmccann

thanks for all the changes and the thorough job!

AdrianGonz97 added 3 commits April 1, 2024 12:52

add failing test

003c66a

shouldn't depend on mtime

2791ba2

changeset

4e28950

AdrianGonz97 added 2 commits April 1, 2024 16:56

replace mtime stat for size

86b6e0e

update test

d572214

hash with the image instead

57f390a

remove async

a4aec62

benmccann reviewed Apr 30, 2024

View reviewed changes

packages/vite/src/utils.ts Outdated Show resolved Hide resolved

AdrianGonz97 added 2 commits April 30, 2024 16:14

hash the image buffer once

9a64cfd

oops, move it up

edda6c7

benmccann reviewed Apr 30, 2024

View reviewed changes

packages/vite/src/utils.ts Outdated Show resolved Hide resolved

remove url from hash

e43d13f

benmccann approved these changes Apr 30, 2024

View reviewed changes

benmccann merged commit f049b7f into JonasKruckenberg:main Apr 30, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Consistent image ID hashes across machines #711

fix: Consistent image ID hashes across machines #711

AdrianGonz97 commented Apr 1, 2024 •

edited

Loading

changeset-bot bot commented Apr 1, 2024 •

edited

Loading

benmccann commented Apr 9, 2024

codecov bot commented Apr 9, 2024 •

edited

Loading

AdrianGonz97 commented Apr 9, 2024 •

edited

Loading

AdrianGonz97 commented Apr 10, 2024 •

edited

Loading

benmccann commented Apr 28, 2024

AdrianGonz97 commented Apr 29, 2024 •

edited

Loading

benmccann left a comment

fix: Consistent image ID hashes across machines #711

fix: Consistent image ID hashes across machines #711

Conversation

AdrianGonz97 commented Apr 1, 2024 • edited Loading

changeset-bot bot commented Apr 1, 2024 • edited Loading

🦋 Changeset detected

benmccann commented Apr 9, 2024

codecov bot commented Apr 9, 2024 • edited Loading

Codecov Report

AdrianGonz97 commented Apr 9, 2024 • edited Loading

AdrianGonz97 commented Apr 10, 2024 • edited Loading

benmccann commented Apr 28, 2024

AdrianGonz97 commented Apr 29, 2024 • edited Loading

benmccann left a comment

Choose a reason for hiding this comment

AdrianGonz97 commented Apr 1, 2024 •

edited

Loading

changeset-bot bot commented Apr 1, 2024 •

edited

Loading

codecov bot commented Apr 9, 2024 •

edited

Loading

AdrianGonz97 commented Apr 9, 2024 •

edited

Loading

AdrianGonz97 commented Apr 10, 2024 •

edited

Loading

AdrianGonz97 commented Apr 29, 2024 •

edited

Loading