Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache regression tests in workflow #495

Merged
merged 11 commits into from
Nov 14, 2023

Conversation

JumiDeluxe
Copy link
Member

Targets #476

Replaced building test/external/riscv-tests directory contents in every run, it is now done on cache miss.
They are now stored using workflow cache instead of artifacts.

@JumiDeluxe JumiDeluxe added the infrastructure CI, testing, etc. label Oct 30, 2023
@JumiDeluxe JumiDeluxe marked this pull request as ready for review October 30, 2023 13:59
Copy link
Contributor

@lekcyjna123 lekcyjna123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. Only one change is needed.

.github/workflows/main.yml Outdated Show resolved Hide resolved
Comment on lines 36 to 37
${{ runner.os }}-build-
${{ runner.os }}-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we try to restore files from caches that do not match env.cache-name?

If I understand correctly, it could load files from other workflow caches that would use similar naming scheme (Linux*). I don't think this is helpful, and potentially could break things.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I've added this cache before looking into this repo existing caches, it's true that this could cause problem in this case, also for future cases.

For now restore keys are:

restore-keys: |
            ${{ env.cache-name }}-${{ runner.os }}-${{ job.container.id }}-
            ${{ env.cache-name }}-${{ runner.os }}-

It is not surprising that including only the first restore key entry caused jobs to fail, as test running jobs executed in container with different ID than the ones building/caching them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in the second action we should depend only on ${{ env.cache-name }}-${{ runner.os }}- restore-keys to download the test? Exact key could never be matched, because we don't know the tests file hash - it will always fall back to restore key - which is the intended behaviour -> download the latest test files from current branch or master.

In this case some always invalid key: could be used. Then fileHash would also not crash the CI.

The ${{ job.container.id }}- should not be used here, because it is independent docker image, that doesn't change the uploaded files.

What do you think?

Copy link
Member Author

@JumiDeluxe JumiDeluxe Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed fileHash to calculate contents of test/external/riscv-tests directory.

It was empty before because it just tried to calculate hash from files that didn't exist before build. And after the change it crashed in fetching phase because it tried to follow encoding.h, which is a symlink to a submodule that wasn't fetched in that job.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid it is still incorrect :(

Build action after compilation pushes cache with id: ...-(riscv-toolchain docker id)-(hash of directory with asm test)
Then Run action downloads it, and pushes it again to second cache ...-(verilator docker id)-(hash of directory with only asm tests). This is the id that would be checked next time in Run action.

I don't know if it is the intended design, in my opinion pushing two caches is not necessary; and in Run action only last/correct cache from Build needs to be downloaded with the same key (and all following problems avoided) (we don't care about ventilator docker - because it is not used to build tests and not outputting anything to cache)

In any case, the current solution would not work when riscv-toolchain changes, because the second cache would still hit (id is not changed in any way)!

I'm not entirely sure about it, but it looks like if tests are updated, then Run action is a cache miss, but according to current priority it would select ${{ env.cache-name }}-${{ runner.os }}-${{ job.container.id }}- prefix first - so the same previous Run cache with different file hash, and not update to new build files.

There is also a warning in logs: Run regression tests Unexpected input(s) 'submodules', valid inputs are ['path', 'key', 'restore-keys', 'upload-chunk-size', 'enableCrossOsArchive', 'fail-on-cache-miss', 'lookup-only']
submodules should be in checkout stage, that also explains why file hashes are different in both actions.

@JumiDeluxe JumiDeluxe force-pushed the ci-cache-regression-tests branch 3 times, most recently from 9659cfc to e5eeddd Compare October 30, 2023 18:40
@JumiDeluxe JumiDeluxe requested a review from piotro888 October 30, 2023 18:50
@piotro888
Copy link
Member

piotro888 commented Oct 30, 2023

This was not working correctly because Build regression tests and Run regression tests uses different docker images. Currently, in second workflow, cache id is partially matched and re-uploaded with second docker id. (EDIT: It works, but if we want to change that, using hash of first Dockerfile would be needed instead of image id)

Also the last part of cache id (hashFiles) doesn't seem to work at all: from logs Cache saved with key: cache-regression-tests-Linux-d2ea333ee3c65f5c8876f565d3f8c7a5f3a7669f4cc2a67cd8a23c1f1fd36c63-

@JumiDeluxe JumiDeluxe force-pushed the ci-cache-regression-tests branch 4 times, most recently from d0ce05b to df5f9e0 Compare October 30, 2023 20:19
@JumiDeluxe JumiDeluxe force-pushed the ci-cache-regression-tests branch from df5f9e0 to 8061367 Compare October 30, 2023 20:38
@lekcyjna123 lekcyjna123 self-requested a review November 2, 2023 17:02
@lekcyjna123
Copy link
Contributor

Please check JumiDeluxe#1 where I added proposition to fix problems reported by @piotro888

@tilk
Copy link
Member

tilk commented Nov 5, 2023

I'm sorry this is this late, but I have a little objection. Artifacts do have one feature that caches don't have: they can be downloaded. This is somewhat useful here, as one can get the compiled tests without the need to have the toolchain installed locally. Is there any way to keep the cache, but still be able to download the files if needed?

Copy link
Member

@piotro888 piotro888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on meeting, it still doesn't work correctly.
The hashFiles part of key is different in Build than Run action when fetching cache.

@lekcyjna123 found the probable reason for it - when fetching submodules .git directory produces different hash each time.

This behaviour causes Run action to firstly fetch the cache from partial match via restore key and then reupload it with its own id at end of action (because of cache miss). This could cause fetching the outdated Run cache (that was reuploaded with Run id) next time, instead of Build cache (that could change and be updated).

Solution for it would be to exclude .git directory form hasFiles and the keys should always fully match (using ** at end of path was suggested on meeting to ignore dot-dirs (to verify) )

.github/workflows/main.yml Outdated Show resolved Hide resolved
.github/workflows/main.yml Outdated Show resolved Hide resolved
@JumiDeluxe JumiDeluxe marked this pull request as draft November 12, 2023 20:03
@JumiDeluxe JumiDeluxe force-pushed the ci-cache-regression-tests branch 2 times, most recently from 9801315 to b67f78e Compare November 13, 2023 09:27
@JumiDeluxe JumiDeluxe force-pushed the ci-cache-regression-tests branch 6 times, most recently from 61cfe04 to 4eb22e9 Compare November 13, 2023 11:20
test artifacts with dummy container id
@JumiDeluxe JumiDeluxe force-pushed the ci-cache-regression-tests branch from 4eb22e9 to b46df40 Compare November 13, 2023 11:23
@JumiDeluxe JumiDeluxe marked this pull request as ready for review November 13, 2023 11:23
@JumiDeluxe
Copy link
Member Author

I'm sorry this is this late, but I have a little objection. Artifacts do have one feature that caches don't have: they can be downloaded. This is somewhat useful here, as one can get the compiled tests without the need to have the toolchain installed locally. Is there any way to keep the cache, but still be able to download the files if needed?

I've added conditional step that uploads artifacts, it is executed on cache miss, after build

      - if: ${{ steps.cache-regression.outputs.cache-hit != 'true' }}
        run: cd test/external/riscv-tests && make

      - if: ${{ steps.cache-regression.outputs.cache-hit != 'true' }}
        name: Upload riscv-tests
        uses: actions/upload-artifact@v3
        with:
          path: test/external/riscv-tests

@JumiDeluxe JumiDeluxe requested a review from piotro888 November 13, 2023 11:28
@JumiDeluxe
Copy link
Member Author

JumiDeluxe commented Nov 13, 2023

❯ git ls-remote https://github.com/riscv-software-src/riscv-tests.git HEAD
bd0a19c136927eaa3b7296a591a896c141affb6b        HEAD

@JumiDeluxe JumiDeluxe force-pushed the ci-cache-regression-tests branch 2 times, most recently from 477e688 to a52a0ba Compare November 13, 2023 14:09
@JumiDeluxe JumiDeluxe force-pushed the ci-cache-regression-tests branch from a52a0ba to 1596cf6 Compare November 13, 2023 14:22
@JumiDeluxe JumiDeluxe force-pushed the ci-cache-regression-tests branch from 1596cf6 to 10d8b4c Compare November 13, 2023 14:26
@JumiDeluxe JumiDeluxe requested a review from piotro888 November 13, 2023 14:41
@JumiDeluxe JumiDeluxe force-pushed the ci-cache-regression-tests branch from a94d7a8 to d5e7e0f Compare November 13, 2023 15:04
@tilk tilk merged commit 8454fb8 into kuznia-rdzeni:master Nov 14, 2023
5 checks passed
github-actions bot pushed a commit that referenced this pull request Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infrastructure CI, testing, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants