-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a template for Android test spec #7091
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7091
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit ac48848 with merge base 0a12e33 (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@pytorchbot drci |
Nevermind, I get the number wrong. I setup the job so that it retries 3 times when it fails (not finding the JSON results), so that 2-hour figure was the between the first and last upload. Individually, each upload took only like 5 minutes. It's a false alarm. |
Looking carefully at https://github.com/pytorch/executorch/actions/runs/12043268246/job/33579733419, it looks like the test finished successfully, but the teardown done by AWS failed and I didn't see any result JSON in the spec. |
I think the test runs sort of ok on samsung s24 https://github.com/pytorch/executorch/actions/runs/12053919612/job/33614078559. The large model is not a blocker anymore which is the scope of this PR. I don't see any benchmark results JSON file in the output though. Maybe there is a problem in the benchmark app, which could be investigated separately after we land this PR. Note that the smaller stories model runs fine and returns the benchmark results as usual https://github.com/pytorch/executorch/actions/runs/12053946075, so I'm confident that this new spec works correctly. |
.github/workflows/android-perf.yml
Outdated
export-models: | ||
name: export-models | ||
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main | ||
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's new in _v2.yml?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It supports manywheel 2.28 (what PyTorch is moving to). For example, arm binaries are now using this new format #7080. During my testing, I saw an error on this job about old v.s. new wheel format. So, I just move it to v2.
This has been delay on PyTorch side until the next release, so this is kind of not needed right now. Let me put it back.
On the other hand, this is ok too because manywheel 2.28 is backward compatible with the current older format.
@@ -45,50 +29,3 @@ jobs: | |||
models: stories110M | |||
devices: samsung_galaxy_s22 | |||
delegates: xnnpack | |||
test_spec: https://gha-artifacts.s3.amazonaws.com/${{ github.repository }}/${{ github.run_id }}/artifacts/android-llm-device-farm-test-spec.yml | |||
|
|||
upload-android-test-spec: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the upload step is moved to android-perf.yml right after the test-spec is specialized and instantiated, w/o validation. Given that, do we still need to keep a validation step here? How would it catch and prevent bad test-spec being uploaded and used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this new setup, the specialized test spec will just be an artifact of the current job uploaded at https://gha-artifacts.s3.amazonaws.com/${{ github.repository }}/${{ github.run_id }}/artifacts/${{ matrix.model }}_${{ matrix.delegate }}/android-llm-device-farm-test-spec.yml
. If the spec is bad, the subsequent benchmark job would fail (and give a red signals on PR).
The validation workflow here now acts just as a trigger to call android-perf workflow when the spec template changes.
Finally, the upload step here is being deleted (not moving to android-perf) because there would not be a single share spec file at s3://ossci-android/executorch/android-llm-device-farm-test-spec.yml
anymore. It couldn't be because each specialized spec now has different a model+delegation S3 link, for example https://github.com/pytorch/executorch/actions/runs/12053946075/job/33613172395#step:10:46
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this can be simplified and merged into android-perf.yml
by triggering a run when the test spec template is modified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do that but the job would need to use the default value in CRON_DEFAULT_MODELS, it seems overkill to me to test them all just to validate the spec. But I think I could tweak it a bit to use a different value for pull_request and keep the full list for scheduled jobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it the example run https://github.com/pytorch/executorch/actions/runs/12150009281 without upload-android-test-specs.yml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
job would need to use the default value in CRON_DEFAULT_MODELS, it seems overkill to me to test them all just to validate the spec.
I see. This is a fair point. Maybe we could add new defaults or change the defaults for test-spec validation in android-perf.yml
?
I don't have strong option to merge it to the android-perf.yml
, thinking in that direction because job upload-android-test-specs.yml
now is essentially validating the android-perf.yml
, the specialized test-spec generated on the fly is already part of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left comments for clarification, other than that looks great!
This setup helps address the 4GB file size limit enforced by AWS and also fixes the usage of
.ci/scripts/test_llama.sh
after #6870.This is the first PR for Android. I also have another PR #7098 to enable
google/gemma-2b
that @guangy10 added a while back, which I use to test this one for large model.Testing
Download the export
model.zip
from S3 successfully https://github.com/pytorch/executorch/actions/runs/12040976164. I also attempt to exportgoogle/gemma-2b
, but it looks like it's better to do that in a separate PR #7098 to avoid bloating this one.