Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a template for Android test spec #7091

Merged
merged 13 commits into from
Dec 4, 2024
Merged

Use a template for Android test spec #7091

merged 13 commits into from
Dec 4, 2024

Conversation

huydhn
Copy link
Contributor

@huydhn huydhn commented Nov 26, 2024

This setup helps address the 4GB file size limit enforced by AWS and also fixes the usage of .ci/scripts/test_llama.sh after #6870.

This is the first PR for Android. I also have another PR #7098 to enable google/gemma-2b that @guangy10 added a while back, which I use to test this one for large model.

Testing

Download the export model.zip from S3 successfully https://github.com/pytorch/executorch/actions/runs/12040976164. I also attempt to export google/gemma-2b, but it looks like it's better to do that in a separate PR #7098 to avoid bloating this one.

Copy link

pytorch-bot bot commented Nov 26, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7091

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ac48848 with merge base 0a12e33 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 26, 2024
@huydhn huydhn added the module: benchmark Features or issues related to benchmark infra, including the workflow, CI and benchmark apps label Nov 26, 2024
@huydhn huydhn had a problem deploying to upload-benchmark-results November 26, 2024 22:24 — with GitHub Actions Failure
@huydhn huydhn had a problem deploying to upload-benchmark-results November 26, 2024 23:26 — with GitHub Actions Failure
@huydhn huydhn temporarily deployed to upload-benchmark-results November 26, 2024 23:37 — with GitHub Actions Inactive
@huydhn
Copy link
Contributor Author

huydhn commented Nov 27, 2024

@pytorchbot drci

@huydhn huydhn mentioned this pull request Nov 27, 2024
@huydhn huydhn temporarily deployed to upload-benchmark-results November 27, 2024 01:19 — with GitHub Actions Inactive
@huydhn huydhn marked this pull request as ready for review November 27, 2024 02:44
@huydhn
Copy link
Contributor Author

huydhn commented Nov 27, 2024

google/gemma-2b tests started fine, but it took nearly 2 hour to finish copying the 10GB model to the device.

2024-11-27T05:05:06.5661289Z [DeviceFarm] adb -s $DEVICEFARM_DEVICE_UDID push *.pte /sdcard
2024-11-27T05:05:06.5661948Z [  0%] /sdcard/google-gemma-2b_xnnpack_fp32.pte
...
2024-11-27T06:06:41.0614924Z [ 99%] /sdcard/google-gemma-2b_xnnpack_fp32.pte
2024-11-27T06:06:41.0615084Z [100%] /sdcard/google-gemma-2b_xnnpack_fp32.pte
2024-11-27T06:06:41.0615419Z google-gemma-2b_xnnpack_fp32.pte: 1 file pushed. 63.5 MB/s (10025204816 bytes in 150.591s)

@guangy10 @kirklandsign Do you know that if that makes sense? Here is the example run https://github.com/pytorch/executorch/actions/runs/12043268246/job/33579733419. The unzipped model size is about 10 GB.

Nevermind, I get the number wrong. I setup the job so that it retries 3 times when it fails (not finding the JSON results), so that 2-hour figure was the between the first and last upload. Individually, each upload took only like 5 minutes. It's a false alarm.

@huydhn
Copy link
Contributor Author

huydhn commented Nov 27, 2024

Looking carefully at https://github.com/pytorch/executorch/actions/runs/12043268246/job/33579733419, it looks like the test finished successfully, but the teardown done by AWS failed and I didn't see any result JSON in the spec.

@huydhn huydhn temporarily deployed to upload-benchmark-results November 27, 2024 16:45 — with GitHub Actions Inactive
@huydhn
Copy link
Contributor Author

huydhn commented Nov 27, 2024

I think the test runs sort of ok on samsung s24 https://github.com/pytorch/executorch/actions/runs/12053919612/job/33614078559. The large model is not a blocker anymore which is the scope of this PR.

I don't see any benchmark results JSON file in the output though. Maybe there is a problem in the benchmark app, which could be investigated separately after we land this PR. Note that the smaller stories model runs fine and returns the benchmark results as usual https://github.com/pytorch/executorch/actions/runs/12053946075, so I'm confident that this new spec works correctly.

export-models:
name: export-models
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's new in _v2.yml?

Copy link
Contributor Author

@huydhn huydhn Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It supports manywheel 2.28 (what PyTorch is moving to). For example, arm binaries are now using this new format #7080. During my testing, I saw an error on this job about old v.s. new wheel format. So, I just move it to v2.

This has been delay on PyTorch side until the next release, so this is kind of not needed right now. Let me put it back.

On the other hand, this is ok too because manywheel 2.28 is backward compatible with the current older format.

@@ -45,50 +29,3 @@ jobs:
models: stories110M
devices: samsung_galaxy_s22
delegates: xnnpack
test_spec: https://gha-artifacts.s3.amazonaws.com/${{ github.repository }}/${{ github.run_id }}/artifacts/android-llm-device-farm-test-spec.yml

upload-android-test-spec:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the upload step is moved to android-perf.yml right after the test-spec is specialized and instantiated, w/o validation. Given that, do we still need to keep a validation step here? How would it catch and prevent bad test-spec being uploaded and used?

Copy link
Contributor Author

@huydhn huydhn Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this new setup, the specialized test spec will just be an artifact of the current job uploaded at https://gha-artifacts.s3.amazonaws.com/${{ github.repository }}/${{ github.run_id }}/artifacts/${{ matrix.model }}_${{ matrix.delegate }}/android-llm-device-farm-test-spec.yml. If the spec is bad, the subsequent benchmark job would fail (and give a red signals on PR).

The validation workflow here now acts just as a trigger to call android-perf workflow when the spec template changes.

Finally, the upload step here is being deleted (not moving to android-perf) because there would not be a single share spec file at s3://ossci-android/executorch/android-llm-device-farm-test-spec.yml anymore. It couldn't be because each specialized spec now has different a model+delegation S3 link, for example https://github.com/pytorch/executorch/actions/runs/12053946075/job/33613172395#step:10:46

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this can be simplified and merged into android-perf.yml by triggering a run when the test spec template is modified?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do that but the job would need to use the default value in CRON_DEFAULT_MODELS, it seems overkill to me to test them all just to validate the spec. But I think I could tweak it a bit to use a different value for pull_request and keep the full list for scheduled jobs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it the example run https://github.com/pytorch/executorch/actions/runs/12150009281 without upload-android-test-specs.yml

Copy link
Contributor

@guangy10 guangy10 Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

job would need to use the default value in CRON_DEFAULT_MODELS, it seems overkill to me to test them all just to validate the spec.

I see. This is a fair point. Maybe we could add new defaults or change the defaults for test-spec validation in android-perf.yml?
I don't have strong option to merge it to the android-perf.yml, thinking in that direction because job upload-android-test-specs.yml now is essentially validating the android-perf.yml, the specialized test-spec generated on the fly is already part of it.

Copy link
Contributor

@guangy10 guangy10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left comments for clarification, other than that looks great!

@huydhn huydhn temporarily deployed to upload-benchmark-results December 3, 2024 05:28 — with GitHub Actions Inactive
@huydhn huydhn requested a review from guangy10 December 3, 2024 17:21
@huydhn huydhn temporarily deployed to upload-benchmark-results December 4, 2024 00:15 — with GitHub Actions Inactive
@huydhn huydhn merged commit 3a088ce into main Dec 4, 2024
47 checks passed
@huydhn huydhn deleted the address-model-size-limit branch December 4, 2024 00:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: benchmark Features or issues related to benchmark infra, including the workflow, CI and benchmark apps topic: not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants