Use a template for Android test spec #7091

huydhn · 2024-11-26T20:36:11Z

This setup helps address the 4GB file size limit enforced by AWS and also fixes the usage of .ci/scripts/test_llama.sh after #6870.

This is the first PR for Android. I also have another PR #7098 to enable google/gemma-2b that @guangy10 added a while back, which I use to test this one for large model.

Testing

Download the export model.zip from S3 successfully https://github.com/pytorch/executorch/actions/runs/12040976164. I also attempt to export google/gemma-2b, but it looks like it's better to do that in a separate PR #7098 to avoid bloating this one.

google/gemma-2b https://github.com/pytorch/executorch/actions/runs/12043268246/job/33579733419
stories110M https://github.com/pytorch/executorch/actions/runs/12053946075

pytorch-bot · 2024-11-26T20:36:14Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7091

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ac48848 with merge base 0a12e33 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

huydhn · 2024-11-27T00:39:38Z

@pytorchbot drci

huydhn · 2024-11-27T07:11:57Z

google/gemma-2b tests started fine, ~~but it took nearly 2 hour to finish copying the 10GB model to the device.~~

2024-11-27T05:05:06.5661289Z [DeviceFarm] adb -s $DEVICEFARM_DEVICE_UDID push *.pte /sdcard
2024-11-27T05:05:06.5661948Z [  0%] /sdcard/google-gemma-2b_xnnpack_fp32.pte
...
2024-11-27T06:06:41.0614924Z [ 99%] /sdcard/google-gemma-2b_xnnpack_fp32.pte
2024-11-27T06:06:41.0615084Z [100%] /sdcard/google-gemma-2b_xnnpack_fp32.pte
2024-11-27T06:06:41.0615419Z google-gemma-2b_xnnpack_fp32.pte: 1 file pushed. 63.5 MB/s (10025204816 bytes in 150.591s)

@guangy10 @kirklandsign Do you know that if that makes sense? Here is the example run https://github.com/pytorch/executorch/actions/runs/12043268246/job/33579733419. The unzipped model size is about 10 GB.

Nevermind, I get the number wrong. I setup the job so that it retries 3 times when it fails (not finding the JSON results), so that 2-hour figure was the between the first and last upload. Individually, each upload took only like 5 minutes. It's a false alarm.

huydhn · 2024-11-27T08:33:41Z

Looking carefully at https://github.com/pytorch/executorch/actions/runs/12043268246/job/33579733419, it looks like the test finished successfully, but the teardown done by AWS failed and I didn't see any result JSON in the spec.

huydhn · 2024-11-27T18:23:44Z

I think the test runs sort of ok on samsung s24 https://github.com/pytorch/executorch/actions/runs/12053919612/job/33614078559. The large model is not a blocker anymore which is the scope of this PR.

I don't see any benchmark results JSON file in the output though. Maybe there is a problem in the benchmark app, which could be investigated separately after we land this PR. Note that the smaller stories model runs fine and returns the benchmark results as usual https://github.com/pytorch/executorch/actions/runs/12053946075, so I'm confident that this new spec works correctly.

.github/workflows/android-perf.yml

guangy10 · 2024-12-03T03:11:28Z

.github/workflows/android-perf.yml

  export-models:
    name: export-models
-    uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
+    uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main


What's new in _v2.yml?

It supports manywheel 2.28 (what PyTorch is moving to). For example, arm binaries are now using this new format #7080. During my testing, I saw an error on this job about old v.s. new wheel format. So, I just move it to v2.

This has been delay on PyTorch side until the next release, so this is kind of not needed right now. Let me put it back.

On the other hand, this is ok too because manywheel 2.28 is backward compatible with the current older format.

.github/workflows/upload-android-test-specs.yml

guangy10 · 2024-12-03T03:23:36Z

.github/workflows/upload-android-test-specs.yml

@@ -45,50 +29,3 @@ jobs:
      models: stories110M
      devices: samsung_galaxy_s22
      delegates: xnnpack
-      test_spec: https://gha-artifacts.s3.amazonaws.com/${{ github.repository }}/${{ github.run_id }}/artifacts/android-llm-device-farm-test-spec.yml
-
-  upload-android-test-spec:


It seems the upload step is moved to android-perf.yml right after the test-spec is specialized and instantiated, w/o validation. Given that, do we still need to keep a validation step here? How would it catch and prevent bad test-spec being uploaded and used?

In this new setup, the specialized test spec will just be an artifact of the current job uploaded at https://gha-artifacts.s3.amazonaws.com/${{ github.repository }}/${{ github.run_id }}/artifacts/${{ matrix.model }}_${{ matrix.delegate }}/android-llm-device-farm-test-spec.yml. If the spec is bad, the subsequent benchmark job would fail (and give a red signals on PR).

The validation workflow here now acts just as a trigger to call android-perf workflow when the spec template changes.

Finally, the upload step here is being deleted (not moving to android-perf) because there would not be a single share spec file at s3://ossci-android/executorch/android-llm-device-farm-test-spec.yml anymore. It couldn't be because each specialized spec now has different a model+delegation S3 link, for example https://github.com/pytorch/executorch/actions/runs/12053946075/job/33613172395#step:10:46

Maybe this can be simplified and merged into android-perf.yml by triggering a run when the test spec template is modified?

We could do that but the job would need to use the default value in CRON_DEFAULT_MODELS, it seems overkill to me to test them all just to validate the spec. But I think I could tweak it a bit to use a different value for pull_request and keep the full list for scheduled jobs.

Here it the example run https://github.com/pytorch/executorch/actions/runs/12150009281 without upload-android-test-specs.yml

job would need to use the default value in CRON_DEFAULT_MODELS, it seems overkill to me to test them all just to validate the spec.

I see. This is a fair point. Maybe we could add new defaults or change the defaults for test-spec validation in android-perf.yml?
I don't have strong option to merge it to the android-perf.yml, thinking in that direction because job upload-android-test-specs.yml now is essentially validating the android-perf.yml, the specialized test-spec generated on the fly is already part of it.

guangy10

Left comments for clarification, other than that looks great!

Use a template for Android test spec

a80efd7

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 26, 2024

huydhn added the module: benchmark Features or issues related to benchmark infra, including the workflow, CI and benchmark apps label Nov 26, 2024

huydhn added 4 commits November 26, 2024 12:51

Minor workflow tweak

dc205e1

Fix wrong workflow format

90c43cb

Add checkout step to get the spec

304452f

Use linux_job_v2.yml

c4104bc

huydhn added the topic: not user facing label Nov 26, 2024

Fix test_llama invocation after #6870

157623b

huydhn had a problem deploying to upload-benchmark-results November 26, 2024 22:24 — with GitHub Actions Failure

huydhn had a problem deploying to upload-benchmark-results November 26, 2024 23:26 — with GitHub Actions Failure

huydhn temporarily deployed to upload-benchmark-results November 26, 2024 23:37 — with GitHub Actions Inactive

huydhn added 2 commits November 26, 2024 15:48

Use curl instead

79d532f

Format

e00e2e5

huydhn mentioned this pull request Nov 27, 2024

Test google/gemma-2b #7098

Draft

huydhn temporarily deployed to upload-benchmark-results November 27, 2024 01:19 — with GitHub Actions Inactive

huydhn requested review from guangy10 and kirklandsign November 27, 2024 02:44

huydhn marked this pull request as ready for review November 27, 2024 02:44

huydhn added 2 commits November 27, 2024 01:05

Copy file silently

5b09cd8

Route adb push output to /dev/null

63a47b3

huydhn temporarily deployed to upload-benchmark-results November 27, 2024 16:45 — with GitHub Actions Inactive

huydhn mentioned this pull request Nov 28, 2024

Onboard ExecuTorch to benchmark database v3 #7117

Merged

kirklandsign mentioned this pull request Dec 2, 2024

Attempting running Minibench on Android, no results generated #7076

Open

huydhn mentioned this pull request Dec 2, 2024

Use a template for Apple test spec #7151

Merged

guangy10 reviewed Dec 3, 2024

View reviewed changes

.github/workflows/android-perf.yml Show resolved Hide resolved

guangy10 reviewed Dec 3, 2024

View reviewed changes

.github/workflows/upload-android-test-specs.yml Outdated Show resolved Hide resolved

guangy10 reviewed Dec 3, 2024

View reviewed changes

huydhn added 2 commits December 2, 2024 20:51

Merge branch 'main' into address-model-size-limit

3351d00

No need to use linux_job_v2 for now

a32ae24

huydhn temporarily deployed to upload-benchmark-results December 3, 2024 05:28 — with GitHub Actions Inactive

huydhn requested a review from guangy10 December 3, 2024 17:21

Clean up upload-android-test-specs workflow

ac48848

guangy10 approved these changes Dec 4, 2024

View reviewed changes

huydhn temporarily deployed to upload-benchmark-results December 4, 2024 00:15 — with GitHub Actions Inactive

huydhn merged commit 3a088ce into main Dec 4, 2024
47 checks passed

huydhn deleted the address-model-size-limit branch December 4, 2024 00:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a template for Android test spec #7091

Use a template for Android test spec #7091

huydhn commented Nov 26, 2024 •

edited

Loading

pytorch-bot bot commented Nov 26, 2024 •

edited

Loading

huydhn commented Nov 27, 2024

huydhn commented Nov 27, 2024 •

edited

Loading

huydhn commented Nov 27, 2024

huydhn commented Nov 27, 2024 •

edited

Loading

guangy10 Dec 3, 2024

huydhn Dec 3, 2024 •

edited

Loading

guangy10 Dec 3, 2024

huydhn Dec 3, 2024 •

edited

Loading

guangy10 Dec 3, 2024

huydhn Dec 3, 2024

huydhn Dec 3, 2024

guangy10 Dec 3, 2024 •

edited

Loading

guangy10 left a comment

Use a template for Android test spec #7091

Use a template for Android test spec #7091

Conversation

huydhn commented Nov 26, 2024 • edited Loading

Testing

pytorch-bot bot commented Nov 26, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7091

✅ No Failures

huydhn commented Nov 27, 2024

huydhn commented Nov 27, 2024 • edited Loading

huydhn commented Nov 27, 2024

huydhn commented Nov 27, 2024 • edited Loading

guangy10 Dec 3, 2024

Choose a reason for hiding this comment

huydhn Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

guangy10 Dec 3, 2024

Choose a reason for hiding this comment

huydhn Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

guangy10 Dec 3, 2024

Choose a reason for hiding this comment

huydhn Dec 3, 2024

Choose a reason for hiding this comment

huydhn Dec 3, 2024

Choose a reason for hiding this comment

guangy10 Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

guangy10 left a comment

Choose a reason for hiding this comment

huydhn commented Nov 26, 2024 •

edited

Loading

pytorch-bot bot commented Nov 26, 2024 •

edited

Loading

huydhn commented Nov 27, 2024 •

edited

Loading

huydhn commented Nov 27, 2024 •

edited

Loading

huydhn Dec 3, 2024 •

edited

Loading

huydhn Dec 3, 2024 •

edited

Loading

guangy10 Dec 3, 2024 •

edited

Loading