macOS nightly wheel builds failing since 2024-11-19 #7019

swolchok · 2024-11-21T21:06:29Z

🐛 Describe the bug

Status page: https://github.com/pytorch/executorch/actions/workflows/build-wheels-m1.yml
Note that the Python 3.9 build always fails, so even though the runs are red, they were successful through 2024-11-18.

Linking is failing with ld: invalid use of ADRP in '_init_f32_vcopysign_config' to '_xnn_f32_vcopysign_ukernel__neon_u8’.

Versions

N/A

The text was updated successfully, but these errors were encountered:

swolchok · 2024-11-21T21:07:20Z

Inspection of PRs landed between the last good build and first bad build suggested the following:

update sccache to latest version #6837 (update sccache)
[Executorch] optimized sigmoid #6522 (optimized sigmoid, just because it’s one of the few PRs touching C++)
[Executorch][Portable] Dont upcast to double for sigmoid #6892 (another sigmoid change)

Trial revert of #6837 in #7013 still failed the job; trialing revert of the other two PRs together

This reverts commit 5b4d9bb. Attempting to debug/fix #7019.

swolchok · 2024-11-21T21:16:42Z

trial revert of #6522 in #7020 did not fix the job

This reverts commit c242a59. Attempting to debug/fix #7019.

swolchok · 2024-11-21T21:26:23Z

trial revert of #6892 in #7021 did not fix the job.

I am also not able to repro this locally, and I've inspected git diff 8526d0a2d798658b6a6e3a42ec935b8093f355ef..04f6fcd4b3920eaf1be9905d12b449f301f89ca7 without finding anything else suspicious, so I wonder if the runners broke somehow

swolchok · 2024-11-21T22:03:17Z

I wonder if the runners broke somehow

I reran the last good workflow run; builds succeeded (there were some failures due to an unrelated issue).

larryliu0820 · 2024-11-21T22:28:35Z

Found a failure with the same error message in a different job (test-llama-runner-mac): https://github.com/pytorch/executorch/actions/runs/11959891658/job/33342737621?pr=7010

swolchok · 2024-11-21T22:30:35Z

Found a failure with the same error message in a different job (test-llama-runner-mac): https://github.com/pytorch/executorch/actions/runs/11959891658/job/33342737621?pr=7010

that job is green on trunk runs though! https://hud.pytorch.org/hud/pytorch/executorch/main/1?per_page=50&name_filter=llama-runner-mac%20(fp32%2C%20mps

kimishpatel · 2024-11-21T22:37:02Z

am late to this so not sure my comments will help, but any change related to xnnpack upgrade? since the job fails related xnnpack

swolchok · 2024-11-21T22:40:36Z

@larryliu0820 suggested maybe the runner toolchain changed.

It looks like we're using macos-m1-stable runners for test-llama-runner-mac: https://github.com/pytorch/executorch/blob/main/.github/workflows/trunk.yml#L236 not sure what runner the wheel build uses

I don't know a whole lot about this runner type, but I see that 1) it seems to be in-house: pytorch/pytorch#127490 2) I don't see recent activity in https://github.com/pytorch-labs/pytorch-gha-infra/ suggesting that there was a recent update

swolchok · 2024-11-21T22:40:57Z

any change related to xnnpack upgrade

as I mentioned above, I inspected all the commits (there aren't many) in the range of commit hashes flagged in the nightly builds.

larryliu0820 · 2024-11-21T22:54:21Z

An example of trunk job passing:

https://github.com/pytorch/executorch/actions/runs/11962683652/job/33351640398

An example of PR job failing:

https://github.com/pytorch/executorch/actions/runs/11959891658/job/33342745520?pr=7010

I don't see obvious difference between these 2, regarding environment setup.

@huydhn anything obvious to you?

swolchok · 2024-11-23T00:40:33Z

Another example: PR jobs failing on #7044; tbd if they fail consistently

swolchok · 2024-11-23T00:50:02Z

interesting that a large block of jobs all failed on the same PR. Points to some piece of shared state being the cause, either the repo state itself or sccache

swolchok · 2024-11-23T00:55:20Z

@wdvr is it a potential problem that our Mac builds are still on sccache 0.4.1? I see that you updated the ubuntu build to 0.8.2 in #6837

swolchok · 2024-11-23T01:06:37Z

I am now able to repro! gh pr checkout 7040; ./install_requirements.sh --pybind xnnpack

swolchok · 2024-11-23T01:11:20Z

reverting backends/xnnpack/third-party/XNNPACK to ad0e62d69815946be92134a56ed3ff688e2549e8 (updated in #6101) does not fix it

swolchok · 2024-11-23T01:13:18Z

removing --pybind xnnpack from the install_requirements.sh line does fix it (duh), so perhaps we couldn't repro with setup.py because we weren't doing whatever magic to build XNNPACK.

swolchok · 2024-11-23T01:18:58Z

just reconfirmed that ./install_requirements.py --pybind xnnpack does not repro on main; must gh pr checkout 7040 first.

huydhn · 2024-11-23T01:42:34Z

@wdvr is it a potential problem that our Mac builds are still on sccache 0.4.1? I see that you updated the ubuntu build to 0.8.2 in #6837

sscache uses the file path and the compiler name and its flags in the cache. So, there shouldn't be any issue from 0.8.2 update on ubuntu as they are well isolated.

larryliu0820 · 2024-11-23T04:58:24Z

Oh it could be coming from PyTorch. #7010 only bumps PyTorch pin and jobs are failing. It seems #7044 is also bumping the pin?

malfet · 2024-11-25T17:47:35Z

There were recent xnnpack update in PyTorch, if ET directly depends on XNNPack, but its version is older, it can easily create a problem, as MacOS, unlike Linux does not have -fvisibility=hidden set by default

swolchok · 2024-11-25T18:15:35Z

recent xnnpack update

for clarity, the update is pytorch/pytorch#139913 and landed on 11/18, the day before nightlies started failing, so it's very suspicious. I've asked @digantdesai / @mcr229 about this internally; tagging them here as well for visibility.

huydhn · 2024-11-28T01:25:09Z

I think #6538 doesn't fix the issue as it's still showing up on the latest nightly with the change in place https://github.com/pytorch/executorch/actions/runs/12060458350/job/33630916538. I should have add ciflow/binaries to run the build on the PR, then it would have clear signals there.

larryliu0820 · 2024-12-03T20:04:37Z

Should we revisit commits between 11/18 nightly and 11/19 nightly? 8526d0a...04f6fcd

larryliu0820 · 2024-12-03T20:19:28Z

Repro steps:

pip install torch --pre --index-url https://download.pytorch.org/whl/nightly/cpu 
export CMAKE_ARGS=' -DEXECUTORCH_BUILD_XNNPACK=ON -DEXECUTORCH_BUILD_COREML=ON -DEXECUTORCH_BUILD_MPS=ON'
export EXECUTORCH_BUILD_PYBIND=1
python setup.py bdist_wheel

swolchok added a commit that referenced this issue Nov 21, 2024

Revert "[Executorch] optimized sigmoid"

60d588d

This reverts commit 5b4d9bb. Attempting to debug/fix #7019.

swolchok mentioned this issue Nov 21, 2024

Revert "[Executorch] optimized sigmoid" #7020

Closed

swolchok added a commit that referenced this issue Nov 21, 2024

Revert "[Executorch][Portable] Dont upcast to double for sigmoid"

3972d7b

This reverts commit c242a59. Attempting to debug/fix #7019.

swolchok mentioned this issue Nov 21, 2024

Revert "[Executorch][Portable] Dont upcast to double for sigmoid" #7021

Closed

swolchok self-assigned this Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

macOS nightly wheel builds failing since 2024-11-19 #7019

macOS nightly wheel builds failing since 2024-11-19 #7019

swolchok commented Nov 21, 2024

swolchok commented Nov 21, 2024

swolchok commented Nov 21, 2024

swolchok commented Nov 21, 2024

swolchok commented Nov 21, 2024

larryliu0820 commented Nov 21, 2024

swolchok commented Nov 21, 2024

kimishpatel commented Nov 21, 2024

swolchok commented Nov 21, 2024

swolchok commented Nov 21, 2024

larryliu0820 commented Nov 21, 2024

swolchok commented Nov 23, 2024

swolchok commented Nov 23, 2024

swolchok commented Nov 23, 2024

swolchok commented Nov 23, 2024

swolchok commented Nov 23, 2024

swolchok commented Nov 23, 2024

swolchok commented Nov 23, 2024

huydhn commented Nov 23, 2024

larryliu0820 commented Nov 23, 2024

malfet commented Nov 25, 2024

swolchok commented Nov 25, 2024

huydhn commented Nov 28, 2024

larryliu0820 commented Dec 3, 2024

larryliu0820 commented Dec 3, 2024

macOS nightly wheel builds failing since 2024-11-19 #7019

macOS nightly wheel builds failing since 2024-11-19 #7019

Comments

swolchok commented Nov 21, 2024

🐛 Describe the bug

Versions

swolchok commented Nov 21, 2024

swolchok commented Nov 21, 2024

swolchok commented Nov 21, 2024

swolchok commented Nov 21, 2024

larryliu0820 commented Nov 21, 2024

swolchok commented Nov 21, 2024

kimishpatel commented Nov 21, 2024

swolchok commented Nov 21, 2024

swolchok commented Nov 21, 2024

larryliu0820 commented Nov 21, 2024

swolchok commented Nov 23, 2024

swolchok commented Nov 23, 2024

swolchok commented Nov 23, 2024

swolchok commented Nov 23, 2024

swolchok commented Nov 23, 2024

swolchok commented Nov 23, 2024

swolchok commented Nov 23, 2024

huydhn commented Nov 23, 2024

larryliu0820 commented Nov 23, 2024

malfet commented Nov 25, 2024

swolchok commented Nov 25, 2024

huydhn commented Nov 28, 2024

larryliu0820 commented Dec 3, 2024

larryliu0820 commented Dec 3, 2024