Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dynamic linking issues with prebuilt pip packages #3049

Merged
merged 4 commits into from
Apr 23, 2024

Conversation

dbort
Copy link
Contributor

@dbort dbort commented Apr 16, 2024

We can now prebuild pip packages, but we're hitting issues when importing pybindings and torch at runtime.

More context in T185860350

Test Plan

  • Ensure that the PR is against release/0.2
  • PR must touch one of the files in the pull_request.paths entry for a given build-wheels workflow:
    pull_request:
    paths:
    - build/packaging/**
    - .github/workflows/build-wheels-m1.yml
    • This is necessary to run the build-wheel jobs in the test channel (using the pytorch release pip package) instead of the nightly channel
  • Add the label ciflow/binaries/all to the PR so that it builds for all versions of python and not just 3.8 (which doesn't work for executorch)
  • After the artifacts are built, follow the instructions in the comment below: Fix dynamic linking issues with prebuilt pip packages #3049 (comment)
  • Also, this PR adds a smoke test that will test the wheels during the build job

Copy link

pytorch-bot bot commented Apr 16, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/3049

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures

As of commit 358a03f with merge base d3326a2 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 16, 2024
@dbort dbort force-pushed the sandbox/dbort/pip-linking branch from 516ba48 to ae1a658 Compare April 16, 2024 20:31
@dbort dbort force-pushed the sandbox/dbort/pip-linking branch from de14f34 to ae1a658 Compare April 16, 2024 22:15
@dbort dbort added ciflow/binaries/all Release PRs with this label will build wheels for all python versions and removed ciflow/binaries labels Apr 17, 2024
@dbort dbort force-pushed the sandbox/dbort/pip-linking branch from ba3b0a4 to f99f0cf Compare April 17, 2024 18:46
@dbort dbort force-pushed the sandbox/dbort/pip-linking branch 2 times, most recently from aa9965d to 0a7566d Compare April 18, 2024 23:25
@dbort
Copy link
Contributor Author

dbort commented Apr 18, 2024

To test out a prebuilt wheel on Linux or macOS:

  • Ensure that you have conda installed
  • Click a link to download a zipped wheel file:
  • Download the test script: curl -o /tmp/test_wheel_install.sh https://raw.githubusercontent.com/pytorch/executorch/1f3c8ff6a6eae81edcb7d192dbe982b59165e3ff/build/test_wheel_install.sh
  • cd to the root of a relatively recent executorch repo. Exact version doesn't matter as long as it can run the xnnpack examples.
  • Run the script against the zip file you downloaded: bash /tmp/test_wheel_install.sh /path/to/pytorch_executorch__xyz.zip | tee /tmp/wheel-install-logs
    • NOTE: This will create a new conda environment called executorch-temp that you can delete with conda remove -y --name executorch-temp --all

If you see PASS at the end, it worked! If not, please create a paste of /tmp/wheel-install-logs and let me know.

+ python /tmp/test-wheel-install-lWkozN8xUl/test_xnnpack_e2e.py
I 00:00:00.003848 executorch:program.cpp:129] InternalConsistency verification requested but not available
E 00:00:00.023344 executorch:method.cpp:936] Overriding output data pointer allocated by memory plan is not allowed.
I 00:00:00.023371 executorch:pybindings.cpp:196] Cannot set_output_data_ptr(): this likely means the outputs were MemoryPlanned inspect the error code to know for sure, but likely this is not an issue. 0x2
PASS <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
+ rm -rf /tmp/test-wheel-install-lWkozN8xUl

@dbort dbort force-pushed the sandbox/dbort/pip-linking branch 2 times, most recently from 52afdb2 to aca550b Compare April 19, 2024 20:56
@kirklandsign
Copy link
Contributor

PASS
Apple M1 Pro
Darwin Kernel Version 23.4.0

@dbort dbort force-pushed the sandbox/dbort/pip-linking branch from aca550b to 5cf96f5 Compare April 20, 2024 01:07
@larryliu0820
Copy link
Contributor

larryliu0820 commented Apr 21, 2024

I ran into some issues on my Mac m1: https://gist.github.com/larryliu0820/7f95f711c2a40835c3e39680a7a6e44d


From @dbort:
@larryliu0820 tracked this down to an issue in the test script. python3 was falling back to the system python, which didn't have torch installed. Running python was able to find the installed torch package.

@dbort dbort force-pushed the sandbox/dbort/pip-linking branch from 5cf96f5 to 2a58198 Compare April 22, 2024 17:52
@dbort
Copy link
Contributor Author

dbort commented Apr 22, 2024

Added a smoke test script to the wheel-build job: succeeding for the expected-compatible python versions, like 3.10 for linux and 3.10 for mac

dbort added 4 commits April 22, 2024 14:57
libtorch.so builds with the old glibc ABI, so we need to as well,
for any source files that include torch headers.
pip wheels will need to be able to find the torch libraries. On Linux,
the .so has non-absolute dependencies on libs like "libtorch.so" without
paths; as long as we `import torch` first, those dependencies will work.

But Apple dylibs do not support non-absolute dependencies, so we need
to tell the loader where to look for its libraries. The LC_LOAD_DYLIB
entries for the torch libraries will look like "@rpath/libtorch.dylib",
so we can add an LC_RPATH entry to look in a directory relative to the
installed location of our _portable_lib.so file.

To see these LC_* values, run `otool -l _portable_lib*.so`.
The executorch build system will ensure that .dylib/.so files have
LC_LOAD_DYLIB and LC_RPATH entries that will work when they're
installed.

Delocating (i.e., making copies of the .dylibs that ET's libs depend on)
will break any libs that depend on the torch libraries if users ever
import both `torch` and the executorch library. Both import paths must
load exactly the same file, not just a copy of it.
This script is run by CI after building the executorch wheel. Before
running this, the job will install the matching torch package as well as
the newly-built executorch package and its dependencies.

For now we test the export of a simple model, and try executing it using
the runtime pybindings.

Test Plan:
```
./install_requirements.sh
python build/packaging/smoke_test.py
```
@dbort dbort force-pushed the sandbox/dbort/pip-linking branch from 9eed1f6 to 358a03f Compare April 22, 2024 21:57
Copy link
Contributor

@larryliu0820 larryliu0820 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the test

@dbort
Copy link
Contributor Author

dbort commented Apr 22, 2024

Successfully ran the test script under WSL running Ubuntu 20.04.4 LTS on Windows 10 22H2, AMD Ryzen 7 2700 Eight-Core Processor 3.20 GHz, 32GB RAM

@dbort dbort mentioned this pull request Apr 22, 2024
@guangy10
Copy link
Contributor

@dbort We expect the prebuilt works for 3.10 only right?

@dbort
Copy link
Contributor Author

dbort commented Apr 23, 2024

@dbort We expect the prebuilt works for 3.10 only right?

That's the main version that we need to work. Currently there's no way to disable the python versions that we don't expect to work (like 3.8 and 3.12).

@guangy10 guangy10 merged commit 1fd5562 into release/0.2 Apr 23, 2024
97 of 106 checks passed
@dbort dbort deleted the sandbox/dbort/pip-linking branch April 23, 2024 20:26
@mergennachin mergennachin mentioned this pull request Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/binaries/all Release PRs with this label will build wheels for all python versions ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants