Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert to pytorch installation with uv in CI #3826

Closed

Conversation

DanielYang59
Copy link
Contributor

@DanielYang59 DanielYang59 commented May 13, 2024

Summary

Revert to install torch with uv

Also link to #4063, pip install torch is taking up 70% of the total dep install time

Revert to pytorch installation with uv in CI,

error: Failed to download distributions
  Caused by: Failed to fetch wheel: torch==2.2.1
  Caused by: Failed to extract archive
  Caused by: error decoding response body
  Caused by: request or response body error
  Caused by: error reading a body from connection
  Caused by: end of file before message length reached

@DanielYang59 DanielYang59 changed the title Revert to pytorch installation with uv Revert to pytorch installation with uv in CI May 13, 2024
@DanielYang59 DanielYang59 marked this pull request as ready for review May 13, 2024 08:26
@janosh janosh added housekeeping Moving around or cleaning up old code/files ci Continuous integration labels May 13, 2024
Copy link
Member

@janosh janosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @DanielYang59! 👍

@janosh janosh enabled auto-merge (squash) May 13, 2024 11:51
@DanielYang59
Copy link
Contributor Author

DanielYang59 commented May 13, 2024

Looks like we still have the issue where uv cannot install torch randomly. I would report this to uv soon.

To fix this, we might need to install torch and explicitly skip torch for uv.

Before this PR, even after torch is already successfully installed by pip, calling uv pip install torch again would still fail (therefore the random errors we have seen).

@janosh
Copy link
Member

janosh commented May 13, 2024

we're getting

pytest: command not found

which is not a torch related error message?

strange that it only happens on the 1st runner. maybe a race condition? we could try python -m pytest ...

@DanielYang59
Copy link
Contributor Author

DanielYang59 commented May 13, 2024

we're getting

pytest: command not found

which is not a torch related error message?

If you look more closely into the log, the failure actually comes from the "Install pymatgen and dependencies" stage (for some reason it still passed).

image

strange that it only happens on the 1st runner. maybe a race condition? we could try python -m pytest ...

Not sure. I thought it just fails randomly...No idea why. I don't think it has anything to do with the split itself though.

Failed at split 1

Failed at split 5

Failed at split 8

@DanielYang59 DanielYang59 marked this pull request as draft May 13, 2024 12:07
@janosh
Copy link
Member

janosh commented May 13, 2024

at least it's no longer failing due to timeouts. looks like now we're hitting this issue astral-sh/uv#2586

@DanielYang59
Copy link
Contributor Author

DanielYang59 commented May 13, 2024

at least it's no longer failing due to timeouts. looks like now we're hitting this issue astral-sh/uv#2586

Look like this one. I guess we would need to still install torch with pip, and somehow prevent uv from trying to install torch later (not sure how to achieve this though, as torch is not a direct dependency of pymatgen, but chgnet).

Maybe we could try to install everything but [optional] with uv, and then use uv to install [optional] only? I think it's really bad, as the package itself would be installed twice, there should be a better way that I'm not aware of.

Or, just fall back to slow but reliable pip altogether?

@janosh janosh marked this pull request as ready for review June 7, 2024 12:53
@DanielYang59 DanielYang59 marked this pull request as draft June 9, 2024 03:08
@janosh janosh force-pushed the master branch 2 times, most recently from e3fbc67 to 41e6d99 Compare August 3, 2024 19:01
@DanielYang59

This comment was marked as duplicate.

@@ -74,15 +74,16 @@ jobs:
micromamba install -n pmg -c conda-forge bader enumlib openff-toolkit packmol pygraphviz tblite --yes

- name: Install pymatgen and dependencies via uv
env:
UV_HTTP_TIMEOUT: 300 # torch could easily timeout
Copy link
Contributor Author

@DanielYang59 DanielYang59 Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems pip install torch is quite flaky as well, and i don't think it's to be fixed anytime soon:

Occasional timeouts will happen.

but they suggesting increasing the timeout for pip instead, perhaps we could do the same to uv: https://docs.astral.sh/uv/configuration/environment/

HTTP_TIMEOUT (or UV_HTTP_TIMEOUT): If set, uv will use this value (in seconds) as the timeout for HTTP reads (default: 30 s).

@DanielYang59 DanielYang59 marked this pull request as ready for review October 30, 2024 10:53
@DanielYang59
Copy link
Contributor Author

Closing this one in #4100, reason: currently chgnet and matgl doesn't support Python 3.13, so torch would not be required in CI for now.

@DanielYang59 DanielYang59 deleted the uv-install-torch branch November 21, 2024 03:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Continuous integration housekeeping Moving around or cleaning up old code/files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants