-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bump macos to m1 #1725
bump macos to m1 #1725
Conversation
Tried to run API tests locally on M3 chip and had no issues. P.S. Here is an interesting project: https://github.com/mxschmitt/action-tmate |
So by commenting out the LLM imports, it makes the segfaults go away, but isn't a realistic option. |
It's a bit of a weird machine. The memory issues aside, I am also seeing
It works perfectly fine locally on both my Macs on M1 and M3. I am also using macOS 14. |
Thank you for looking into this. I'd not worry much about that. This can happen from different rng seeds, test order etc. Probably OK to just increase the tolerance to 3e-5 or so.
Am 16. September 2024 19:42:14 MESZ schrieb Sebastian Raschka ***@***.***>:
…It's a bit of a weird machine. The memory issues aside, I am also seeing
> FAILED tests/test_convert_lit_checkpoint.py::test_against_original_gemma_2[device0-dtype0-gemma-2-27b] - AssertionError: Tensor-likes are not close!
Mismatched elements: 305 / 5120000 (0.0%)
Greatest absolute difference: 1.7881393432617188e-05 at index (0, 13, 73469) (up to 1e-05 allowed)
It works perfectly fine locally on both my Macs on M1 and M3. I am also using macOS 14.
--
Reply to this email directly or view it on GitHub:
#1725 (comment)
You are receiving this because you authored the thread.
Message ID: ***@***.***>
|
I was able to isolate it, it's this one here that segfaults on the CI: def test_llm_load_random_init(tmp_path):
download_from_hub(repo_id="EleutherAI/pythia-14m", tokenizer_only=True, checkpoint_dir=tmp_path)
torch.manual_seed(123)
llm = LLM.load(
model="pythia-14m",
init="random",
tokenizer_dir=Path(tmp_path/"EleutherAI/pythia-14m")
) Works fine locally though ... |
To narrow it down further, it only happens with the default settings, not when Lines 215 to 239 in a686b40
Perhaps the MPS support in the CI has some issues. (Since it works fine locally.) |
Ah, so it does seem to be MPS related. I.e., changing
to
will fix those tests on the the CI. My guess is that's it's something particular about the CI machine because it works fine locally on 2 of my Macs (+ also on Andrei's Mac). Maybe outdated drivers. So let's just skip MPS-related tests on that machine. |
Ok, I left MPS disabled for the macos runner since it seems to have issues. Could be Fabric-related, driver-related or LitGPT-related (although it works fine locally). Let's merge this for now and revisit in a few weeks or months when the macos-15 machines are more readily available in workflows. Maybe their drivers are just old. |
some discussion on #1724
seems to run into segfaults. obviously, if anyone with a macbook or so could take over sorting things out, it would be supergood.