Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix command line argument #229

Merged
merged 12 commits into from
Aug 13, 2024
Merged

Fix command line argument #229

merged 12 commits into from
Aug 13, 2024

Conversation

wiederm
Copy link
Member

@wiederm wiederm commented Aug 13, 2024

Description

There were two separate bugs affecting command-line arguments:

  • the device can be an int or a List[int], this was not correctly handled by the argparse type check
  • command line arguments should override parameters provided in the TOML file. The read_config function was doing the opposite (the docstring described the correct behavior)

Status

  • Ready to go

@wiederm wiederm added the bug Something isn't working label Aug 13, 2024
@wiederm wiederm self-assigned this Aug 13, 2024
@wiederm wiederm changed the title Fix device command line argument Fix command line argument Aug 13, 2024
wiederm and others added 7 commits August 13, 2024 13:04
…il) often on the CI. I'm planning on refactoring the code to remove this functionality, as we do not need a specialized function for this, but can just provide more info than querying the API which seems problematic at times.
…form list evaluation, so input can be of form "[0,1]", "(0,1)" or just "0,1"
@chrisiacovella
Copy link
Member

This looks good. Good catch on parsing the devices; I had this initially as just an int when I checked, then realized it could be a list and didn't properly test apparently (incorrectly assuming it would properly coerce the input). I pushed a minor change that looks for the presence of a comma instead of square brackets, so the input can be either "[0,1]" or "(0,1)" or "0,1"

@wiederm
Copy link
Member Author

wiederm commented Aug 13, 2024

Unfortunately, I think that won't work! It's kind of counterintuitive how this parameter works: when you pass an int, it means grab that number of GPUs, if you pass a List[int], it means grab that specific GPU device. So, a common use case is to pass [1] to grab GPU:1 on a multi-GPU node. I think checking for , will miss that use case.

@chrisiacovella
Copy link
Member

Unfortunately, I think that won't work! It's kind of counterintuitive how this parameter works: when you pass an int, it means grab that number of GPUs, if you pass a List[int], it means grab that specific GPU device. So, a common use case is to pass [1] to grab GPU:1 on a multi-GPU node. I think checking for , will miss that use case.

Hmm, I'll switch it back, I guess I misunderstood what this is doing. Maybe we need to eventually change the input syntax to make it clearer. Like two separate arguments, number_of_devices (automatically picks) and selected_devices (you pick which ones). Even if internally we don't differentiate the functionality of those, it might be clearer to convey what is being done.

@wiederm
Copy link
Member Author

wiederm commented Aug 13, 2024

Yeah, this is confusing! THat wasn't my idea, but is what PyTorch Lighning is expecting

@chrisiacovella
Copy link
Member

I also commented out the test figshare download test as reading from their API seems to randomly be timing out/failing on some of the CI nodes (inconsistent failures). I plan to refactor the code and the download from figshare function will be removed.

@wiederm wiederm merged commit 4711e7e into main Aug 13, 2024
5 checks passed
@wiederm wiederm deleted the fix-device-command-line-argument branch August 13, 2024 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants