Polling for Available GPUs when Batch Running SLEAP #777

jmdelahanty · 2022-06-03T22:01:27Z

jmdelahanty
Jun 3, 2022

Hello again SLEAP Devs!

After chatting with a few members of the lab who are trying to run many videos at once, I was wondering how to poll whether or not a GPU is available on the computer and, if it becomes available, start running SLEAP on those instances in parallel if possible. Some of our computers here have 4 or even 5 cards on them and it would be pretty cool to add this sort of functionality to SLEAP if a team/lab doesn't have an orchestrator for spawning jobs available in their cluster. An even cooler enhancement would be to try running fractional GPU jobs from something like this if the card has enough VRAM available, although I don't know if that's easy to configure without tools like Run:AI.

@talmo sent me this section of code for examples that can help make a Python script do it, but I figured I would post about it here and maybe ask about how I could contribute to adding a feature like that to SLEAP one day.

talmo · 2022-06-03T22:18:05Z

talmo
Jun 3, 2022
Maintainer

We can't do fractional GPU allocations at the level that SLEAP or TensorFlow operate at since that's a driver-level thing, but one thing that SLEAP does by default is to disable preallocation of GPU memory. What that does is it prevents TensorFlow from using up the entire GPU's memory and instead just grow its usage as needed. This allows you to run other stuff on the GPU at the same time as long as there is sufficient memory available.

In terms of auto-selecting a good one to use -- this is a bit trickier and probably a more involved feature enhancement. The easiest way to do this would probably be to parse the output of nvidia-smi which is a CLI utility that's installed together with CUDA Toolkit, and use that to select the GPU automatically. An example set of tasks for a PR implementing this feature would:

Check for available GPU memory by parsing nvidia-smi manually or by using a library like gpustat.
Add the GPU device selection at different entrypoints in SLEAP like here and here.

A good way to get started might be to do step 1 and add it as a function to sleap.nn.system. The second step should be pretty easy from there.

4 replies

jmdelahanty Jun 13, 2022
Author

So far in my attempts to do this, I've found that gpustat doesn't play nice on Windows! So I think for now I'll try to do the manual parsing of the smi output.

jmdelahanty Jul 12, 2022
Author

One small update to this is that I'm getting outputs from nvidia-smi and can parse them with the code from the link you posted before. I'm also trying out the nvidia-ml-py library to see what that does...

jmdelahanty Aug 11, 2022
Author

I have a little script that at least does the parsing thing (the stack overflow post didn't work out of the box unfortunately for me) but I don't think it's particularly good! Somehow the nvidia-ml-py library, which was updated a couple months ago, has documentation that simply doesn't seem to exist anywhere and any other library I've found so far that has python bindings for the library has been seemingly abandoned since 2019.

EDIT:
It looks like gpustat has some updates recently that maybe fixed the problems I was running into at home, so probably worth trying that out again. It looks really nice.

EDIT2:

It looks like pynvml is the package I was looking to try using probably... should try this one out again as well.

Would be great to see what you all think:

import subprocess

def get_gpu_memory():
    """
    Utility function for getting available GPUs on a machine and determining
    if the card is available for running a SLEAP GPU workload (ie predict, train)

    See below for where this code basically comes from, just without as many comments...
    https://stackoverflow.com/questions/59567226/how-to-programmatically-determine-available-gpu-memory-with-tensorflow/59571639#59571639
    """

    # Subprocessing asks for a command in a list form. We'll be using the NVIDIA SMI tool
    # or the NVIDIA System Management Interface, which is a command line tool for interacting
    # with the available GPUs on a computer. There are some libraries out there that do this, but
    # they don't seem to all play nice on Windows, so manually processing the outputs seems
    # necessary.

    # Create a subprocess, meaning a program that will run in the background, from the main script
    # Our command arguments mean:
    # 1. nvidia-smi: Run the nvidia-smi command
    # 2. "--query-gpu=": How you ask nvidia-smi to ask about specific GPU properties
    # 3. index: Zero-indexed GPU ID for the machine. I'm not sure if the index alone is consistent. The docs say
    # that if there's a hard reset/reboot of the machine it could change, but that shouldn't happen during a run
    # anyways...
    # 4. memory.free: Free memory in megabytes
    # 5. memory.total: Total memory on the card in megabytes
    # 6. "--format=csv": How the data is displayed to a terminal for you I think/can be stored to file if you want
    command = ["nvidia-smi", "--query-gpu=index,memory.free,memory.total", "--format=csv"]

    # Using subprocess.run, the command above will be executed in the background! We tell the subprocess to run
    # and also capture the output of the command which will return a bytestring (ie b'message') to the memory_poll
    # variable
    memory_poll = subprocess.run(
        command,
        capture_output=True
        )

    # Grab the stdout, or standard out, from the subprocess call to nvidia-smi. This basically
    # just stores the output of our subprocess to a variable that we can do things to!
    subprocess_result = memory_poll.stdout

    # You can do this on the same line, but for explaination sake I separated it out to another
    # line.
    # The output of nvidia-smi is passed as an ascii encoded byte string. The subprocess_result
    # variable itself is a bytestring! So we can call the decode method on it directly and specify "ascii"
    # as the encoding of the data. Each line that's normally displayed as a pretty row/column format
    # is split up by a newline character, or the "\n" character. So we split up the decoded string by
    # these newline characters. Next, we only want useful information in the output list that split
    # gives us. The final line of nvidia-smi is an empty string for some reason, so by getting everything
    # up to the last element we remove it from the output! Next, we don't need the header of the csv
    # that says what the variables are since they won't be displayed to the user. These are the first values
    # in the list, so we can ignore them by slicing past them!
    memory_string = subprocess_result.decode("ascii").split("\n")[:-1][1:]

    # We want to be able to parse out which card has memory available. Another way of doing this would be
    # to see if the card has any working processes on it, but for now memory seems like a reasonable
    # thing to select. So a dictionary is made where the keys will be the ID of the card (remember,
    # zero indexed!) and the value will be the difference between total and available memory.
    memory_dict = {}

    # The split and slicing we did above makes it so we have each row in its own list. So for
    # each row in the decoded string...
    for row in memory_string:

        # Get the gpu ID, which is the first element of the list we make with split here
        gpu_id = row.split(",")[0]
        # Get the available amount of memory, the second element of the list made with split here
        available = row.split(",")[1]
        # Get total amount of memory, the final element of the list made with split here
        total = row.split(",")[2]

        # Allegedly, using string translate methods is faster than using a regex (not that it matters here...)
        # so make a "translate" map that will replace the megabyte characters (MiB) with equivalently sized
        # length of spaces (3 letters in MiB, so 3 spaces)
        available_translator = available.maketrans("MiB", "   ")
        total_translator = total.maketrans("MiB", "   ")

        # Perform the translation method and then strip the spaces from the resulting strings
        available_memory = available.translate(available_translator).strip()
        total_memory = total.translate(total_translator).strip()

        # Add the gpu_id as the key for the dictionary and subtract the total/available memory
        # after converting those strings into integers. A value of 0 indicates that the card
        # has all of it's memory availble! If all the memory is unallocated, we can then use
        # that to spawn off SLEAP jobs to the available graphics cards (hopefully...)
        memory_dict[gpu_id] = int(total_memory) - int(available_memory)

        return memory_dict


memory = get_gpu_memory()

print(memory)

talmo Aug 11, 2022
Maintainer

Hi Jeremy,

This is umm, pretty verbose... In any case, if you'd like to add this GPU memory detector to the sleap.nn.system module, please feel free to fork the repo, add a function (following the format in that file) and shoot us a pull request for review. We'd be happy to add an "auto-selector" feature based on this functionality.

Swing down to the lab if you're unsure of how to start :)

Talmo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Polling for Available GPUs when Batch Running SLEAP #777

{{title}}

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Polling for Available GPUs when Batch Running SLEAP #777

jmdelahanty Jun 3, 2022

Replies: 1 comment · 4 replies

talmo Jun 3, 2022 Maintainer

jmdelahanty Jun 13, 2022 Author

jmdelahanty Jul 12, 2022 Author

jmdelahanty Aug 11, 2022 Author

talmo Aug 11, 2022 Maintainer

jmdelahanty
Jun 3, 2022

Replies: 1 comment 4 replies

talmo
Jun 3, 2022
Maintainer

jmdelahanty Jun 13, 2022
Author

jmdelahanty Jul 12, 2022
Author

jmdelahanty Aug 11, 2022
Author

talmo Aug 11, 2022
Maintainer