The main point of this README is to provide you with the tools and knowledge of understanding where the main cause of increasing memory, CPU time, and GPU memory is likely to come from in your Python code that you submit as a job on the HEC:
- The case of tagging, using a machine learning model to infer/predict on data, or training machine learning models the time and memory requirements will be dependent on the batch size which is the amount of data you tag or train from in one go. The larger the batch size the more memory will be required, but the quicker the model should run/train. Note if your are processing/training on data that can vary in size e.g. text each batch size may contain, for instance, 32 sentences but those 32 sentences are very likely to vary in length thus the longer the sentences are the more memory is required to process them (this only really applies to deep learning methods that represent sentences as sequence of varying word lengths, if using a bag of words model/representation this problem can be ignored.).
- The case of Bag Of Words (BOW) model for Natural Language Processing (NLP) I would suspect the main memory requirement (not sure about time but I would suspect time as well) will be based on the size of your BOWs e.g. if you are going to represent all words in your BOW vector or just the top 100 words, the more words you represent the more memory you will require.
- For deep learning models e.g. RNNs, Transformers, etc when training them the memory requirements (if you cannot fit everything into a the batch size you want you have to accumulate the gradients thus taking longer to train.) will be based on the number of parameters in your model, which is mainly based on the hidden sizes of your model. Knowing some of the differences between the models is useful e.g. transformers increase in computation non-linearly with the number of words whereas RNN based models increase non-linearly with the dimension of the hidden size see table 1 of Attention Is All You Need, Vaswani et al. 2017. A good guide for understanding transformers is by Jay Alammar, for RNN and CNN Stanford has a good overview, and another good overview of RNN specifically LSTM Christopher Olah.
Some tools for discovering the amount of memory, CPU time, and transfer of data being used. For tools looking at the amount of memory used in the majority of cases we are mainly interested in the peak amount of memory used as we want to determine how much memory is required to run the job/code and thus determine how much memory we need to request for that job:
- Linux and Mac you can use the Python Native resource library, which uses the
getrusage
command from the underlying OS, the man page can be found here. The main memory statistic that is of use is the peak memory used to get this in KB for Linux and Bytes for Mac (this my change over time so check the man pages forgetrusage
on your OS) run at the end of your script:
from resource import getrusage, RUSAGE_SELF
getrusage(RUSAGE_SELF).ru_maxrss
- Windows you can use the psutil library, installed via pip
pip install psutil
or condaconda install psutil
, with the command at the end of your script:
import psutil
print(psutil.Process().memory_info().peak_wset)
- If you want to know how long a single batch takes to run wrapping it around a timing function should be all you need, as shown with this:
import time
import spacy
nlp = spacy.load("en_core_web_sm", disable=[ "tagger", "parser"])
data = ['a batch', 'of data']
start_time = time.perf_counter()
for processed_data in nlp.pipe(data):
continue
end_time = time.perf_counter()
print(end_time - start_time)
- For more information on how to incorporate timing functions into your code see the real python blog post. To get a more accurate measure of how long your code takes to run you could use the timeit interface which runs a block of code N times (default N = 1000) and reports how long the code took to run N times. An example of how to use the
timeit
function with repeat is shown below (repeat means that the code is run N * M times where N=100 and M=3 in this case and reports how long it took to run the code N times for each M), the Python documentation states that you should use the minimum reported time as the time to report. For a good guide on how to use the timeit function see this blog:
import timeit
setup_code = '''
import spacy
nlp = spacy.load("en_core_web_sm", disable=[ "tagger", "parser"])
data = ["a batch", "of data"]
'''
code_to_run = '''
for processed_data in nlp.pipe(data):
continue
'''
print(timeit.repeat(stmt=code_to_run, setup=setup_code, number=100, repeat=3))
# In my case this prints:
# [0.1814169869903708, 0.18344234900723677, 0.17951916699530557]
- All OS (for Windows it states it runs on Ubuntu in Windows WSL2) another tool is a profiler of which Scalene library has been recommended. Can be installed through pip e.g.
pip install scalene
. A profiler analysis your code line by line to show you which lines are using the most memory, CPU time, and transfer of data. Profilers are useful as they can show which lines of code are using up the most resources, this can then allow you to potentially improve your code. An example of how to use thescalene
profile and the output of the profiler can be found in ./scalene_example/README.md.
With all of these tools some can be useful when running your main code program rather than a test program to determine how much memory or time it is going to take. You shouldn't run a profiler like scalene
when running your main code as it will just slow down your program, whereas it would be useful to log every N batches how long your program has taken to run those N batches whether that is for tagging data or training a machine learning model so that you can get a better idea on when your code is going to finish.
6. Keeping track of GPU usage there are various tools that you can use, of which the fastai docs have a great guide on these various tools. The one tool to highlight for Python is the pynvml library as it can quickly query the GPU without having to call nvidia-smi
. At the current moment I do not know of a tool that outputs peak memory usage of the GPU, I know that PyTorch has various functions for max_memory_allocated and max_memory_reserved, but these functions do not take into account the memory required to run PyTorch on the GPU which can be 0.5GB.
- Use either resource library (for windows psutil library) to find an accurate measure of peak memory used compared to just relying on the Scalene library. I find that all other features of Scalene like Net (MB) to be accurate.
- Using a tool like pynvml for GPU memory monitoring compared to the information that is generated for you from the HEC about the GPU usage, see the Comparing Results section in the GPU example for more details why..
- Example of how to use the resource library and the Python time library to find the maximum amount of memory required for a tagging task and the average time it will take to process a batch. Code can be found at ./resource_and_time_example/README.md.
- Example of how to use Scalene can be found at ./scalene_example/README.md.
- Example of how to use pynvml, can be found at ./gpu_example/README.md. In this example we run Stanza on the GPU and show how you can find peak GPU memory usage and other GPU memory information. Further we show that you cannot estimated the GPU memory usage through RAM usage on a CPU version of the model.