StatAnalysis Memory and Thresholding Issues #1076
-
Hi wonderful MET help team! I told you it wouldn't be long before you heard from me! I hope you're all doing well. I'm running into a few issues with StatAnalysis. I'm running StatAnalysis on a HPC via Singularity starting from the DTC Docker image. I am using version 10.0.0.
It runs in about 1 minute and 20 seconds (7000 lines). If I add -by FCST_VAR, OBS_SID, the job consumes all available 128 G of compute node RAM before crashing. Do you know why this would occur? I am following the example in the NRL tutorial StatAnalysis presentation (slide 14) that uses multiple -by statements. I tried turning on debugging (-v 4) and I don't get any related messages.
"DEBUG 4: ClimoCDFInfo::set_cdf_ta() -> For "cdf_bins" (20) and "center_bins" (false), defined climatology CDF thresholds: >=0.00000,>=0.05000,>=0.10000,>=0.15000,>=0.20000,>=0.25000,>=0.30000,>=0.35000,>=0.40000,>=0.45000,>=0.50000,>=0.55000,>=0.60000,>=0.65000,>=0.70000,>=0.75000,>=0.80000,>=0.85000,>=0.90000,>=0.95000,>=1.00000 This is cumulative, so when running over 5000 matched pairs, the last one has 100,000 thresholds attached to it. This creates a job so complex that the job does not finish. I was able to circumvent this problem by adding the flag -out_bin_size 1, but I still figured you would want to know about it. Thank you in advance for your help! Best, |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 7 replies
-
Hi Lindsay, These sound like performances issues that we should investigate. Thank you for bringing this to our attention. I think @JohnHalleyGotway would be best suited to look into it, however he is out on vacation this week. I will let him know about these issues and he can look into it when he returns. To help recreate these issues, could you provide us with the data that you are using and each command that you use that causes these slow run times? If the files aren't too big you could attach a zip or tar file to this discussion for each access. If they are very large, then you could upload them to FTP. Thanks, |
Beta Was this translation helpful? Give feedback.
-
@lindsayrblank, good news. I found a bug with a simple one-line fix that'll solve this excessive memory use problem. The void ClimoCDFInfo::set_cdf_ta(int n_bin, bool ¢er) function fails to initialize the cdf_ta array. Each time we call it, that arrays grows by 21 elements. Your job calls it 22,530 times, once for each combination of station id and variable name. So by the end, it has length 473,130 and we have 22,530 copies of it. That's what's hogging all the memory. Thanks for finding this issue! Testing with a patch, Stat-Analysis consumes around 2 GB for this job. And technically, it could consume less than 1/4 of that. The issue is the NumArray class... it allocates memory in blocks of size 1000. However, each of your cases is less than length 50. So there's lots of extra memory allocated that goes unused. We could consider reimplementing NumArray to consume less. I'll write up a GitHub issue describing the problem and commit a simple bugfix for the main_v10.0 branch. Once it's merged in, DockerHub will rebuild the main_v10.0 image and then you should be able to pull that via Singularity. |
Beta Was this translation helpful? Give feedback.
@lindsayrblank, good news. I found a bug with a simple one-line fix that'll solve this excessive memory use problem.
The void ClimoCDFInfo::set_cdf_ta(int n_bin, bool ¢er) function fails to initialize the cdf_ta array. Each time we call it, that arrays grows by 21 elements. Your job calls it 22,530 times, once for each combination of station id and variable name. So by the end, it has length 473,130 and we have 22,530 copies of it. That's what's hogging all the memory. Thanks for finding this issue!
Testing with a patch, Stat-Analysis consumes around 2 GB for this job. And technically, it could consume less than 1/4 of that. The issue is the NumArray class... it allocates memory in bl…