-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/memcomp #4599
Fix/memcomp #4599
Conversation
CIME/baselines/performance.py
Outdated
tolerance = case.get_value("TEST_MEMLEAK_TOLERANCE") | ||
|
||
if tolerance is None: | ||
tolerance = 0.1 | ||
|
||
baseline_file = os.path.join(baseline_dir, "cpl-mem.log") | ||
if not os.path.exists(baseline_file): | ||
return tolerance, "MEMCOMP: No baseline memory file present" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jgfouca I would like this string to appear in the TestStatus file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jedwards4b , I think you need to return a boolean here, right?
return True, "MEMCOMP: No baseline memory file present"
I'm a little worried that this will hide errors. I would not have noticed that our perf test was not running long enough to create a cpl-mem file if it was passing. @jasonb5 , thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it should be boolean. The options for TEST_*_STATUS are PASS, FAIL and PEND - should I add a TEST_SKIP_STATUS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jedwards4b , were the tests that were failing performance tests or just regular tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just regular tests and I am only dealing with this in system_tests_common.py so I don't think there will be a problem for your performance tests. Now have:
PASS SMS_Ln9_Vnuopc.ne30_ne30_mg17.FW2000climo.derecho_intel.cam-outfrq9s MEMCOMP: No baseline memory file present
in TestStatus file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a script. It's users looking at TestStatus and seeing that cpl-mem.log is missing. Then they start contacting me asking what went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just remembered, a lot of people use cs.status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what dies the output of cs.status look like when cpl-mem.log is missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ERP_D_Ln9_Vnuopc.f19_f19_mg17.QPC6.derecho_intel.cam-outfrq9s (Overall: PASS) details:
PASS ERP_D_Ln9_Vnuopc.f19_f19_mg17.QPC6.derecho_intel.cam-outfrq9s CREATE_NEWCASE
PASS ERP_D_Ln9_Vnuopc.f19_f19_mg17.QPC6.derecho_intel.cam-outfrq9s XML
PASS ERP_D_Ln9_Vnuopc.f19_f19_mg17.QPC6.derecho_intel.cam-outfrq9s SETUP
PASS ERP_D_Ln9_Vnuopc.f19_f19_mg17.QPC6.derecho_intel.cam-outfrq9s SHAREDLIB_BUILD time=143
PASS ERP_D_Ln9_Vnuopc.f19_f19_mg17.QPC6.derecho_intel.cam-outfrq9s NLCOMP
PASS ERP_D_Ln9_Vnuopc.f19_f19_mg17.QPC6.derecho_intel.cam-outfrq9s MODEL_BUILD time=238
PASS ERP_D_Ln9_Vnuopc.f19_f19_mg17.QPC6.derecho_intel.cam-outfrq9s SUBMIT
PASS ERP_D_Ln9_Vnuopc.f19_f19_mg17.QPC6.derecho_intel.cam-outfrq9s RUN time=159
PASS ERP_D_Ln9_Vnuopc.f19_f19_mg17.QPC6.derecho_intel.cam-outfrq9s COMPARE_base_rest
PASS ERP_D_Ln9_Vnuopc.f19_f19_mg17.QPC6.derecho_intel.cam-outfrq9s GENERATE cesm2_3_alpha17c
PASS ERP_D_Ln9_Vnuopc.f19_f19_mg17.QPC6.derecho_intel.cam-outfrq9s BASELINE cesm2_3_alpha17b:
FAIL ERP_D_Ln9_Vnuopc.f19_f19_mg17.QPC6.derecho_intel.cam-outfrq9s MEMCOMP [Errno 2] No such file or directory: '/glade/campaign/cesm/cesmdata/cesm_baselines/cesm2_3_alpha17b/ERP_D_Ln9_Vnuopc.f19_f19_mg17.QPC6.derecho_intel.cam-outfrq9s/cpl-mem.log'
PASS ERP_D_Ln9_Vnuopc.f19_f19_mg17.QPC6.derecho_intel.cam-outfrq9s TPUTCOMP
PASS ERP_D_Ln9_Vnuopc.f19_f19_mg17.QPC6.derecho_intel.cam-outfrq9s MEMLEAK insufficient data for memleak test
PASS ERP_D_Ln9_Vnuopc.f19_f19_mg17.QPC6.derecho_intel.cam-outfrq9s SHORT_TERM_ARCHIVER
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is similar to what happens when the baseline comparison fails, because of missing baselines. So I'm OK with having it marked as FAIL. So far I've been ignoring the MEM and TPUTCOMP tests because we found we couldn't count on them. So I'm glad they are being worked on. It would be nice to be able to use them. So thanks for working on this.
I'll look into refactoring this as a change in cs.status ... |
Actually that won't solve the problem. People are still going to manually look at the TestStatus file and see that the Test indicates fail. |
Do we know about how long the run has to be for a memtest to be valid? Just skip all the mem testing if the test length is to short. Force it on with a command-line argument. |
Or...only run the memtest portion if that file is present. If something about the machine keeps the file from being created, like disk issues, then other things will also go wrong. |
@jedwards4b as I say above I'm OK with it marked as fail if it says it's because of missing baseline files. Personally I then do some grepping to make sure the FAIL's are only because of that kind of problem. But, I do like @rljacob idea of not doing these tests for cases that are too short. Marking them as such would also be fine as well. All -- I know we can change the thresholds for these tests, but is there a way to signal in the testlist*.xml file that you want them turned off? For example for tests that are too short or you just don't want to check it for particular tests? Still just glad this is being worked on! Thanks @jedwards4b |
I thought that this was going to be an easy fix - but it's not, I'm going to set it aside for now. |
Superceeded by #4691 |
Currently if a test is too short for the memcomp test it fails because the test baseline is missing. It should just skip the test in this case...
This PR causes the test to be skipped and pass, but I would like it to be more informative.
Test suite:
Test baseline:
Test namelist changes:
Test status: bit for bit
Fixes:
User interface changes?:
Update gh-pages html (Y/N)?: