-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
License list leads to segmentation fault in old PrePARE versions. #376
Comments
Hi @glevava, I've tested this with CMOR v3.6.1 and I'm seeing segfaults if there is more than one entry in the list (I thought I'd tested this before the change to the CVs file was merged). We have tried this with the one of the testing builds on the PCMDI conda channel, cmor=3.6.1.2022.03.14.18.32.ga7f960d with python 3.8 (see here), and it worked fine, but we haven't been able to understand the change that leads to this behaving itself. CMOR v3.7.0 is fairly close as I understand it -- would updating to use this, once released, be an option? (I appreciate that this will probably require updates across all active ESGF nodes). |
I'm not sure this is due to the number of entries in the list as I have now two items in the list and it works like a charm. In the future (CMIP7?), we will change our workflow and the new release of CMOR could be an option yes. Nevertheless, I must admit we didn't use CMOR at IPSL nor at CNRM, our GCMs directly produce standardized CMIP data. So we only need CMOR to run PrePARE which is quite overkill. It would be appreciated to have PrePARE as standalone tool maybe in full Python. This would be also in line with a controlled vocabulary service we are thinking about in the ESGF in order to check and provide metadata at the different steps of the publication process. |
@glevava just an FYI, we (@matthew-mizielinski, @sashakames, @taylor13) have started thinking about how to cleanup what we currently have so to move forward in a more modular (and non-duplicated) way, a sketch of the potential path can be found in WCRP-CMIP/CMIP6Plus_CVs#1 (comment) - happy to hear your views on this (and ask away if the vagueness of the info requires more detail) |
@durack1 The link in your post above doesn't work. |
I've just had a slightly deeper dig into this as I think it is affecting ESGF publication at CEDA too, and I may have a possible mechanism for why this is failing at v3.6.1 (not sure why it is working with the nightly builds, unless there is a change in one of the limits); I've had a look at the way in which PrePARE is failing and by trial and error found that if I shorten the license strings such that the total number of characters across all four entries in the list is less than 1008 then PrePARE works fine (I'm guessing that there is a 1024 byte limit on the list holding the licenses and there is a ~5 byte cost for the separation of entries in the list -- adding an extra entry to the list seems to segfault at 1003 characters total). While I think that having the multiple patterns is by far the most elegant solution, but we could squeeze the four patterns we have into one more general one. The pattern below (line breaks inserted for readability) would cover the four cases, but it does leave the license text more open to typos.
A quick google suggests that the regex C library being used here is very limited in its capabilities, so I haven't been able to work out how to construct a suitable pattern with the different options rather than using Would it be worth making a change to work within (what I assume to be -- I could be wrong) the character limit for the license text in this repository now to ensure that ESGF publication chains can pick this up without having to wait for the next version of CMOR? We can then put the more elegant solution into post CMIP6 work. |
@sashakames, if you have a moment could you have a look and see whether the same issue is affecting LLNL publication? |
thanks @matthew-mizielinski for your efforts to diagnose and provide a likely explanation for the problem. If there is some urgency to fix this problem, I would do as you say. I wouldn't think treating the license string as a special case that could exceed 1024 characters would be that difficult, but @mauzey1 could advise. He can tell us whether we should wait for the more elegant solution or implement the quicker fix you suggested. |
@matthew-mizielinski thanks for this, in an offline discussion with @mauzey1 last night we came to the same conclusion that updating the license check string, from 4 to 1 should solve this problem, and ensure that folks using the latest version of the tables can use these with CMOR 3.6.1 and previous. What you have proposed above #376 (comment) (768 chars) is where I was headed. The nightly builds have tweaks in place that deal with this license change, @mauzey1 can elaborate if required. I believe this example is a primary reason why we need to move toward a CMOR4, to solve the string limits, but not break anything with CMIP6 that has been successfully using CMOR3.x |
The issue with using the 4 license templates with older versions of CMOR/PrePARE is that whenever a license is invalid, such as when using older licenses that don't match any of the 4 templates, CMOR will try creating an error message stating that the license doesn't match any of the listed values. The message contains a regex string from all of the values found in the registered data for the license attribute concatenated together. Since the templates are all >800 characters long, combining all 4 created a string too long for the C string arrays that were only allocated to 1024 chars. Note: Strings should actually be 1023 characters long since the 1024th char needs to be null ( |
@mauzey1 can you go ahead and implement the simplified license check - the license check string proposed above #376 (comment) (768 chars) please? I believe this will solve the two standing PrePARE and E3SM-2-0 issue |
Okay. So we want to get rid of the current 4 templates and replace them with the more flexible one proposed by @matthew-mizielinski? I agree with this. |
@mauzey1 exactly, hopefully this will resolve both issues at once |
And I think a non-conforming license should raise a warning, not an error. I don't think that in general (and I know for CMIP specifically) there is no mandate to force folks to adopt one of our recommended licenses. |
Hey @matthew-mizielinski I could take a look, but would need some example data that would break of which we have already replicated. We don't check replicas on the presumption that the original data had already passed prepare and the files are identical. That said I'm aware that the Canadians had an issue recently with PrePARE crashing so I could reach out to them on this issue. |
@mauzey1 I second the comment of @taylor13 that a non-conforming license should raise a warning, rather than an error and stopping execution. Also, to attempt to capture the shorten CC license identifier, can we use the template:
The tweak from |
Among the 4 possibles license values in CMIP6_CV.json, 2 lines lead to a "segmentation fault" when used by old PrePARE version (relying on CMOR version < 3).
The corresponding lines are:
cmip6-cmor-tables/Tables/CMIP6_CV.json
Line 3530 in 3591446
cmip6-cmor-tables/Tables/CMIP6_CV.json
Line 3532 in 3591446
By the way, the JSON syntax seems to be valid.
Error appears when PrePARE try to use the loaded CV with cmip6_cv methods like
cmip6_cv.check_requiredattributes(table)
, nevertheless, the CV loading few line code above works fine and returns 0:table = cmip6_cv.load_table(cmip6_table)
So, I guess there is/are some characteres in the two licenses that are mis-parsed by CMOR in some way.
Removing the two lines from the license list makes PrePARE working perfectly fine.
The text was updated successfully, but these errors were encountered: