-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes to enable custom structmaps to be more expressive #1404
Conversation
d3bc39a
to
5476b50
Compare
247309f
to
51b3793
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code looks good! A few questions inline.
I have one overall concern. If I understand ADR-0001 correctly, we'd like to (over time) remove most of the XML generation code from create_mets_v2
, and replace it with metsrw
usage. It seems like this PR does not advance that goal. It doesn't make that task too much harder, but it seems like if we're touching this script in a significant way, it's a good opportunity to move in that direction. I do understand though that metsrw
has some serious issues, that may prevent it from being used at all here, so maybe we need to just understand that while we have that goal, we can't advance it here (I don't know the urgency of this change, etc).
return | ||
if not os.path.isfile(mets_xsd): | ||
raise (VerifyMETSException("METS asset is unavailable")) | ||
xmlschema = etree.XMLSchema(etree.parse(mets_xsd)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm, does metsrw
not yet work for this case?
@@ -1,7 +1,9 @@ | |||
# -*- coding: utf8 | |||
from __future__ import unicode_literals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
return sum([len(files) for _, dir_, files in os.walk(path)]) | ||
|
||
@staticmethod | ||
def validate_mets(mets_xsd, mets_structmap): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, just to confirm, metsrw
can't be used here?
that they don't have to anticipate the Archivematica normalization process. | ||
""" | ||
return os.path.join( | ||
"", *[sanitizeName(name.encode("utf8")) for name in path.split(os.path.sep)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea here is that these files will already have been sanitized, so we're just matching up to the current state on the filesystem/database, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry i missed this @cole, yep! That's right. We'll have to keep this in mind when you complete your work on that microservice job.
tree.getroot() | ||
) # TDOD - not root to return, but sub element structMap | ||
# print etree.tostring(root) | ||
root = tree.getroot() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm, does metsrw not yet work for this case?
Hi @cole thanks for the quick input. For In prepping the way for
I approached it as taking one step forward. |
@ross-spencer fair enough! Moving create_mets_v2 to metsrw is a big job, maybe it's wishful thinking that we can tackle it incrementally. Thanks for the detailed issues you've created (archivematica/Issues#636 and related). |
@cole yeah, but I'm looking forward to making it happen though 🤓 . If there's an appetite to have |
Previously a user would need to submit a custom mets:structMap in Archivematica using a FILEID attribute for a file path which would be replaced during METS generation. The FILEID is only capable of storing a limited set of values and so here we replace that capability and ask users to use the CONTENTIDS attribute of a custom structMap which can encode a larger range of values to ensure that the final METS document remains valid. Unit tests have been created to support maintenance of this feature on an ongoing basis. Validation of structMap input is completed during transfer and ingest and this validation was previously been done by an external command called xmllint. Given the opportunity to make the validation of METS documents in Archivematica more consistent these calls in Archivematica jobs to xmllint have been replaced with more idiomatic calls to Python lxml validation methods. Validation of mets:structMap files is now completed against METS v1.12. Other small fixes in this commit include the ability for the custom structMap identifier to be updated dynamically where previously it would always be structMap_2 which would be incompatible if other automatically generated structMaps were included in the METS. Users should note that they can also label a custom structMap with something more meaningful to them using their own label and this will not be replaced.
51b3793
to
2a539c5
Compare
Fixes to enable custom structmaps to be more expressive
Previously a user would need to submit a custom mets:structMap in
Archivematica using a FILEID attribute for a file path which would
be replaced during METS generation. The FILEID is only capable of
storing a limited set of values and so here we replace that
capability and ask users to use the CONTENTIDS attribute of a custom
structMap which can encode a larger range of values to ensure that
the final METS document remains valid.
Unit tests have been created to support maintenance of this feature
on an ongoing basis.
Validation of structMap input is completed during transfer and ingest
and this validation was previously been done by an external command
called xmllint. Given the opportunity to make the validation of METS
documents in Archivematica more consistent these calls in
Archivematica jobs to xmllint have been replaced with more idiomatic
calls to Python lxml validation methods.
Validation of mets:structMap files is now completed against
METS v1.12.
Other small fixes in this commit include the ability for the custom
structMap identifier to be updated dynamically where previously it
would always be structMap_2 which would be incompatible if other
automatically generated structMaps were included in the METS. Users
should note that they can also label a custom structMap with
something more meaningful to them using their own label and this will
not be replaced.
From the original commit history
This PR is separated into multiple commits and it is recommended to tackle them one by one for the code reviewer:
create_mets_v2.py
code to support mode dynamic testing and add those tests.embed_custom_structmap
to enable more complex structures.create_mets_v2.py
to separate responsibilities somewhat.Feature: Draft documentation
Sample-data for testing can be found here: artefactual/archivematica-sampledata#52
Connected to archivematica/Issues#283
Connected to archivematica/Issues#633
Connected to archivematica/Issues#245
Connected to IISH#5
Connected to IISH#4