Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev/issue 8894 premis parsing #20

Closed
wants to merge 4 commits into from
Closed

Dev/issue 8894 premis parsing #20

wants to merge 4 commits into from

Conversation

mcantelon
Copy link
Member

Added PREMIS object/event parsers.

@Hwesta
Copy link
Contributor

Hwesta commented Feb 6, 2017

Note: Previous iterations of this feature found in #12 #13 and #14

Copy link
Contributor

@Hwesta Hwesta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the PREMIS work, but this might not be the direction we want to take PREMIS parsing.

Since this is the METS reader-writer, and METS can contain a variety of other metadata standards, it makes sense to leave the parsing of xmlData contents to something that users can easily add - some sort of plugin system. We could ship a couple of plugins for specific metadata standards ourselves (the ones we use, eg PREMIS & DublinCore), and this is a good candidate for a premis plugin, and so is https://github.com/uchicago-library/uchicagoldr-premiswork

I think this should wait to be merged until a longer term solution for how to handle parsing a variety of metadata format has been discussed, since that's out of scope for this PR.


def __init__(self, title=None, creator=None, subject=None, description=None, publisher=None, contributor=None, date=None, format=None, identifier=None, source=None, relation=None, language=None, coverage=None, rights=None):
for element in self.DC_ELEMENTS:
setattr(self, element, locals()[element])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great use case for kwargs instead of accessing locals() directly. We could replace the list of parameters with **kwargs, which would allow us to accept an arbitrary number of inputs. kwargs is a dictionary. Then the loop could be setattr(self, element, kwargs[element])

:raises exceptions.ParseError: If the root is not xmlData or doesn't contain a dublincore element.
"""
if root.tag != utils.lxmlns('mets') + 'xmlData':
raise exceptions.ParseError('DublinCoreXmlData can only parse xmlData elements with mets namespace.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If DC is a plugin and a therefore a separate module, This should be a different, DC-specific exception, possibly that subclasses ParseError.


def serialize(self):
nsmap = {'mets': utils.NAMESPACES['mets'], 'xsi': utils.NAMESPACES['xsi'], 'xlink': utils.NAMESPACES['xlink']}
root = etree.Element(utils.lxmlns('mets') + 'xmlData', nsmap=nsmap)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MDWrap already creates the xmlData element when serializing, and it makes sense to have DublinCore only handle elements in its own namespace. This could be merged with _serialize_dublincore below.

parser = etree.XMLParser(remove_blank_text=True)
if isinstance(document, six.string_types):
self.document = etree.fromstring(document, parser=parser)
elif isinstance(document, etree._Element):
self.document = document
self.mdtype = mdtype
self.data = data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than creating a new attribute, this could use document to store the child document - whether that's a string, ElementTree, or plugin class of the appropriate type.

@@ -291,13 +345,16 @@ class MDWrap(object):
:param str mdtype: The MDTYPE of XML document being enclosed. Examples
include "PREMIS:OBJECT" and "PREMIS:EVENT".
"""
def __init__(self, document, mdtype):
MDTYPE_CLASSES = {'DC': DublinCoreXmlData}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this! It will have to be improved to talk to a plugin interface though.

raise exceptions.ConstructError('event_type argument is required.')

if event_datetime is None:
raise exceptions.ConstructError('event_datetime argument is required.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels unpythonic - these parameters are required by the function, and if they're None, it will raise an error later.


event_type_el = etree.Element(utils.lxmlns('premis') + 'eventType')
event_type_el.text = self.event_type
root.append(event_type_el)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip: These three lines can be etree.SubElement(root, utils.lxmlns('premis') + 'eventType').text = self.event_type

@jrwdunham
Copy link
Contributor

This should be closed in favour of #34, which adds these changes to current master and makes them compatible with the recently-introduced plugin/ dependency injection pattern. Once the comments/ CR from this PR are transferred to PR 34, this can be closed.

@jrwdunham
Copy link
Contributor

Closed in favour of #34

@jrwdunham jrwdunham closed this Nov 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants