You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It may be desirable (convenient?) to load any METS document into mets-reader-writer to perform validation against the METS schema.
Current behaviour
Mets-reader-writer places limits on what can be imported during its load processes by seeking the existence of various properties within the METS when it is loaded. The reader-writer could potentially be more general purpose.
As an example:
As mets-reader-writer loads XML from a file, it then calls the following functions:
In _parse_tree we seek the existence of a physical structMap and raise an error if one isn't found: raise exceptions.ParseError("No physical structMap found.")
A structmap however isn't a mandatory element of a METS file. And here (1.12) looking specifically for a physical structMap is also an additional stipulation affecting our ability to load any particular METS.
Steps to reproduce
A sample structmap that will fail validation is as follows:
<?xml version="1.0" encoding="utf-8"?>
<mets:metsxmlns:mets="http://www.loc.gov/METS/">
<mets:structMapTYPE="logical">
<mets:divTYPE="book"LABEL="How to create a hierarchical book">
<mets:divTYPE="page"LABEL="Cover">
<mets:fptrFILEID="cover.jpg"/>
</mets:div>
<mets:divTYPE="page"LABEL="Inside cover">
<mets:fptrFILEID="inside_cover.jpg"/>
</mets:div>
<mets:divTYPE="chapter"LABEL="Chapter 1">
<mets:divTYPE="page"LABEL="Page 1">
<mets:fptrFILEID="page_01.jpg"/>
</mets:div>
<mets:divTYPE="subchapter"LABEL="Subchapter 1.1">
<mets:divTYPE="page"LABEL="Page 2">
<mets:fptrFILEID="page_02.jpg"/>
</mets:div>
<mets:divTYPE="page"LABEL="Page 3">
<mets:fptrFILEID="page_03.jpg"/>
</mets:div>
<mets:divTYPE="page"LABEL="Page 4">
<mets:fptrFILEID="page_04.jpg"/>
</mets:div>
<mets:divTYPE="subchapter"LABEL="Subchapter 1.2">
<mets:divTYPE="page"LABEL="Page 5">
<mets:fptrFILEID="page_05.jpg"/>
</mets:div>
<mets:divTYPE="page"LABEL="Page 6">
<mets:fptrFILEID="page_06.jpg"/>
</mets:div>
<mets:divTYPE="page"LABEL="Page 7">
<mets:fptrFILEID="page_07.jpg"/>
</mets:div>
</mets:div>
<!-- Subchapter 1.2 -->
</mets:div>
<!-- Subchapter 1.1 -->
</mets:div>
<!-- Chapter 1 --><!-- Chapters 2 and 3, each with their own subchapters as in Chapter 1, omitted from this example. -->
<mets:divTYPE="afterword"LABEL="Afterword">
<mets:divTYPE="page"LABEL="Page 20">
<mets:fptrFILEID="page_20.jpg"/>
</mets:div>
</mets:div>
<!-- afterword -->
<mets:divTYPE="index"LABEL="Index">
<mets:divTYPE="page"LABEL="Index, page 1">
<mets:fptrFILEID="index_01.jpg"/>
</mets:div>
<mets:divTYPE="page"LABEL="Index, page 2">
<mets:fptrFILEID="index_02.jpg"/>
</mets:div>
</mets:div>
<!-- index -->
<mets:divTYPE="page"LABEL="Back cover">
<mets:fptrFILEID="back_cover.jpg"/>
</mets:div>
<!-- back cover -->
</mets:div>
<!-- book -->
</mets:structMap>
</mets:mets>
An attempt to load this will result in the following stack trace:
ross-spencer
changed the title
Problem: mets-reader-writer places its own restrictions on METS profiles it can validate
Problem: mets-reader-writer places its own restrictions on METS profiles can read and then validate
Apr 15, 2019
ross-spencer
changed the title
Problem: mets-reader-writer places its own restrictions on METS profiles can read and then validate
Problem: mets-reader-writer places its own restrictions on METS profiles can read and then validated
Apr 16, 2019
ross-spencer
changed the title
Problem: mets-reader-writer places its own restrictions on METS profiles can read and then validated
Problem: mets-reader-writer places its own restrictions on METS profiles is can read and then validate
Apr 16, 2019
ross-spencer
changed the title
Problem: mets-reader-writer places its own restrictions on METS profiles is can read and then validate
Problem: mets-reader-writer places its own restrictions on METS profiles it can read and then validate
Apr 16, 2019
ross-spencer
changed the title
Problem: mets-reader-writer places its own restrictions on METS profiles it can read and then validate
Problem: mets-reader-writer places its own restrictions on METS profiles it can read and then validate (metsrw)
Jan 16, 2020
Expected behaviour
It may be desirable (convenient?) to load any METS document into mets-reader-writer to perform validation against the METS schema.
Current behaviour
Mets-reader-writer places limits on what can be imported during its load processes by seeking the existence of various properties within the METS when it is loaded. The reader-writer could potentially be more general purpose.
As an example:
As mets-reader-writer loads XML from a file, it then calls the following functions:
fromtree
: hereWhich then calls:
_parse_tree
: here.In
_parse_tree
we seek the existence of a physical structMap and raise an error if one isn't found:raise exceptions.ParseError("No physical structMap found.")
A structmap however isn't a mandatory element of a METS file. And here (1.12) looking specifically for a
physical structMap
is also an additional stipulation affecting our ability to load any particular METS.Steps to reproduce
A sample structmap that will fail validation is as follows:
An attempt to load this will result in the following stack trace:
Your environment (version of Archivematica, OS version, etc)
metsrw-0.3.7
.Additional context
Validation could be done via
mets-rw
for custom structmaps rather than viaxmllint
inarchivematicaVerifyMETS.sh
.For Artefactual use:
Please make sure these steps are taken before moving this issue from Review to Verified in Waffle:
The text was updated successfully, but these errors were encountered: