Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem: It is possible to create invalid pointer files relatively easily as they use restricted character sets for transfer/package IDs #380

Closed
ross-spencer opened this issue Dec 13, 2018 · 4 comments
Labels
Type: bug A flaw in the code that causes the software to produce an incorrect or unexpected result.

Comments

@ross-spencer
Copy link
Contributor

ross-spencer commented Dec 13, 2018

Expected behaviour

It shouldn't be possible to create invalid pointer files.

Current behaviour

It is possible to create invalid pointer files. Invalid pointer files do not result in a failure in 1.7.x or 1.8.x because the result of the validation does not get returned to the MCP Server.

The affected fields are mets:file ID and mets:fptr FILEID and both of these have very specific data types that are required to validate against the xs:NCName type.

There is a good discussion about how restrictive NCName fields are here: https://stackoverflow.com/questions/1631396/what-is-an-xsncname-type-and-when-should-it-be-used

I have completed a worked example/demo below.

If a user inputs a value into the transfer name that is not a valid xs:NCName the pointer file will be created invalid.

Steps to reproduce

image

Create a transfer with the transfer name 34750.zip (the package must be set to be compressed) and monitor the storage service logs. The output will be as follows:

XMLSchema (xsd) Error(s):
ERROR ON LINE 0: Element '{http://www.loc.gov/METS/}file', attribute 'ID': '34750.zip-8f4e0ab3-2eb0-4baf-a6fb-6f9502f6b7d5' is not a valid value of the atomic type 'xs:ID'.
ERROR ON LINE 0: Element '{http://www.loc.gov/METS/}fptr', attribute 'FILEID': '34750.zip-8f4e0ab3-2eb0-4baf-a6fb-6f9502f6b7d5' is not a valid value of the atomic type 'xs:IDREF'.

This can be seen when run against xmllint as well nb. investigation needs to be done into the other errors:

xmllint --schema mets.xsd pointer.ffae19d9-e0f7-47f6-a2d1-b1c30ec1480e.xml --noout
pointer.ffae19d9-e0f7-47f6-a2d1-b1c30ec1480e.xml:8: element object: Schemas validity error : Element '{info:lc/xmlns/premis-v2}object', attribute '{http://www.w3.org/2001/XMLSchema-instance}type': The QName value '{info:lc/xmlns/premis-v2}file' of the xsi:type attribute does not resolve to a type definition.
pointer.ffae19d9-e0f7-47f6-a2d1-b1c30ec1480e.xml:8: element object: Schemas validity error : Element '{info:lc/xmlns/premis-v2}object': The type definition is absent.
pointer.ffae19d9-e0f7-47f6-a2d1-b1c30ec1480e.xml:144: element file: Schemas validity error : Element '{http://www.loc.gov/METS/}file', attribute 'ID': '34750.zip-ffae19d9-e0f7-47f6-a2d1-b1c30ec1480e' is not a valid value of the atomic type 'xs:ID'.
pointer.ffae19d9-e0f7-47f6-a2d1-b1c30ec1480e.xml:152: element fptr: Schemas validity error : Element '{http://www.loc.gov/METS/}fptr', attribute 'FILEID': '34750.zip-ffae19d9-e0f7-47f6-a2d1-b1c30ec1480e' is not a valid value of the atomic type 'xs:IDREF'.
pointer.ffae19d9-e0f7-47f6-a2d1-b1c30ec1480e.xml fails to validate

Your environment (version of Archivematica, OS version, etc)

Archivematica 1.7 and qa/1.x

Worked example

The following example demonstrates valid and invalid combinations that can be used in the transfer name field. I demonstrate this using the xs:types from the pointer file (xs:ID, xs:IDREF), and the global xs:NCName type which they both inherit. It is possible to see their equivalence this way. For each test I use xmllint. The templates below might be useful for understanding how restrictive this field is, how we may either validate, restrict, or modify the data input in the the transfer name field.

XML:

<?xml version="1.0" encoding="UTF-8"?>
<transfername>
  <id-name>transfer</id-name>
  <id-name>-transfer</id-name>
  <id-name>.transfer</id-name>
  <id-name>1-transfer</id-name>
  <id-name>_transfer</id-name>
  <id-name>transfer.zip</id-name>
  <id-name>1transfer</id-name>
  <id-name>2600.zip</id-name>
  <id-name>āēīōū</id-name>
  <id-name>a@</id-name>
  <id-name>has#</id-name>
  <id-name>handle/1234</id-name>
  <!---->
  <idref-name>transfer</idref-name>
  <idref-name>-transfer</idref-name>
  <idref-name>.transfer</idref-name>
  <idref-name>1-transfer</idref-name>
  <idref-name>_transfer</idref-name>
  <idref-name>transfer.zip</idref-name>
  <idref-name>1transfer</idref-name>
  <idref-name>2600.zip</idref-name>
  <idref-name>āēīōū</idref-name>
  <idref-name>a@</idref-name>
  <idref-name>has#</idref-name>
  <idref-name>handle/1234</idref-name>
  <!---->
  <idnc-name>transfer</idnc-name>
  <idnc-name>-transfer</idnc-name>
  <idnc-name>.transfer</idnc-name>
  <idnc-name>1-transfer</idnc-name>
  <idnc-name>_transfer</idnc-name>
  <idnc-name>transfer.zip</idnc-name>
  <idnc-name>1transfer</idnc-name>
  <idnc-name>2600.zip</idnc-name>
  <idnc-name>āēīōū</idnc-name>
  <idnc-name>a@</idnc-name>
  <idnc-name>has#</idnc-name>
  <idnc-name>handle/1234</idnc-name>
</transfername>

XSD:

<?xml version="1.0" encoding="UTF-8" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
	<xs:element name="transfername">
		<xs:complexType>
			<xs:sequence>
				<!--xs:ID https://www.oreilly.com/library/view/xml-schema/0596002521/re74.html-->
				<!--xs:IDREF https://www.oreilly.com/library/view/xml-schema/0596002521/re75.html -->
				<!--xs:NCName https://www.oreilly.com/library/view/xml-schema/0596002521/re82.html -->
				<xs:element name="id-name" type="xs:ID" minOccurs="1" maxOccurs="unbounded" />
				<xs:element name="idref-name" type="xs:IDREF" minOccurs="1" maxOccurs="unbounded" />
				<xs:element name="idnc-name" type="xs:NCName" minOccurs="1" maxOccurs="unbounded" />
			</xs:sequence>
		</xs:complexType>
	</xs:element>
</xs:schema>

Output:

ross-spencer@artefactual:~/Desktop/temp/transfername$ xmllint --schema transfername.xsd transfername.xml --noout
transfername.xml:4: element id-name: Schemas validity error : Element 'id-name': '-transfer' is not a valid value of the atomic type 'xs:ID'.
transfername.xml:5: element id-name: Schemas validity error : Element 'id-name': '.transfer' is not a valid value of the atomic type 'xs:ID'.
transfername.xml:6: element id-name: Schemas validity error : Element 'id-name': '1-transfer' is not a valid value of the atomic type 'xs:ID'.
transfername.xml:9: element id-name: Schemas validity error : Element 'id-name': '1transfer' is not a valid value of the atomic type 'xs:ID'.
transfername.xml:10: element id-name: Schemas validity error : Element 'id-name': '2600.zip' is not a valid value of the atomic type 'xs:ID'.
transfername.xml:12: element id-name: Schemas validity error : Element 'id-name': 'a@' is not a valid value of the atomic type 'xs:ID'.
transfername.xml:13: element id-name: Schemas validity error : Element 'id-name': 'has#' is not a valid value of the atomic type 'xs:ID'.
transfername.xml:14: element id-name: Schemas validity error : Element 'id-name': 'handle/1234' is not a valid value of the atomic type 'xs:ID'.
transfername.xml:17: element idref-name: Schemas validity error : Element 'idref-name': '-transfer' is not a valid value of the atomic type 'xs:IDREF'.
transfername.xml:18: element idref-name: Schemas validity error : Element 'idref-name': '.transfer' is not a valid value of the atomic type 'xs:IDREF'.
transfername.xml:19: element idref-name: Schemas validity error : Element 'idref-name': '1-transfer' is not a valid value of the atomic type 'xs:IDREF'.
transfername.xml:22: element idref-name: Schemas validity error : Element 'idref-name': '1transfer' is not a valid value of the atomic type 'xs:IDREF'.
transfername.xml:23: element idref-name: Schemas validity error : Element 'idref-name': '2600.zip' is not a valid value of the atomic type 'xs:IDREF'.
transfername.xml:25: element idref-name: Schemas validity error : Element 'idref-name': 'a@' is not a valid value of the atomic type 'xs:IDREF'.
transfername.xml:26: element idref-name: Schemas validity error : Element 'idref-name': 'has#' is not a valid value of the atomic type 'xs:IDREF'.
transfername.xml:27: element idref-name: Schemas validity error : Element 'idref-name': 'handle/1234' is not a valid value of the atomic type 'xs:IDREF'.
transfername.xml:30: element idnc-name: Schemas validity error : Element 'idnc-name': '-transfer' is not a valid value of the atomic type 'xs:NCName'.
transfername.xml:31: element idnc-name: Schemas validity error : Element 'idnc-name': '.transfer' is not a valid value of the atomic type 'xs:NCName'.
transfername.xml:32: element idnc-name: Schemas validity error : Element 'idnc-name': '1-transfer' is not a valid value of the atomic type 'xs:NCName'.
transfername.xml:35: element idnc-name: Schemas validity error : Element 'idnc-name': '1transfer' is not a valid value of the atomic type 'xs:NCName'.
transfername.xml:36: element idnc-name: Schemas validity error : Element 'idnc-name': '2600.zip' is not a valid value of the atomic type 'xs:NCName'.
transfername.xml:38: element idnc-name: Schemas validity error : Element 'idnc-name': 'a@' is not a valid value of the atomic type 'xs:NCName'.
transfername.xml:39: element idnc-name: Schemas validity error : Element 'idnc-name': 'has#' is not a valid value of the atomic type 'xs:NCName'.
transfername.xml:40: element idnc-name: Schemas validity error : Element 'idnc-name': 'handle/1234' is not a valid value of the atomic type 'xs:NCName'.
transfername.xml fails to validate

For Artefactual use:
Please make sure these steps are taken before moving this issue from Review to Verified in Waffle:

  • All PRs related to this issue are properly linked 👍
  • All PRs related to this issue have been merged 👍
  • Test plan for this issue has been implemented and passed 👍
  • Documentation regarding this issue has been written and it has been added to the release notes, if needed 👍
@ross-spencer
Copy link
Contributor Author

@ross-spencer ross-spencer added Jisc RDSS University of Melbourne University of Melbourne Type: bug A flaw in the code that causes the software to produce an incorrect or unexpected result. labels Dec 13, 2018
@ross-spencer ross-spencer changed the title Problem: It is too easy to create invalid pointer files as they require more restricted character sets for transfer/package names Problem: It is possible to create invalid pointer files relatively easily as they use restricted character sets for transfer/package IDs Dec 13, 2018
@ross-spencer
Copy link
Contributor Author

This issue might also impact the approach to: artefactual/archivematica-storage-service#324

@evelynPM
Copy link

See proposed fix for #660.

@ross-spencer
Copy link
Contributor Author

Closing in favor of #660. The fix there does resolve this, What I missed in my analysis above is that between the beginning of the transfer and the end of the transfer, the transfer name is 'cleaned up'. So many of these characters will be removed making it okay to wrap the transfer name in the METS file.

@ross-spencer ross-spencer removed Jisc RDSS Severity: medium An inconvenient situation where the software is usable but inconvenient or slow. triage-release-1.11 labels May 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: bug A flaw in the code that causes the software to produce an incorrect or unexpected result.
Projects
None yet
Development

No branches or pull requests

3 participants