Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

%TAG prefix does not accept all characters in ns-uri-char production #253

Open
gkellogg opened this issue Aug 8, 2022 · 2 comments
Open

Comments

@gkellogg
Copy link

gkellogg commented Aug 8, 2022

As noted in yaml/yaml-spec#268 (comment), Psych does not accept a %TAG prefix including a #, which seems to be due to the following code:

libyaml/src/scanner.c

Lines 2603 to 2627 in f8f760f

/*
* The set of characters that may appear in URI is as follows:
*
* '0'-'9', 'A'-'Z', 'a'-'z', '_', '-', ';', '/', '?', ':', '@', '&',
* '=', '+', '$', '.', '!', '~', '*', '\'', '(', ')', '%'.
*
* If we are inside a verbatim tag <...> (parameter uri_char is true)
* then also the following flow indicators are allowed:
* ',', '[', ']'
*/
while (IS_ALPHA(parser->buffer) || CHECK(parser->buffer, ';')
|| CHECK(parser->buffer, '/') || CHECK(parser->buffer, '?')
|| CHECK(parser->buffer, ':') || CHECK(parser->buffer, '@')
|| CHECK(parser->buffer, '&') || CHECK(parser->buffer, '=')
|| CHECK(parser->buffer, '+') || CHECK(parser->buffer, '$')
|| CHECK(parser->buffer, '.') || CHECK(parser->buffer, '%')
|| CHECK(parser->buffer, '!') || CHECK(parser->buffer, '~')
|| CHECK(parser->buffer, '*') || CHECK(parser->buffer, '\'')
|| CHECK(parser->buffer, '(') || CHECK(parser->buffer, ')')
|| (uri_char && (
CHECK(parser->buffer, ',')
|| CHECK(parser->buffer, '[') || CHECK(parser->buffer, ']')
)
))

According to theYAML 1.2 Spec the ns-uri-char does include #, which is missing from the scanner.

[39] ns-uri-char ::=
    (
      '%'
      [ns-hex-digit](https://yaml.org/spec/1.2.2/#rule-ns-hex-digit){2}
    )
  | [ns-word-char](https://yaml.org/spec/1.2.2/#rule-ns-word-char)
  | '#'
  | ';'
  | '/'
  | '?'
  | ':'
  | '@'
  | '&'
  | '='
  | '+'
  | '$'
  | ','
  | '_'
  | '.'
  | '!'
  | '~'
  | '*'
  | "'"
  | '('
  | ')'
  | '['
  | ']'

This prevents creating a TAG line such as the following:

%TAG ! http://www.w3.org/2001/XMLSchema#
@gkellogg
Copy link
Author

As a workaround, %TAG ! http://www.w3.org/2001/XMLSchema%23 works, but is not ideal, and shouldn't be required based on the grammar.

@gkellogg
Copy link
Author

The scanning issue extends to inline-tags, as well. If you parse the following

%TAG !xsd! http://www.w3.org/2001/XMLSchema%23
---
date: !xsd!date 2022-08-08

and re-serialize without the %TAG directive, you'll get the following:

date: !<http://www.w3.org/2001/XMLSchema%23date> 2022-08-08

Per the grammar, you should also be able to parse the following:

date: !<http://www.w3.org/2001/XMLSchema#date> 2022-08-08

But, it fails in a similar manner to that reported on %TAG. In this case, it is the c-verbatim-tag which includes ns-uri-char+ where the # is again excluded.

Working around this requires a pre-parsing step to replace these characters are appropriate before parsing and after serializing.

This is tested using Ruby Psych version 4.0.4, which wraps libyaml, and the issues seem to be entirely within the library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant