Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema validation seems to ignore formats #474

Open
tschmidtb51 opened this issue Sep 29, 2023 · 12 comments · Fixed by #475 · May be fixed by #517
Open

Schema validation seems to ignore formats #474

tschmidtb51 opened this issue Sep 29, 2023 · 12 comments · Fixed by #475 · May be fixed by #517
Assignees
Labels

Comments

@tschmidtb51
Copy link
Collaborator

It looks like the current version (3.0.0-beta1; but it might have existed in previous version), does not evaluated the format contraints. This was discovered when checking https://github.com/cisagov/CSAF/blob/971b6976d1ce52a710025aef920dba72d3a0675e/csaf_files/OT/white/2023/icsa-23-159-02.json#L269 resulted in a different amoung of errors while using the csaf_validator with the csaf_validator-service preset of mandatory (0) vs basic (1).

We need to find out, why the uri wasn't checked for the format.

@s-l-teichmann
Copy link
Contributor

Turns out that the Schema Compiler from the validation library we use has a flag to check the formats.
It is off by default.

PR #475 turns this flag on.

Without the PR:

./csaf_validator icsa-23-159-02.json
"icsa-23-159-02.json" passes the schema validation.

With the PR:

./csaf_validator icsa-23-159-02.json
schema validation errors of "icsa-23-159-02.json"
  * https://docs.oasis-open.org/csaf/csaf/v2.0/csaf_json_schema.json#: doesn't validate with https://docs.oasis-open.org/csaf/csaf/v2.0/csaf_json_schema.json#
  * /vulnerabilities/0/remediations/3/url: 'www.illustracameras.com' is not valid 'uri'

I'm not totally sure what the overall consequences of always turning on this flag are but I think we should go with it.

@s-l-teichmann s-l-teichmann removed the investigation_needed This item needs investigation label Sep 30, 2023
@bernhardreiter bernhardreiter added this to the Release 3.0.0-beta2 milestone Oct 12, 2023
@tschmidtb51
Copy link
Collaborator Author

This seem not to be resolved for the following file:

{
  "document": {
    "category": "csaf_base",
    "csaf_version": "2.0",
    "publisher": {
      "category": "other",
      "name": "Test",
      "namespace": "https://www.example.com"
    },
    "references": [
      {
        "summary": "Invalid URL",
        "url": "https://security.business.xerox.com/wp-content/uploads/2022/11/Xerox-Security-Bulletin-XRX22-026-FreeFlow®-Print-Server-v7.pdf"
      }
    ],
    "title": "Testfile with invalid URI",
    "tracking": {
      "current_release_date": "2023-11-30T12:25:34.622Z",
      "generator": {
        "date": "2023-11-30T12:27:08.481Z",
        "engine": {
          "name": "Secvisogram",
          "version": ".2.2.15"
        }
      },
      "id": "Test-2023-11-30",
      "initial_release_date": "2023-11-30T12:25:34.622Z",
      "revision_history": [
        {
          "date": "2023-11-30T12:25:34.622Z",
          "number": "1",
          "summary": "Initial version."
        }
      ],
      "status": "draft",
      "version": "1"
    }
  }
}

@tschmidtb51 tschmidtb51 reopened this Nov 30, 2023
@tschmidtb51 tschmidtb51 added bug Something isn't working service+dev labels Nov 30, 2023
@s-l-teichmann
Copy link
Contributor

Saved as test-2023-11-30.json

./csaf_validator -o short --validator=http://0.0.0.0:8082 --validator_preset=basic test-2023-11-30.json
"test-2023-11-30.json" passes the schema validation.
isValid: false
errors:
  message: must match format "uri"
  instance path(s):
    /document/references/0/url
"test-2023-11-30.json" does not pass remote validation.

@tschmidtb51
Copy link
Collaborator Author

@s-l-teichmann Thanks for adding the output. To be precise: I would expect that the file also fails the Go schema validation.

@s-l-teichmann
Copy link
Contributor

uri format validation is table driven in https://github.com/santhosh-tekuri/jsonschema/blob/9e31c47bfa9dfdb5ee04dec04fab51117e43c45a/format.go#L30 with the implementation
in https://github.com/santhosh-tekuri/jsonschema/blob/9e31c47bfa9dfdb5ee04dec04fab51117e43c45a/format.go#L390

Checked out https://github.com/santhosh-tekuri/jsonschema
Modified format_test.go:

diff --git a/format_test.go b/format_test.go
index b8fdded..018890a 100644
--- a/format_test.go
+++ b/format_test.go
@@ -238,6 +238,7 @@ func TestIsIPV6(t *testing.T) {
 
 func TestIsURI(t *testing.T) {
        tests := []test{
+               {"https://foobar®.pdf", false},
                {"http://foo.bar/?baz=qux#quux", true},
                {"//foo.bar/?baz=qux#quux", false}, // an invalid protocol-relative URI Reference
                {"\\\\WINDOWS\\fileshare", false},  // an invalid URI

Run

go test -run TestIsURI                                                         master!
--- FAIL: TestIsURI (0.00s)
    format_test.go:249: #0: "https://foobar®.pdf", valid false, got valid true
FAIL
exit status 1
FAIL	github.com/santhosh-tekuri/jsonschema/v5	0.007s

So the isUri does not do the expected.

@s-l-teichmann
Copy link
Contributor

As this essentially call url.Parse from the Go standard library I did the same:

https://go.dev/play/p/VPCRVKC9olw

package main

import (
	"fmt"
	"net/url"
)

func main() {
	fmt.Println(url.Parse("https://foobar®.pdf"))
}

Which results in:

https://foobar%C2%AE.pdf <nil>

So the ® is replaced by a valid URL encoding.

@s-l-teichmann
Copy link
Contributor

s-l-teichmann commented Nov 30, 2023

From here are more than one way to proceed:

  • Ask in the Go project why they think that https://foobar®.pdf is a valid URL.
  • As the Formats table has public access in the schema validation package we could register our own more restrict version of an URI validator.

Side note: Using url.ParseRequestURI might be the better way to check an URI but that also does the replacement (btdt).

@s-l-teichmann
Copy link
Contributor

s-l-teichmann commented Nov 30, 2023

BTW: Even Node and Java think this correct:

$ node
> console.log(new URL("https://security.business.xerox.com/wp-content/uploads/2022/11/Xerox-Security-Bulletin-XRX22-026-FreeFlow®-Print-Server-v7.pdf"))
URL {
  href: 'https://security.business.xerox.com/wp-content/uploads/2022/11/Xerox-Security-Bulletin-XRX22-026-FreeFlow%C2%AE-Print-Server-v7.pdf',
  origin: 'https://security.business.xerox.com',
  protocol: 'https:',
  username: '',
  password: '',
  host: 'security.business.xerox.com',
  hostname: 'security.business.xerox.com',
  port: '',
  pathname: '/wp-content/uploads/2022/11/Xerox-Security-Bulletin-XRX22-026-FreeFlow%C2%AE-Print-Server-v7.pdf',
  search: '',
  searchParams: URLSearchParams {},
  hash: ''
}
$ cat Uri.java
import java.net.URI;

public class Uri {
    public static void main(String[] args) throws Exception {
        URI u = new URI("https://foobar®.pdf");
        System.out.println(u);
    }
}
$ javac Uri.java
$ java Uri
https://foobar®.pdf

If i have tracked it correctly the validator service gets its idea of a valid URI from here: https://github.com/ajv-validator/ajv-formats/blob/4dd65447575b35d0187c6b125383366969e6267e/src/formats.ts#L116
I'm not sure if this is good idea. Currently I'm a bit tempted to label this as Not our bug.

@tschmidtb51
Copy link
Collaborator Author

Well, I did a quick read through the JSON Validation Specification. The Resource Identifiers section which includes the format uri refers to RFC 3986. The important part is section 3.3 in connexion with section 2.3 and section 2.2. So the Regex doesn't look that bad to me...

Also: If you look at node: They do percent encoding (and that is the correct way). The issue is that the JSON value should be percent encoded but it is not.

@tschmidtb51
Copy link
Collaborator Author

So the function isURI is implemented incorrectly. Instead of checking whether the value is a valid URI it checks whether it can be parsed into a valid URL...

@s-l-teichmann
Copy link
Contributor

Well, I did a quick read through the JSON Validation Specification. The Resource Identifiers section which includes the format uri refers to RFC 3986. The important part is section 3.3 in connexion with section 2.3 and section 2.2. So the Regex doesn't look that bad to me...

The problematic part is obviously the '®'. This character is captured by the last [^\s] part
of the regular expression. Without further limiting the input to the ASCII range
that would be accepted as valid as it is not white space.

PR #517 emulates the behavior of the validator service.

So the function isURI is implemented incorrectly.
Instead of checking whether the value is a valid URI it checks whether it can be
parsed into a valid URL...

@tschmidtb51 you may discuss this with maintainers of https://github.com/santhosh-tekuri/jsonschema

I'm not convinced that the behavior of the validator service is right ... anyway with the PR
applied here is the output of the local schema validation of csaf_validator:

./csaf_validator test-2023-11-30.json
schema validation errors of "test-2023-11-30.json"
  * https://docs.oasis-open.org/csaf/csaf/v2.0/csaf_json_schema.json#: doesn't validate with https://docs.oasis-open.org/csaf/csaf/v2.0/csaf_json_schema.json#
  * /document/references: doesn't validate with '/$defs/references_t'
  * /document/references/0/url: 'https://security.business.xerox.com/wp-content/uploads/2022/11/Xerox-Security-Bulletin-XRX22-026-FreeFlow®-Print-Server-v7.pdf' is not valid 'uri'

@bernhardreiter
Copy link
Member

We believe it should be changed upstream and we will split the issue up and summarize.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment