Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mvnix-update creates multiple erroneous entries after hitting 404 pages #41

Open
evanjs opened this issue May 6, 2020 · 5 comments
Open

Comments

@evanjs
Copy link

evanjs commented May 6, 2020

I am currently trying to build a project with many dependencies using mavenix.

Note that several dependencies are either exclusive to repositories outside of maven central, while some are exclusive to our own internal nexus server.


Upon reaching the Sanity check, the following error is thrown:

  Sanity check...

parse error: Invalid string: control characters from U+0000 through U+001F must be escaped at line 1967, column 3301
Lockfile has been saved to /tmp/mavenix.lock

The last line is a result of my own modification of mavenix
The mavenix.lock file is typically thrown away, so I added a shim to help me debug this issue:

Saving the mavenix.lock file
diff --git a/mvnix-update b/mvnix-update
index 7bc403e..4407567 100755
--- a/mvnix-update
+++ b/mvnix-update
@@ -25,7 +25,8 @@ EOF
 # Setup work dir and cleanup
 WORK_DIR=$(mktemp -d --tmpdir mvnix-update.XXXXXX)
 cleanup() { { chmod -R +w "$WORK_DIR"; rm -rf "$WORK_DIR"; } || true; }
-trap 'trap - EXIT; cleanup; kill -- $$' EXIT
+backup() { { tlock="/tmp/mavenix.lock"; cp $tmp_lock_file $tlock; echo "Lockfile has been saved to $tlock"; } }
+trap 'trap - EXIT; backup; cleanup; kill -- $$' EXIT

 # Default values
 tmp_repo="$WORK_DIR/m2-repo"

After removing the duplicate entries, the resulting mavenix.lock file seems to work fine.

Here is an example duplicate entry
{
      "path": "systems/uom/systems-parent/2.1-SNAPSHOT",
      "content": "<!DOCTYPE HTML>
<html lang=\"en\">
    <head>
        <title>License4J - 404 Not Found</title>
        <meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\" />
        <meta name=\"Description\" content=\"License Manager and License Server. Java Software Product Licensing Solutions. Generation and validation of license text, license keys and floating licenses.\" />
        <meta name=\"Keywords\" content=\"java license manager, software protection, licensing library, license validator, activation, deactivation, license key, license text, floating license server\" />

        <link rel=\"icon\" href=\"/favicon.ico\" type=\"image/x-icon\" /> 
        <link rel=\"shortcut icon\" href=\"/favicon.ico\" type=\"image/x-icon\" />

        <link rel=\"stylesheet\" type=\"text/css\" href=\"/_css/main.css\" />
    </head>
    <body>
        <div class=\"mainblock\">
            <div class=\"headerblock\">
                <img class=\"l4jimgcls\" src=\"/_images/license4j.png\" width=\"400\" alt=\"License4J Java Software Licensing, License Manager\"/><br/>

                <div class=\"linksblock\">
                    <a href=\"/\" title=\"License Manager Home\">Home</a>
                    <a href=\"/features/\" title=\"License Manager Features\">Features</a>
                    <a href=\"/documents/\" title=\"License Manager Documentation\">Documents</a>
                    <a href=\"/download/\" title=\"License4j Download\">Download</a>
                    <a href=\"/buy/\" title=\"Buy License4J, License Manager Store\">Buy</a>
                </div>
            </div>

            <div style=\"position:relative;top:-90px;left:720px;font-size:8pt;width:150px;color:red;\">
                - Latest version <b>4.7.3</b> -
            </div>

            <div class=\"bodyblock\" style=\"text-align:center;padding:75px 0px 0px 0px;color:red;\">

                <h1 style=\"display:inline;\">404 Not Found</h1>

                <br /><br /><br /><br />
                
                <h2 style=\"display:inline;\">Sorry, the page you are looking for could not be found.</h2>
            </div>

            <div class=\"footerblock\">
    <a href=\"/zprivacy.php\" rel=\"nofollow\">Privacy</a>
    <a href=\"/ztermsofuse.php\" rel=\"nofollow\">Terms of Use</a>
    <a href=\"/contactus/\" rel=\"nofollow\">Contact Us</a>
    <!--
    <br /><br />
    <a href=\"https://www.facebook.com/License4J\"><img style=\"border:0px;\" src=\"/_images/facebook-icon.png\" alt=\"License4J Facebook Page\"/></a>
    <a href=\"https://twitter.com/License4J\"><img style=\"border:0px;\" src=\"/_images/twitter-icon.png\" alt=\"License4J Twitter Page\"/></a>
    -->
</div>
     
        </div>
        <script>
  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

  ga('create', 'UA-8359507-13', 'auto');
  ga('send', 'pageview');

</script>
    </body>
</html>"
}
This is preceded by a proper entry for the artifact
 {
      "path": "systems/uom/systems-parent/2.1-SNAPSHOT",
      "content": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>
	  <metadata modelVersion=\"1.1.0\">
		<groupId>systems.uom</groupId>
		<artifactId>systems-parent</artifactId>
		<version>2.1-SNAPSHOT</version>
		<versioning>
			<snapshot>
				<timestamp>20200405.213628</timestamp>
				<buildNumber>4</buildNumber>
			</snapshot>
			<lastUpdated>20200405213629</lastUpdated>
			<snapshotVersions>
				<snapshotVersion>
					<extension>pom</extension>
					<value>2.1-20200405.213628-4</value>
					<updated>20200405213628</updated>
				</snapshotVersion>
			</snapshotVersions>
		</versioning>
		</metadata>"
}

One thing to note is that <title>License4J - 404 Not Found</title> seems to appear in every one of the duplicate entries under metas.

@evanjs
Copy link
Author

evanjs commented May 6, 2020

After digging a little deeper, it looks like these might be exclusively from the license4j server.

The remote for license4j is not defined in default.nix, and is only defined in a package of this project.

However, adding it to the remotes specified in default.nix seemed to have no effect.

License4J error[WARNING] Checksum validation failed, expected <!DOCTYPE but is 3a08b17598cc540c37049370768c2d42c88ece7e from license4j-runtime-library for http://www.license4j.com/maven/javax/annotation/javax.annotation-api/1.2/javax.annotation-api 1.2.pom

@evanjs
Copy link
Author

evanjs commented May 7, 2020

Okay, so I've been going in and out of working configurations, trying to stabilize/find the right setup.

What I have found is that, during mvnix-update, various 404s will be cached instead of artifact definitions. I noticed this after I started using mvnix-update -s ~/.m2/repository.

I've been able to find some working settings, but haven't been committing my default.nix and etc, so I'm currently trying to find the working default.nix and etc for this particular project 😪

@evanjs
Copy link
Author

evanjs commented May 7, 2020

This seems to be a combination of misconfigured remotes/repos, and jq being given bad JSON (something like jqlang/jq#1049 (comment), perhaps).

In failing to resolve various artifacts, mvnix-update will enter the sanity check phase with several metas entries with a 404 page (HTML) for content instead of the expected XML definitions.

The various sed calls in mvnix-update are not set up to handle this, and thus, jq is given bad JSON, and dies while trying to process it.

It seems like we either need to escape the resulting HTML properly, or fail gracefully with a more helpful error message, potentially denoting the user's configuration might not be correct.

Escaping the resulting HTML might not be the best solution, as we won't be able to do anything with it, anyway...

We could also provide a way to exit or fail faster, possibly before the sanity check phase, as soon as all the dependencies have been "resolved".

@evanjs
Copy link
Author

evanjs commented May 15, 2020

This seemed to solve the issue in my case:

Contents of settings.xml
<settings>
  <mirrors>
          <mirror>
                  <id>nexus</id>
                  <mirrorOf>*</mirrorOf>
                  <name>Local Nexus Repository</name>
                  <url>http://172.16.0.208:8080/nexus/content/groups/public</url>
          </mirror>
  </mirrors>
</settings>

I still feel like this could be better handled, and either fail or notify the user when something like this (i.e. 404 pages being saved in place of pom files, etc) occurs.

@addict3d
Copy link

This seemed to solve the issue in my case: [using a] <mirror>

I still feel like this could be better handled, and either fail or notify the user when something like this (i.e. 404 pages being saved in place of pom files, etc) occurs.

(quote edited for brevity)

How should the mirror directive interact with remote URLs in mavenix?

We use a <mirrorOf>*</mirrorOf> directive to route all repositories to a combined 'repository cache' inside our repository manager.

Looking at a project that includes some remotes in its pom, it appears that the remote URLs go straight through and end up in the dependencies' derivations. So there is no mirror URL substitution at that level.

If all of those remotes were well behaved, then the build would succeed. As is, the project build fails from mismatched hashes in the dependencies.

mavenix-update fetches the correct hashes for each dep, but the builder for the dependency derivations attempts to fetch from the (remote repo) URLs in order, and a specific remote near the front of the list returns bogus content sometimes. It seems to return an HTTP 200 response and an XML document that says '404' (see OP). So the whole process fails due to hash mismatch.

Does obtaining the correct hash for a jar/pom dependency rely on the failure tolerances in maven itself, or does it depend on routing through the mirror?

If it is the second, should mavenix implement the same logic that maven uses? Go through the mirrors, matching remotes to mirrorOf rules, and rewrite the remote URL to the first matching mirror URL (if any)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants