Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EntityResolver implementation to address Xerces issue in http->https redirection #116

Open
jenriquesoriano opened this issue Aug 4, 2022 · 0 comments
Labels
EIP-approved EIP approved by the Steering Group

Comments

@jenriquesoriano
Copy link

jenriquesoriano commented Aug 4, 2022

Background and Motivation:

In the v2.1-RC of the ETF, it's included the library xercesimpl-2.12.1.
This library (java) doesn't allow redirections if they use different protocols.
Considering this, the main issue that motivates this EIP is caused by redirections from the protocol HTTP to HTTPS, because even if HTTPS mirrors HTTP, it's still a different protocol.
As there is no way to disable this behaviour, it ends up appearing errors later on in the report because of it.

Proposed change

The proposed solution to this problem after some discussion about the possible alternatives, it's a change at ETF-level, implementing SAX Interface EntityResolver in Xerces classes handling URLs.

This interface would allow to particularize the resource resolution, in order to enable specific behaviours and solving the redirection issue by substituting the url to the redirected url before it is used.

The class EntityResolver would be similar to this:

public class RedirectEntityResolver implements EntityResolver {

public InputSource resolveEntity(String publicId, String systemId) throws SAXException, IOException {

URL obj = new URL(systemId);
HttpURLConnection conn = (HttpURLConnection) obj.openConnection();

int status = conn.getResponseCode();
if ((status != HttpURLConnection.HTTP_OK) &&
(status == HttpURLConnection.HTTP_MOVED_TEMP
|| status == HttpURLConnection.HTTP_MOVED_PERM
|| status == HttpURLConnection.HTTP_SEE_OTHER
// manage https redirection
)) {

String newUrl = conn.getHeaderField("Location");
conn = (HttpURLConnection) new URL(newUrl).openConnection();
}

return new InputSource(conn.getInputStream());

}

}

Then the entity resolver should be set in the specific xerces class implementing the Xerces.EntityResolver interface and register an instance with the SAX parser using the parser's setEntityResolver method.
The entity resolver will check that the only change is the change of protocol from HTTP to HTTPS and that the URL remains the same.
It will be configurable at application level for the download of schemas through the ETF config properties.

Alternatives

An alternative to this solution would be to generate a cache on server with a list of http locations, phisically downloading the file on the server. Then we could intercept any HTTP request and redirect to the cache on server to return the file phisically to the ETF.
This solution would require to register all URLs in a reverse proxy element, creating a virtualhost for each of these URLs.
Moreover as a disadvantage, it complicates the architecture and maintenance of the system.
However, it would generate a cache for these files.

Funding

Funding is provided by JRC through the INSPIRE validator team.

Additional information

This error seems to be able to occur in other places where is not possible to redirect from one protocol to another. So in a future, it could be needed a more general approach to this problem to be able to solve any redirections not depending on how the library reacts to this kind of redirections.

@cportele cportele added the EIP-approved EIP approved by the Steering Group label Aug 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EIP-approved EIP approved by the Steering Group
Projects
None yet
Development

No branches or pull requests

2 participants