Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] [E4 Xpath] Replace apache.commons.jxpath by javax.xml.xpath #2290

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

HannesWell
Copy link
Member

As said in #423 (comment) this is the current state of my stalled work to migrate E4 Xpath off the old and unmaintained apache.commons.jxpath library.
The basic idea is to provide a org.w3c.dom.Element view/wrapper for an EObject so that an javax.xml.xpath.Xpath can operate on it.
As mentioned this is heavily work in progres, not yet functional and a lot has to be cleaned up before this can be used (it contains a lot of try out code).

@ptziegler if you or anybody else like to take this over and complete it please feel free. I also would find this an interesting topic, but I have currently no time to work on this. But if you don't have time either, I might continue this by myself in the future.

I have also extracted some minor improvements that can be applied now already in #2289.

Copy link
Contributor

github-actions bot commented Sep 17, 2024

Test Results

 1 815 files  ±0   1 815 suites  ±0   1h 40m 58s ⏱️ + 11m 11s
 7 699 tests ±0   7 468 ✅  - 2  228 💤 ±0  1 ❌ ±0  2 🔥 +2 
24 258 runs  ±0  23 504 ✅  - 6  747 💤 ±0  3 ❌ +2  4 🔥 +4 

For more details on these failures and errors, see this check.

Results for commit a8e0384. ± Comparison against base commit 95cf53a.

♻️ This comment has been updated with latest results.

@ptziegler
Copy link
Contributor

I played around with this draft and I think the initial approach can work, with the main obstacles being:

  1. jxpath has the application as root, while xpath has the document. This leads to weird situation where the XPath "/" has to be translated to "/application/".
  2. The handling of the parent context is something that needs to be extensively tested. Also whether e.g. "/" is now the root of the parent context. I'm also not sure how this would be encoded in XML. I.e. whether both documents need to be merged or whether the "child" application needs to be appended to the "parent" application or if it's something completely different.

How should we proceed here? I don't think I can push directly to your branch. So should I create a separate branch where I do my own development on?

@HannesWell
Copy link
Member Author

I played around with this draft and I think the initial approach can work,

Awesome!

with the main obstacles being:

  1. jxpath has the application as root, while xpath has the document. This leads to weird situation where the XPath "/" has to be translated to "/application/".

Couldn't this be fixed by adding a placeholder/virtual/dummy document?

  1. The handling of the parent context is something that needs to be extensively tested. Also whether e.g. "/" is now the root of the parent context. I'm also not sure how this would be encoded in XML. I.e. whether both documents need to be merged or whether the "child" application needs to be appended to the "parent" application or if it's something completely different.

I cannot say much about this a.t.m.

How should we proceed here? I don't think I can push directly to your branch. So should I create a separate branch where I do my own development on?

Yes you have to create your own branch and PR, but you could add a link to this. If you have created it, this can be closed.

@laeubi
Copy link
Contributor

laeubi commented Sep 24, 2024

I'm also not sure how this would be encoded in XML. I.e. whether both documents need to be merged or whether the "child" application needs to be appended to the "parent" application or if it's something completely different.

Can you explain what exactly is the problem/question?

jxpath has the application as root, while xpath has the document. This leads to weird situation where the XPath "/" has to be translated to "/application/"

I would expect that application is the document element or do I understand the problem wrong?

@ptziegler
Copy link
Contributor

Can you explain what exactly is the problem/question?

I'm simply not sure how the parent context is handled in jxpath. But until we can properly ready the current context, this doesn't have a very high priority on my side.

I would expect that application is the document element or do I understand the problem wrong?

Given the following XML document:

<foo>
    <bar/>
    <bar/>
    <bar/>
</foo>

When converted to a Java document, you get the following object structure:

- Document
  - Element (foo)
    - Element (bar)
    - Element (bar)
    - Element (bar)

Evaluating the XPath "/" on any node returns Document and not Element(foo).

@laeubi
Copy link
Contributor

laeubi commented Sep 24, 2024

If I understand right we already implement the DOM API here (maybe something better placed at EMF directly? @merks ?) so can't Application implement Document + Element here and simplify return it as document and the root element (might be a bit counterintuitive but probably works).

@merks
Copy link
Contributor

merks commented Sep 24, 2024

Stop to ask, why are there so many alternatives to DOM? (Because it's horrible?!)

Goodness knows why folks could not have just use EMF's support for paths?

  • org.eclipse.emf.ecore.resource.impl.ResourceImpl.getEObject(List)
  • org.eclipse.emf.ecore.InternalEObject.eObjectForURIFragmentSegment(String)

Probably wasn't pretty enough? Not standard enough? Note powerful enough? Best to hide EMFness?

In any case, no one ever asked me for advice or suggestions, so I have no clue how it was necessary to have the full power of XPath available to reference an object when there are far simpler mechanisms available for doing just that.

I definitely don't want to push this problem down into EMF. People have asked for many things, but never this thing.

@laeubi
Copy link
Contributor

laeubi commented Sep 24, 2024

Stop to ask, why are there so many alternatives to DOM? (Because it's horrible?!)

EMF is a DOM as well, it just don't implement the (XML) DOM API ;-)

Probably wasn't pretty enough? Not standard enough? Note powerful enough? Best to hide EMFness?

I have no clue but can only assume because the e4 xmi is actually an XML document and XPath is the standard for XML .. anyways Xpath itself do not mandates to use DOM, it supports other (xml) representations as well, thats why I previously mentioned that we probably just need to copy the parser part, because in the end we only need to parse an Xpath Expression and map it to the (EMF) DOM thats what actually is done as of today.

Sadly I have found little to no documentation on this feature so its quite hard to guess what must be supported and how exactly it is mapped or what where the reasons for a design decision. Also the UI for this is really bare....

@mickaelistria
Copy link
Contributor

Probably wasn't pretty enough? Not standard enough? Note powerful enough? Best to hide EMFness?

I'm pretty sure it's just because XPath is standard and popular enough to assume most developers will feel comfortable enough with it for this case.
I'm wondering whether the full power of XPath is required here. Looking at all found instances on GitHub https://github.com/search?q=path%253A*.e4xmi+xpath&type=code&ref=advsearch , we can see only a few basic patterns: xpath:/ (root, 95% of occurrences), xpath://mainMenu (select all), xpath:/mainMenu/child[1], xpath://*[@elementId='fragment.contributedMenu1' or @elementId='fragment.contributedMenu2'] (select by attribute). We could consider just sticking to supporting that list.

If EMF already support well an XPath-like syntax to select node and this syntax is xpath enough to expect most users wouldn't need to change their extensions to get the same node selected, we could consider just dropping XPath and adopting the EMF way.
If the is another close but not directly complatible syntax supported natively in EMF, we could consider converting XPath to this syntax.
@merks What do you think is possible/best here to rely on more native EMF features?

@merks
Copy link
Contributor

merks commented Sep 24, 2024

The XPath library being used has the benefit that it operates on any DOM-like structure. The built-in XPath support works only on org.w3c.dom. That's simply nasty such that one must try to serialize the model to a DOM and keep a mapping to work your way back. I haven't looked at the details of prototype. It's not clear to me that cloning jxpath and deleting the unused content would not be the easier approach. Either way, there is a whole whack of complex crap that needs to be maintained...

I think at this point, we are stuck needing to support XPath expressions exactly as they are current used, so we must parse them and evaluate them somehow. Alternative approaches are water under the bridge that can't be pushed back upstream. (I bring it up merely because I do not want EMF, i.e., me personally, to burdened with this, but I'm happy to help the Platform wherever I can.)

@ptziegler
Copy link
Contributor

It's not clear to me that cloning jxpath and deleting the unused content would not be the easier approach. Either way, there is a whole whack of complex crap that needs to be maintained...

I believe both approaches are feasible but at least in the long term, we should try to remove the reference JXPath. But given that this will take quite a lot of effort, it also makes sense to simply fork the JXPath project until then.
Perhaps we also come to the conclusion that the XPath approach doesn't work out as well as we had hoped, in which case we still have the alternative to fall back on. I'll try to draft a PR for the fork, separately from the PR for using XPath.

@merks
Copy link
Contributor

merks commented Sep 24, 2024

FYI, in Orbit I build axis1 (horrible but BIRT uses it) from source and publish it to repo.eclipse.org so that we can use BND to create an OSGi build from it as if it were published to Maven central:

https://github.com/eclipse-orbit/orbit-simrel/blob/main/maven-deploy/MavenAxis.jenkinsfile

We could do that with jxpath, or a fork of jxpath, perhaps a fork where only the "CVE" functionality is disabled so that there really isn't much to maintain at all, and it could be rebased on newer versions of jxpath in the future.

Just a thought...

@ptziegler
Copy link
Contributor

We could do that with jxpath, or a fork of jxpath, perhaps a fork where only the "CVE" functionality is disabled so that there really isn't much to maintain at all, and it could be rebased on newer versions of jxpath in the future.

That's effectively the case with org.apache.commons.jxpath v1.3.0.v200911051830 that was used previously. Because this plugin doesn't import e.g. the javax.servlet packages, all of the "remote execution" CVEs are effectively irrelevant, as the application would already fail with an exception, when trying to initialize the servlets.

@laeubi
Copy link
Contributor

laeubi commented Sep 24, 2024

Because this plugin doesn't import e.g. the javax.servlet packages, all of the "remote execution" CVEs are effectively irrelevant, as the application would already fail with an exception, when trying to initialize the servlets.

As jxpath does not run inside a servelt / EE Container, they are effectively irrelevant for where we use that as well ... ;-)

In any case, embedding the code seem more suitable than building something that is similar but named the same as an official artifact.

@HannesWell
Copy link
Member Author

Today I stumbled upon the jaxen library, which says:

The Jaxen XPath Engine for Java
[...]
It is also possible to write adapters that treat non-XML trees such as compiled Java byte code
or Java beans as XML, thus enabling you to query these trees with XPath too.

It sounds like this maybe could be an alternative for jXpath. It isn't very active either but it's latest release is only two years old.
And the good thing is, it's already in simrel-Orbit and seems to have no extra dependencies https://github.com/eclipse-orbit/orbit-simrel/blob/75cc23701b2417f530efd3ce51763aef09c5a206/maven-osgi/tp/Maven.target#L678-L680

@ptziegler
Copy link
Contributor

ptziegler commented Oct 8, 2024

Today I stumbled upon the jaxen library, which says:

The Jaxen XPath Engine for Java
[...]
It is also possible to write adapters that treat non-XML trees such as compiled Java byte code
or Java beans as XML, thus enabling you to query these trees with XPath too.

It sounds like this maybe could be an alternative for jXpath. It isn't very active either but it's latest release is only two years old. And the good thing is, it's already in simrel-Orbit and seems to have no extra dependencies https://github.com/eclipse-orbit/orbit-simrel/blob/75cc23701b2417f530efd3ce51763aef09c5a206/maven-osgi/tp/Maven.target#L678-L680

I gave it a quick try, but I don't believe it works as well as it should... For example, you can't "skip" nodes, so expressions like "children/mainMenu" work, but "//mainMenu" doesn't. Getting the current object via "/" also doesn't work...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants