Skip to content

[3.6 M3] Support for XPath, XSLT and XQuery v3.0

marianogonzalez edited this page Nov 4, 2014 · 5 revisions

Motivation / Context

The ESB currently has a very outdated and inconsistent support for the XML standards of XSLT, XPath and XQuery (from now on the “XML Specs”). As a result:

  • We have only partial support for XQuery 1.0 and XSLT 2.0.
  • The library we’re using to support XQuery and XSLT is 6 years old
  • We support different versions and implementation of XPath depending on the context
  • When processing an XSLT template, we support XPath 2.0 through Saxon 9.1 (6 years old)
  • When using the xpath() MEL function or the xpath: expression evaluator, we only support XPath 1.0 through jaxen 1.1, which is not only 7 years old but also an abandoned project
  • The xpath() function and expression evaluator are a box of chocolates: you never know what you’re gonna get. The return type changes depending on how many results the query finds and if it’s either a simple type or a node.
  • The jxpath-filter and jxpath-extractor-transformer elements, which are supposed to only process POJOS, falls back to an actual XPath 1.0 expression through the use of dom4j if the message payload is an XML document

The goal is to:

  • Provide state of the art, 100% compliance support for XPath 2.0, XSLT 2.0, and XQuery 1.0
  • Provide basic support for version 3.0 of the XML specs
  • Reuse the existing XSLT and XQuery elements and functions we have (xpath-filter, xslt-transformer, xquery-transformer, etc) so that they can be used regardless of the targeted version spec Deprecate our current XPath support and provide a new, more usable and consistent solution allowing to use either XPath 2.0 or 3.0
  • Deprecate all JXPath support in favor of simple MEL expressions.

Uses cases

  • As a developer, I want to have 100% compliant support for XSLT and XPath 2.0
  • As a developer, I want to have 100% compliant support for XQuery 1.0
  • As a developer, I want to have basic support for version 3.0 of the XML specs
  • As a developer, I want to be able to reuse the current transformers, processors and filters currently available for XSLT support. I’ll select the spec version I want through the version attribute in the stylesheet.
  • As a developer, I want to be able to reuse the current transformers, processors and filters currently available for XQuery support. I’ll select the spec version I want through the version attribute in the query.
  • As a developer, I want a new MEL function to perform XPath 3.0 queries, with a predictable return type
  • As a developer, I want to keep backwards compatibility in my current uses of XPath, XSLT and XQuery

Before we Begin

A few words on the 3.0 spec

The XML specs 3.0 are still a recommendation, not yet approved by the W3C committee. However, they’re on “last call” status, which means that they’re highly unlikely to receive any substantial changes. About XPath 3.0

XPath 3.0 is backwards compatible with 2.0. However, it’s not fully compatible with version 1.0. Although a compatibility mode exists, it’s doesn’t cover all cases. This is one of the main reasons why although we’ll provide a new API for Xpath processing, we’ll still support the xpath() function which currently works with XPath 1.0

What does basic support means?

In this document’s prologue we use the phrase “basic support” when referring to the 3.0 specs. By basic support we mean all features which don’t rely on:

  • Schema awareness
  • High order functions
  • Streaming

Dependencies

Currently there are two known implementations of the 3.0 specs: Saxon 9.6 and the latest version of Xerces. We can’t upgrade Xerces because of backwards compatibility reasons so we’re going with the latest Saxon, which is backwards compatible with the version we currently have thanks to its vanilla implementation of the JAXP and XQJ APIS.

We’ll use Saxon 9.6.0.1-HE, which is the open source version of that library.

Other libraries also updated as a side effect are:

  • stax2-api: from version 3.1.1 to 3.1.4
  • woodstox-core-asl: from version 4.1.4 to 4.4.1

Behaviour

XSLT

The same XSLT transformer we have today will remain current and unaltered, supporting all known versions of XSLT.

<mulexml:xslt-transformer xsl-file="with-xml-node-param.xsl"
maxIdleTransformers="2"
maxActiveTransformers="5"
uriResolver-ref="testResolver">
            <mulexml:context-property key="foo" value="#[flowVars['foo']]"/>
</mulexml:xslt-transformer>

The version of XSLT to be used will be taken from the version attribute in the xsl template:

<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<users>
                            <xsl:copy-of select="." />
                        </users>
</xsl:template>
</xsl:stylesheet>

Because of that, this remains backwards compatible. For more information on the XSLT transformer, look at our current docs

XQuery

Pretty much the same as with XSLT. We maintain the current XML support and API, the version of XQuery to be supported can be specified though the version attribute in the query declaration.

<mxml:xquery-transformer name="xquery">
        <mxml:context-property key="title" value="#[flowVars['ListTitle']]"/>
        <mxml:context-property key="rating" value="#[flowVars['ListRating']]"/>

        <mxml:xquery-text>
           <![CDATA[
                
xquery version "3.0";
declare variable $document external;
declare variable $title external;
declare variable $rating external;

                <cd-listings title='{$title}' rating='{$rating}'> {
                    for $cd in $document/catalog/cd
                    return <cd-title>{data($cd/title)}</cd-title>
                } </cd-listings>
            ]]>
        </mxml:xquery-text>
    </mxml:xquery-transformer>

If version is not specified then it will default to 3.0, since per the spec, all 1.0 queries are valid in 3.0 and must return the same result.

Context Parameters enhancement

We’ll take the opportunity to also perform an enhancement on how parameters are passed to the XQuery transformer. Currently, it only supports simple types. Now it will also support instances of DOM Document and Node. This is specially useful in use cases in which you wat one XQuery to work on several input documents. For example, consider a simple query which just concatenates two documents together.

    <flow name="multipleInputsByParam">
        <mxml:xquery-transformer>
            <mxml:context-property key="books" value="#[flowVars['books']]" />
            <mxml:context-property key="cities" value="#[flowVars['cities']]" />
            <mxml:xquery-text>
                <![CDATA[
                   xquery version "3.0";
                    declare variable $document external;
                    declare variable $cities external;
                    declare variable $books external;
                    <mixes>
                    {
                        for $b in $books/BOOKLIST/BOOKS/ITEM,
                            $c in $cities/cities/city

                        return <mix title="{$b/TITLE/text()}" city="{$c/@name}" />
                    }
                    </mixes>
                ]]>
            </mxml:xquery-text>
        </mxml:xquery-transformer>
    </flow>

The $cities and $books variables hold documents or nodes that were passed as context properties.

Also, because we now support XQuery 3.0, the same can be achieved by providing the path to the actual XML document and the engine would generate the document itself

    <flow name="multipleInputsByPath">
        <mxml:xquery-transformer>
            <mxml:context-property key="books" value="#[flowVars['books']]" />
            <mxml:context-property key="cities" value="#[flowVars['cities']]" />
            <mxml:xquery-text>
                <![CDATA[
                   xquery version "3.0";
                    declare variable $document external;
                    declare variable $cities external;
                    declare variable $books external;
                    <mixes>
                    {
                        for $b in fn:doc($books)/BOOKLIST/BOOKS/ITEM,
                            $c in fn:doc($cities)/cities/city

                        return <mix title="{$b/TITLE/text()}" city="{$c/@name}" />
                    }
                    </mixes>
                ]]>
            </mxml:xquery-text>
        </mxml:xquery-transformer>
    </flow>

In this case the flowVars only contain the path to the xml documents on disk and the fn:doc function inside the query takes care of the parsing.

MULE-8001

While developing this feature the bug MULE-8001 was found. This upgrade will include the fix for such bug which slightly changes behavior. This is not a compatibility issue but a bug for which we had no test case and just ran into. The issue description is as follows:

By default, the XQuery transformer only returns the first result, unless an array is specified in the returnClass attribute, in which case it returns all the matches in an Object[] (even if the return type was set to X[]). This means that by default, the transformer does not return all results. If the user does specify a return value, but no results were found then it returns NullPayload. If it came back with only one, then it returns that one element, even if you asked for an Array. Although this is clearly a bug and a usability pain, fixing this could break some applications which are taking this bug as a feature. Thus, the result will be: By default, the xquery transformer will return a List That list will contain all the results, even if only one was found If no results found, then the list will be empty If the user did specified a return class, then we will fallback to the current behaviour (array, one element, or null).

XPath

Xpath is probably the part of this spec which adds the most value, but also the most complex one. Unfortunately, we can’t provide such a straightforward upgrade experience as with its companions. As previously stated, reasons are that we have an inconsistent and inusable mixture of the Xpath 1.0 and 2.0, and that Xpath 3.0 is not backwards compatible with 1.0.

So the plan here is that the following will be deprecated:

  • xpath: expression evaluator
  • xpath2: expression evaluator
  • bean: expression evaluator
  • jxpath filter
  • jxpath extractor transformer
  • jaxen-filter

Implicit things to take notice on:

  • Because XPath 3.0 is completely backwards compatible with 2.0, this function will also serve those wanting to use 2.0 expressions
  • This doesn’t guarantee support on Xpath 1.0 expressions. The simpler ones will work, but the ones which are not compatible will not. Since XPath 1.0 is dated all the way back to 1999, we consider it deprecated and don’t officially support it. Compatibility mode will be disabled.
  • Because we want this function to have predictable return types, we need to create a new xpath3() function. We considered adding a compatibility flag to the current function, but our analysis indicated that the impact was way too great for that to make sense. Therefore, a new xpath3 function is created and the existing xpath() one is deprecated

The new xpath3() function will behave as follows:

Arguments

The function will now accept the following arguments (in order):

expression (required String)

The Xpath expression to be evaluated. Cannot be null or blank.

input (optional Object, defaults to the message payload)

The input data on which the expression is going to be evaluated. This is an optional argument, it defaults to the message payload if not provided.

Input types

This function supports the following input types:

  • org.w3c.dom.Document
  • org.w3c.dom.Node
  • org.xml.sax.InputSource
  • OutputHandler
  • byte[]
  • InputStream
  • String
  • XMLStreamReader
  • DelayedResult

If the input if not of any of these types, then we’ll attempt to use a registered transformer to transform the input into a DOM document or Node. If no such transformer can be found, then an IllegalArgumentException is thrown.

Consumable inputs

This function will verify if the input is a consumable type (streams, readers, etc). Because evaluating the expression over a consumable input will cause that source to be exhausted, in the cases in which the input value was the actual message payload (no matter if it was given explicitly or by default), we will update the output message payload with the result obtained from consuming the input.

Output type (optional String, defaults to ‘STRING’)

When executing an XPath expression, a developer might have very different intents. Sometimes you want to retrieve actual data, sometimes you just want to verify if a node exists.

Also, the JAXP API (JSR-206) defines the standard way for a Java application to handle XML, and therefore, how to execute XPath expressions. This API accounts for the different intents a developer might have and allows choosing from a list of possible output types. We consider this to be a really useful features in JAXP, and we also consider that many Java developers that are familiar with this API would appreciate that Mule accounts for this while hiding the rest of the API’s complexity.

That is why there’s a third parameter (optional, String), which will allow specifying one of the following:

  • BOOLEAN: returns the effective boolean value of the expression, as a java.lang.Boolean. This is the same as wrapping the expression in a call of the XPath boolean() function.
  • STRING: returns the result of the expression converted to a string, as a java.lang.String. This is the same as wrapping the expression in a call of the XPath string() function.
  • NUMBER: returns the result of the expression converted to a double as a java.lang.Double. This is the same as wrapping the expression in a call of the XPath number() function.
  • NODE: returns the result the result as a node object.
  • NODESET: returns a DOM NodeList object. Components like the foreach, splitter, etc, will also be updated to support iterating that type.

Parameters

Another XPath feature that will now be supported is the ability to pass parameters into the query. For example, consider the following query which returns all the LINE elements which contains a given word:

//LINE[contains(., $word)]

the $ sign is used to mark the parameter. As for the binding, the function will automatically resolve that variable against the current message flow variables. So, if you want to return all the occurrences of the word ‘handkerchief’, all you have to do is:

<set-variable variableName="word" value="handkerchief" />
<expression-transformer>
xpath3('//LINE[contains(., $word)]', payload, 'NODESET')
</expression-transformer>

Examples

  • xpath3('catalog/cd/title'): executes the given expression over the message payload, returning a List of Nodes
  • xpath3('catalog/cd/title', flowVars[‘catalog’]): executes the given expression taking a flow variable as input, returning a list of Nodes
  • xpath3('catalog/cd/title', payload, ‘STRING’): executes the given expression over the message payload, returning the matching nodes as a String

NamespaceManager

Unlike its deprecated predecessor, the xpath3 function will be namespace-manager aware, which means that all namespaces configured through a namespace-manager component will be available in the xpath3 function through a JAXP NamespaceContext.

Because we aim for consistency, this also affects the xquery-filter element, which means that some applications might have issues if they were using expressions with custom namespaces without specifying the namespace manager correctly. That can be fixed by either declaring the manager or using wildcard expressions (e.g.: use *:/title instead of book:/title).

Risks

This requires upgrading libraries which might cause issues in some applications. Seems unlikely yet possible. Affected users will have to use a loader.override configuration or upgrade their apps.

Impact

Studio Impact

Studio would have to deprecate the current elements and add auto completion support for the new MEL function.

DevKit Impact

No impact

MMC Impact

No impact

CH Impact

No impact

API Manager impact

No impact.

Migration Impact

  • Current XPath and JXPath support gets deprecated
  • Dependency changes

Documentation Impact

  • Update docs reflecting these new features and any examples if needed
  • Update docs deprecating the mentioned components
  • Update training documents