The XPath data format parser parses different formats into metric fields using XPath expressions.
For supported XPath functions check the underlying XPath library.
NOTE: The type of fields are specified using XPath functions. The only exception are integer fields that need to be specified in a fields_int
section.
name | data_format setting |
comment |
---|---|---|
Extensible Markup Language (XML) | "xml" |
|
JSON | "xpath_json" |
|
MessagePack | "xpath_msgpack" |
|
Protocol buffers | "xpath_protobuf" |
see additional parameters |
For using the protocol-buffer format you need to specify a protocol buffer definition file (.proto
) in xpath_protobuf_file
, Furthermore, you need to specify which message type you want to use via xpath_protobuf_type
.
In this configuration mode, you explicitly specify the field and tags you want to scrape out of your data.
[[inputs.file]]
files = ["example.xml"]
## Data format to consume.
## Each data format has its own unique set of configuration options, read
## more about them here:
## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
data_format = "xml"
## PROTOCOL BUFFER definitions
## Protocol buffer definition file
# xpath_protobuf_file = "sparkplug_b.proto"
## Name of the protocol buffer message type to use in a fully qualified form.
# xpath_protobuf_type = ""org.eclipse.tahu.protobuf.Payload""
## Print the internal XML document when in debug logging mode.
## This is especially useful when using the parser with non-XML formats like protocol buffers
## to get an idea on the expression necessary to derive fields etc.
# xpath_print_document = false
## Multiple parsing sections are allowed
[[inputs.file.xpath]]
## Optional: XPath-query to select a subset of nodes from the XML document.
# metric_selection = "/Bus/child::Sensor"
## Optional: XPath-query to set the metric (measurement) name.
# metric_name = "string('example')"
## Optional: Query to extract metric timestamp.
## If not specified the time of execution is used.
# timestamp = "/Gateway/Timestamp"
## Optional: Format of the timestamp determined by the query above.
## This can be any of "unix", "unix_ms", "unix_us", "unix_ns" or a valid Golang
## time format. If not specified, a "unix" timestamp (in seconds) is expected.
# timestamp_format = "2006-01-02T15:04:05Z"
## Tag definitions using the given XPath queries.
[inputs.file.xpath.tags]
name = "substring-after(Sensor/@name, ' ')"
device = "string('the ultimate sensor')"
## Integer field definitions using XPath queries.
[inputs.file.xpath.fields_int]
consumers = "Variable/@consumers"
## Non-integer field definitions using XPath queries.
## The field type is defined using XPath expressions such as number(), boolean() or string(). If no conversion is performed the field will be of type string.
[inputs.file.xpath.fields]
temperature = "number(Variable/@temperature)"
power = "number(Variable/@power)"
frequency = "number(Variable/@frequency)"
ok = "Mode != 'ok'"
A configuration can contain muliple xpath subsections for e.g. the file plugin to process the xml-string multiple times. Consult the XPath syntax and the underlying library's functions for details and help regarding XPath queries. Consider using an XPath tester such as xpather.com or Code Beautify's XPath Tester for help developing and debugging your query.
Alternatively to the configuration above, fields can also be specified in a batch way. So contrary to specify the fields
in a section, you can define a name
and a value
selector used to determine the name and value of the fields in the
metric.
[[inputs.file]]
files = ["example.xml"]
## Data format to consume.
## Each data format has its own unique set of configuration options, read
## more about them here:
## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
data_format = "xml"
## Name of the protocol buffer type to use.
## This is only relevant when parsing protocol buffers and must contain the fully qualified
## name of the type e.g. "org.eclipse.tahu.protobuf.Payload".
# xpath_protobuf_type = ""
## Print the internal XML document when in debug logging mode.
## This is especially useful when using the parser with non-XML formats like protocol buffers
## to get an idea on the expression necessary to derive fields etc.
# xpath_print_document = false
## Multiple parsing sections are allowed
[[inputs.file.xpath]]
## Optional: XPath-query to select a subset of nodes from the XML document.
metric_selection = "/Bus/child::Sensor"
## Optional: XPath-query to set the metric (measurement) name.
# metric_name = "string('example')"
## Optional: Query to extract metric timestamp.
## If not specified the time of execution is used.
# timestamp = "/Gateway/Timestamp"
## Optional: Format of the timestamp determined by the query above.
## This can be any of "unix", "unix_ms", "unix_us", "unix_ns" or a valid Golang
## time format. If not specified, a "unix" timestamp (in seconds) is expected.
# timestamp_format = "2006-01-02T15:04:05Z"
## Field specifications using a selector.
field_selection = "child::*"
## Optional: Queries to specify field name and value.
## These options are only to be used in combination with 'field_selection'!
## By default the node name and node content is used if a field-selection
## is specified.
# field_name = "name()"
# field_value = "."
## Optional: Expand field names relative to the selected node
## This allows to flatten out nodes with non-unique names in the subtree
# field_name_expansion = false
## Tag definitions using the given XPath queries.
[inputs.file.xpath.tags]
name = "substring-after(Sensor/@name, ' ')"
device = "string('the ultimate sensor')"
Please note: The resulting fields are always of type string!
It is also possible to specify a mixture of the two alternative ways of specifying fields.
You can specify a XPath query to select a subset of nodes from the XML document, each used to generate a new metrics with the specified fields, tags etc.
For relative queries in subsequent queries they are relative to the metric_selection
. To specify absolute paths, please start the query with a slash (/
).
Specifying metric_selection
is optional. If not specified all relative queries are relative to the root node of the XML document.
By specifying metric_name
you can override the metric/measurement name with the result of the given XPath query. If not specified, the default metric name is used.
By default the current time will be used for all created metrics. To set the time from values in the XML document you can specify a XPath query in timestamp
and set the format in timestamp_format
.
The timestamp_format
can be set to unix
, unix_ms
, unix_us
, unix_ns
, or
an accepted Go "reference time". Consult the Go time package for details and additional examples on how to set the time format.
If timestamp_format
is omitted unix
format is assumed as result of the timestamp
query.
XPath queries in the tag name = query
format to add tags to the metrics. The specified path can be absolute (starting with /
) or relative. Relative paths use the currently selected node as reference.
NOTE: Results of tag-queries will always be converted to strings.
XPath queries in the field name = query
format to add integer typed fields to the metrics. The specified path can be absolute (starting with /
) or relative. Relative paths use the currently selected node as reference.
NOTE: Results of field_int-queries will always be converted to int64. The conversion will fail in case the query result is not convertible!
XPath queries in the field name = query
format to add non-integer fields to the metrics. The specified path can be absolute (starting with /
) or relative. Relative paths use the currently selected node as reference.
The type of the field is specified in the XPath query using the type conversion functions of XPath such as number()
, boolean()
or string()
If no conversion is performed in the query the field will be of type string.
NOTE: Path conversion functions will always succeed even if you convert a text to float!
You can specify a XPath query to select a set of nodes forming the fields of the metric. The specified path can be absolute (starting with /
) or relative to the currently selected node. Each node selected by field_selection
forms a new field within the metric.
The name and the value of each field can be specified using the optional field_name
and field_value
queries. The queries are relative to the selected field if not starting with /
. If not specified the field's name defaults to the node name and the field's value defaults to the content of the selected field node.
NOTE: field_name
and field_value
queries are only evaluated if a field_selection
is specified.
Specifying field_selection
is optional. This is an alternative way to specify fields especially for documents where the node names are not known a priori or if there is a large number of fields to be specified. These options can also be combined with the field specifications above.
NOTE: Path conversion functions will always succeed even if you convert a text to float!
When true, field names selected with field_selection
are expanded to a path relative to the selected node. This
is necessary if we e.g. select all leaf nodes as fields and those leaf nodes do not have unique names. That is in case
you have duplicate names in the fields you select you should set this to true
.
This example.xml
file is used in the configuration examples below:
<?xml version="1.0"?>
<Gateway>
<Name>Main Gateway</Name>
<Timestamp>2020-08-01T15:04:03Z</Timestamp>
<Sequence>12</Sequence>
<Status>ok</Status>
</Gateway>
<Bus>
<Sensor name="Sensor Facility A">
<Variable temperature="20.0"/>
<Variable power="123.4"/>
<Variable frequency="49.78"/>
<Variable consumers="3"/>
<Mode>busy</Mode>
</Sensor>
<Sensor name="Sensor Facility B">
<Variable temperature="23.1"/>
<Variable power="14.3"/>
<Variable frequency="49.78"/>
<Variable consumers="1"/>
<Mode>standby</Mode>
</Sensor>
<Sensor name="Sensor Facility C">
<Variable temperature="19.7"/>
<Variable power="0.02"/>
<Variable frequency="49.78"/>
<Variable consumers="0"/>
<Mode>error</Mode>
</Sensor>
</Bus>
This example shows the basic usage of the xml parser.
Config:
[[inputs.file]]
files = ["example.xml"]
data_format = "xml"
[[inputs.file.xpath]]
[inputs.file.xpath.tags]
gateway = "substring-before(/Gateway/Name, ' ')"
[inputs.file.xpath.fields_int]
seqnr = "/Gateway/Sequence"
[inputs.file.xpath.fields]
ok = "/Gateway/Status = 'ok'"
Output:
file,gateway=Main,host=Hugin seqnr=12i,ok=true 1598610830000000000
In the tags definition the XPath function substring-before()
is used to only extract the sub-string before the space. To get the integer value of /Gateway/Sequence
we have to use the fields_int section as there is no XPath expression to convert node values to integers (only float).
The ok
field is filled with a boolean by specifying a query comparing the query result of /Gateway/Status
with the string ok. Use the type conversions available in the XPath syntax to specify field types.
This is an example for using time and name of the metric from the XML document itself.
Config:
[[inputs.file]]
files = ["example.xml"]
data_format = "xml"
[[inputs.file.xpath]]
metric_name = "name(/Gateway/Status)"
timestamp = "/Gateway/Timestamp"
timestamp_format = "2006-01-02T15:04:05Z"
[inputs.file.xpath.tags]
gateway = "substring-before(/Gateway/Name, ' ')"
[inputs.file.xpath.fields]
ok = "/Gateway/Status = 'ok'"
Output:
Status,gateway=Main,host=Hugin ok=true 1596294243000000000
Additionally to the basic parsing example, the metric name is defined as the name of the /Gateway/Status
node and the timestamp is derived from the XML document instead of using the execution time.
For XML documents containing metrics for e.g. multiple devices (like Sensor
s in the example.xml), multiple metrics can be generated using node selection. This example shows how to generate a metric for each Sensor in the example.
Config:
[[inputs.file]]
files = ["example.xml"]
data_format = "xml"
[[inputs.file.xpath]]
metric_selection = "/Bus/child::Sensor"
metric_name = "string('sensors')"
timestamp = "/Gateway/Timestamp"
timestamp_format = "2006-01-02T15:04:05Z"
[inputs.file.xpath.tags]
name = "substring-after(@name, ' ')"
[inputs.file.xpath.fields_int]
consumers = "Variable/@consumers"
[inputs.file.xpath.fields]
temperature = "number(Variable/@temperature)"
power = "number(Variable/@power)"
frequency = "number(Variable/@frequency)"
ok = "Mode != 'error'"
Output:
sensors,host=Hugin,name=Facility\ A consumers=3i,frequency=49.78,ok=true,power=123.4,temperature=20 1596294243000000000
sensors,host=Hugin,name=Facility\ B consumers=1i,frequency=49.78,ok=true,power=14.3,temperature=23.1 1596294243000000000
sensors,host=Hugin,name=Facility\ C consumers=0i,frequency=49.78,ok=false,power=0.02,temperature=19.7 1596294243000000000
Using the metric_selection
option we select all Sensor
nodes in the XML document. Please note that all field and tag definitions are relative to these selected nodes. An exception is the timestamp definition which is relative to the root node of the XML document.
For XML documents containing metrics with a large number of fields or where the fields are not known before (e.g. an unknown set of Variable
nodes in the example.xml), field selectors can be used. This example shows how to generate a metric for each Sensor in the example with fields derived from the Variable nodes.
Config:
[[inputs.file]]
files = ["example.xml"]
data_format = "xml"
[[inputs.file.xpath]]
metric_selection = "/Bus/child::Sensor"
metric_name = "string('sensors')"
timestamp = "/Gateway/Timestamp"
timestamp_format = "2006-01-02T15:04:05Z"
field_selection = "child::Variable"
field_name = "name(@*[1])"
field_value = "number(@*[1])"
[inputs.file.xpath.tags]
name = "substring-after(@name, ' ')"
Output:
sensors,host=Hugin,name=Facility\ A consumers=3,frequency=49.78,power=123.4,temperature=20 1596294243000000000
sensors,host=Hugin,name=Facility\ B consumers=1,frequency=49.78,power=14.3,temperature=23.1 1596294243000000000
sensors,host=Hugin,name=Facility\ C consumers=0,frequency=49.78,power=0.02,temperature=19.7 1596294243000000000
Using the metric_selection
option we select all Sensor
nodes in the XML document. For each Sensor we then use field_selection
to select all child nodes of the sensor as field-nodes Please note that the field selection is relative to the selected nodes.
For each selected field-node we use field_name
and field_value
to determining the field's name and value, respectively. The field_name
derives the name of the first attribute of the node, while field_value
derives the value of the first attribute and converts the result to a number.