From 4b9961927c14fa913260fc735261643bf35521a1 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 5 Jun 2024 08:50:19 -0600 Subject: [PATCH] Add JSON processor documentation (#5982) * Add JSON processor documentation Signed-off-by: Melissa Vagi * Add pipeline examples Signed-off-by: Melissa Vagi * Add parameters Signed-off-by: Melissa Vagi * Update parameters Signed-off-by: Melissa Vagi * Update json.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Update json.md Signed-off-by: Melissa Vagi Signed-off-by: Melissa Vagi * Address tech review feedback Signed-off-by: Melissa Vagi * Update _ingest-pipelines/processors/json.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _ingest-pipelines/processors/json.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _ingest-pipelines/processors/json.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _ingest-pipelines/processors/json.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _ingest-pipelines/processors/json.md Signed-off-by: Melissa Vagi * Update _ingest-pipelines/processors/json.md Signed-off-by: Melissa Vagi --------- Signed-off-by: Melissa Vagi Co-authored-by: Nathan Bower --- _ingest-pipelines/processors/json.md | 199 +++++++++++++++++++++++++++ 1 file changed, 199 insertions(+) create mode 100644 _ingest-pipelines/processors/json.md diff --git a/_ingest-pipelines/processors/json.md b/_ingest-pipelines/processors/json.md new file mode 100644 index 0000000000..d251533e27 --- /dev/null +++ b/_ingest-pipelines/processors/json.md @@ -0,0 +1,199 @@ +--- +layout: default +title: JSON +parent: Ingest processors +nav_order: 170 +--- + +# JSON processor + +The `json` processor serializes a string value field into a map of maps, which can be useful for various data processing and enrichment tasks. + +The following is the syntax for the `json` processor: + +```json +{ + "processor": { + "json": { + "field": "", + "target_field": "", + "add_to_root": + } + } +} +``` +{% include copy-curl.html %} + +## Configuration parameters + +The following table lists the required and optional parameters for the `json` processor. + +Parameter | Required/Optional | Description | +|-----------|-----------|-----------| +`field` | Required | The name of the field containing the JSON-formatted string to be deserialized. +`target_field` | Optional | The name of the field in which the deserialized JSON data is stored. When not provided, the data is stored in the `field` field. If `target_field` exists, its existing value is overwritten with the new JSON data. +`add_to_root` | Optional | A Boolean flag that determines whether the deserialized JSON data should be added to the root of the document (`true`) or stored in the target_field (`false`). If `add_to_root` is `true`, then `target-field` is invalid. Default value is `false`. +`description` | Optional | A description of the processor's purpose or configuration. +`if` | Optional | Specifies to conditionally execute the processor. +`ignore_failure` | Optional | Specifies to ignore processor failures. See [Handling pipeline failures]({{site.url}}{{site.baseurl}}/ingest-pipelines/pipeline-failures/). +`on_failure`| Optional | Specifies a list of processors to run if the processor fails during execution. These processors are executed in the order they are specified. +`tag` | Optional | An identifier tag for the processor. Useful for debugging in order to distinguish between processors of the same type. + +## Using the processor + +Follow these steps to use the processor in a pipeline. + +### Step 1: Create a pipeline + +The following query creates a pipeline named `my-json-pipeline` that uses the `json` processor to process JSON data and enrich the documents with additional information: + +```json +PUT _ingest/pipeline/my-json-pipeline +{ + "description": "Example pipeline using the JsonProcessor", + "processors": [ + { + "json": { + "field": "raw_data", + "target_field": "parsed_data" + "on_failure": [ + { + "set": { + "field": "error_message", + "value": "Failed to parse JSON data" + } + }, + { + "fail": { + "message": "Failed to process JSON data" + } + } + ] + } + }, + { + "set": { + "field": "processed_timestamp", + "value": "{{_ingest.timestamp}}" + } + } + ] +} +``` +{% include copy-curl.html %} + +### Step 2 (Optional): Test the pipeline + +It is recommended that you test your pipeline before you ingest documents. +{: .tip} + +To test the pipeline, run the following query: + +```json +POST _ingest/pipeline/my-json-pipeline/_simulate +{ + "docs": [ + { + "_source": { + "raw_data": "{\"name\":\"John\",\"age\":30,\"city\":\"New York\"}" + } + }, + { + "_source": { + "raw_data": "{\"name\":\"Jane\",\"age\":25,\"city\":\"Los Angeles\"}" + } + } + ] +} +``` +{% include copy-curl.html %} + +#### Response + +The following example response confirms that the pipeline is working as expected: + +```json +{ + "docs": [ + { + "doc": { + "_index": "_index", + "_id": "_id", + "_source": { + "processed_timestamp": "2024-05-30T15:24:48.064472090Z", + "raw_data": """{"name":"John","age":30,"city":"New York"}""", + "parsed_data": { + "name": "John", + "city": "New York", + "age": 30 + } + }, + "_ingest": { + "timestamp": "2024-05-30T15:24:48.06447209Z" + } + } + }, + { + "doc": { + "_index": "_index", + "_id": "_id", + "_source": { + "processed_timestamp": "2024-05-30T15:24:48.064543006Z", + "raw_data": """{"name":"Jane","age":25,"city":"Los Angeles"}""", + "parsed_data": { + "name": "Jane", + "city": "Los Angeles", + "age": 25 + } + }, + "_ingest": { + "timestamp": "2024-05-30T15:24:48.064543006Z" + } + } + } + ] +} +``` +{% include copy-curl.html %} + +### Step 3: Ingest a document + +The following query ingests a document into an index named `my-index`: + +```json +POST my-index/_doc?pipeline=my-json-pipeline +{ + "raw_data": "{\"name\":\"John\",\"age\":30,\"city\":\"New York\"}" +} +``` +{% include copy-curl.html %} + +#### Response + +The response confirms that the document containing the JSON data from the `raw_data` field was successfully indexed: + +```json +{ + "_index": "my-index", + "_id": "mo8yyo8BwFahnwl9WpxG", + "_version": 1, + "result": "created", + "_shards": { + "total": 2, + "successful": 1, + "failed": 0 + }, + "_seq_no": 3, + "_primary_term": 2 +} +``` +{% include copy-curl.html %} + +### Step 4 (Optional): Retrieve the document + +To retrieve the document, run the following query: + +```json +GET my-index/_doc/1 +``` +{% include copy-curl.html %}