[RFC] JSON-to-JSON Transformer #12795
Labels
enhancement
Enhancement or improvement to existing feature or request
Other
RFC
Issues requesting major changes
Is your feature request related to a problem? Please describe
The JSON-to-JSON transformer functions as a standalone utility within the Core package. It enables users to configure transformations from one or multiple JSONs format to another, such as converting input JSON objects(e.g., search results from a previous flow step) into a different JSON format like a prompt template. It offers three approaches for data transformation: the Painless Script (P0 item), string manipulation function JSONPath (P0 item), and automated transformation based on specified inputs and outputs (P1 item). This utility should be stand alone and can be integrated into any processor, either before or after the processor execution flow, as a data transformation step.
Describe the solution you'd like
Providing a public utility method in core package that can be used by any processor. Depends on future requirement, we can expose this utility method to a REST API, or even a processor.
List<JsonNode>
, the dataset that needs to perform transform on. Usually it’s a list of SearchHits object.DataTransformApproach approach
, EnumPAINLESS
, or EnumJSONPATH
, the approach customer would like to use to transform the dataset.List<String> source
, the painless script source, or JSONPath field mapping instructionSupported Transform Approach 1. Painless Script
Painless is a performant, secure scripting language that provides numerous capabilities. Writing Painless Scripts can be challenging for customers, and we aim to eliminate that difficulty. However, we still want to maintain this method as the default approach, allowing customers to achieve their objectives when string manipulation function JSONPath are not enough.
Supported Transform Approach 2. String Manipulation (JSONPath)
JSONPath is a query language designed for navigating and extracting parts of a JSON document. With JSONPath, you can specify and navigate to different parts of a JSON structure, making it easier to retrieve specific data elements without needing to process the entire structure manually in code.
AppSec has been clear for using JSONPath in ml-commons since 2.12. Will initiate another AppSec for this use case.
2.1. N-1 Transform: Merge multiple JSONs into one JSON or other format of data
In some cases, the transform has to be applied in a “many-to-one” mode by transforming all multiple objects like search results into a single JSON output. For instance, a re-ranker type mode may require the incoming search results (hits.fields) to be collapsed into a single array of strings as input into a re-ranker (eg. Cohere ReRank)
For example, when customer has the following
Customer will need to provide the following JSONPath transform instruction
The output would be
2.2. 1-1 Transform: Map a specific field in one JSON to another JSON
1-1 Transform is essentially the same as an N-1 Transform, with the distinction being that in a 1-1 Transform, N equals 1. Therefore, we don't need a separate DataTransformApproach Enum to differentiate between 1-1 and N-1 Transforms. However, for an N-1 Transform scenario, customers would need to use a painless script, as JSONPath may not be sufficient for such transformations.
Related component
Other
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: