-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] Add schema validation and placeholders to index mappings #3240
base: main
Are you sure you want to change the base?
[Enhancement] Add schema validation and placeholders to index mappings #3240
Conversation
…ad of string constants Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
…ums rather than use their own mappings Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
…holders Signed-off-by: Pavan Yekbote <[email protected]>
…h_index_mappings_from_files
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
…cter issue Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: Pavan Yekbote <[email protected]>
Please add 2.x backport label |
common/src/main/java/org/opensearch/ml/common/utils/StringUtils.java
Outdated
Show resolved
Hide resolved
common/src/main/java/org/opensearch/ml/common/utils/StringUtils.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Pavan Yekbote <[email protected]>
Addressed comments, please re-review. Thanks! |
} | ||
|
||
return mapping; | ||
} | ||
|
||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: this seems not a standard java doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made it into a proper comment for now via latest commit, i will modify to java doc standard with *
in a later pr, i don't want to dismiss the previous approval with a new push again, if thats okay
@@ -54,6 +63,8 @@ public class StringUtils { | |||
} | |||
public static final String TO_STRING_FUNCTION_NAME = ".toString()"; | |||
|
|||
private static final ObjectMapper MAPPER = new ObjectMapper(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it thread-safe to define it as singleton ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the instance is not being modified anywhere, only methods are used. It is used similarly in RestActionUtils
Signed-off-by: Pavan Yekbote <[email protected]>
public static String getMappingFromFile(String path) throws IOException { | ||
URL url = IndexUtils.class.getClassLoader().getResource(path); | ||
if (url == null) { | ||
throw new IOException("Resource not found: " + path); | ||
} | ||
|
||
String mapping = Resources.toString(url, Charsets.UTF_8).trim(); | ||
if (mapping.isEmpty() || !StringUtils.isJson(mapping)) { | ||
throw new IllegalArgumentException("Invalid or non-JSON mapping at: " + path); | ||
if (mapping.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to have this check here considering we are going to check mapping.isBlank()
in the replacePlaceholders
method. Seems like redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i added this check here since this method can be used on its own, incase used in other places, i believe it is safer to have this additional check
} | ||
|
||
String placeholderMapping = Resources.toString(url, Charsets.UTF_8); | ||
mapping = mapping.replace(placeholder.getKey(), placeholderMapping); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are creating string in every replace. May be we could use a StringBuilder to have in-place replacement?
May be something like this:
public static String replacePlaceholders(String mapping) throws IOException {
if (mapping == null || mapping.isBlank()) {
throw new IllegalArgumentException("Mapping cannot be null or empty");
}
// Preload resources into memory to avoid redundant I/O
Map<String, String> loadedPlaceholders = new HashMap<>();
for (Map.Entry<String, String> placeholder : MAPPING_PLACEHOLDERS.entrySet()) {
URL url = IndexUtils.class.getClassLoader().getResource(placeholder.getValue());
if (url == null) {
throw new IOException("Resource not found: " + placeholder.getValue());
}
// Load and cache the content
loadedPlaceholders.put(placeholder.getKey(), Resources.toString(url, Charsets.UTF_8));
}
// Use StringBuilder for efficient in-place replacements
StringBuilder result = new StringBuilder(mapping);
for (Map.Entry<String, String> entry : loadedPlaceholders.entrySet()) {
String placeholder = entry.getKey();
String replacement = entry.getValue();
// Replace all occurrences of the placeholder
int index;
while ((index = result.indexOf(placeholder)) != -1) {
result.replace(index, index + placeholder.length(), replacement);
}
}
return result.toString();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, thanks for the suggestion, saves on the I/O and is more efficient!
@@ -336,4 +347,25 @@ public static JsonObject getJsonObjectFromString(String jsonString) { | |||
return JsonParser.parseString(jsonString).getAsJsonObject(); | |||
} | |||
|
|||
public static void validateSchema(String schemaString, String instanceString) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add try catch block to catch any exceptions like:
catch (JsonProcessingException e) {
throw new IllegalArgumentException("Invalid JSON format: " + e.getMessage(), e);
} catch (Exception e) {
throw new OpenSearchParseException("Schema validation failed: " + e.getMessage(), e);
}
|
||
// Validate JSON node against the schema | ||
Set<ValidationMessage> errors = schema.validate(jsonNode); | ||
if (!errors.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about this:
if (!errors.isEmpty()) {
String errorMessage = errors.stream()
.map(ValidationMessage::getMessage)
.collect(Collectors.joining(", "));
throw new OpenSearchParseException(
"Validation failed: " + errorMessage +
" for instance: " + instanceString +
" with schema: " + schemaString
);
}
@@ -1,6 +1,6 @@ | |||
{ | |||
"_meta": { | |||
"schema_version": "1" | |||
"schema_version": 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually, I see that JSON files in the repository are named using snake_case. Let's follow the same convention.
…tion handling Signed-off-by: Pavan Yekbote <[email protected]>
@dhrubo-os addressed review comments, please review! |
Description
Added schema validation and placeholders to index mappings
Related Issues
Resolves #2951
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.