-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added performance improvement for datetime field parsing #9567
Added performance improvement for datetime field parsing #9567
Conversation
Perf Results: Perf results on SO dataset with new datetime default format. No perf degradation was observed in this case
Perf results on modified SO dataset ( Change: milliseconds were removed from SO dataset to test the changes)
|
Gradle Check (Jenkins) Run Completed with:
|
Compatibility status:Checks if related components are compatible with change d9f6aa7 Incompatible componentsSkipped componentsCompatible componentsCompatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git] |
|
@dblock if I understood the change, the "cache" in scope of this pull request is really reoder: the list of the parsers is iterated over and in case of successful parsing, this parser moves to the begging of the parsers list (so next parsing attempt will start with it). But we don't actually "cache" or construct the parsers (@CaptainDredge please correct me if I am wrong) |
Ok, you're right, I misunderstood! So my next dumb question is: doesn't this potentially cause the wrong parser to be applied by assuming that the data is all the same format? If for doc 1 parser 1 fails, and doc 2 succeeds, you could very well had row 2 parser 1 succeed, but with the cache it will fail. I do get it that it's rare to put data with different formats in multiple documents, but it's entirely possible. If I am correct (I am sure you'll show me how I misunderstood this again :) then the only 100% reliable approach is that the caller can hint to the parser to use. |
The backport to
To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-9567-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 2965e69aff83b702d46d1d630998cbf3ef7ebca5
# Push it to GitHub
git push --set-upstream origin backport/backport-9567-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x Then, create a pull request where the |
@CaptainDredge I think there are problem with last minute changes, please correct me if I am wrong here (commented on problematic places). |
@reta yes I've tried to address your concerns, see if the responses are satisfactory otherwise I can raise separate PR for any suggested changes by you asap |
…project#9567) * Added performance improvement for datetime field parsing This adds caching of formatters in case of no explicit format specified for the datetime field in mapping. This also adds `strict_date_time_no_millis` as additional formatter in default date time formats Signed-off-by: Prabhat Sharma <[email protected]> * Refactor DateTimeFormatter Access under featireflag Signed-off-by: Prabhat Sharma <[email protected]> --------- Signed-off-by: Prabhat Sharma <[email protected]> Co-authored-by: Prabhat Sharma <[email protected]> (cherry picked from commit 2965e69) Signed-off-by: Prabhat Sharma <[email protected]>
…project#9567) * Added performance improvement for datetime field parsing This adds caching of formatters in case of no explicit format specified for the datetime field in mapping. This also adds `strict_date_time_no_millis` as additional formatter in default date time formats Signed-off-by: Prabhat Sharma <[email protected]> * Refactor DateTimeFormatter Access under featireflag Signed-off-by: Prabhat Sharma <[email protected]> --------- Signed-off-by: Prabhat Sharma <[email protected]> Co-authored-by: Prabhat Sharma <[email protected]> (cherry picked from commit 2965e69) Signed-off-by: Prabhat Sharma <[email protected]>
…project#9567) * Added performance improvement for datetime field parsing This adds caching of formatters in case of no explicit format specified for the datetime field in mapping. This also adds `strict_date_time_no_millis` as additional formatter in default date time formats Signed-off-by: Prabhat Sharma <[email protected]> * Refactor DateTimeFormatter Access under featireflag Signed-off-by: Prabhat Sharma <[email protected]> --------- Signed-off-by: Prabhat Sharma <[email protected]> Co-authored-by: Prabhat Sharma <[email protected]> (cherry picked from commit 2965e69) Signed-off-by: Prabhat Sharma <[email protected]>
…project#9567) * Added performance improvement for datetime field parsing This adds caching of formatters in case of no explicit format specified for the datetime field in mapping. This also adds `strict_date_time_no_millis` as additional formatter in default date time formats Signed-off-by: Prabhat Sharma <[email protected]> * Refactor DateTimeFormatter Access under featireflag Signed-off-by: Prabhat Sharma <[email protected]> --------- Signed-off-by: Prabhat Sharma <[email protected]> Co-authored-by: Prabhat Sharma <[email protected]> (cherry picked from commit 2965e69) Signed-off-by: Prabhat Sharma <[email protected]> (cherry picked from commit 5a459ba)
…project#9567) * Added performance improvement for datetime field parsing This adds caching of formatters in case of no explicit format specified for the datetime field in mapping. This also adds `strict_date_time_no_millis` as additional formatter in default date time formats Signed-off-by: Prabhat Sharma <[email protected]> * Refactor DateTimeFormatter Access under featireflag Signed-off-by: Prabhat Sharma <[email protected]> --------- Signed-off-by: Prabhat Sharma <[email protected]> Co-authored-by: Prabhat Sharma <[email protected]> (cherry picked from commit 2965e69) Signed-off-by: Prabhat Sharma <[email protected]>
I am seeing multiple tests failure showing up while parsing the date time data value. Also these are happening for non-concurrent path as well. It seems to be related to this PR, can you please take a look ?
|
@sohami as @reta mentioned these should've been fixed by #10385 but I'll try reproducing these flaky tests locally and will try to resolve any issues but just wondering if you've any idea on how frequently these tests are failing because I don't see other PRs referring above issues to get an idea on number of failures |
Not sure about the frequency, but it started showing up last week only across multiple test cases. Earlier the thought was it has probably something to do with concurrent search. On looking further, found correlation with this change hence raised it here. You can check the run time in the shared CI links and see if your change was already merged in by then. Or try running locally in loop and see if it repros. |
…project#9567) * Added performance improvement for datetime field parsing This adds caching of formatters in case of no explicit format specified for the datetime field in mapping. This also adds `strict_date_time_no_millis` as additional formatter in default date time formats Signed-off-by: Prabhat Sharma <[email protected]> * Refactor DateTimeFormatter Access under featireflag Signed-off-by: Prabhat Sharma <[email protected]> --------- Signed-off-by: Prabhat Sharma <[email protected]> Co-authored-by: Prabhat Sharma <[email protected]>
…project#9567) * Added performance improvement for datetime field parsing This adds caching of formatters in case of no explicit format specified for the datetime field in mapping. This also adds `strict_date_time_no_millis` as additional formatter in default date time formats Signed-off-by: Prabhat Sharma <[email protected]> * Refactor DateTimeFormatter Access under featireflag Signed-off-by: Prabhat Sharma <[email protected]> --------- Signed-off-by: Prabhat Sharma <[email protected]> Co-authored-by: Prabhat Sharma <[email protected]>
@CaptainDredge Any luck with your manual local run. Are these still failing ? if not, can you please close these issues ? |
@sohami I tried reproing locally by running these in loops for few iterations but wasn't lucky enough to hit failure. I'll close these issue outs and if other PRs reports failure, then will put more effort on these |
…project#9567) * Added performance improvement for datetime field parsing This adds caching of formatters in case of no explicit format specified for the datetime field in mapping. This also adds `strict_date_time_no_millis` as additional formatter in default date time formats Signed-off-by: Prabhat Sharma <[email protected]> * Refactor DateTimeFormatter Access under featireflag Signed-off-by: Prabhat Sharma <[email protected]> --------- Signed-off-by: Prabhat Sharma <[email protected]> Co-authored-by: Prabhat Sharma <[email protected]> (cherry picked from commit 2965e69) Signed-off-by: Prabhat Sharma <[email protected]>
…project#9567) * Added performance improvement for datetime field parsing This adds caching of formatters in case of no explicit format specified for the datetime field in mapping. This also adds `strict_date_time_no_millis` as additional formatter in default date time formats Signed-off-by: Prabhat Sharma <[email protected]> * Refactor DateTimeFormatter Access under featireflag Signed-off-by: Prabhat Sharma <[email protected]> --------- Signed-off-by: Prabhat Sharma <[email protected]> Co-authored-by: Prabhat Sharma <[email protected]> (cherry picked from commit 2965e69) Signed-off-by: Prabhat Sharma <[email protected]>
…10448) * Added performance improvement for datetime field parsing (#9567) * Added performance improvement for datetime field parsing This adds caching of formatters in case of no explicit format specified for the datetime field in mapping. This also adds `strict_date_time_no_millis` as additional formatter in default date time formats Signed-off-by: Prabhat Sharma <[email protected]> * Refactor DateTimeFormatter Access under featireflag Signed-off-by: Prabhat Sharma <[email protected]> --------- Signed-off-by: Prabhat Sharma <[email protected]> Co-authored-by: Prabhat Sharma <[email protected]> (cherry picked from commit 2965e69) Signed-off-by: Prabhat Sharma <[email protected]> * Race condition fix for datetime optimization (#10385) * Race condition fix for datetime optimization Signed-off-by: Prabhat Sharma <[email protected]> * Changed JavaDateTimeFormatter caching of parser from MRU(most recently used) to a simple last used formatter Signed-off-by: Prabhat Sharma <[email protected]> --------- Signed-off-by: Prabhat Sharma <[email protected]> Co-authored-by: Prabhat Sharma <[email protected]> Signed-off-by: Prabhat Sharma <[email protected]> --------- Signed-off-by: Prabhat Sharma <[email protected]> Co-authored-by: Prabhat Sharma <[email protected]>
…project#9567) * Added performance improvement for datetime field parsing This adds caching of formatters in case of no explicit format specified for the datetime field in mapping. This also adds `strict_date_time_no_millis` as additional formatter in default date time formats Signed-off-by: Prabhat Sharma <[email protected]> * Refactor DateTimeFormatter Access under featireflag Signed-off-by: Prabhat Sharma <[email protected]> --------- Signed-off-by: Prabhat Sharma <[email protected]> Co-authored-by: Prabhat Sharma <[email protected]> (cherry picked from commit 2965e69) Signed-off-by: Prabhat Sharma <[email protected]>
…project#9567) * Added performance improvement for datetime field parsing This adds caching of formatters in case of no explicit format specified for the datetime field in mapping. This also adds `strict_date_time_no_millis` as additional formatter in default date time formats Signed-off-by: Prabhat Sharma <[email protected]> * Refactor DateTimeFormatter Access under featireflag Signed-off-by: Prabhat Sharma <[email protected]> --------- Signed-off-by: Prabhat Sharma <[email protected]> Co-authored-by: Prabhat Sharma <[email protected]> Signed-off-by: Shivansh Arora <[email protected]>
Signed-off-by: Prabhat Sharma [email protected]
Description
This PR adds caching of last used datetime field parser on a shard level in case of no explicit format specified for the datetime field in mapping i.e. for default datetime mapping. This also adds
strict_date_time_no_millis
as additional formatter in default date time formats to improve the efficiency of formatting for most common log date format ofyyyy-MM-dd'T'HH:mm:ssZ
This PR also adds
printFormat
to control format for string conversion of epochs back to datetime during search requestsRelated Issues
Resolves #4558
Also addresses issue #10118
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.