Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added performance improvement for datetime field parsing #9567

Merged
merged 2 commits into from
Oct 5, 2023

Conversation

CaptainDredge
Copy link
Contributor

@CaptainDredge CaptainDredge commented Aug 28, 2023

Signed-off-by: Prabhat Sharma [email protected]

Description

This PR adds caching of last used datetime field parser on a shard level in case of no explicit format specified for the datetime field in mapping i.e. for default datetime mapping. This also adds strict_date_time_no_millis as additional formatter in default date time formats to improve the efficiency of formatting for most common log date format of yyyy-MM-dd'T'HH:mm:ssZ

This PR also adds printFormat to control format for string conversion of epochs back to datetime during search requests

Related Issues

Resolves #4558
Also addresses issue #10118

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@CaptainDredge
Copy link
Contributor Author

CaptainDredge commented Aug 28, 2023

Perf Results:

Perf results on SO dataset with new datetime default format. No perf degradation was observed in this case
Metric Task Value_no_op Value_no_op Value_no_op Value Value Value
Cumulative indexing time of primary shards 158.3718 159.1368333 151.9491667 124.5224333 121.8936333 124.60485
Min cumulative indexing time across primary shards 0 0 0 0 0 0
Median cumulative indexing time across primary shards 31.47405 31.62925 29.98943333 24.50805 24.12626667 24.43531667
Max cumulative indexing time across primary shards 32.43993333 32.34871667 31.07395 26.12665 24.6366 25.56873333
Cumulative indexing throttle time of primary shards 0 0 0 0 0 0
Min cumulative indexing throttle time across primary shards 0 0 0 0 0 0
Median cumulative indexing throttle time across primary shards 0 0 0 0 0 0
Max cumulative indexing throttle time across primary shards 0 0 0 0 0 0
Cumulative merge time of primary shards 95.57346667 57.327 87.58745 66.1147 59.8178 71.12356667
Cumulative merge count of primary shards 41 36 39 36 38 37
Min cumulative merge time across primary shards 0 0 0 0 0 0
Median cumulative merge time across primary shards 18.90066667 9.931266667 17.25311667 8.502616667 8.4724 13.72081667
Max cumulative merge time across primary shards 19.82798333 16.54443333 17.96195 17.02511667 17.36268333 17.03018333
Cumulative merge throttle time of primary shards 39.13806667 20.62775 38.94758333 30.01636667 28.51205 33.26633333
Min cumulative merge throttle time across primary shards 0 0 0 0 0 0
Median cumulative merge throttle time across primary shards 7.645233333 3.078216667 7.757483333 3.0886 3.538016667 5.87745
Max cumulative merge throttle time across primary shards 8.3052 7.54865 8.100966667 8.360116667 9.151816667 8.512166667
Cumulative refresh time of primary shards 9.200733333 8.069333333 7.820066667 6.555233333 6.10715 6.29345
Cumulative refresh count of primary shards 135 125 141 131 131 135
Min cumulative refresh time across primary shards 0 0 0 0 0 0
Median cumulative refresh time across primary shards 1.7453 1.574566667 1.5345 1.289966667 1.207233333 1.242766667
Max cumulative refresh time across primary shards 2.0735 1.69375 1.6082 1.370716667 1.29255 1.3228
Cumulative flush time of primary shards 40.52408333 36.16191667 35.90536667 29.20646667 27.72203333 27.98448333
Cumulative flush count of primary shards 84 78 84 85 86 83
Min cumulative flush time across primary shards 0 0 0 0 0 0
Median cumulative flush time across primary shards 7.919166667 7.1781 7.059133333 5.827366667 5.416233333 5.529716667
Max cumulative flush time across primary shards 8.3798 x 7.408816667 6.180833333 5.7379 5.851333333
Total Young Gen GC time 12.244 7.611 8.418 5.435 4.073 6.851
Total Young Gen GC count 546 215 270 395 167 189
Total Old Gen GC time 0 0 0 0 0 0
Total Old Gen GC count 0 0 0 0 0 0
Store size 40.84163455 35.67850496 35.54589905 30.67675517 36.44220873 35.1891273
Translog size 3.59E-07 3.59E-07 3.59E-07 3.59E-07 3.59E-07 3.59E-07
Heap used for segments 0 0 0 0 0 0
Heap used for doc values 0 0 0 0 0 0
Heap used for terms 0 0 0 0 0 0
Heap used for norms 0 0 0 0 0 0
Heap used for points 0 0 0 0 0 0
Heap used for stored fields 0 0 0 0 0 0
Segment count 110 129 112 133 132 134
Min Throughput index-append 24551.64 24691.88 25518.3 33067.56 33415.98 32672.32
Mean Throughput index-append 27525.93 28294.83 28562.63 36544.85 37026.45 36550.81
Median Throughput index-append 27579.52 28355.53 28587.78 36416.03 37017.04 36458.89
Max Throughput index-append 31075.19 32945.87 32302.52 42010.55 42022.23 42678.38
50th percentile latency index-append 1155.089368 1125.468426 1122.983612 921.8961308 905.3256814 939.0141056
90th percentile latency index-append 2407.577756 2162.263612 2179.204289 1487.419373 1406.00183 1490.707261
99th percentile latency index-append 9572.885646 9519.778251 8682.590078 7699.925619 7728.17741 7172.447614
99.9th percentile latency index-append 31323.80176 30075.54338 25929.53038 23825.75035 21592.98976 23663.97538
100th percentile latency index-append 44919.77966 38591.04387 31915.99929 28385.39693 30053.03134 27668.44527
50th percentile service time index-append 1155.089368 1125.468426 1122.983612 921.8961308 905.3256814 939.0141056
90th percentile service time index-append 2407.577756 2162.263612 2179.204289 1487.419373 1406.00183 1490.707261
99th percentile service time index-append 9572.885646 9519.778251 8682.590078 7699.925619 7728.17741 7172.447614
99.9th percentile service time index-append 31323.80176 30075.54338 25929.53038 23825.75035 21592.98976 23663.97538
100th percentile service time index-append 44919.77966 38591.04387 31915.99929 28385.39693 30053.03134 27668.44527
error rate index-append 0 1.38 0 0 0.11 1.5
Min Throughput wait-until-merges-finish 0 0 0 0 0 0
Mean Throughput wait-until-merges-finish 0 0 0 0 0 0
Median Throughput wait-until-merges-finish 0 0 0 0 0 0
Max Throughput wait-until-merges-finish 0 0 0 0 0 0
100th percentile latency wait-until-merges-finish 484389.3422 364687.8157 507630.8088 471247.3052 532352.968 484628.3004
100th percentile service time wait-until-merges-finish 484389.3422 364687.8157 507630.8088 471247.3052 532352.968 484628.3004
error rate wait-until-merges-finish 0 0 0 0 0 0
Perf results on modified SO dataset ( Change: milliseconds were removed from SO dataset to test the changes)
Metric Task Value no optimization Value no optimization Value no optimization Value no optimization Value no optimization Value no optimization Value no optimization No optimization With optimization Value Value Value Value Value Value Value
Cumulative indexing time of primary shards 119.4644 125.4285 118.984017 122.400967 123.558633 128.749667 126.507833 123.5848596 111.4348524 112.7521 114.14205 111.572067 111.148433 110.768933 109.691017 109.969367
Min cumulative indexing time across primary shards 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Median cumulative indexing time across primary shards 23.61195 24.4752667 23.7198167 24.3540333 24.6725667 25.31325 24.96075 24.44394763 22.05133811 22.29515 22.3909 22.0575 22.0847167 22.2035667 21.6587667 21.6687667
Max cumulative indexing time across primary shards 24.36956667 26.02525 24.4382833 25.46585 25.5333 26.20105 25.92245 25.42225 22.79463331 23.1831333 23.2842 23.07075 22.9071833 22.2634333 22.5541833 22.29955
Cumulative indexing throttle time of primary shards 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Min cumulative indexing throttle time across primary shards 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Median cumulative indexing throttle time across primary shards 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Max cumulative indexing throttle time across primary shards 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Cumulative merge time of primary shards 48.00231667 57.8046333 70.64315 54.85575 55.2994667 74.4616833 77.1996333 62.60951904 65.40815 71.7817 68.0835833 72.7356333 46.0652667 80.6974 66.3752 52.1182667
Cumulative merge count of primary shards 37 39 39 36 36 41 42 38.57142857 36.42857143 40 35 36 33 38 38 35
Min cumulative merge time across primary shards 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Median cumulative merge time across primary shards 8.304733333 8.85816667 15.2551667 7.5924 8.0758 15.9518333 14.9575667 11.28509524 12.49072381 14.24825 13.8843167 15.8058 7.48463333 14.7868 13.4614 7.76386667
Max cumulative merge time across primary shards 15.90916667 16.1955667 16.2358833 16.23505 16.4953667 16.44595 17.0359333 16.36470238 16.11048096 16.7908167 15.7627 16.4554667 15.4942333 17.7337333 15.6829667 14.85345
Cumulative merge throttle time of primary shards 23.3134 28.3469667 33.4246333 25.0928 25.7987667 34.0979167 32.9500333 29.00350239 30.1351619 32.3708833 30.4251333 32.0131167 20.8474 38.7960833 32.25885 24.2346667
Min cumulative merge throttle time across primary shards 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Median cumulative merge throttle time across primary shards 3.7553 4.10676667 7.00946667 2.87521667 3.12463333 7.27018333 6.11863333 4.894314286 5.414890476 5.8824 5.48725 6.89768333 3.21295 6.90478333 5.9665 3.55266667
Max cumulative merge throttle time across primary shards 8.287566667 8.3756 7.67606667 8.08678333 8.0511 7.7461 8.0407 8.037702381 8.004171427 8.3137 8.01783333 7.56891667 7.95713333 8.69965 8.25008333 7.22188333
Cumulative refresh time of primary shards 6.5835 6.91985 5.97388333 6.58016667 6.6773 6.82043333 7.3509 6.700861904 6.270442859 6.74915 6.15006667 6.19041667 6.41876667 6.30578333 6.0392 6.03971667
Cumulative refresh count of primary shards 156 158 160 153 157 163 167 159.1428571 151.4285714 159 151 153 143 156 152 146
Min cumulative refresh time across primary shards 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Median cumulative refresh time across primary shards 1.291866667 1.35181667 1.185 1.30336667 1.3299 1.28743333 1.43268333 1.31172381 1.197585714 1.26403333 1.20095 1.17986667 1.20415 1.25848333 1.1685 1.10711667
Max cumulative refresh time across primary shards 1.419333333 1.47403333 1.20855 1.38968333 1.39411667 1.45866667 1.53251667 1.410985715 1.359423809 1.48183333 1.31231667 1.34398333 1.4911 1.32875 1.25515 1.30283333
Cumulative flush time of primary shards 28.34575 27.8718833 27.8219333 28.42885 29.65445 30.1351667 31.2186 29.06809047 27.60923333 29.05165 27.0747667 27.69735 27.1701333 27.9657333 27.1503333 27.1546667
Cumulative flush count of primary shards 83 79 84 81 84 81 85 82.42857143 82.85714286 86 80 83 81 83 84 83
Min cumulative flush time across primary shards 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Median cumulative flush time across primary shards 5.5964 5.54656667 5.51655 5.58206667 5.71998333 5.87446667 6.05851667 5.69922143 5.413278573 5.68353333 5.43721667 5.34511667 5.43851667 5.28201667 5.13746667 5.56908333
Max cumulative flush time across primary shards 5.918933333 5.80915 5.7209 5.9442 6.33565 6.33365 6.57468333 6.091023809 5.817090476 6.12903333 5.62025 5.94835 5.7427 5.9115 5.72288333 5.64491667
Total Young Gen GC time 6.428 5.397 3.812 5.844 9.199 5.217 4.63 5.789571429 5.777 6.105 6.293 5.947 5.818 4.55 5.676 6.05
Total Young Gen GC count 436 165 171 348 626 184 182 301.7142857 357.1428571 434 442 173 401 229 389 432
Total Old Gen GC time 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Total Old Gen GC count 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Store size 35.70914218 35.6330158 35.2318302 35.6018434 30.393478 35.3645242 35.4382837 34.76744535 33.38280891 35.6010569 36.3583308 29.9805695 35.4828869 30.3519081 30.4047239 35.5001863
Translog size 3.59E-07 3.59E-07 3.59E-07 3.59E-07 3.59E-07 3.59E-07 3.59E-07 3.58937E-07 0.000000359 3.59E-07 3.59E-07 3.59E-07 3.59E-07 3.59E-07 3.59E-07 3.59E-07
Heap used for segments 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Heap used for doc values 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Heap used for terms 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Heap used for norms 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Heap used for points 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Heap used for stored fields 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Segment count 137 133 114 131 137 118 111 125.8571429 124 111 122 114 152 113 117 139
Min Throughput index-append 33903.31 31928.82 33516.75 33090.19 32681.27 30899 32202.71 32603.15 35287.14286 34751.68 35125.59 34317.64 36226.83 35761.22 35212.71 35614.33
Mean Throughput index-append 36795.64 35318.27 36979.57 36635.96 36226.12 35132.89 36024.94 36159.05571 39248.17571 38860.52 38627.77 39167.56 39778.35 39330.72 39212.98 39759.33
Median Throughput index-append 36818.28 35200.96 37132.7 36651.61 35794.86 35038.51 36148.02 36112.13429 39295.11429 38956.02 38669.74 39326.49 39536.31 39571.47 39529.27 39476.5
Max Throughput index-append 39826.39 39093.23 41461.05 40990.97 41622.86 41898.59 41170.39 40866.21143 44037.66857 43146.62 43796.69 44478.1 44230.65 43980 44338.72 44292.9
50th percentile latency index-append 884.6166171 931.054405 877.422431 915.254234 909.982619 944.893907 955.427582 916.9502564 812.7845763 832.956027 822.156546 829.621169 791.785785 795.84113 822.119696 795.011681
90th percentile latency index-append 1330.464894 1307.41193 1395.8527 1362.18196 1439.38677 1538.177 1472.59726 1406.581788 1303.51111 1271.78029 1319.04292 1369.82525 1342.25575 1272.78296 1270.93468 1277.95592
99th percentile latency index-append 6661.154797 7613.90564 7191.48228 7323.44836 8087.56312 9381.54527 7579.1587 7691.179738 7626.424103 8218.9514 7701.8918 7704.59999 7594.26998 7977.22126 7008.53384 7179.50045
99.9th percentile latency index-append 21985.01806 23013.7004 22862.1766 24025.8882 22477.3652 27623.1024 23035.6135 23574.69491 21882.54059 21855.8508 22840.6311 21333.0898 22104.3067 22341.6623 21163.6954 21538.548
100th percentile latency index-append 28404.38619 25723.5164 27559.3387 27691.6661 30598.8981 35564.2117 33187.6533 29818.52436 29176.18114 29527.9861 27246.7852 32822.2265 30723.841 25603.2436 27667.8283 30641.3573
50th percentile service time index-append 884.6166171 931.054405 877.422431 915.254234 909.982619 944.893907 955.427582 916.9502564 812.7845763 832.956027 822.156546 829.621169 791.785785 795.84113 822.119696 795.011681
90th percentile service time index-append 1330.464894 1307.41193 1395.8527 1362.18196 1439.38677 1538.177 1472.59726 1406.581788 1303.51111 1271.78029 1319.04292 1369.82525 1342.25575 1272.78296 1270.93468 1277.95592
99th percentile service time index-append 6661.154797 7613.90564 7191.48228 7323.44836 8087.56312 9381.54527 7579.1587 7691.179738 7626.424103 8218.9514 7701.8918 7704.59999 7594.26998 7977.22126 7008.53384 7179.50045
99.9th percentile service time index-append 21985.01806 23013.7004 22862.1766 24025.8882 22477.3652 27623.1024 23035.6135 23574.69491 21882.54059 21855.8508 22840.6311 21333.0898 22104.3067 22341.6623 21163.6954 21538.548
100th percentile service time index-append 28404.38619 25723.5164 27559.3387 27691.6661 30598.8981 35564.2117 33187.6533 29818.52436 29176.18114 29527.9861 27246.7852 32822.2265 30723.841 25603.2436 27667.8283 30641.3573
error rate index-append 0 1.75 0.63 0.27 0 0 0.18 0.404285714 0.218571429 0 0 1.53 0 0 0 0
Min Throughput wait-until-merges-finish 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Mean Throughput wait-until-merges-finish 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Median Throughput wait-until-merges-finish 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Max Throughput wait-until-merges-finish 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
100th percentile latency wait-until-merges-finish 417625.0436 480505.289 434937.904 402814.153 476785.878 479417.753 452387.216 449210.4624 455649.6034 489132.515 542712.908 425169.871 453631.349 502062.851 447131.268 329706.462
100th percentile service time wait-until-merges-finish 417625.0436 480505.289 434937.904 402814.153 476785.878 479417.753 452387.216 449210.4624 455649.6034 489132.515 542712.908 425169.871 453631.349 502062.851 447131.268 329706.462
error rate wait-until-merges-finish 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Aug 28, 2023

Compatibility status:

Checks if related components are compatible with change d9f6aa7

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git]

@dblock
Copy link
Member

dblock commented Aug 28, 2023

  1. Are there any real scenarios where canCacheLastParsedFormatter would be set to false? If not, then it should not be exposed in the function arguments and the feature flag would toggle formatter cache on/off.
  2. Does it make sense to cache multiple (all possible) formatters in a concurrent map with a max size instead of just the last one? It feels like there's only N possible formats ever used and if the lookup is always cheaper than the new formatter construction then we can end up with predictably better performance regardless of the mix in the data formats.

@reta
Copy link
Collaborator

reta commented Aug 28, 2023

2. Does it make sense to cache multiple (all possible) formatters in a concurrent map with a max size instead of just the last one? I

@dblock if I understood the change, the "cache" in scope of this pull request is really reoder: the list of the parsers is iterated over and in case of successful parsing, this parser moves to the begging of the parsers list (so next parsing attempt will start with it). But we don't actually "cache" or construct the parsers (@CaptainDredge please correct me if I am wrong)

@dblock
Copy link
Member

dblock commented Aug 28, 2023

  1. Does it make sense to cache multiple (all possible) formatters in a concurrent map with a max size instead of just the last one? I

@dblock if I understood the change, the "cache" in scope of this pull request is really reoder: the list of the parsers is iterated over and in case of successful parsing, this parser moves to the begging of the parsers list (so next parsing attempt will start with it). But we don't actually "cache" or construct the parsers (@CaptainDredge please correct me if I am wrong)

Ok, you're right, I misunderstood! So my next dumb question is: doesn't this potentially cause the wrong parser to be applied by assuming that the data is all the same format? If for doc 1 parser 1 fails, and doc 2 succeeds, you could very well had row 2 parser 1 succeed, but with the cache it will fail. I do get it that it's rare to put data with different formats in multiple documents, but it's entirely possible.

If I am correct (I am sure you'll show me how I misunderstood this again :) then the only 100% reliable approach is that the caller can hint to the parser to use.

@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-9567-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 2965e69aff83b702d46d1d630998cbf3ef7ebca5
# Push it to GitHub
git push --set-upstream origin backport/backport-9567-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-9567-to-2.x.

@reta
Copy link
Collaborator

reta commented Oct 5, 2023

@reta I needed to add synchronisation while updating list to maintain data consistency for eg. there could've been a race condition where the formatter is removed but not yet added and in the meantime another thread tried to parse through the list which doesn't contain that formatter. Can we merge this in main now?

@CaptainDredge I think there are problem with last minute changes, please correct me if I am wrong here (commented on problematic places).

@CaptainDredge
Copy link
Contributor Author

@reta yes I've tried to address your concerns, see if the responses are satisfactory otherwise I can raise separate PR for any suggested changes by you asap

CaptainDredge added a commit to CaptainDredge/OpenSearch that referenced this pull request Oct 5, 2023
…project#9567)

* Added performance improvement for datetime field parsing

This adds caching of formatters in case of no explicit format specified for the datetime field in mapping.
This also adds `strict_date_time_no_millis` as additional formatter in default date time formats

Signed-off-by: Prabhat Sharma <[email protected]>

* Refactor DateTimeFormatter Access under featireflag

Signed-off-by: Prabhat Sharma <[email protected]>

---------

Signed-off-by: Prabhat Sharma <[email protected]>
Co-authored-by: Prabhat Sharma <[email protected]>
(cherry picked from commit 2965e69)
Signed-off-by: Prabhat Sharma <[email protected]>
CaptainDredge added a commit to CaptainDredge/OpenSearch that referenced this pull request Oct 6, 2023
…project#9567)

* Added performance improvement for datetime field parsing

This adds caching of formatters in case of no explicit format specified for the datetime field in mapping.
This also adds `strict_date_time_no_millis` as additional formatter in default date time formats

Signed-off-by: Prabhat Sharma <[email protected]>

* Refactor DateTimeFormatter Access under featireflag

Signed-off-by: Prabhat Sharma <[email protected]>

---------

Signed-off-by: Prabhat Sharma <[email protected]>
Co-authored-by: Prabhat Sharma <[email protected]>
(cherry picked from commit 2965e69)
Signed-off-by: Prabhat Sharma <[email protected]>
CaptainDredge added a commit to CaptainDredge/OpenSearch that referenced this pull request Oct 6, 2023
…project#9567)

* Added performance improvement for datetime field parsing

This adds caching of formatters in case of no explicit format specified for the datetime field in mapping.
This also adds `strict_date_time_no_millis` as additional formatter in default date time formats

Signed-off-by: Prabhat Sharma <[email protected]>

* Refactor DateTimeFormatter Access under featireflag

Signed-off-by: Prabhat Sharma <[email protected]>

---------

Signed-off-by: Prabhat Sharma <[email protected]>
Co-authored-by: Prabhat Sharma <[email protected]>
(cherry picked from commit 2965e69)
Signed-off-by: Prabhat Sharma <[email protected]>
CaptainDredge added a commit to CaptainDredge/OpenSearch that referenced this pull request Oct 6, 2023
…project#9567)

* Added performance improvement for datetime field parsing

This adds caching of formatters in case of no explicit format specified for the datetime field in mapping.
This also adds `strict_date_time_no_millis` as additional formatter in default date time formats

Signed-off-by: Prabhat Sharma <[email protected]>

* Refactor DateTimeFormatter Access under featireflag

Signed-off-by: Prabhat Sharma <[email protected]>

---------

Signed-off-by: Prabhat Sharma <[email protected]>
Co-authored-by: Prabhat Sharma <[email protected]>
(cherry picked from commit 2965e69)
Signed-off-by: Prabhat Sharma <[email protected]>
(cherry picked from commit 5a459ba)
CaptainDredge added a commit to CaptainDredge/OpenSearch that referenced this pull request Oct 6, 2023
…project#9567)

* Added performance improvement for datetime field parsing

This adds caching of formatters in case of no explicit format specified for the datetime field in mapping.
This also adds `strict_date_time_no_millis` as additional formatter in default date time formats

Signed-off-by: Prabhat Sharma <[email protected]>

* Refactor DateTimeFormatter Access under featireflag

Signed-off-by: Prabhat Sharma <[email protected]>

---------

Signed-off-by: Prabhat Sharma <[email protected]>
Co-authored-by: Prabhat Sharma <[email protected]>
(cherry picked from commit 2965e69)
Signed-off-by: Prabhat Sharma <[email protected]>
@reta
Copy link
Collaborator

reta commented Oct 7, 2023

I am seeing multiple tests failure showing up while parsing the date time data value. Also these are happening for non-concurrent path as well. It seems to be related to this PR, can you please take a look ?

@sohami yes, this change has introduced the issue that was fixed shortly after by #10385

@CaptainDredge
Copy link
Contributor Author

@sohami as @reta mentioned these should've been fixed by #10385 but I'll try reproducing these flaky tests locally and will try to resolve any issues but just wondering if you've any idea on how frequently these tests are failing because I don't see other PRs referring above issues to get an idea on number of failures

@sohami
Copy link
Collaborator

sohami commented Oct 9, 2023

@sohami as @reta mentioned these should've been fixed by #10385 but I'll try reproducing these flaky tests locally and will try to resolve any issues but just wondering if you've any idea on how frequently these tests are failing because I don't see other PRs referring above issues to get an idea on number of failures

Not sure about the frequency, but it started showing up last week only across multiple test cases. Earlier the thought was it has probably something to do with concurrent search. On looking further, found correlation with this change hence raised it here. You can check the run time in the shared CI links and see if your change was already merged in by then. Or try running locally in loop and see if it repros.

deshsidd pushed a commit to deshsidd/OpenSearch that referenced this pull request Oct 9, 2023
…project#9567)

* Added performance improvement for datetime field parsing

This adds caching of formatters in case of no explicit format specified for the datetime field in mapping.
This also adds `strict_date_time_no_millis` as additional formatter in default date time formats

Signed-off-by: Prabhat Sharma <[email protected]>

* Refactor DateTimeFormatter Access under featireflag

Signed-off-by: Prabhat Sharma <[email protected]>

---------

Signed-off-by: Prabhat Sharma <[email protected]>
Co-authored-by: Prabhat Sharma <[email protected]>
vikasvb90 pushed a commit to vikasvb90/OpenSearch that referenced this pull request Oct 10, 2023
…project#9567)

* Added performance improvement for datetime field parsing

This adds caching of formatters in case of no explicit format specified for the datetime field in mapping.
This also adds `strict_date_time_no_millis` as additional formatter in default date time formats

Signed-off-by: Prabhat Sharma <[email protected]>

* Refactor DateTimeFormatter Access under featireflag

Signed-off-by: Prabhat Sharma <[email protected]>

---------

Signed-off-by: Prabhat Sharma <[email protected]>
Co-authored-by: Prabhat Sharma <[email protected]>
@sohami
Copy link
Collaborator

sohami commented Oct 13, 2023

@sohami as @reta mentioned these should've been fixed by #10385 but I'll try reproducing these flaky tests locally and will try to resolve any issues but just wondering if you've any idea on how frequently these tests are failing because I don't see other PRs referring above issues to get an idea on number of failures

Not sure about the frequency, but it started showing up last week only across multiple test cases. Earlier the thought was it has probably something to do with concurrent search. On looking further, found correlation with this change hence raised it here. You can check the run time in the shared CI links and see if your change was already merged in by then. Or try running locally in loop and see if it repros.

@CaptainDredge Any luck with your manual local run. Are these still failing ? if not, can you please close these issues ?

@CaptainDredge
Copy link
Contributor Author

@sohami I tried reproing locally by running these in loops for few iterations but wasn't lucky enough to hit failure. I'll close these issue outs and if other PRs reports failure, then will put more effort on these

CaptainDredge added a commit to CaptainDredge/OpenSearch that referenced this pull request Oct 16, 2023
…project#9567)

* Added performance improvement for datetime field parsing

This adds caching of formatters in case of no explicit format specified for the datetime field in mapping.
This also adds `strict_date_time_no_millis` as additional formatter in default date time formats

Signed-off-by: Prabhat Sharma <[email protected]>

* Refactor DateTimeFormatter Access under featireflag

Signed-off-by: Prabhat Sharma <[email protected]>

---------

Signed-off-by: Prabhat Sharma <[email protected]>
Co-authored-by: Prabhat Sharma <[email protected]>
(cherry picked from commit 2965e69)
Signed-off-by: Prabhat Sharma <[email protected]>
CaptainDredge added a commit to CaptainDredge/OpenSearch that referenced this pull request Oct 19, 2023
…project#9567)

* Added performance improvement for datetime field parsing

This adds caching of formatters in case of no explicit format specified for the datetime field in mapping.
This also adds `strict_date_time_no_millis` as additional formatter in default date time formats

Signed-off-by: Prabhat Sharma <[email protected]>

* Refactor DateTimeFormatter Access under featireflag

Signed-off-by: Prabhat Sharma <[email protected]>

---------

Signed-off-by: Prabhat Sharma <[email protected]>
Co-authored-by: Prabhat Sharma <[email protected]>
(cherry picked from commit 2965e69)
Signed-off-by: Prabhat Sharma <[email protected]>
reta pushed a commit that referenced this pull request Oct 19, 2023
…10448)

* Added performance improvement for datetime field parsing (#9567)

* Added performance improvement for datetime field parsing

This adds caching of formatters in case of no explicit format specified for the datetime field in mapping.
This also adds `strict_date_time_no_millis` as additional formatter in default date time formats

Signed-off-by: Prabhat Sharma <[email protected]>

* Refactor DateTimeFormatter Access under featireflag

Signed-off-by: Prabhat Sharma <[email protected]>

---------

Signed-off-by: Prabhat Sharma <[email protected]>
Co-authored-by: Prabhat Sharma <[email protected]>
(cherry picked from commit 2965e69)
Signed-off-by: Prabhat Sharma <[email protected]>

* Race condition fix for datetime optimization (#10385)

* Race condition fix for datetime optimization

Signed-off-by: Prabhat Sharma <[email protected]>

* Changed JavaDateTimeFormatter caching of parser from MRU(most recently used) to a simple last used formatter

Signed-off-by: Prabhat Sharma <[email protected]>

---------

Signed-off-by: Prabhat Sharma <[email protected]>
Co-authored-by: Prabhat Sharma <[email protected]>
Signed-off-by: Prabhat Sharma <[email protected]>

---------

Signed-off-by: Prabhat Sharma <[email protected]>
Co-authored-by: Prabhat Sharma <[email protected]>
CaptainDredge added a commit to CaptainDredge/OpenSearch that referenced this pull request Jan 31, 2024
…project#9567)

* Added performance improvement for datetime field parsing

This adds caching of formatters in case of no explicit format specified for the datetime field in mapping.
This also adds `strict_date_time_no_millis` as additional formatter in default date time formats

Signed-off-by: Prabhat Sharma <[email protected]>

* Refactor DateTimeFormatter Access under featireflag

Signed-off-by: Prabhat Sharma <[email protected]>

---------

Signed-off-by: Prabhat Sharma <[email protected]>
Co-authored-by: Prabhat Sharma <[email protected]>
(cherry picked from commit 2965e69)
Signed-off-by: Prabhat Sharma <[email protected]>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…project#9567)

* Added performance improvement for datetime field parsing

This adds caching of formatters in case of no explicit format specified for the datetime field in mapping.
This also adds `strict_date_time_no_millis` as additional formatter in default date time formats

Signed-off-by: Prabhat Sharma <[email protected]>

* Refactor DateTimeFormatter Access under featireflag

Signed-off-by: Prabhat Sharma <[email protected]>

---------

Signed-off-by: Prabhat Sharma <[email protected]>
Co-authored-by: Prabhat Sharma <[email protected]>
Signed-off-by: Shivansh Arora <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport-failed enhancement Enhancement or improvement to existing feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Infer and cache date field format instead of re-parsing it for every document
5 participants