Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Sorting on a field of type basic_date_time_no_millis gives a java.time.DateTimeException #11138

Closed
varfrog opened this issue Nov 8, 2023 · 9 comments
Assignees
Labels
bug Something isn't working good first issue Good for newcomers Search Search query, autocomplete ...etc

Comments

@varfrog
Copy link

varfrog commented Nov 8, 2023

Describe the bug

Background:

  1. Index with a field as follows:
            "last_seen": {
                "format": "basic_date_time_no_millis",
                "type": "date"
            }
  1. Some documents are without a value for the date field (last_seen). This is a requirement for the exception to occur.

Problem: when sorted by this field with size = <number of search results>, OpenSearch returns a Java exception. Full queries are laid out in section "To Reproduce".

{
    "error": {
        "root_cause": [
            {
                "type": "response_handler_failure_transport_exception",
                "reason": "java.time.DateTimeException: Field Year cannot be printed as the value -292275055 exceeds the maximum print width of 4"
            }
        ],
        "type": "search_phase_execution_exception",
        "reason": "all shards failed",
        "phase": "query",
        "grouped": true,
        "failed_shards": [
            {
                "shard": 0,
                "index": "test_es_issue_81960",
                "node": "MqBH-lbkRyyOCTK7hWJmow",
                "reason": {
                    "type": "response_handler_failure_transport_exception",
                    "reason": "java.time.DateTimeException: Field Year cannot be printed as the value -292275055 exceeds the maximum print width of 4",
                    "caused_by": {
                        "type": "date_time_exception",
                        "reason": "Field Year cannot be printed as the value -292275055 exceeds the maximum print width of 4"
                    }
                }
            }
        ]
    },
    "status": 500
}

To Reproduce
Create an index without a date field.

PUT {{host}}/{{index}}

{
    "mappings": {
        "dynamic": "strict",
        "properties": {
            "username": {
                "type": "keyword"
            },
            "last_seen": {
                "format": "basic_date_time_no_millis",
                "type": "date"
            }
        }
    }
}

Index some documents. One with the date value, another without.

POST {{host}}/_bulk

{ "index": { "_index": "test_es_issue_81960", "_id": "1" } }
{ "username" : "ann" }
{ "index": { "_index": "test_es_issue_81960", "_id": "2" } }
{ "username" : "bob", "last_seen" : "20231015T144500Z" }

Search query "Q1":

{
    "query": {
        "match_all": {}
    },
    "size": 2,
    "sort": [
        {
            "last_seen": {
                "order": "desc"
            }
        }
    ]
}

The query fails with size set to 2 (the number of documents in the index). Values 1 and 3 for size work. Adding missing=0 seems to be a workaround.

            "last_seen": {
                "order": "desc",
                "missing": 0
            }

Expected behavior
Query "Q1" (above) should return result:

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": null,
        "hits": [
            {
                "_index": "test_es_issue_81960",
                "_id": "2",
                "_score": null,
                "_source": {
                    "username": "bob",
                    "last_seen": "20231015T144500Z"
                },
                "sort": [
                    1697381100000
                ]
            },
            {
                "_index": "test_es_issue_81960",
                "_id": "1",
                "_score": null,
                "_source": {
                    "username": "ann"
                },
                "sort": [
                    0
                ]
            }
        ]
    }
}

Plugins
None.

Screenshots

Host/Environment (please complete the following information):

Additional context

Tried and reproduced on OpenSearch 2.5.0 and 2.11.0.

This has been reported for ElasticSearch also: elastic/elasticsearch#81960

@varfrog varfrog added bug Something isn't working untriaged labels Nov 8, 2023
@mch2 mch2 added the Search Search query, autocomplete ...etc label Dec 8, 2023
@msfroh msfroh added good first issue Good for newcomers and removed untriaged labels Dec 13, 2023
@mkhludnev
Copy link
Contributor

fyi
full stack trace is

Caused by: java.time.DateTimeException: Field Year cannot be printed as the value -292275055 exceeds the maximum print width of 4
        at java.time.format.DateTimeFormatterBuilder$NumberPrinterParser.format(DateTimeFormatterBuilder.java:2771) ~[?:?]
        at java.time.format.DateTimeFormatterBuilder$CompositePrinterParser.format(DateTimeFormatterBuilder.java:2402) ~[?:?]
        at java.time.format.DateTimeFormatterBuilder$CompositePrinterParser.format(DateTimeFormatterBuilder.java:2402) ~[?:?]
        at java.time.format.DateTimeFormatterBuilder$CompositePrinterParser.format(DateTimeFormatterBuilder.java:2402) ~[?:?]
        at java.time.format.DateTimeFormatter.formatTo(DateTimeFormatter.java:1849) ~[?:?]
        at java.time.format.DateTimeFormatter.format(DateTimeFormatter.java:1823) ~[?:?]
        at org.opensearch.common.time.JavaDateFormatter.format(JavaDateFormatter.java:282) ~[opensearch-2.11.1.jar:2.11.1]
        at org.opensearch.search.DocValueFormat$DateTime.format(DocValueFormat.java:306) ~[opensearch-2.11.1.jar:2.11.1]
        at org.opensearch.search.DocValueFormat$DateTime.format(DocValueFormat.java:232) ~[opensearch-2.11.1.jar:2.11.1]
        at org.opensearch.search.SearchSortValuesAndFormats.<init>(SearchSortValuesAndFormats.java:65) ~[opensearch-2.11.1.jar:2.11.1]
        at org.opensearch.action.search.BottomSortValuesCollector.consumeTopDocs(BottomSortValuesCollector.java:89) ~[opensearch-2.11.1.jar:2.11.1]
        at org.opensearch.action.search.SearchQueryThenFetchAsyncAction.onShardResult(SearchQueryThenFetchAsyncAction.java:159) ~[opensearch-2.11.1.jar:2.11.1]
        at org.opensearch.action.search.AbstractSearchAsyncAction$1.innerOnResponse(AbstractSearchAsyncAction.java:292) ~[opensearch-2.11.1.jar:2.11.1]
        at org.opensearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:59) ~[opensearch-2.11.1.jar:2.11.1]
        at org.opensearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:44) ~[opensearch-2.11.1.jar:2.11.1]
        at org.opensearch.action.search.SearchExecutionStatsCollector.onResponse(SearchExecutionStatsCollector.java:99) ~[opensearch-2.11.1.jar:2.11.1]
        at org.opensearch.action.search.SearchExecutionStatsCollector.onResponse(SearchExecutionStatsCollector.java:52) ~[opensearch-2.11.1.jar:2.11.1]
        at org.opensearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:70) ~[opensearch-2.11.1.jar:2.11.1]
        at org.opensearch.action.search.SearchTransportService$ConnectionCountingHandler.handleResponse(SearchTransportService.java:746) ~[opensearch-2.11.1.jar:2.11.1]
        at org.opensearch.transport.TransportService$6.handleResponse(TransportService.java:897) ~[opensearch-2.11.1.jar:2.11.1]
        at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1516) ~[opensearch-2.11.1.jar:2.11.1]
        at org.opensearch.transport.TransportService$DirectResponseChannel.processResponse(TransportService.java:1599) ~[opensearch-2.11.1.jar:2.11.1]

@mkhludnev
Copy link
Contributor

mkhludnev commented Dec 19, 2023

also, if we set size=3 it yields incorrect sort value imo

        "sort": [
          -9223372036854775808
        ]

Note: if we sort by integer field and a doc have no value, we got -Int.MAX_VALUE.

   "sort": [
          -2147483648
        ]

Assuming that handling absent integers is correct, how to handle formatting -Long.MAX_VALUE into date? Should we cap years to -9999? Or we need to change all missing values handling to passing null/empty values?
WDYT?
I think we can find min and max values which formatter yet can handle an use them as a boundary.

mkhludnev pushed a commit to mkhludnev/OpenSearch that referenced this issue Dec 19, 2023
when sorting by date column, missing values transferred as Long.MIN/MAX_VAL see IndexFieldData.XFieldComparatorSource.missingObject()
When this value formatted for merging in coordinator it hit the error.
@mkhludnev
Copy link
Contributor

mkhludnev commented Dec 19, 2023

Sharing some observations:
basic_* formats are fixed widths, and strict_* are delimited that allows the latter ones represent ten digits years (no matter).
Then, absent values are responded as Long.MAX/MIN_VAL, to sort them it needs to format longs to strings.
That's why basic_* format fails to represent meaningless years, but strict_* formats go well with it.
Specifying missing works as mentioned above, also we can suggest to switch to strict_* format, or specify null_value
Unfortunately, we can't override date format per query yet.
Also it's worth to clarify this difference between formats in this edge case in the doc.

I need a suggestion to continue work on this.

see https://opensearch.org/docs/latest/field-types/supported-field-types/date/#built-in-formats

@reta
Copy link
Collaborator

reta commented Dec 19, 2023

This interesting, we have a change merged recently #11196 related to missingValues however AFAICT it was merged after this issue had been reported.

@mkhludnev
Copy link
Contributor

mkhludnev commented Dec 20, 2023

@reta, I suppose #11196 test would fail if we add dates with some basic_* format there.
and ... here we go cb0fdd4#diff-09160f52b60c093338a00e225edf277f9104073fe949c0beea458a63f7080fc1

@getsaurabh02
Copy link
Member

@gashutos Could you take a stab and share your thoughts on this?

@gashutos
Copy link
Contributor

This interesting, we have a change merged recently #11196 related to missingValues however AFAICT it was merged after this issue had been reported.

This should get fixed with above PR. @mkhludnev if you get chance to test this on latest version.

@gashutos
Copy link
Contributor

@reta, I suppose #11196 test would fail if we add dates with some basic_* format there.
and ... here we go cb0fdd4#diff-09160f52b60c093338a00e225edf277f9104073fe949c0beea458a63f7080fc1

You already tested, cool let me see what more is required there.

@getsaurabh02
Copy link
Member

@gashutos closing this out as based on search community meeting discussion. Feel free to re-open if you find anything more on this

@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in Search Project Board Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers Search Search query, autocomplete ...etc
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

7 participants