Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Reporting] Fix incorrect number of hits being exported from CSV #112406

Closed
wants to merge 9 commits into from

Conversation

jloleysens
Copy link
Contributor

@jloleysens jloleysens commented Sep 16, 2021

Summary

Fix #112164

The following was being reported for subsequent runs of a large CSV export on CI. TL;DR, the CSV row count was random, but always below the expected 4675 total.

Data
Run scan outputs:
------------------
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 171
   │ proc [kibana] results 0
------------------
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 171
   │ proc [kibana] results 0
------------------
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 171
   │ proc [kibana] results 0
-------------------
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 146
   │ proc [kibana] results 0
-------------------
   │ proc [kibana] searchBody {
   │ proc [kibana]   "fields": [
   │ proc [kibana]     {
   │ proc [kibana]       "field": "*",
   │ proc [kibana]       "include_unmapped": "true"
   │ proc [kibana]     },
   │ proc [kibana]     {
   │ proc [kibana]       "field": "customer_birth_date",
   │ proc [kibana]       "format": "strict_date_optional_time"
   │ proc [kibana]     },
   │ proc [kibana]     {
   │ proc [kibana]       "field": "order_date",
   │ proc [kibana]       "format": "strict_date_optional_time"
   │ proc [kibana]     },
   │ proc [kibana]     {
   │ proc [kibana]       "field": "products.created_on",
   │ proc [kibana]       "format": "strict_date_optional_time"
   │ proc [kibana]     }
   │ proc [kibana]   ],
   │ proc [kibana]   "sort": [
   │ proc [kibana]     {
   │ proc [kibana]       "order_date": {
   │ proc [kibana]         "order": "desc",
   │ proc [kibana]         "unmapped_type": "boolean"
   │ proc [kibana]       }
   │ proc [kibana]     }
   │ proc [kibana]   ],
   │ proc [kibana]   "track_total_hits": true,
   │ proc [kibana]   "script_fields": {},
   │ proc [kibana]   "stored_fields": [
   │ proc [kibana]     "*"
   │ proc [kibana]   ],
   │ proc [kibana]   "runtime_mappings": {},
   │ proc [kibana]   "_source": false,
   │ proc [kibana]   "query": {
   │ proc [kibana]     "bool": {
   │ proc [kibana]       "must": [],
   │ proc [kibana]       "filter": [
   │ proc [kibana]         {
   │ proc [kibana]           "range": {
   │ proc [kibana]             "order_date": {
   │ proc [kibana]               "format": "strict_date_optional_time",
   │ proc [kibana]               "gte": "2019-04-27T23:56:51.374Z",
   │ proc [kibana]               "lte": "2019-08-23T16:18:51.821Z"
   │ proc [kibana]             }
   │ proc [kibana]           }
   │ proc [kibana]         }
   │ proc [kibana]       ],
   │ proc [kibana]       "should": [],
   │ proc [kibana]       "must_not": []
   │ proc [kibana]     }
   │ proc [kibana]   }
   │ proc [kibana] }
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 147
   │ proc [kibana] results 0
   │ proc [kibana] this.csvRowCount 4647
-------------------
   │ proc [kibana] searchBody {
   │ proc [kibana]   "fields": [
   │ proc [kibana]     {
   │ proc [kibana]       "field": "*",
   │ proc [kibana]       "include_unmapped": "true"
   │ proc [kibana]     },
   │ proc [kibana]     {
   │ proc [kibana]       "field": "customer_birth_date",
   │ proc [kibana]       "format": "strict_date_optional_time"
   │ proc [kibana]     },
   │ proc [kibana]     {
   │ proc [kibana]       "field": "order_date",
   │ proc [kibana]       "format": "strict_date_optional_time"
   │ proc [kibana]     },
   │ proc [kibana]     {
   │ proc [kibana]       "field": "products.created_on",
   │ proc [kibana]       "format": "strict_date_optional_time"
   │ proc [kibana]     }
   │ proc [kibana]   ],
   │ proc [kibana]   "sort": [
   │ proc [kibana]     {
   │ proc [kibana]       "order_date": {
   │ proc [kibana]         "order": "desc",
   │ proc [kibana]         "unmapped_type": "boolean"
   │ proc [kibana]       }
   │ proc [kibana]     }
   │ proc [kibana]   ],
   │ proc [kibana]   "track_total_hits": true,
   │ proc [kibana]   "script_fields": {},
   │ proc [kibana]   "stored_fields": [
   │ proc [kibana]     "*"
   │ proc [kibana]   ],
   │ proc [kibana]   "runtime_mappings": {},
   │ proc [kibana]   "_source": false,
   │ proc [kibana]   "query": {
   │ proc [kibana]     "bool": {
   │ proc [kibana]       "must": [],
   │ proc [kibana]       "filter": [
   │ proc [kibana]         {
   │ proc [kibana]           "range": {
   │ proc [kibana]             "order_date": {
   │ proc [kibana]               "format": "strict_date_optional_time",
   │ proc [kibana]               "gte": "2019-04-27T23:56:51.374Z",
   │ proc [kibana]               "lte": "2019-08-23T16:18:51.821Z"
   │ proc [kibana]             }
   │ proc [kibana]           }
   │ proc [kibana]         }
   │ proc [kibana]       ],
   │ proc [kibana]       "should": [],
   │ proc [kibana]       "must_not": []
   │ proc [kibana]     }
   │ proc [kibana]   }
   │ proc [kibana] }
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 139
   │ proc [kibana] results 0
----------------
   │ proc [kibana] searchBody {
   │ proc [kibana]   "fields": [
   │ proc [kibana]     {
   │ proc [kibana]       "field": "*",
   │ proc [kibana]       "include_unmapped": "true"
   │ proc [kibana]     },
   │ proc [kibana]     {
   │ proc [kibana]       "field": "customer_birth_date",
   │ proc [kibana]       "format": "strict_date_optional_time"
   │ proc [kibana]     },
   │ proc [kibana]     {
   │ proc [kibana]       "field": "order_date",
   │ proc [kibana]       "format": "strict_date_optional_time"
   │ proc [kibana]     },
   │ proc [kibana]     {
   │ proc [kibana]       "field": "products.created_on",
   │ proc [kibana]       "format": "strict_date_optional_time"
   │ proc [kibana]     }
   │ proc [kibana]   ],
   │ proc [kibana]   "sort": [
   │ proc [kibana]     {
   │ proc [kibana]       "order_date": {
   │ proc [kibana]         "order": "desc",
   │ proc [kibana]         "unmapped_type": "boolean"
   │ proc [kibana]       }
   │ proc [kibana]     }
   │ proc [kibana]   ],
   │ proc [kibana]   "track_total_hits": true,
   │ proc [kibana]   "script_fields": {},
   │ proc [kibana]   "stored_fields": [
   │ proc [kibana]     "*"
   │ proc [kibana]   ],
   │ proc [kibana]   "runtime_mappings": {},
   │ proc [kibana]   "_source": false,
   │ proc [kibana]   "query": {
   │ proc [kibana]     "bool": {
   │ proc [kibana]       "must": [],
   │ proc [kibana]       "filter": [
   │ proc [kibana]         {
   │ proc [kibana]           "range": {
   │ proc [kibana]             "order_date": {
   │ proc [kibana]               "format": "strict_date_optional_time",
   │ proc [kibana]               "gte": "2019-04-27T23:56:51.374Z",
   │ proc [kibana]               "lte": "2019-08-23T16:18:51.821Z"
   │ proc [kibana]             }
   │ proc [kibana]           }
   │ proc [kibana]         }
   │ proc [kibana]       ],
   │ proc [kibana]       "should": [],
   │ proc [kibana]       "must_not": []
   │ proc [kibana]     }
   │ proc [kibana]   }
   │ proc [kibana] }
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 42
   │ proc [kibana] results 0
   │ proc [kibana] this.csvRowCount 454

It appears that this was reproducible only with using the _scroll endpoint. After switching to using point in time per the recommendation in the docs, we are getting consistent CSV row counts again:

   │ proc [kibana] total hits 4675
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 500
   │ proc [kibana] results 175
   │ proc [kibana] this.csvRowCount 4675

The docs indicate that scroll should not be used to span more than 10000 docs, but in this case we were spanning less than half that. We should do an analysis to determine how far back this was introduced as it is likely a result of something in ES changing (still investigating).

How to test locally

(this can be automated by running the functional test generates a report from a new search with data: default)

  1. Set up ES with the data archive from x-pack/test/functional/es_archives/reporting/ecommerce
  2. Start Kibana
  3. Issue the following request:
curl 'http://localhost:5620/api/reporting/generate/csv_searchsource' \
  -H 'Connection: keep-alive' \
  -H 'sec-ch-ua: "Google Chrome";v="93", " Not;A Brand";v="99", "Chromium";v="93"' \
  -H 'Content-Type: application/json' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36' \
  -H 'kbn-version: 7.16.0' \
  -H 'sec-ch-ua-platform: "macOS"' \
  -H 'Accept: */*' \
  -H 'Origin: http://localhost:5620' \
  -H 'Sec-Fetch-Site: same-origin' \
  -H 'Sec-Fetch-Mode: cors' \
  -H 'Sec-Fetch-Dest: empty' \
  -H 'Referer: http://localhost:5620/app/discover?_t=1631716621016' \
  -H 'Accept-Language: en-GB,en-US;q=0.9,en;q=0.8' \
  -u elastic:changeme \
  --data-raw $'{"jobParams":"(browserTimezone:UTC,columns:\u0021(),objectType:search,searchSource:(fields:\u0021((field:\'*\',include_unmapped:true)),filter:\u0021((meta:(field:order_date,index:\'5193f870-d861-11e9-a311-0fa548c5f953\',params:()),range:(order_date:(format:strict_date_optional_time,gte:\'2019-04-04T23:56:51.374Z\',lte:\'2019-08-29T16:18:51.821Z\')))),index:\'5193f870-d861-11e9-a311-0fa548c5f953\',parent:(filter:\u0021(),index:\'5193f870-d861-11e9-a311-0fa548c5f953\',query:(language:kuery,query:\'\')),sort:\u0021((order_date:desc)),trackTotalHits:\u0021t),title:\'Discover search [2021-09-15T14:48:11.140+00:00]\',version:\'7.16.0\')"}' \
  --compressed

Release note

Does this need a public release note?

Checklist

@jloleysens jloleysens added release_note:fix (Deprecated) Feature:Reporting Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead v8.0.0 Team:AppServices v7.16.0 v7.15.1 labels Sep 16, 2021
@jloleysens
Copy link
Contributor Author

@elasticmachine merge upstream

@jloleysens jloleysens added release_note:skip Skip the PR/issue when compiling release notes and removed release_note:fix labels Sep 16, 2021
@tsullivan
Copy link
Member

After switching to using point in time per the recommendation in the docs, we are getting consistent CSV row counts again:

I looked back in the PR branch that added the current CSV export implementation: #88303. In the first commit / original implementation, the plan was to use PIT instead of the ES _scroll API. Unfortunately that plan was scrapped when we had test failures for exporting non-timebased data and unsorted data. I think we should come up with a plan to identify when PIT should be used to export the CSV. My guess is that 99% of the time, using PIT is the "right way" to do it.

@jloleysens
Copy link
Contributor Author

@elasticmachine merge upstream

@jloleysens
Copy link
Contributor Author

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@jloleysens jloleysens closed this Sep 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
(Deprecated) Feature:Reporting Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead release_note:skip Skip the PR/issue when compiling release notes v7.15.1 v7.16.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants