Elasticsearch schema Raw PSC data nested type #225

tiredpixel · 2023-11-23T15:52:47Z

Within Elasticsearch, the raw data for PSC uses ukpsc_company_records, ukpsc_roe_company_records2 indexes. In each of these, documents have a top-level data mapping with nested field type.

I'm not convinced that this is an optimal schema for this data, since it prevents inner object flattening, and makes nested queries necessary for searching the fields contained within the inner object. Whilst there are valid use cases for nested field types, a brief look through the data here seems to indicate that this doesn't hold an array of objects or similar, only a single object itself containing other objects (which isn't directly relevant here since those can have their own types).

Note that this is different to raw data for DK, which uses dk_deltagerperson_records index without a top-level data key or top-level nested field type, and also different to raw data for SK, which uses sk_records index without a top-level data key or top-level nested field type (but with numerous nested field types below, which similarly are likely non-optimal, in contrast to the DK schema).

It's not immediately clear to me why this schema has been chosen, and whether there are any advantages or issues that it's working around. But it should likely be investigated, since it prevents certain types of queries and more efficient exploration of the data, and it might well be simply an oversight.

References #173 , during which investigation this was found.

The text was updated successfully, but these errors were encountered:

tiredpixel mentioned this issue Dec 6, 2023

Elasticsearch schema BODS data nested type #230

Open

27 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elasticsearch schema Raw PSC data nested type #225

Elasticsearch schema Raw PSC data nested type #225

tiredpixel commented Nov 23, 2023 •

edited

Loading

Elasticsearch schema Raw PSC data nested type #225

Elasticsearch schema Raw PSC data nested type #225

Comments

tiredpixel commented Nov 23, 2023 • edited Loading

tiredpixel commented Nov 23, 2023 •

edited

Loading