Array representations within FHIRPath queries #1759

johngrimes · 2023-10-03T00:52:37Z

This issue represents the work to refactor the FHIRPath query engine to use arrays and array operations to represent FHIRPath expressions. This will have a number of benefits, including the elimination of joins and the simplification of query plans.

This work was initiated within #1438, but we would like to integrate this work back into main without the SQL on FHIR views implementation itself. This will allow us to evaluate this change separately, and control the magnitude of the change.

Outstanding tasks:

Create a new branch (off fhir-views) and a PR for this change
Do a partial implementation on aggregate to determine overall feasibility
Complete implementation of FHIRPath functions, operators and other syntax elements
Reactivate aggregate and extract executors and refactor to work with new implementation
Reactivate test suite and refactor as necessary
Update benchmarking and evaluate relative to main

The text was updated successfully, but these errors were encountered:

johngrimes · 2024-01-11T01:03:52Z

I have found an issue with the implementation on issue/1759, given the following query:

observations = data.view(
    "Observation",
    select=[
        {
            "column": [
                {
                    "description": "Observation ID",
                    "path": "getResourceKey()",
                    "name": "id",
                },
                {
                    "description": "Patient ID",
                    "path": "subject.getReferenceKey()",
                    "name": "patient_id",
                },
                {
                    "description": "Observation date",
                    "path": "effectiveDateTime",
                    "name": "date",
                },
            ],
            "select": [
                {
                    "forEach": "code.coding",
                    "column": [
                        {
                            "description": "Observation code",
                            "path": "code",
                            "name": "code",
                        },
                    ],
                },
                {
                    "forEach": "value.ofType(Quantity)",
                    "column": [
                        {
                            "description": "Total cholesterol unit",
                            "path": "unit",
                            "name": "unit",
                        },
                        {
                            "description": "Total cholesterol value",
                            "path": "value",
                            "name": "value",
                        },
                    ],
                },
            ],
        }
    ],
    where=[
        {
            "description": "Total cholesterol > 240 mg/dL",
            "path": "where(code.coding.where(system = 'http://loinc.org').exists(code = '2093-3')).value.ofType(Quantity) > 240 'mg/dL'",
        }
    ],
)

observations.explain()

I get the following plan:

{'resource': 'Observation', 'select': [{'column': [{'description': 'Observation ID', 'path': 'getResourceKey()', 'name': 'id'}, {'description': 'Patient ID', 'path': 'subject.getReferenceKey()', 'name': 'patient_id'}, {'description': 'Observation date', 'path': 'effectiveDateTime', 'name': 'date'}], 'select': [{'forEach': 'code.coding', 'column': [{'description': 'Observation code', 'path': 'code', 'name': 'code'}]}, {'forEach': 'value.ofType(Quantity)', 'column': [{'description': 'Total cholesterol unit', 'path': 'unit', 'name': 'unit'}, {'description': 'Total cholesterol value', 'path': 'value', 'name': 'value'}]}]}], 'where': [{'description': 'Total cholesterol > 240 mg/dL', 'path': "where(code.coding.where(system = 'http://loinc.org').exists(code = '2093-3')).value.ofType(Quantity) > 240 'mg/dL'"}]}
== Physical Plan ==
Project [Observation#974.id_versioned AS id#832, Observation#974.subject.reference AS patient_id#834, Observation#974.effectiveDateTime AS date#836, @8ehd09#982.code AS code#838, Observation#974.valueQuantity.unit AS unit#840, Observation#974.valueQuantity.value AS value#842]
+- Filter isnotnull(@8ehd09#982)
   +- Generate explode(Observation#974.code.coding), [Observation#974], false, [@8ehd09#982]
      +- Project [struct(id, id#883, id_versioned, id_versioned#884, meta, meta#885, implicitRules, implicitRules#886, language, language#887, text, text#888, identifier, identifier#889, basedOn, basedOn#890, partOf, partOf#891, status, status#892, category, category#893, code, code#894, ... 62 more fields) AS Observation#974]
         +- Filter (isnotnull(valueQuantity#909) AND CASE WHEN CASE WHEN NOT CASE WHEN isnotnull(filter(filter(code#894.coding, lambdafunction((lambda x_0#1015.system = http://loinc.org), lambda x_0#1015, false)), lambdafunction((lambda x_2#1023.code = 2093-3), lambda x_2#1023, false))) THEN (size(filter(filter(code#894.coding, lambdafunction((lambda x_0#1015.system = http://loinc.org), lambda x_0#1015, false)), lambdafunction((lambda x_2#1023.code = 2093-3), lambda x_2#1023, false)), true) = 0) ELSE true END THEN (valueQuantity#909._code_canonicalized = g.m-3) ELSE false END THEN UDF(CASE WHEN NOT CASE WHEN isnotnull(filter(filter(code#894.coding, lambdafunction((lambda x_1#1016.system = http://loinc.org), lambda x_1#1016, false)), lambdafunction((lambda x_3#1024.code = 2093-3), lambda x_3#1024, false))) THEN (size(filter(filter(code#894.coding, lambdafunction((lambda x_1#1016.system = http://loinc.org), lambda x_1#1016, false)), lambdafunction((lambda x_3#1024.code = 2093-3), lambda x_3#1024, false)), true) = 0) ELSE true END THEN valueQuantity#909._value_canonicalized END, [2400,0]) ELSE false END)
            +- FileScan parquet [id#883,id_versioned#884,meta#885,implicitRules#886,language#887,text#888,identifier#889,basedOn#890,partOf#891,status#892,category#893,code#894,subject#895,focus#896,encounter#897,effectiveDateTime#898,effectiveInstant#899,effectivePeriod#900,effectiveTiming#901,issued#902,performer#903,valueBoolean#904,valueCodeableConcept#905,valueDateTime#906,... 19 more fields] Batched: false, DataFilters: [isnotnull(valueQuantity#909), CASE WHEN CASE WHEN NOT CASE WHEN isnotnull(filter(filter(code#894..., Format: Parquet, Location: PreparedDeltaFileIndex(1 paths)[file:/Users/gri306/Library/CloudStorage/OneDrive-CSIRO/Data/synth..., PartitionFilters: [], PushedFilters: [IsNotNull(valueQuantity)], ReadSchema: struct<id:string,id_versioned:string,meta:struct<id:string,versionId:string,versionId_versioned:s...

It would be better if we did not have to retrieve all of the columns for this query, only the ones that are required.

johngrimes · 2024-01-11T01:05:03Z

I have merged main into issue/1759, and updated the version to 6.5.0-SNAPSHOT.

johngrimes · 2024-01-11T01:06:04Z

@piotrszul Would you mind creating a pull request for this change, so that it is easier to follow activity on the branch?

johngrimes added the refactoring Improving the design of existing code label Oct 3, 2023

johngrimes assigned piotrszul Oct 3, 2023

johngrimes added this to Pathling Oct 3, 2023

github-project-automation bot moved this to Backlog in Pathling Oct 3, 2023

johngrimes moved this from Backlog to In progress in Pathling Oct 3, 2023

piotrszul mentioned this issue Jan 16, 2024

SQL on FHIR view runner #1775

Open

johngrimes assigned johngrimes and unassigned piotrszul Jan 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Array representations within FHIRPath queries #1759

Array representations within FHIRPath queries #1759

johngrimes commented Oct 3, 2023 •

edited

Loading

johngrimes commented Jan 11, 2024

johngrimes commented Jan 11, 2024

johngrimes commented Jan 11, 2024

Array representations within FHIRPath queries #1759

Array representations within FHIRPath queries #1759

Comments

johngrimes commented Oct 3, 2023 • edited Loading

johngrimes commented Jan 11, 2024

johngrimes commented Jan 11, 2024

johngrimes commented Jan 11, 2024

johngrimes commented Oct 3, 2023 •

edited

Loading