Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot get expression from valueQuantity #1770

Open
lakime opened this issue Oct 20, 2023 · 4 comments
Open

Cannot get expression from valueQuantity #1770

lakime opened this issue Oct 20, 2023 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@lakime
Copy link

lakime commented Oct 20, 2023

Describe the bug
While trying to fetch data from Observation entity - valueQuantity, from data generated via synthea - using databricks - I do receive error
Libraries:
au.csiro.pathling:library-api:6.3.1
latest pathling installed using pypi

IllegalArgumentException: requirement failed: All input types must be the same except nullable, containsNull, valueContainsNull flags. The expression is: if ((NOT instanceof(assertnotnull(input[0, org.hl7.fhir.r4.model.Observation, true]).getValue, class org.hl7.fhir.r4.model.Quantity) OR isnull(objectcast(assertnotnull(input[0, org.hl7.fhir.r4.model.Observation, true]).getValue, ObjectType(class org.hl7.fhir.r4.model.Quantity))))) null else named_struct(value, staticinvoke(class org.apache.spark.sql.types.Decimal, DecimalType(32,6), apply, if (instanceof(assertnotnull(input[0, org.hl7.fhir.r4.model.Observation, true]).getValue, class org.hl7.fhir.r4.model.Quantity)) objectcast(assertnotnull(input[0, org.hl7.fhir.r4.model.Observation, true]).getValue, ObjectType(class org.hl7.fhir.r4.model.Quantity)) else null.getValueElement.getValue, true, true, true)). The input types found are
StructType(StructField(id,StringType,true),StructField(value,DecimalType(32,6),true),StructField(value_scale,IntegerType,true),StructField(comparator,StringType,true),StructField(unit,StringType,true),StructField(system,StringType,true),StructField(code,StringType,true),StructField(_value_canonicalized,StructType(StructField(value,DecimalType(38,0),true),StructField(scale,IntegerType,true)),true),StructField(_code_canonicalized,StringType,true))
StructType(StructField(value,DecimalType(32,6),true)).

If I will remove "valueQuantity" - it works as expected

To Reproduce

Observation - To be checked quantities

observationfhir = json_resources.extract("Observation",
columns=[
exp("id", "Identifier"),
exp("status", "status"),
exp("category.first().coding.first().code", "category"),
exp("code.coding.code", "Observation_Code"),
exp("code.coding.display", "Observation_Name"),
exp("code.text", "Observation_Text"),
exp("subject.reference", "Subject_Reference"),
exp("encounter.reference", "Encounter_Reference"),
exp("valueQuantity.value","Value_Quantity")
]
)

observationfhir = observationfhir.withColumn('source',lit('payorq')).withColumn('sourceFile',lit(today)).withColumn('Value_Quantity', col('Value_Quantity').cast("string"))
display(observationfhir)

Expected behavior
values from FHIR files

@github-project-automation github-project-automation bot moved this to Backlog in Pathling Oct 23, 2023
@johngrimes johngrimes moved this from Backlog to In progress in Pathling Oct 23, 2023
@johngrimes
Copy link
Member

Thanks @lakime, we have reproduced the issue and are working on a fix.

@lakime
Copy link
Author

lakime commented Oct 23, 2023

Probably it is wrong construct - as:

from pathling import PathlingContext, Expression as exp
from pyspark.sql.functions import split, explode, col, lit, expr, cast
from datetime import date

today = date.today()
pc = PathlingContext.create()
ndjson_dir = 'dbfs:/mnt/hda/raw/payorq/landing/'
json_resources = pc.read.ndjson(ndjson_dir)

I am able to fetch the data using sql query

%sql SELECT valueQuantity.value as Value_Quantity_Value, valueQuantity.unit as Value_Quantity_Unit, FROM bronzeraw.observation;

@johngrimes
Copy link
Member

johngrimes commented Oct 27, 2023

Hi @lakime,

We have done a bit of work to figure out what is happening here.

This is essentially caused by a bug in Spark, or an inability of Spark to deal with the expressions that we generate in certain scenarios. We're working on creating a bug report for this.

This behaviour is specific to reading data directly from a raw FHIR source, such as NDJSON or Bundles.

There are two workarounds, the first one is to simply cache the datasets involved in the query before running extract:

pc = PathlingContext.create()
ndjson_dir = 'dbfs:/mnt/hda/raw/payorq/landing/'
json_resources = pc.read.ndjson(ndjson_dir)

json_resources.read('Observation').cache()

observationfhir = json_resources.extract(  #...

The other workaround is to set the configuration parameter spark.sql.optimizer.nestedSchemaPruning.enabled to false:

spark = (
    SparkSession.builder
    .config("spark.sql.optimizer.nestedSchemaPruning.enabled", "false")
    .getOrCreate()
)

pc = PathlingContext.create(spark)

Perhaps you could try this and let us know if this solves your problem?

@piotrszul
Copy link
Collaborator

Reported to spark as a bug: SPARK-45766

@johngrimes johngrimes moved this from In progress to Backlog in Pathling Nov 7, 2023
@johngrimes johngrimes added the bug Something isn't working label Dec 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Backlog
Development

No branches or pull requests

3 participants