Avro field names not getting parsed, just values #311
-
Hi and thank you for sharing this code! So far it seems very promising and will reduce a lot of complexity in our pipelines. I have managed to work my way through how to implement this in pyspark (pyspark=>3.3.0,scala=>2.12) and this is my submit:
This is my convert (put the code in separate file so I pass in the SparkContext) My issue is that it doesn't seem to be parsing out the field name, just the values, with the exception of a single field name (Entry_ID):
Example of the matching scheme entries:
Am I missing something simple or is this some type of compatibility issue? Thanks for any help/guidance you can offer! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi @cjlyons81
The field names are converted to Dataframe column names. You can see them by inspecting the schema, e.g. |
Beta Was this translation helpful? Give feedback.
Hi @cjlyons81
Thanks for your interest in our library. I am not sure I understand your issue correctly. What command do you use to arrive at the output?
The field names are converted to Dataframe column names. You can see them by inspecting the schema, e.g.
df2.printSchema
. Also btw, quite often, it is convenient to select the fields directly, instead of keeping the wrapper struct (parsed
in your case). E.g. you can writeval df2 = df.select(from_avro(col("value"), from_avro_abris_settings, sc) as 'data).select("data.*")