Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avro backward compatibility #1053

Open
unit7-0 opened this issue May 19, 2021 · 0 comments
Open

Avro backward compatibility #1053

unit7-0 opened this issue May 19, 2021 · 0 comments

Comments

@unit7-0
Copy link
Contributor

unit7-0 commented May 19, 2021

Is your feature request related to a problem? Please describe.
The question is related to backward compatibility support when changing avro schemas.

Is your feature request related to a specific runtime of cloudflow or applicable for all runtimes?
All runtimes.

Describe the solution you'd like
I faced that the cloudflow serialization/deserialization engine does not support backward compatibility for avro messages. I did a some research on the code and found that when creating the codec, only one schema is used, which is presented along with the generated class during compilation(https://github.com/lightbend/cloudflow/blob/master/core/cloudflow-streamlets/src/main/scala/cloudflow/streamlets/avro/AvroCodec.scala#L30). However to maintain backward compatibility in the java avro serialization mechanism, it is necessary to pass two schemas to the SpecificDatumReader - one schema for the message writer, other for the reader(https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificDatumReader.java#L49).
In other words, for now to make changes in the schema, it is necessary for the consumer streamlet to read all the messages in the topic, and only then we can update deployed pipeline version. Are there any plans to support backward compatibility for avro messages in the future releases? Maybe the task overlaps with the plans for introducing the schema registry and this will be implemented within that feature? (#1010, #996)

For now I get an error while deserializing with new schema(compatible) message written in old schema:

Caused by: com.twitter.bijection.InversionFailure: Failed to invert: [B@46271dd6
	at com.twitter.bijection.InversionFailure$$anonfun$partialFailure$1.applyOrElse(InversionFailure.scala:43)
	at com.twitter.bijection.InversionFailure$$anonfun$partialFailure$1.applyOrElse(InversionFailure.scala:42)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
	at scala.util.Failure.recoverWith(Try.scala:236)
	at com.twitter.bijection.Inversion$.attempt(Inversion.scala:32)
	at com.twitter.bijection.avro.BinaryAvroCodec.invert(AvroCodecs.scala:330)
	at com.twitter.bijection.avro.BinaryAvroCodec.invert(AvroCodecs.scala:320)
	at cloudflow.streamlets.avro.AvroSerde.$anonfun$inverted$1(AvroCodec.scala:39)
	at cloudflow.streamlets.avro.AvroSerde.$anonfun$decode$1(AvroCodec.scala:45)
	at scala.util.Try$.apply(Try.scala:213)
	at cloudflow.streamlets.avro.AvroSerde.decode(AvroCodec.scala:45)
	at cloudflow.streamlets.avro.AvroCodec.decode(AvroCodec.scala:34)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant