-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NU-1823] Kafka source without topic schema #7066
Conversation
17b9df4
to
6ec935e
Compare
.../main/scala/pl/touk/nussknacker/engine/schemedkafka/KafkaUniversalComponentTransformer.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comments.
Let's start from some test with scenario runner which shows that the source works in desired configuration.
After that we can figure out whether we can render "content type" field in case of non-schemed topics instead of "version" field.
.../main/scala/pl/touk/nussknacker/engine/schemedkafka/KafkaUniversalComponentTransformer.scala
Outdated
Show resolved
Hide resolved
.../main/scala/pl/touk/nussknacker/engine/schemedkafka/KafkaUniversalComponentTransformer.scala
Outdated
Show resolved
Hide resolved
...a/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/ParsedSchemaSupport.scala
Outdated
Show resolved
Hide resolved
.../main/scala/pl/touk/nussknacker/engine/schemedkafka/KafkaUniversalComponentTransformer.scala
Outdated
Show resolved
Hide resolved
.../main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/SchemaVersionOption.scala
Outdated
Show resolved
Hide resolved
.../main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/SchemaVersionOption.scala
Outdated
Show resolved
Hide resolved
.../main/scala/pl/touk/nussknacker/engine/schemedkafka/KafkaUniversalComponentTransformer.scala
Outdated
Show resolved
Hide resolved
cd3c797
to
4ef3b59
Compare
engine/flink/tests/src/test/scala/pl/touk/nussknacker/defaultmodel/KafkaJsonItSpec.scala
Outdated
Show resolved
Hide resolved
engine/flink/tests/src/test/scala/pl/touk/nussknacker/defaultmodel/KafkaJsonItSpec.scala
Outdated
Show resolved
Hide resolved
engine/flink/tests/src/test/scala/pl/touk/nussknacker/defaultmodel/KafkaJsonItSpec.scala
Show resolved
Hide resolved
engine/flink/tests/src/test/scala/pl/touk/nussknacker/defaultmodel/KafkaJsonItSpec.scala
Outdated
Show resolved
Hide resolved
.../main/scala/pl/touk/nussknacker/engine/schemedkafka/KafkaUniversalComponentTransformer.scala
Outdated
Show resolved
Hide resolved
...nacker/engine/schemedkafka/schemaregistry/formatter/AbstractSchemaBasedRecordFormatter.scala
Outdated
Show resolved
Hide resolved
...nacker/engine/schemedkafka/schemaregistry/formatter/AbstractSchemaBasedRecordFormatter.scala
Outdated
Show resolved
Hide resolved
...uk/nussknacker/engine/schemedkafka/schemaregistry/universal/UniversalKafkaDeserializer.scala
Outdated
Show resolved
Hide resolved
.../main/scala/pl/touk/nussknacker/engine/schemedkafka/source/UniversalKafkaSourceFactory.scala
Outdated
Show resolved
Hide resolved
.../main/scala/pl/touk/nussknacker/engine/schemedkafka/source/UniversalKafkaSourceFactory.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
🧹 Outside diff range and nitpick comments (14)
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/UniversalSchemaPayloadDeserializer.scala (1)
85-89
: Document the plain content type handling behaviorThe special handling for plain content type should be documented to explain when raw bytes are returned versus when deserialization occurs. This aligns with the PR objective of supporting topics without schemas.
+ /** + * Deserializes the payload based on the schema type: + * - For plain content type (no schema): returns raw bytes without transformation + * - For other JSON schemas: uses the schema's deserializer + */ override def deserialize(engine/flink/tests/src/test/scala/pl/touk/nussknacker/defaultmodel/KafkaJsonItSpec.scala (3)
73-74
: Remove unused variableThe
longJsonInHex
variable is defined but never used in the test.- val longJsonInHex = - "227b226669727374223a2022546f6d656b222c20226d6964646c65223a20224a616e222c20226c617374223a20224b6f77616c736b69227d22"
76-82
: Remove debug codeRemove the debug-related code including the unused
BigInteger
conversion and print statements.- val big = new BigInteger(shortJsonInHex, 16).toByteArray - - val str = new String(byteString) - println(str) - println(byteString.mkString("Array(", ", ", ")")) - println(big.mkString("Array(", ", ", ")"))
66-114
: Consider adding validation for message contentWhile the test verifies the byte-level equality, it would be valuable to also validate that the message content is a valid JSON string as expected.
run(process) { val outputRecord = kafkaClient .createConsumer() .consumeWithConsumerRecord(outputTopic) .take(1) .head outputRecord.value() shouldBe byteString + // Verify the content is valid JSON + val outputJson = parser.parse(new String(outputRecord.value())) + outputJson.isRight shouldBe true }utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/source/UniversalKafkaSourceFactory.scala (3)
75-101
: Reduce code duplication in schema handlingThe JSON and plain cases have similar structure. Consider extracting the common logic into a helper method.
private def createRuntimeSchemaData(contentType: ContentTypes, typingResult: TypingResult): (Option[RuntimeSchemaData[ParsedSchema]], TypingResult) = { val schema = contentType match { case ContentTypes.JSON => ContentTypesSchemas.schemaForJson case ContentTypes.PLAIN => ContentTypesSchemas.schemaForPlain } ( Some( RuntimeSchemaData[ParsedSchema]( new NkSerializableParsedSchema[ParsedSchema](schema), Some(SchemaId.fromString(contentType.toString)) ) ), typingResult ) }
97-98
: Address TODO comment about Array[Byte] handlingThe comment indicates incomplete implementation for plain content type handling. Please clarify the implementation timeline or requirements.
Would you like me to help implement the Array[Byte] handling or create a GitHub issue to track this task?
80-80
: Avoid using content type strings as schema IDsUsing content type strings as schema IDs could lead to conflicts if the schema registry is used in the future. Consider using dedicated schema identifiers.
Consider:
- Using a dedicated schema ID generation mechanism
- Adding a prefix to distinguish these special cases (e.g., "internal-json-schema")
- Documenting the schema ID allocation strategy
Also applies to: 93-93
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/UniversalKafkaDeserializer.scala (2)
56-56
: Improve Error Message with Actual Content TypeWhen throwing the
IllegalStateException
, it's helpful to include the actual content type that was received. This provides clearer context for debugging.Apply this diff to enhance the error message:
- throw new IllegalStateException("Topic without schema should have ContentType Json or Plain, was neither") + throw new IllegalStateException(s"Topic without schema should have ContentType JSON or PLAIN, but was [$writerSchemaId]")
Line range hint
61-68
: Add Tests for Mismatched Schema TypesThe current code throws a
MismatchReaderWriterSchemaException
when the reader and writer schema types do not match. However, there is a TODO comment indicating that this case needs testing, especially when supporting JSON schema.Consider adding unit tests to cover scenarios where the reader and writer schema types differ to ensure the deserializer handles these cases correctly. Would you like assistance in creating these test cases?
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/formatter/AbstractSchemaBasedRecordFormatter.scala (1)
120-139
: Consider adding unit tests for schemaless topic handlingIt's important to ensure that the new logic for handling schemaless topics is thoroughly tested. Adding unit tests covering scenarios for JSON and plain content types can help prevent future regressions.
Would you like assistance in creating unit tests for these cases?
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/sink/UniversalKafkaSinkFactory.scala (4)
171-175
: Use named parameters incopy
method to enhance clarityIn line 174, when calling the
copy
method onjsonSchema
, it's advisable to use named parameters to explicitly indicate which field is being modified. This improves code readability and reduces potential errors if the class structure changes in the future.Apply this diff to use named parameters:
- jsonSchema.copy(new NkSerializableParsedSchema[ParsedSchema](ContentTypesSchemas.schemaForPlain)) + jsonSchema.copy( + schema = new NkSerializableParsedSchema[ParsedSchema](ContentTypesSchemas.schemaForPlain) + )
174-174
: Consider defining a separateplainSchema
variable for better clarityReusing
jsonSchema.copy
to represent a plain schema can be confusing. Defining a separate variableplainSchema
enhances code clarity and maintainability.Define
plainSchema
:private val plainSchema = RuntimeSchemaData[ParsedSchema]( new NkSerializableParsedSchema[ParsedSchema](ContentTypesSchemas.schemaForPlain), None )Update the code to use
plainSchema
:val runtimeSchemaData = if (contentType.equals(ContentTypes.JSON.toString)) { jsonSchema } else { - jsonSchema.copy( - schema = new NkSerializableParsedSchema[ParsedSchema](ContentTypesSchemas.schemaForPlain) - ) + plainSchema }
171-172
: Use case-insensitive comparison for content typeWhen comparing strings like
contentType
, it's safer to perform a case-insensitive comparison to handle input variations gracefully.Apply this diff:
- val runtimeSchemaData = if (contentType.equals(ContentTypes.JSON.toString)) { + val runtimeSchemaData = if (contentType.equalsIgnoreCase(ContentTypes.JSON.toString)) {
254-255
: Use case-insensitive comparison forcontentType
invalueEditorParamStep
Ensure that the comparison for
contentType
is case-insensitive to handle different input cases.Apply this diff:
- (`contentTypeParamName`, DefinedEagerParameter(contentType: String, _)) :: + (`contentTypeParamName`, DefinedEagerParameter(contentTypeRaw: String, _)) :: ... - if (contentType.equals(ContentTypes.JSON.toString)) { + val contentType = contentTypeRaw.toUpperCase + if (contentType == ContentTypes.JSON.toString) {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (11)
engine/flink/tests/src/test/scala/pl/touk/nussknacker/defaultmodel/KafkaJsonItSpec.scala
(1 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/KafkaUniversalComponentTransformer.scala
(6 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/ContentTypes.scala
(1 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/SchemaRegistryClient.scala
(2 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/formatter/AbstractSchemaBasedRecordFormatter.scala
(2 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/ParsedSchemaSupport.scala
(2 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/RecordFormatterSupport.scala
(2 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/UniversalKafkaDeserializer.scala
(2 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/UniversalSchemaPayloadDeserializer.scala
(2 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/sink/UniversalKafkaSinkFactory.scala
(6 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/source/UniversalKafkaSourceFactory.scala
(2 hunks)
🔇 Additional comments (17)
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/ContentTypes.scala (3)
1-4
: LGTM: Package structure and imports are well-organized
The package hierarchy and import statement are appropriate for schema registry functionality.
11-14
: Add documentation and consider schema validation implications
The current implementation uses extremely permissive schemas (empty object/string) which effectively bypass schema validation. While this aligns with the PR's goal of handling schema-less topics, it has several implications:
- There's no documentation explaining when these schemas should be used
- The empty schemas might mask potential data quality issues
- It's unclear how this relates to the metadata handling mentioned in the PR objectives
Let's check if these schemas are used appropriately:
#!/bin/bash
# Search for usages of these schemas to understand the context
rg -l "ContentTypesSchemas" | xargs rg "schemaFor(Json|Plain)"
Consider:
- Adding comprehensive documentation explaining the use cases and implications
- Implementing basic structural validation even for schema-less topics
- Adding logging when these permissive schemas are used
1-14
: Verify handling of hermes metadata
The PR objectives mention issues with hermes metadata handling, but this implementation doesn't explicitly address how metadata fields will be handled with these content types.
Let's check how metadata is handled:
✅ Verification successful
Let me gather more specific information about metadata handling.
Let me check one more specific aspect of the implementation to understand how the content types interact with message handling.
Based on the gathered information, I can now provide the final response:
Implementation correctly handles metadata through content type distinction
The implementation properly addresses metadata handling through the content type system:
- Plain and JSON content types are handled separately in UniversalKafkaDeserializer with appropriate schema assignments
- The deserializer implementation maintains metadata through SchemaWithMetadata wrapper
- The system correctly handles both schema-based and schema-less messages with proper content type detection
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for hermes metadata handling in the codebase
rg -A 5 "hermes.*metadata"
# Look for any message transformation logic
ast-grep --pattern 'def transform$_($$$) {
$$$
}'
Length of output: 2630
Script:
#!/bin/bash
# Look for metadata handling in UniversalKafkaDeserializer
rg -A 10 "SchemaWithMetadata" utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/UniversalKafkaDeserializer.scala
# Check for any specific hermes metadata handling
rg -A 5 "hermes" utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/
# Look for metadata-related schema definitions
ast-grep --pattern 'case class $_Metadata($$$)'
Length of output: 3538
Script:
#!/bin/bash
# Check UniversalSchemaPayloadDeserializer for message handling
rg -A 10 "deserialize" utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/UniversalSchemaPayloadDeserializer.scala
# Look for any schema-less topic handling
rg -A 5 "schemaless" utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/
Length of output: 2433
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/SchemaRegistryClient.scala (1)
43-46
: 🛠️ Refactor suggestion
Consider improving error handling and performance
The implementation has several areas that could be enhanced:
- Error Handling: The method silently handles potential errors from
strategy.getTopics
. Consider propagating errors to allow proper handling by callers. - Performance: Fetching all topics for each check could be inefficient. Consider caching results if this method is called frequently.
- Input Validation: Add parameter validation for null safety.
Consider this improved implementation:
- def isTopicWithSchema(topic: String, strategy: TopicSelectionStrategy): Boolean = {
- val topicsWithSchema = strategy.getTopics(this)
- topicsWithSchema.exists(_.map(_.name).contains(topic))
- }
+ def isTopicWithSchema(topic: String, strategy: TopicSelectionStrategy): Validated[SchemaRegistryError, Boolean] = {
+ require(topic != null, "Topic cannot be null")
+ require(strategy != null, "Strategy cannot be null")
+ strategy.getTopics(this).map(topics => topics.exists(_.name == topic))
+ }
Let's verify the usage of this method to understand the impact:
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/UniversalSchemaPayloadDeserializer.scala (2)
10-10
: LGTM: Import statement is correctly placed
The new import for ContentTypesSchemas
is appropriately placed with other schema-related imports and is necessary for the plain content type handling.
85-89
: Verify plain content type handling across the codebase
The introduction of special handling for plain content type may affect other components that consume the deserialized data. We should verify that consumers can handle raw byte arrays.
✅ Verification successful
The previous search didn't yield results because we were looking for the wrong class name. Let's broaden our search to understand the content type handling and byte array processing in the codebase.
Based on the search results, I can now make a determination about the plain content type handling. The codebase shows that:
- The plain content type handling is properly integrated in the UniversalSchemaPayloadDeserializer with
ContentTypesSchemas.schemaForPlain
- The byte array handling is widely supported throughout the codebase:
- Kafka sources and sinks work with
Array[Byte]
as their base type - The deserialization/serialization infrastructure has proper byte array support
- The Lite engine and other components handle raw byte arrays correctly
- Kafka sources and sinks work with
Raw byte array handling is properly supported across the codebase
The special handling for plain content type is consistent with the overall architecture where raw byte arrays are a fundamental data type. Consumers are already equipped to handle raw byte arrays as this is the base type for Kafka integration.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for places where JsonSchemaPayloadDeserializer.deserialize is used
rg -l "JsonSchemaPayloadDeserializer.*deserialize" | while read -r file; do
echo "=== $file ==="
rg -A 5 "JsonSchemaPayloadDeserializer.*deserialize" "$file"
done
Length of output: 67
Script:
#!/bin/bash
# Let's search for any usage of UniversalSchemaPayloadDeserializer and content type handling
rg -A 5 "ContentTypesSchemas.schemaForPlain|UniversalSchemaPayloadDeserializer"
# Also search for byte array handling in the codebase
rg -A 5 "Array\[Byte\]|deserialize.*bytes"
Length of output: 235407
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/RecordFormatterSupport.scala (2)
8-8
: LGTM: Import statement is correctly updated
The addition of ContentTypesSchemas
to the import statement is necessary for the schema-less topic handling functionality.
53-59
: Verify string content validation requirements
The implementation allows any string content when no schema is provided. Based on the PR objectives, this aligns with the requirement to treat input as Unknown
. However, please verify if any basic validation or sanitization is needed for security or data quality purposes.
Run the following script to check for any validation patterns in the codebase:
✅ Verification successful
Let me analyze the findings and generate additional verification scripts to better understand the validation context.
Let me generate one final verification to check the schema validation context:
Based on the analysis of the codebase, I can now generate the final response:
String content validation is appropriately handled through schema validation
The implementation is correct as-is. When no schema is provided, treating input as raw bytes is the intended behavior. For cases with schemas:
- JSON content is validated against
ContentTypesSchemas.schemaForJson
- Plain text content is validated against
ContentTypesSchemas.schemaForPlain
- Schema validation is enforced through
UniversalSchemaValidator
and type-specific validators (AvroSchemaValidator, JsonSchemaValidator)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for string validation patterns in Kafka-related files
rg -t scala "validate.*String|sanitize.*String"
# Search for similar message handling patterns
ast-grep --pattern 'case j if j.isString => $$$'
Length of output: 12377
Script:
#!/bin/bash
# Check for schema validation patterns in the relevant files
rg -A 5 "schemaOpt match" utils/schemed-kafka-components-utils/
# Look for any string content validation in the schema registry related files
rg -t scala "validate.*content|validate.*message" utils/schemed-kafka-components-utils/
# Check the parent class/trait for any validation requirements
ast-grep --pattern 'trait RecordFormatterSupport {
$$$
}'
Length of output: 2530
Script:
#!/bin/bash
# Check how schema validation is handled in the broader context
rg -t scala -B 5 -A 10 "ParsedSchema" utils/schemed-kafka-components-utils/
# Look for any content validation in the parent directories
rg -t scala "validate.*ParsedSchema|validate.*Schema" utils/
Length of output: 294921
engine/flink/tests/src/test/scala/pl/touk/nussknacker/defaultmodel/KafkaJsonItSpec.scala (1)
20-64
: LGTM! Clean and focused test implementation
The JSON round-trip test is well-structured and follows best practices:
- Clear test data setup
- Proper topic naming
- Explicit content type specification
- Robust error handling in JSON parsing
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/ParsedSchemaSupport.scala (2)
34-34
: LGTM!
The import is necessary for the Try-based error handling in the formValueEncoder implementation.
161-176
:
Several issues need to be addressed in the formValueEncoder implementation.
- The code uses unsafe tuple accessors (_1, _2) which should be replaced with pattern matching as suggested in previous reviews.
- The type casting with asInstanceOf is unsafe and should include better error handling.
- Using "Failure" as a value is incorrect - it should be a Failure instance.
- The special case handling for "Value" key needs better documentation.
Here's a suggested refactoring that addresses these issues:
(value: Any) => {
- // In ad-hoc test without schema we create object `{ "Value" = userInputInAdHoc }`, so if present we should just take the input
- Try {
- val temp = value.asInstanceOf[Map[String, Map[String, Any]]].head
- val key = temp._1
- // Any try to create a variable with value temp._2 fails
- if (key.equals("Value")) {
- temp._2
- } else Failure
- } match {
- // For normal usage
- case Failure(_) => encoder.encodeOrError(value, rawSchema)
- // If source with topic without schema
- case Success(objectInside) => encoder.encodeOrError(objectInside, rawSchema)
- }
+ /**
+ * Special handling for ad-hoc testing without schema:
+ * When testing topics without schema, the input is wrapped in {"Value": actualMessage}.
+ * This unwraps such messages to ensure consistent behavior with direct topic publishing.
+ */
+ value match {
+ case map: Map[String, Map[String, Any]] =>
+ map.headOption.flatMap {
+ case ("Value", innerMap) => Some(innerMap)
+ case _ => None
+ }.map(encoder.encodeOrError(_, rawSchema))
+ .getOrElse(encoder.encodeOrError(value, rawSchema))
+ case _ => encoder.encodeOrError(value, rawSchema)
+ }
}
The refactored version:
- Uses pattern matching instead of unsafe casting
- Provides clear documentation for the special case
- Handles the "Value" key case more safely
- Removes the incorrect usage of Failure as a value
- Simplifies the control flow
Let's verify if this pattern is used consistently across the codebase:
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/source/UniversalKafkaSourceFactory.scala (1)
9-9
: LGTM: Import for serializable schema wrapper
The added import for NkSerializableParsedSchema
is appropriate for making schemas serializable, which is necessary for Flink's distributed execution.
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/formatter/AbstractSchemaBasedRecordFormatter.scala (2)
115-119
: Logic for handling topics with schemas looks good
The implementation correctly processes topics that have associated schemas.
120-139
: Proper handling of schemaless topics based on content type
The code appropriately handles schemaless topics by checking the content type and processing JSON and plain text messages accordingly.
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/KafkaUniversalComponentTransformer.scala (3)
39-39
: Addition of contentTypeParamName
aligns with existing conventions
The introduction of contentTypeParamName
as a constant maintains consistency with the parameter naming strategy used throughout the transformer, facilitating clear and maintainable code.
117-146
: Effective handling of schema-less topics in getVersionOrContentTypeParam
The getVersionOrContentTypeParam
method correctly differentiates between topics with and without schemas. By providing content type options when a schema is absent, it enhances the component's flexibility and user experience.
229-235
: Seamless integration of content type parameter in schemaParamStep
The updates in schemaParamStep
method efficiently incorporate getVersionOrContentTypeParam
, ensuring that the appropriate parameters are generated based on the topic's schema presence without disrupting the existing flow.
...ils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/ContentTypes.scala
Show resolved
Hide resolved
...nacker/engine/schemedkafka/schemaregistry/universal/UniversalSchemaPayloadDeserializer.scala
Outdated
Show resolved
Hide resolved
...l/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/RecordFormatterSupport.scala
Outdated
Show resolved
Hide resolved
engine/flink/tests/src/test/scala/pl/touk/nussknacker/defaultmodel/KafkaJsonItSpec.scala
Outdated
Show resolved
Hide resolved
.../main/scala/pl/touk/nussknacker/engine/schemedkafka/source/UniversalKafkaSourceFactory.scala
Show resolved
Hide resolved
.../main/scala/pl/touk/nussknacker/engine/schemedkafka/source/UniversalKafkaSourceFactory.scala
Outdated
Show resolved
Hide resolved
...uk/nussknacker/engine/schemedkafka/schemaregistry/universal/UniversalKafkaDeserializer.scala
Outdated
Show resolved
Hide resolved
.../src/main/scala/pl/touk/nussknacker/engine/schemedkafka/sink/UniversalKafkaSinkFactory.scala
Outdated
Show resolved
Hide resolved
.../src/main/scala/pl/touk/nussknacker/engine/schemedkafka/sink/UniversalKafkaSinkFactory.scala
Outdated
Show resolved
Hide resolved
created: #7205 |
.../main/scala/pl/touk/nussknacker/engine/schemedkafka/KafkaUniversalComponentTransformer.scala
Outdated
Show resolved
Hide resolved
.../main/scala/pl/touk/nussknacker/engine/schemedkafka/KafkaUniversalComponentTransformer.scala
Outdated
Show resolved
Hide resolved
designer/server/src/main/scala/pl/touk/nussknacker/ui/config/FeatureTogglesConfig.scala
Outdated
Show resolved
Hide resolved
...a/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/ParsedSchemaSupport.scala
Outdated
Show resolved
Hide resolved
.../main/scala/pl/touk/nussknacker/engine/schemedkafka/KafkaUniversalComponentTransformer.scala
Outdated
Show resolved
Hide resolved
📝 Walkthrough📝 WalkthroughWalkthroughThe pull request introduces extensive modifications across several files to enhance the integration of Nussknacker with Kafka and Schema Registries. Key changes include a revised documentation file ( In the testing framework, new integration tests for JSON and plain text message processing are introduced, while existing topic selection strategies are updated to include checks for subject existence. The Additionally, enhancements to the Possibly related PRs
Suggested labels
Suggested reviewers
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (4)
🚧 Files skipped from review as they are similar to previous changes (4)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
🧹 Outside diff range and nitpick comments (15)
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/ContentTypes.scala (1)
11-14
: Add documentation and consider validation strategyWhile the empty schemas align with handling schemaless topics, please consider:
- Adding scaladoc to explain the purpose and usage of these schemas
- Documenting the implications of using empty schemas
- Clarifying if any validation should be performed even for schemaless topics
Here's a suggested improvement:
object ContentTypesSchemas { + /** Schema for JSON content type. + * Used when no schema is available in the registry. + * Note: This allows any valid JSON without validation. + */ val schemaForJson: OpenAPIJsonSchema = OpenAPIJsonSchema("{}") + + /** Schema for plain text content type. + * Used when no schema is available in the registry. + * Note: This allows any string content without validation. + */ val schemaForPlain: OpenAPIJsonSchema = OpenAPIJsonSchema("") }engine/flink/schemed-kafka-components-utils/src/test/scala/pl/touk/nussknacker/engine/schemedkafka/source/flink/TopicSelectionStrategySpec.scala (1)
56-57
: Add documentation for schema handling behaviorThe example demonstrates strategy override but lacks documentation about schema handling behavior.
Consider adding comments to explain:
- How topics without schemas are handled
- When to use this strategy vs TopicsWithExistingSubjectSelectionStrategy
- Impact of showTopicsWithoutSchema configuration
test("show how to override topic selection strategy") { + // Example: Configure source factory to handle topics with pattern "test-.*" + // Topics without schemas will be treated as Unknown type based on showTopicsWithoutSchema setting new UniversalKafkaSourceFactory( schemaRegistryClientFactory, UniversalSchemaBasedSerdeProvider.create(schemaRegistryClientFactory), testModelDependencies, new FlinkKafkaSourceImplFactory(None) ) { override def topicSelectionStrategy = new TopicsMatchingPatternWithExistingSubjectsSelectionStrategy(Pattern.compile("test-.*")) } }engine/flink/tests/src/test/scala/pl/touk/nussknacker/defaultmodel/KafkaJsonItSpec.scala (3)
18-22
: Consider adding more diverse test casesWhile the current test data is valid, consider adding test cases that cover:
- Nested JSON structures
- Arrays
- Different data types (numbers, booleans, nulls)
- Special characters in strings
51-51
: Document the rationale for using lax validation modeThe choice of
ValidationMode.lax
should be documented to explain why strict validation isn't used and what implications this has for the test.
97-105
: Enhance error handling in test assertionsThe test should handle parsing errors more gracefully and provide better error messages. Consider:
- Adding timeout to the consumer
- Adding descriptive error messages for parsing failures
run(process) { - val outputRecord = kafkaClient.createConsumer().consumeWithConsumerRecord(outputTopic).take(1).head + val outputRecord = kafkaClient.createConsumer() + .consumeWithConsumerRecordWithin(outputTopic, timeout = 10.seconds) + .headOption + .getOrElse(fail("No message received within timeout")) val parsedOutput = parser .parse(new String(outputRecord.value(), StandardCharsets.UTF_8)) - .fold(throw _, identity) + .fold( + error => fail(s"Failed to parse output as JSON: $error"), + identity + ) parsedOutput shouldBe jsonRecord }utils/kafka-utils/src/main/scala/pl/touk/nussknacker/engine/kafka/KafkaConfig.scala (2)
32-33
: Add ScalaDoc for the new configuration parameter.The new
showTopicsWithoutSchema
parameter would benefit from documentation explaining its purpose and impact. Consider adding ScalaDoc to clarify when this should be enabled/disabled.Add documentation above the parameter:
sinkDeliveryGuarantee: Option[SinkDeliveryGuarantee.Value] = None, + /** Controls visibility of Kafka topics that don't have an associated schema in the schema registry. + * When true (default), all non-hidden topics are shown. + * When false, only topics with existing schemas are displayed. + */ showTopicsWithoutSchema: Boolean = true
32-33
: Add changelog entry for the new configuration.Since this introduces a new configuration capability, it should be documented in the changelog to help users understand and adopt the feature.
Consider adding an entry like:
### Added - New configuration option `showTopicsWithoutSchema` in `KafkaConfig` to control visibility of Kafka topics without schema registry entriesutils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/ParsedSchemaSupport.scala (1)
159-176
: Document the ad-hoc testing behaviorThe code includes special handling for ad-hoc testing scenarios, but this behavior is only documented in an inline comment.
Add a proper ScalaDoc comment explaining this special case:
/** Encodes the input value according to the given schema. * * For ad-hoc testing without schema, the input is expected to be in the format: * {{{ * Map("Value" -> actualValue) * }}} * In such cases, only the inner value is encoded. Otherwise, the entire value * is encoded normally. * * @param schema The schema to encode against * @param mode The validation mode to use * @return A function that encodes values according to the schema */docs/integration/KafkaIntegration.md (1)
207-207
: LGTM! Consider adding migration guide entry.The new configuration option
showTopicsWithoutSchema
is well documented and aligns perfectly with the PR objective of allowing users to select Kafka topics without schema. The default value oftrue
ensures backward compatibility.Consider adding an entry to the migration guide since this introduces a behavioral change in topic visibility. This will help users understand:
- What changed in topic visibility
- How to restore previous behavior by setting
showTopicsWithoutSchema: false
- Impact on existing scenarios
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/TopicSelectionStrategy.scala (1)
44-59
: Consider logging the caughtKafkaException
for better observabilityCurrently, when a
KafkaException
occurs while listing topics from Kafka, the exception is swallowed, and an empty list is returned. Logging this exception can aid in debugging and provide visibility into potential issues during runtime.You might consider modifying the catch block to log the exception:
} catch { // In some tests we pass dummy kafka address, so when we try to get topics from kafka it fails - case _: KafkaException => + case e: KafkaException => + logger.warn("Failed to list topics from Kafka", e) List.empty }utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/UniversalKafkaDeserializer.scala (1)
56-56
: Enhance Exception Message for Better DebuggingThe exception message on line 56 can be made more informative by including the unexpected content type value. This will aid in debugging by specifying the actual value encountered.
Apply this diff to improve the exception message:
- throw new IllegalStateException("Topic without schema should have ContentType Json or Plain, was neither") + throw new IllegalStateException(s"Topic without schema should have ContentType Json or Plain, but was '${value}'")utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/formatter/AbstractSchemaBasedRecordFormatter.scala (1)
136-138
: Enhance exception messages with detailed contextConsider enhancing the exception messages to include details about the unexpected content type or schema ID. This additional information can aid in debugging by providing more context about the error.
Apply this diff to improve the exception messages:
- throw new IllegalStateException("Schemaless topic should have json or plain content type, got neither") + throw new IllegalStateException(s"Schemaless topic should have JSON or plain content type, got: $$contentType")and
- throw new IllegalStateException("Schemaless topic should have json or plain content type, got neither") + throw new IllegalStateException(s"Schemaless topic should have valid content type, got: $${record.valueSchemaId.getOrElse("None")}")utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/KafkaUniversalComponentTransformer.scala (1)
103-134
: Address the TODO: EnablePLAIN
content type when readyThere is a TODO comment indicating that the
PLAIN
content type is not working correctly. It's important to resolve this before merging or create a follow-up task to enable thePLAIN
content type when it's fully supported. Including commented-out code can lead to confusion and should be avoided in the main codebase.Would you like assistance in resolving the issue with the
PLAIN
content type or in creating a task to track this work?utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/sink/UniversalKafkaSinkFactory.scala (2)
365-366
: UseequalsIgnoreCase
forcontentType
comparisonsCurrently, the comparison of
contentType
strings is case-sensitive. To improve robustness and handle different cases of input, consider usingequalsIgnoreCase
.Apply this diff:
private def runtimeSchemaDataForContentType(contentType: String): RuntimeSchemaData[ParsedSchema] = { - if (contentType.equals(ContentTypes.JSON.toString)) { jsonSchema } - else if (contentType.equals(ContentTypes.PLAIN.toString)) { plainSchema } + if (contentType.equalsIgnoreCase(ContentTypes.JSON.toString)) { jsonSchema } + else if (contentType.equalsIgnoreCase(ContentTypes.PLAIN.toString)) { plainSchema } else { throw new IllegalStateException("Content Type should be JSON or PLAIN, is neither") } }
367-367
: ThrowIllegalArgumentException
for invalidcontentType
When an invalid argument is provided, it's more appropriate to throw an
IllegalArgumentException
instead of anIllegalStateException
. Additionally, including the invalid value in the exception message can aid in debugging.Apply this diff:
else { - throw new IllegalStateException("Content Type should be JSON or PLAIN, is neither") + throw new IllegalArgumentException(s"Invalid content type: $contentType. Content type should be JSON or PLAIN.") }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (13)
docs/integration/KafkaIntegration.md
(1 hunks)engine/flink/schemed-kafka-components-utils/src/test/scala/pl/touk/nussknacker/engine/schemedkafka/source/flink/TopicSelectionStrategySpec.scala
(4 hunks)engine/flink/tests/src/test/scala/pl/touk/nussknacker/defaultmodel/KafkaJsonItSpec.scala
(1 hunks)utils/kafka-utils/src/main/scala/pl/touk/nussknacker/engine/kafka/KafkaConfig.scala
(1 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/KafkaUniversalComponentTransformer.scala
(6 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/TopicSelectionStrategy.scala
(1 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/ContentTypes.scala
(1 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/SchemaRegistryClient.scala
(2 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/formatter/AbstractSchemaBasedRecordFormatter.scala
(2 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/ParsedSchemaSupport.scala
(1 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/UniversalKafkaDeserializer.scala
(2 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/sink/UniversalKafkaSinkFactory.scala
(7 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/source/UniversalKafkaSourceFactory.scala
(3 hunks)
🔇 Additional comments (19)
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/ContentTypes.scala (2)
1-4
: LGTM!
The package structure and import are well-organized and specific.
5-9
: Previous review comment about sealed trait pattern is still applicable.
Let's verify the usage of ContentTypes to understand the impact of the suggested refactor:
#!/bin/bash
# Search for usages of ContentTypes to assess refactoring impact
rg -A 2 "ContentTypes\."
engine/flink/schemed-kafka-components-utils/src/test/scala/pl/touk/nussknacker/engine/schemedkafka/source/flink/TopicSelectionStrategySpec.scala (1)
9-12
: Verify complete migration to new topic selection strategies
The import changes suggest a shift towards strategies that explicitly handle existing subjects. This aligns with the PR objective of supporting topics without schemas.
✅ Verification successful
Migration to new topic selection strategies is complete
The verification shows that the old strategy classes (AllTopicsSelectionStrategy
and TopicPatternSelectionStrategy
) have been completely removed from the codebase, and the new strategies (TopicsWithExistingSubjectSelectionStrategy
and TopicsMatchingPatternWithExistingSubjectsSelectionStrategy
) are consistently used across all relevant files. The migration appears to be thorough and complete.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify that old strategy classes are fully replaced
# Expected: No occurrences of old strategy classes
# Search for old strategy class usages
rg "AllTopicsSelectionStrategy|TopicPatternSelectionStrategy" --type scala
# Search for new strategy class usages to confirm migration
rg "TopicsWithExistingSubjectSelectionStrategy|TopicsMatchingPatternWithExistingSubjectsSelectionStrategy" --type scala
Length of output: 3770
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/SchemaRegistryClient.scala (1)
5-10
: LGTM: Import changes are well-organized
The new imports are properly organized and necessary for the added functionality.
utils/kafka-utils/src/main/scala/pl/touk/nussknacker/engine/kafka/KafkaConfig.scala (1)
32-33
: Verify the parameter usage across the codebase.
The addition of this parameter needs to be reflected in configuration files and documentation.
✅ Verification successful
Let me analyze the search results to verify the parameter usage. I can see that the parameter showTopicsWithoutSchema
is part of KafkaConfig
class and is used in the codebase. Let me search for its actual usage to determine if configuration files need updating.
Based on the search results, I can now provide a final response:
No configuration updates needed for this parameter
The showTopicsWithoutSchema
parameter is already well-integrated in the codebase:
- It has a default value of
true
in theKafkaConfig
class - It's documented in
docs/integration/KafkaIntegration.md
with clear description - The parameter is used in
KafkaUniversalComponentTransformer
to control topic selection strategy - It's an optional configuration that works with the default value, and users can override it if needed through the standard Kafka configuration mechanism
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for potential configuration files and documentation that might need updating
echo "Searching for configuration files..."
fd -e conf -e yaml -e properties --exec rg -l "kafka"
echo "Searching for documentation files..."
fd -e md --exec rg -l "kafka|schema"
echo "Searching for test configurations..."
rg "KafkaConfig" -A 5
Length of output: 105643
Script:
#!/bin/bash
# Search for usage of showTopicsWithoutSchema parameter
rg "showTopicsWithoutSchema" -A 3
# Search for kafka configuration in config files
rg "kafka" -A 5 ./nussknacker-dist/src/universal/conf/application.conf
Length of output: 2788
docs/integration/KafkaIntegration.md (1)
Line range hint 1-207
: Documentation is comprehensive and well-structured!
The documentation provides excellent coverage of Kafka integration, including:
- Clear prerequisites
- Detailed concepts
- Configuration options
- Authentication setup
- Schema handling
- Exception handling
The new changes integrate seamlessly with the existing content.
🧰 Tools
🪛 LanguageTool
[grammar] ~205-~205: The noun should probably be in the singular form.
Context: ... | Maximum entries size for each caches: available schemas cache and parsed sch...
(EVERY_EACH_SINGULAR)
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/UniversalKafkaDeserializer.scala (2)
8-17
: Imports are Correct and Necessary
The added imports are appropriate for the new logic introduced and ensure all dependencies are properly referenced.
48-57
: Duplicate Issue: Incorrect Pattern Matching on writerSchemaId
As previously noted, pattern matching on writerSchemaId.value
is incorrect because writerSchemaId.value
is likely a String
, whereas StringSchemaId
is a case class expecting to match against writerSchemaId
itself. This mismatch will result in a match failure at runtime.
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/formatter/AbstractSchemaBasedRecordFormatter.scala (2)
115-122
: Correct handling of topics with schemas
The code correctly checks if the topic has a schema and processes the value accordingly. The use of schemaRegistryClient.isTopicWithSchema
ensures that messages are serialized properly when a schema is present.
124-143
: Robust handling of schemaless topics with JSON or PLAIN content types
The added logic accurately handles topics without schemas by checking the content type and processing JSON and plain text formats appropriately. This ensures that messages from schemaless topics are correctly serialized.
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/KafkaUniversalComponentTransformer.scala (5)
29-29
: Consistent addition of contentTypeParamName
constant
The addition of contentTypeParamName
aligns with the existing parameter naming conventions and enhances readability.
50-54
: Ensure showTopicsWithoutSchema
is properly documented
The use of kafkaConfig.showTopicsWithoutSchema
to determine the topic selection strategy is appropriate. Please ensure that the showTopicsWithoutSchema
configuration parameter is added to KafkaConfig
and thoroughly documented in the configuration documentation.
70-70
: Appropriate usage of kafkaConfig
in getTopics
method
Passing kafkaConfig
to topicSelectionStrategy.getTopics
ensures that the strategy has access to necessary configuration parameters. This modification enhances flexibility and is correctly implemented.
217-223
: Correct integration of versionOrContentTypeParam
in schemaParamStep
The integration of versionOrContentTypeParam
into the parameters and error handling ensures that the component accurately accounts for both version and content type based on the topic's schema presence. This implementation is sound and aligns with the intended functionality.
238-239
: Addition of contentTypeParamName
maintains consistency
Adding contentTypeParamName
as a protected lazy val maintains consistency with other parameter names in the class, enhancing maintainability.
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/source/UniversalKafkaSourceFactory.scala (4)
9-9
: Import is necessary and appropriate
The import of NkSerializableParsedSchema
is required for the creation of RuntimeSchemaData
instances and is correctly added.
47-47
: Constructor modification is appropriate
The addition of implProvider
to the constructor parameters is appropriate, as it is utilized within the implementation
method to create the Kafka source.
74-74
:
Avoid potential NullPointerException by swapping operands in equals
Using contentType.equals(...)
can lead to a NullPointerException
if contentType
is null
. Swap the operands to prevent this issue.
This concern was previously raised; please address it to ensure robustness.
74-100
: 🛠️ Refactor suggestion
Refactor duplicated schema creation logic
The code for creating RuntimeSchemaData
is duplicated between the JSON and PLAIN cases. Consider extracting this logic into a helper method to improve maintainability and reduce duplication.
This suggestion was previously provided and is still applicable. Refactoring will enhance the code's clarity and ease future modifications.
.../scala/pl/touk/nussknacker/engine/schemedkafka/source/flink/TopicSelectionStrategySpec.scala
Show resolved
Hide resolved
.../scala/pl/touk/nussknacker/engine/schemedkafka/source/flink/TopicSelectionStrategySpec.scala
Show resolved
Hide resolved
...main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/SchemaRegistryClient.scala
Show resolved
Hide resolved
engine/flink/tests/src/test/scala/pl/touk/nussknacker/defaultmodel/KafkaJsonItSpec.scala
Show resolved
Hide resolved
engine/flink/tests/src/test/scala/pl/touk/nussknacker/defaultmodel/KafkaJsonItSpec.scala
Show resolved
Hide resolved
...a/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/ParsedSchemaSupport.scala
Outdated
Show resolved
Hide resolved
...a/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/ParsedSchemaSupport.scala
Outdated
Show resolved
Hide resolved
.../main/scala/pl/touk/nussknacker/engine/schemedkafka/source/UniversalKafkaSourceFactory.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (3)
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/TopicSelectionStrategy.scala (3)
21-30
: Consider documenting unused parameterThe
kafkaConfig
parameter is not used in this implementation. Consider adding a comment explaining why, or if this is intentional.class TopicsWithExistingSubjectSelectionStrategy extends TopicSelectionStrategy { + // kafkaConfig not used as this strategy only fetches topics from schema registry override def getTopics( schemaRegistryClient: SchemaRegistryClient, kafkaConfig: KafkaConfig ): Validated[SchemaRegistryError, List[UnspecializedTopicName]] = { schemaRegistryClient.getAllTopics } }
32-60
: Good implementation with a few suggestions for improvementThe implementation successfully combines topics from both sources and handles errors gracefully. Consider these improvements:
- The comment about dummy kafka address should be moved to ScalaDoc
- Consider extracting the hidden topic prefix "_" to a constant
- Consider adding logging for Kafka connection failures
class AllNonHiddenTopicsSelectionStrategy extends TopicSelectionStrategy { + private val HiddenTopicPrefix = "_" + + /** Retrieves topics from both schema registry and Kafka. + * Note: In test environments with dummy Kafka addresses, Kafka topic retrieval + * may fail silently and return empty list + */ override def getTopics( schemaRegistryClient: SchemaRegistryClient, kafkaConfig: KafkaConfig ): Validated[SchemaRegistryError, List[UnspecializedTopicName]] = { val topicsFromSchemaRegistry = schemaRegistryClient.getAllTopics val validatorConfig = kafkaConfig.topicsExistenceValidationConfig.validatorConfig val schemaLessTopics: List[UnspecializedTopicName] = { try { KafkaUtils.usingAdminClient(kafkaConfig) { _.listTopics(new ListTopicsOptions().timeoutMs(validatorConfig.adminClientTimeout.toMillis.toInt)) .names() .get() .asScala .toSet .map(UnspecializedTopicName.apply) - .filterNot(topic => topic.name.startsWith("_")) + .filterNot(topic => topic.name.startsWith(HiddenTopicPrefix)) .toList } } catch { - // In some tests we pass dummy kafka address, so when we try to get topics from kafka it fails case e: KafkaException => + logger.warn("Failed to retrieve topics from Kafka", e) List.empty } }
64-72
: Consider documenting unused parameterSimilar to TopicsWithExistingSubjectSelectionStrategy, the
kafkaConfig
parameter is not used in this implementation.class TopicsMatchingPatternWithExistingSubjectsSelectionStrategy(val topicPattern: Pattern) extends TopicSelectionStrategy { + // kafkaConfig not used as this strategy only filters topics from schema registry override def getTopics( schemaRegistryClient: SchemaRegistryClient, kafkaConfig: KafkaConfig ): Validated[SchemaRegistryError, List[UnspecializedTopicName]] = schemaRegistryClient.getAllTopics.map(_.filter(topic => topicPattern.matcher(topic.name).matches()))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/TopicSelectionStrategy.scala
(1 hunks)
🔇 Additional comments (2)
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/TopicSelectionStrategy.scala (2)
4-6
: LGTM: Required imports added for Kafka admin functionality
The new imports support the enhanced topic selection capabilities with Kafka admin client integration.
14-17
: Verify impact of the breaking change in trait signature
Adding kafkaConfig
parameter is a breaking change that will affect all implementations of this trait.
✅ Verification successful
All implementations of TopicSelectionStrategy have been updated with the new parameter
The two implementations found in the codebase:
TopicsWithExistingSubjectSelectionStrategy
AllNonHiddenTopicsSelectionStrategy
Both have been properly updated to include the new kafkaConfig
parameter in their getTopics
method signature, matching the trait's interface. The breaking change has been handled correctly.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Find all implementations of TopicSelectionStrategy to ensure they're updated
ast-grep --pattern 'class $_ extends TopicSelectionStrategy'
Length of output: 6594
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (2)
utils/schemed-kafka-components-utils/src/test/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/azure/AzureSchemaBasedSerdeProviderIntegrationTest.scala (1)
32-37
: Consider making the test environment more CI-friendly.The test relies on Azure Event Hubs namespace which might not be available in all CI environments. Consider:
- Adding clear documentation about required environment variables
- Providing a mock implementation for CI
- Marking the test with appropriate tags for conditional execution
utils/schemed-kafka-components-utils/src/test/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/azure/AzureTestsFromFileIntegrationTest.scala (1)
42-42
: LGTM! Consider adding a comment explaining the flag's purpose.The addition of
showTopicsWithoutSchema = false
is appropriate for this test class as it focuses on schema registry functionality. This ensures the test remains focused on verifying schema-based topic behavior.Consider adding a comment explaining why this flag is set to false here:
+ // Explicitly disable schema-less topics as this test focuses on schema registry functionality private val kafkaConfig = KafkaConfig(Some(schemaRegistryConfigMap), None, showTopicsWithoutSchema = false)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
utils/schemed-kafka-components-utils/src/test/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/azure/AzureSchemaBasedSerdeProviderIntegrationTest.scala
(1 hunks)utils/schemed-kafka-components-utils/src/test/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/azure/AzureTestsFromFileIntegrationTest.scala
(1 hunks)
🔇 Additional comments (2)
utils/schemed-kafka-components-utils/src/test/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/azure/AzureSchemaBasedSerdeProviderIntegrationTest.scala (1)
32-37
: LGTM on the KafkaConfig parameter addition.
The addition of showTopicsWithoutSchema = false
is appropriate for this schema-based test case.
utils/schemed-kafka-components-utils/src/test/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/azure/AzureTestsFromFileIntegrationTest.scala (1)
42-42
: Verify test coverage for schema-less topics.
While this test appropriately covers schema registry functionality, we should ensure there are corresponding test cases for the new schema-less topic support introduced in this PR.
Let's check for test coverage of schema-less topics:
✅ Verification successful
Schema-less topic test coverage is present in the codebase
The codebase has adequate test coverage for schema-less topics in multiple test files:
KafkaJsonItSpec
contains dedicated tests for topics without schema, testing both JSON and plain content typesUniversalRecordFormatterSpec
includes specific test cases for JSON record formatting without schemaUniversalKafkaDeserializerTest
covers deserialization scenarios for messages without schema IDs
The showTopicsWithoutSchema = false
setting in the reviewed test file is appropriate as it's specifically testing Azure Schema Registry functionality, while schema-less scenarios are covered elsewhere.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for test files that might contain schema-less topic tests
rg -t scala "showTopicsWithoutSchema\s*=\s*true" --type scala
# Look for potential test class names related to schema-less topics
rg -t scala "class.*Test.*Schema(Less|Free|Optional|Without)" --type scala
Length of output: 140
Script:
#!/bin/bash
# Let's first check for any test files that mention schema-less or related terms
rg -t scala -i "(schema.?less|without.?schema|no.?schema)" --type scala
# Let's also check for test files that use showTopicsWithoutSchema in any form
rg -t scala "showTopicsWithoutSchema" --type scala
# Let's look at the actual test file content to understand the test coverage
cat utils/schemed-kafka-components-utils/src/test/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/azure/AzureTestsFromFileIntegrationTest.scala
Length of output: 14915
.../engine/schemedkafka/schemaregistry/azure/AzureSchemaBasedSerdeProviderIntegrationTest.scala
Show resolved
Hide resolved
TODO: ad-hoc test and schema in sink
Added comments where not sure
…andling when couldn't list kafka topics (previously it wasn't needed)
…compiles Also a workaround in the same file for test, it will now fetch all topics from schema registry as before and from kafka, so tests should pass
When topic without schema is selected instead of version, user can now choose ContentType (which for now doesn't change anything, need to implement handling bytes array)
- created a function for repeated code - removed use of avro in test for json - used more constants were applicable
- commented out plain for now - all kafka topics are now filtered using topicSelectionStrategy - todo: add feature flag for showing topics without schema
…ema - false for now
- `showTopicsWithoutSchema` flag is now in KafkaConfig - TopicSelectionStrategies changed to include topics without schema
901d727
to
26d42b1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (8)
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/UniversalKafkaDeserializer.scala (1)
44-59
: Consider extensibility for future content typesThe current implementation hardcodes JSON and Plain content types. Consider making this extensible for future content types.
Example approach:
trait ContentTypeHandler { def canHandle(contentType: String): Boolean def getSchemaWithMetadata(contentType: String): SchemaWithMetadata } object JsonContentTypeHandler extends ContentTypeHandler { def canHandle(contentType: String): Boolean = contentType.equals(ContentTypes.JSON.toString) def getSchemaWithMetadata(contentType: String): SchemaWithMetadata = SchemaWithMetadata(ContentTypesSchemas.schemaForJson, SchemaId.fromString(ContentTypes.JSON.toString)) } // Similar implementation for PlainContentTypeHandlerutils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/formatter/AbstractSchemaBasedRecordFormatter.scala (2)
124-140
: Simplify content type handling logicThe current implementation has nested conditions and duplicate error messages. Consider using pattern matching for cleaner content type handling.
- val schema = record.valueSchemaId.flatMap { - case StringSchemaId(contentType) => - if (contentType.equals(ContentTypes.JSON.toString)) { - Some( - SchemaWithMetadata( - ContentTypesSchemas.schemaForJson, - SchemaId.fromString(ContentTypes.JSON.toString) - ).schema - ) - } else if (contentType.equals(ContentTypes.PLAIN.toString)) { - None - } else - throw new IllegalStateException("Schemaless topic should have json or plain content type, got neither") - case _ => - throw new IllegalStateException("Schemaless topic should have json or plain content type, got neither") + val schema = record.valueSchemaId.flatMap { + case StringSchemaId(contentType) => contentType match { + case ct if ct == ContentTypes.JSON.toString => + Some(SchemaWithMetadata( + ContentTypesSchemas.schemaForJson, + SchemaId.fromString(ContentTypes.JSON.toString) + ).schema) + case ct if ct == ContentTypes.PLAIN.toString => None + case _ => throw new IllegalStateException(s"Schemaless topic should have json or plain content type, got: $contentType") + } + case other => throw new IllegalStateException(s"Schemaless topic should have string schema id, got: $other") + }
114-144
: Document the schemaless topic handlingThe new functionality for handling schemaless topics should be documented with examples and limitations.
Consider adding scaladoc comments explaining:
- The behavior for different content types
- Expected JSON structure for schemaless topics
- Handling of metadata from hermes (as mentioned in PR objectives)
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/KafkaUniversalComponentTransformer.scala (1)
114-118
: Consider improving error messages for better user experience.The error message from schema registry is directly passed to the user. Consider providing more context-specific error messages that guide users on how to resolve the issue.
case Invalid(e) => Writer[List[ProcessCompilationError], List[Integer]]( - List(CustomNodeError(e.getMessage, Some(topicParamName))), + List(CustomNodeError(s"Failed to retrieve schema versions: ${e.getMessage}. Please ensure the topic exists and has schemas registered.", Some(topicParamName))), Nil )utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/source/UniversalKafkaSourceFactory.scala (1)
68-73
: Consider adding validation for content type parameterThe content type parameter is accepted without validation, which could lead to runtime errors if an invalid value is provided.
Consider adding a dedicated parameter with a fixed set of allowed values:
+ protected val contentTypeParam = Parameter(contentTypeParamName) + .copy(editor = Some(FixedValuesParameterEditor(ContentTypes.values.map(_.toString).toList))) + override def contextTransformation(context: ValidationContext, dependencies: List[NodeDependencyValue])( implicit nodeId: NodeId ): ContextTransformationDefinition = topicParamStep orElse + Parameter.initial(contentTypeParam) orElse schemaParamStep(paramsDeterminedAfterSchema) orElse nextSteps(context, dependencies)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/sink/UniversalKafkaSinkFactory.scala (2)
85-86
: Consider caching schema instances more efficientlyThe lazy vals for schemas are a good start, but consider moving them to a companion object to ensure they're truly singleton instances across all factory instances.
object UniversalKafkaSinkFactory { + private lazy val jsonSchema = + RuntimeSchemaData(new NkSerializableParsedSchema[ParsedSchema](ContentTypesSchemas.schemaForJson), None) + private lazy val plainSchema = + RuntimeSchemaData(new NkSerializableParsedSchema[ParsedSchema](ContentTypesSchemas.schemaForPlain), None) private val paramsDeterminedAfterSchema = List( // existing code )Also applies to: 89-92
365-369
: Enhance error handling in runtimeSchemaDataForContentTypeThe error message could be more informative. Consider including the actual content type in the error message and using pattern matching for better type safety.
private def runtimeSchemaDataForContentType(contentType: String): RuntimeSchemaData[ParsedSchema] = { - if (contentType.equals(ContentTypes.JSON.toString)) { jsonSchema } - else if (contentType.equals(ContentTypes.PLAIN.toString)) { plainSchema } - else { throw new IllegalStateException("Content Type should be JSON or PLAIN, is neither") } + contentType match { + case ct if ct.equals(ContentTypes.JSON.toString) => jsonSchema + case ct if ct.equals(ContentTypes.PLAIN.toString) => plainSchema + case unsupported => throw new IllegalStateException( + s"Content Type should be JSON or PLAIN, got: $unsupported" + ) + } }docs/integration/KafkaIntegration.md (1)
207-207
: Configuration entry looks good but could use more context.The configuration entry for
showTopicsWithoutSchema
is clear and well-documented. However, consider adding a note about its relationship with schema-less topics and how it affects the user experience when working with topics that don't have associated schemas.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (15)
docs/integration/KafkaIntegration.md
(1 hunks)engine/flink/schemed-kafka-components-utils/src/test/scala/pl/touk/nussknacker/engine/schemedkafka/source/flink/TopicSelectionStrategySpec.scala
(4 hunks)engine/flink/tests/src/test/scala/pl/touk/nussknacker/defaultmodel/KafkaJsonItSpec.scala
(1 hunks)utils/kafka-utils/src/main/scala/pl/touk/nussknacker/engine/kafka/KafkaConfig.scala
(1 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/KafkaUniversalComponentTransformer.scala
(6 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/TopicSelectionStrategy.scala
(1 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/ContentTypes.scala
(1 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/SchemaRegistryClient.scala
(2 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/formatter/AbstractSchemaBasedRecordFormatter.scala
(2 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/ParsedSchemaSupport.scala
(1 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/UniversalKafkaDeserializer.scala
(2 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/sink/UniversalKafkaSinkFactory.scala
(7 hunks)utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/source/UniversalKafkaSourceFactory.scala
(3 hunks)utils/schemed-kafka-components-utils/src/test/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/azure/AzureSchemaBasedSerdeProviderIntegrationTest.scala
(1 hunks)utils/schemed-kafka-components-utils/src/test/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/azure/AzureTestsFromFileIntegrationTest.scala
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
- engine/flink/schemed-kafka-components-utils/src/test/scala/pl/touk/nussknacker/engine/schemedkafka/source/flink/TopicSelectionStrategySpec.scala
- engine/flink/tests/src/test/scala/pl/touk/nussknacker/defaultmodel/KafkaJsonItSpec.scala
- utils/kafka-utils/src/main/scala/pl/touk/nussknacker/engine/kafka/KafkaConfig.scala
- utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/TopicSelectionStrategy.scala
- utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/ContentTypes.scala
- utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/SchemaRegistryClient.scala
- utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/ParsedSchemaSupport.scala
- utils/schemed-kafka-components-utils/src/test/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/azure/AzureSchemaBasedSerdeProviderIntegrationTest.scala
- utils/schemed-kafka-components-utils/src/test/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/azure/AzureTestsFromFileIntegrationTest.scala
🔇 Additional comments (13)
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/universal/UniversalKafkaDeserializer.scala (2)
8-8
: LGTM: New imports support schema-less topics functionality
The added imports are well-organized and necessary for implementing the new feature to handle topics without schema.
Also applies to: 12-17
44-59
: Verify content type validation for schema-less topics
The current implementation assumes the message format matches the content type without validation. This could lead to runtime errors if the actual message format doesn't match the assumed content type.
Consider adding message format validation for schema-less topics. For example:
- For JSON content type: Verify the message can be parsed as valid JSON
- For Plain content type: Define clear expectations for the message format
Would you like me to propose a validation implementation?
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/formatter/AbstractSchemaBasedRecordFormatter.scala (2)
11-19
: LGTM: New imports support schemaless topic functionality
The added imports are necessary and well-organized to support the new feature for handling topics without schemas.
141-142
: Verify JSON structure validation
For schemaless topics with JSON content type, ensure that the JSON structure is properly validated. The current implementation might accept any JSON value (string, number, array, etc.) without validation.
✅ Verification successful
Based on the code analysis, I can now provide a final response regarding the JSON validation concern:
JSON validation is properly handled through multiple layers
For schemaless topics with JSON content type, the validation is handled appropriately through:
- The
JsonPayloadRecordFormatterSupport
handles JSON message formatting and reading - When schema is provided, validation is performed using
JsonSchemaOutputValidator
with proper validation modes (strict/lax) - For schemaless JSON topics, basic structural validation is performed during serialization/deserialization
- The implementation correctly handles both schema-based and schemaless JSON messages through the universal formatter infrastructure
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for JSON validation logic in value parsing
ast-grep --pattern 'readValueMessage($schema, $topic, $jsonObj) {
$$$
}'
Length of output: 76
Script:
#!/bin/bash
# Let's first check the actual implementation of readValueMessage
rg -A 10 "readValueMessage" --type scala
# Also search for any JSON validation related code
rg -A 5 "JSON|json" --type scala
# Let's also check the entire file content to understand the context better
cat "utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/schemaregistry/formatter/AbstractSchemaBasedRecordFormatter.scala"
Length of output: 111078
Script:
#!/bin/bash
# Let's check the implementation of JsonPayloadRecordFormatterSupport which handles JSON validation
rg -A 10 "JsonPayloadRecordFormatterSupport" --type scala
# Also check for any JSON validation related code in the universal package
rg -A 5 "validate.*json|json.*validate" --type scala --ignore-case
# Let's also check the specific implementation of readMessage in JsonPayloadRecordFormatterSupport
ast-grep --pattern 'readMessage($topic, $schemaOpt, $jsonObj) {
$$$
}'
Length of output: 51680
Script:
#!/bin/bash
# Let's check the implementation of JsonPayloadRecordFormatterSupport more specifically
rg -A 15 "object JsonPayloadRecordFormatterSupport" --type scala
# Also check for any JSON schema validation in the universal package
rg -A 10 "JsonSchemaOutputValidator" --type scala
# Let's also check the specific implementation of readMessage in RecordFormatterSupport
rg -A 10 "private def readMessage" --type scala
Length of output: 27730
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/KafkaUniversalComponentTransformer.scala (5)
29-29
: LGTM: Content type parameter constant added.
The new constant follows the existing naming convention and is appropriately placed with other parameter name constants.
50-54
: LGTM: Topic selection strategy based on configuration.
The implementation cleanly switches between strategies based on the configuration flag, following the suggested approach from previous reviews.
70-70
: LGTM: Topic selection strategy now receives KafkaConfig.
The change properly implements the suggested refactoring to pass KafkaConfig to the topic selection strategy.
123-125
: Verify the status of PLAIN content type support.
There's a TODO comment about removing the PLAIN content type once it's working correctly. Please clarify:
- What issues exist with PLAIN content type?
- What's the timeline for enabling it?
- Should this TODO be tracked in an issue?
Would you like me to create a GitHub issue to track this TODO?
238-239
: LGTM: Content type parameter name constant follows existing patterns.
The implementation is consistent with other parameter name constants and correctly marked as @transient.
utils/schemed-kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/schemedkafka/sink/UniversalKafkaSinkFactory.scala (4)
22-27
: LGTM: Clean import organization for content type support
The new imports are well-organized and properly scoped for the content type functionality.
99-108
: LGTM: Clean pattern matching for raw editor parameters
The pattern matching for content type parameters is well-structured and maintains consistency with the existing schema-based approach.
162-195
: 🛠️ Refactor suggestion
Consider extracting duplicate validation logic
The validation logic in this block is nearly identical to the schema-based validation at lines 124-161. Consider extracting this into a shared method to improve maintainability.
+ private def validateWithSchema(
+ context: ValidationContext,
+ runtimeSchemaData: RuntimeSchemaData[ParsedSchema],
+ mode: String,
+ value: BaseDefinedParameter
+ )(implicit nodeId: NodeId): TransformationStepResult = {
+ schemaSupportDispatcher
+ .forSchemaType(runtimeSchemaData.schema.schemaType())
+ .extractParameter(
+ runtimeSchemaData.schema,
+ rawMode = true,
+ validationMode = extractValidationMode(mode),
+ rawParameter = rawValueParamDeclaration.createParameter(),
+ restrictedParamNames
+ )
+ .map { extractedSinkParameter =>
+ val validationAgainstSchemaErrors = extractedSinkParameter
+ .validateParams(Map(sinkValueParamName -> value))
+ .swap
+ .map(_.toList)
+ .getOrElse(List.empty)
+ FinalResults(
+ context,
+ validationAgainstSchemaErrors,
+ Some(TransformationState(runtimeSchemaData, extractedSinkParameter))
+ )
+ }
+ .valueOr { errors =>
+ FinalResults(context, errors.toList)
+ }
+ }
249-278
: Consider handling empty schemas more explicitly
The comment about empty schemas leading to infinite loops deserves more attention. Consider adding explicit validation or documentation for this edge case.
✅ Verification successful
Let me gather more information about the schema handling and parameter transformation logic to better understand the potential for infinite loops.
Based on the analysis of the codebase, I can now provide a final response regarding the empty schema concern:
Empty schema handling is properly implemented with safeguards
The code already handles empty schemas appropriately:
- It explicitly checks for empty parameters using
valueParam.toParameters.isEmpty
- It returns
FinalResults
with the current state when empty, preventing infinite loops - The validation mode system (
ValidationMode.lax
vsValidationMode.strict
) provides additional control over schema validation - Similar pattern is consistently used across the codebase in other schema-handling components
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for empty schema handling across the codebase
rg -A 3 "empty\s+schema"
# Look for potential infinite loop scenarios in schema handling
ast-grep --pattern 'while ($condition) {
$$$
if ($condition) {
$$$
}
}'
Length of output: 4037
Script:
#!/bin/bash
# Look for schema validation and parameter transformation logic
ast-grep --pattern 'def extractParameter($$$) {
$$$
}'
# Search for other instances where toParameters is used
rg -A 3 "toParameters"
# Look for schema validation implementations
rg -A 5 "ValidationMode"
Length of output: 108045
…dSchemaSupport doesn't know about it, it will be checked only in tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
engine/lite/components/kafka/src/main/scala/pl/touk/nussknacker/engine/lite/components/LiteKafkaSourceImplFactory.scala (1)
89-94
: Good fix for the {"Value": message} wrapping issue.The implementation correctly addresses the PR objective of fixing the ad-hoc JSON object formatting. However, consider these improvements for robustness:
- The "Value" string comparison is case-sensitive. Consider using case-insensitive comparison or defining it as a constant.
- Consider adding validation for the extracted inner value to ensure it's of the expected type.
Here's a suggested improvement:
+ private val ValueKey = "Value" // Define constant for better maintainability + override def parametersToTestData(params: Map[ParameterName, AnyRef]): ConsumerRecord[Array[Byte], Array[Byte]] = { val unflattenedParams = TestingParametersSupport.unflattenParameters(params) val removedValue = if (unflattenedParams.size == 1) { unflattenedParams.headOption match { - case Some(("Value", inner)) => inner + case Some((key, inner)) if key.equalsIgnoreCase(ValueKey) => + require(inner != null, "Value cannot be null") + inner case _ => unflattenedParams } } else unflattenedParamsengine/flink/kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/kafka/source/flink/FlinkKafkaSource.scala (1)
128-137
: Consider metadata handling requirementsThe PR objectives mention uncertainty about handling metadata from
hermes
. While these changes address the basic value handling, we might need to consider:
- How metadata will be handled in test scenarios
- Whether metadata should be included in test data generation
- If additional parameters for metadata testing are needed
Consider adding support for metadata handling in test scenarios if required. This could involve:
- Adding metadata-related test parameters
- Implementing metadata extraction in test data generation
- Documenting metadata handling behavior
Would you like me to help design the metadata handling solution?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
engine/flink/kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/kafka/source/flink/FlinkKafkaSource.scala
(1 hunks)engine/lite/components/kafka/src/main/scala/pl/touk/nussknacker/engine/lite/components/LiteKafkaSourceImplFactory.scala
(1 hunks)
🔇 Additional comments (3)
engine/lite/components/kafka/src/main/scala/pl/touk/nussknacker/engine/lite/components/LiteKafkaSourceImplFactory.scala (1)
Line range hint 52-54
: Consider addressing the TODO comment about kafkaConfig properties.
The PR objectives mention uncertainty about handling metadata from hermes
. This TODO comment about kafkaConfig
properties might be related. Consider whether additional kafkaConfig
properties should be used here for metadata handling.
Let's check for any related configuration handling:
engine/flink/kafka-components-utils/src/main/scala/pl/touk/nussknacker/engine/kafka/source/flink/FlinkKafkaSource.scala (2)
137-137
: Verify test record creation with the new parameter handling
The change to use removedValue
in createTestRecord
looks correct, but we should ensure it handles all test scenarios properly.
Let's verify the test record creation:
✅ Verification successful
Based on the search results, I can now generate the final response:
Test record creation with removedValue
is correctly implemented
The usage of createTestRecord
with removedValue
is consistent with the Kafka source test pattern across the codebase. The KafkaTestParametersInfo
case class defines a type-safe function createTestRecord: Any => TestRecord
, and its usage in both Flink and Lite Kafka sources follows the same pattern.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check test record creation implementations
# Search for implementations of createTestRecord
ast-grep --pattern 'def createTestRecord($params) = {
$$$
}'
# Search for test cases using createTestRecord
rg -A 5 "createTestRecord"
Length of output: 23073
128-133
: Improved handling of test data parameters
The changes correctly address the PR objective by removing the ad-hoc JSON object formatting when there's only a "Value" parameter. This aligns with the goal of handling topics without schema more naturally.
Let's verify the test data handling in the codebase:
✅ Verification successful
The verification results show that the parametersToTestData
change in FlinkKafkaSource
is mirrored in LiteKafkaSourceImplFactory
, both implementing the same logic for handling single "Value" parameters. This is a consistent pattern across Kafka-related components. The change is well-contained within these Kafka source implementations and other source implementations have their own specific parameter handling logic.
Parameter handling change is consistent and properly scoped
The special handling of single "Value" parameter is correctly implemented in both Kafka-related sources while other sources maintain their specific parameter handling logic. This confirms the change is properly scoped to Kafka components and aligns with the goal of handling topics without schema.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for test cases that might be affected by the parameter handling change
# Search for test cases using parametersToTestData
rg -l "parametersToTestData" | while read -r file; do
echo "=== $file ==="
rg -A 5 "parametersToTestData" "$file"
done
# Search for test cases with single "Value" parameter
ast-grep --pattern 'Map($_, $v)' | while read -r match; do
rg -C 5 "Value.*=>" "$(echo "$match" | cut -d: -f1)"
done
Length of output: 9050
Describe your changes
We want to enable users to choose and use topics without schema present on schema registry.
We won't validate input and it will be passed as
Unknown
Problems
Currently using ad-hoc creates json object {"Value":
message
} and so it works differently compared to whenmessage
is put on topic.Similarly, I'm not sure if we should handle metadata from
hermes
here or is it managed on cloud. If locally we create topic and send messages throughhermes
we get:when we only need message -> this can be worked around for now by using dynamic access
Checklist before merge
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests