Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[admin-tool] Add support to dump UPDATE records and RMD records in Kafka Topic Dumper #831

Merged
merged 4 commits into from
Jan 29, 2024

Conversation

sixpluszero
Copy link
Contributor

@sixpluszero sixpluszero commented Jan 25, 2024

[admin-tool] Add support to dump UPDATE records and RMD records in Kafka Topic Dumper

This PR adds new mode to print data record (PUT/DELETE/UPDATE) as well as its corresponding RMD record to console, on top of its existing functionality of logging Kafka message metadata to console. It is very useful for debugging feature correctness.

In the future, we can consider adding a key filter to log specific key only, to better focus on certain key.

How was this PR tested?

Added new unit test to cover the message build up

Does this PR introduce any user-facing changes?

  • No. You can skip the rest of this section.
  • Yes. Make sure to explain your proposed changes and call out the behavior change.

ZacAttack
ZacAttack previously approved these changes Jan 27, 2024
Copy link
Contributor

@ZacAttack ZacAttack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@ZacAttack
Copy link
Contributor

ZacAttack commented Jan 27, 2024

One thing occurs to me though. This will dump the content of fields. That is useful I suppose, but I wonder if we should by default add some screening to the output to avoid data privacy violations.

Like, it's probably ok to display nulls or NoOps, but anything else we should probably screen with ###### with an optional flag to display the data if it's truly necessary.

@sixpluszero
Copy link
Contributor Author

One thing occurs to me though. This will dump the content of fields. That is useful I suppose, but I wonder if we should by default add some screening to the output to avoid data privacy violations.

Like, it's probably ok to display nulls or NoOps, but anything else we should probably screen with ###### with an optional flag to display the data if it's truly necessary.

Hmm, my understanding is first the existing mode already supports dumping the whole data into files, what I added here is only to extend the functionality to be easier to use. So I mean I don't make it more or less safe compared to master branch.
Also, when we are using the dumper, we are already guarded by ACL. The optional flag you suggest, I think it does not add extra safety unless the option is also guarded with permission. It really depends on the purpose of the user. We could think of ways to make it more secure, but I think..it could happen in a future PR I suppose? To me for now it looks more like query.sh script...

@sixpluszero sixpluszero merged commit 08e5581 into linkedin:main Jan 29, 2024
32 checks passed
@sixpluszero sixpluszero deleted the topicDumpRMD branch April 2, 2024 06:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants