Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HDFS StorageBackend implementation #583

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

tigrulya-exe
Copy link

This PR adds support for HDFS as a StorageBackend implementation. It also provides Kerberos authentication through the use of a provided keytab and supports asynchronous metric collection based on HDFS client file system statistics.

Users can provide HDFS client configuration in two ways: either by using traditional XML files, specifying their location in the hdfs.core-site.path and hdfs.hdfs-site.path options, or by passing the configuration options as regular Kafka options with the hdfs.conf. prefix.

@tigrulya-exe tigrulya-exe requested a review from a team as a code owner August 27, 2024 14:47
@jeqo
Copy link
Contributor

jeqo commented Sep 13, 2024

Thanks @tigrulya-exe! This looks like a great addition and quite complete coverage of the storage back-end. However, I'm hesitant to move forward on the review as I lack experience on HDFS to be useful on anything apart from the APIs usage.
I'd like to leave this PR open in the meantime to gather feedback and let others to chime in around how to proceed with adding a new back-end.

There are still some work on the project we would like to prioritize before on-boarding a new back-end as well: preparing for Tiered Storage becoming prod-ready in 3.9 or later, and adding docs and release process, etc.

A couple of alternatives while this is open for discussion is to point to your fork (or a separate repo with just HDFS) from our README to let users know there's an HDFS implementation.

Let me know wdyt, and thanks again for your contribution!

@tigrulya-exe
Copy link
Author

@jeqo Hi! Thank you for the feedback! I think it's a nice idea to point to our fork with the HDFS storage implementation in your README while this PR is open for discussion :) I don't think we need to create a separate repository just for HDFS, as it could complicate porting future features from the main repository

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants