Do I have to set up HDFS in order to use streamX? #60

iShiBin · 2018-07-25T18:51:06Z

I noticed I have to configure the hadoop config files like core-site.xml, hdfs-site.xml to configure S3. And I could not find the mentioned config/hadoop-conf in my installation (Kafka 0.10.2.0). So do I have to use HDFS in order to use this streamX?

What I am trying to do is to transform some messages in JSON format to parquet and then store them in S3.

Using spark could achieve this target but it would require a long-running cluster to do, or I can use the checkpoint to do a per day basic ETL.

OneCricketeer · 2019-02-11T17:55:43Z

And I could not find the mentioned config/hadoop-conf in my installation (Kafka 0.10.2.0).

Kafka is not a Hadoop project, that is why you will not find it there. You must make this folder on your own. An EMR instance, or other EC2 Hadoop-provisioned machine would have this folder.

So do I have to use HDFS in order to use this streamX?

Not exactly, but you need to use a Hadoop compatibile filesystem (which S3 is).

Since this project uses the Hadoop FileSystem API, you need to just specify the configuration directory with the XML files included.

Using spark could achieve this target but it would require a long-running cluster to do

Kafka Connect consumers also typically are long-running, as part of a cluster / consumer-group.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do I have to set up HDFS in order to use streamX? #60

Do I have to set up HDFS in order to use streamX? #60

iShiBin commented Jul 25, 2018

OneCricketeer commented Feb 11, 2019 •

edited

Loading

Do I have to set up HDFS in order to use streamX? #60

Do I have to set up HDFS in order to use streamX? #60

Comments

iShiBin commented Jul 25, 2018

OneCricketeer commented Feb 11, 2019 • edited Loading

OneCricketeer commented Feb 11, 2019 •

edited

Loading