Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use example with Hadoop 2.0.0-cdh4.2.1 #5

Open
WimOckham opened this issue Oct 16, 2013 · 0 comments
Open

Use example with Hadoop 2.0.0-cdh4.2.1 #5

WimOckham opened this issue Oct 16, 2013 · 0 comments

Comments

@WimOckham
Copy link

We tried the example with the following software:

  • Hadoop 2.0.0-cdh4.2.1
  • Hive 0.10.0-cdh4.2.1
  • Flume 1.3.0-cdh4.2.1
  • Oozie 3.3.0-cdh4.2.1

In the description of the example the use of MySQL is stressed. Default Hadoop 2.0.0-cdh4.2.1 is installed with postgresql for Hive and Derby for Oozie, which works with no problem using this example.

We didn’t need to install Flume manually either. In the Cloudera Manager you can add Flume as a service. In the page of the service you can add the content of flume.conf in Configuration – Agent (Base). In the same page you can set the agent name to TwitterAgent. When you put the flume-sources-1.0-SNAPSHOT.jar in /usr/share/cmf/lib/plugins/ the jar will be added to FLUME_CLASSPATH in /var/run/cloudera-scm-agent/process/-flume-AGENT/flume-env.sh when the service is started.

However, one issue prevented us to use this service for the example. You have to add com.cloudera.flume.source.TwitterSource to flume.plugin.classes in flume-site.xml. Otherwise you get the error ClassNotFound. We haven’t found a way to do this via the Cloudera Manager. When starting the service a directory /var/run/cloudera-scm-agent/process/-flume-AGENT is created, which includes flume-site.xml. When you restart the service via Cloudera Manager a new directory is created with a different number for . But after changing flume-site.xml you could use this directory to start Flume via the command line.

Concerning the custom Flume Source, it’s probably best to build the source with the right value for hadoop.version (in our case 2.0.0-cdh4.2.1) and flume.version (1.3.0-cdh4.2.1) in pom.xml.

We had some trouble with the time zone. In our case the time zone in coord-app.xml in oozie-workflows had to be changed to "Europe/Amsterdam". In job.properties tzOffset had to be changed to 1, otherwise we got a mismatch between the directory mentioned in the parameter WFINPUT and the DATEHOUR-parameter in Action Configuration of Oozie (viewed via Oozie Web Console).

We didn’t need to install the Oozie ShareLib in HDFS.

We used Hue File Browser to create the necessary directories in HDFS.

It turned out that each time a hive session is started the ADD JAR ; has to be executed again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant