-
Notifications
You must be signed in to change notification settings - Fork 5
Starting an HDFS cluster on EC2 with Provisionr and Rundeck
We've made a short video illustrating this capability, you can see it on YouTube.
There are some background steps you need to take in order to make the integration work. First of all, before creating an AWS pool through provisionr, you need to set your credentials. You can do this in two ways - either through the console, as illustrated in the main readme:
$ ./bin/provisionr
provisionr [0.0.1-SNAPSHOT] $ config:edit com.axemblr.provisionr.amazon
provisionr [0.0.1-SNAPSHOT] $ config:proplist
service.pid = com.axemblr.provisionr.amazon
secretKey = secret
felix.fileinstall.filename = file:[...]/etc/com.axemblr.provisionr.amazon.cfg
region = us-east-1
accessKey = access
provisionr [0.0.1-SNAPSHOT] $ config:propset accessKey "XXXXXXX"
provisionr [0.0.1-SNAPSHOT] $ config:propset secretKey "XXXXXXX"
provisionr [0.0.1-SNAPSHOT] $ config:update
provisionr [0.0.1-SNAPSHOT] $ config:list "(service.pid=com.axemblr.provisionr.amazon)"
Or by editing etc/com.axemblr.provisionr.amazon.cfg
directly in the unzipped distribution directory after starting the service.
The machines in the pool can be automatically provisioned according to a template of your choice. For this demo, we used CDH4 packages defined in this template.
In Rundeck, you need to have a project with the endpoint set to the dedicated URL exposed by Provisionr: http://localhost:8181/rundeck/machines.xml
. The configuration job needs to receive a parameter named namenodePrivateHostname
set to any hostname in the pool, which will be set up as the namenode. There's only one workflow step - a bash script that runs the actual configuration on each node. You can see the source for the script in this gist.
If you want, you can download a prebuilt project that does all this from here.