From f54b32daaa76a3b7787b9b7e4b2c28e252149322 Mon Sep 17 00:00:00 2001 From: James Cancilla Date: Fri, 19 Jun 2015 15:29:47 -0400 Subject: [PATCH] Doc update --- com.ibm.streamsx.hdfs/info.xml | 52 ++++++++++++++++++++++------------ 1 file changed, 34 insertions(+), 18 deletions(-) diff --git a/com.ibm.streamsx.hdfs/info.xml b/com.ibm.streamsx.hdfs/info.xml index d63f8fa..54d56f4 100644 --- a/com.ibm.streamsx.hdfs/info.xml +++ b/com.ibm.streamsx.hdfs/info.xml @@ -13,7 +13,7 @@ The HDFS Toolkit provides operators that can read and write data from Hadoop Dis The operators in this toolkit use Hadoop Java APIs to access HDFS and GPFS. The operators support the following versions of Hadoop distributions: * Apache Hadoop versions 2.x - * InfoSphere BigInsights 2.1.2, 3.0.0.x + * InfoSphere BigInsights 2.1.2, 3.0.0.x, 4.0.0.0 * Cloudera distribution including Apache Hadoop version 4 (CDH4) and version 5 (CDH 5) * Hortonworks Data Platform (HDP) 2.2 @@ -47,15 +47,20 @@ Alternatively, you can fully qualify the operators that are provided by toolkit # Procedure +# Procedure + 1. If InfoSphere Streams has access to the location where Hadoop is installed, set the following environment variables: * For Apache HDFS, Cloudera, or Hortonworks Data Platform: * Set **HADOOP_HOME** to `Hadoop_Install_Directory`. For example, `/usr/lib/hadoop`. * Set **JAVA_HOME** to the location where Java is installed. - * For IBM InfoSphere BigInsights: + * For IBM InfoSphere BigInsights 3.x: * Set **BIGINSIGHTS_HOME** to `BigInsights_Install_Directory`. For example, `/opt/ibm/biginsights`. * Set **HADOOP_HOME** to `BigInsights_Install_Directory/IHC`. For example, `/opt/ibm/biginsights/IHC`. * Set **JAVA_HOME** to the location where Java is installed. + * For IBM InfoSphere BigInsights 4.x: + * Set **HADOOP_HOME** to `BigInsights_Install_Directory/hadoop`. For example, `/usr/iop/4.0.0.0/hadoop`. + * Set **JAVA_HOME** to the location where Java is installed. 2. If InfoSphere Streams does not have access to the location where Hadoop is installed, copy the Hadoop library files to a location that is accessible to InfoSphere Streams and set the appropriate environment variables. @@ -69,13 +74,19 @@ Alternatively, you can fully qualify the operators that are provided by toolkit cp -Lr /usr/lib/hadoop /usr/lib/hadoop-hdfs /path-on-cluster 2. Copy `/usr/lib/hadoop-hdfs` to the InfoSphere Streams cluster and place it in a directory on the cluster, which is accessible to InfoSphere Streams. - * For IBM InfoSphere BigInsights: + * For IBM InfoSphere BigInsights 3.x: 1. Copy `BigInsights_Install_Directory/IHC` to the InfoSphere Streams cluster and place it under a directory on the cluster, which is accessible to InfoSphere Streams. For example, `/home/Streams/BigInsights_Install_Directory/IHC`. 2. Copy the`BigInsights_Install_Directory/hadoop-conf` directory to the InfoSphere Streams cluster and place it under a directory on the cluster, which is accessible to InfoSphere Streams. For example, `/home/Streams/BigInsights_Install_Directory/hadoop-conf` - * For IBM InfoSphere BigInsights installed on GPFS: + * For IBM InfoSphere BigInsights 4.x: + 1. Copy `BigInsights_Install_Directory/hadoop` to the InfoSphere Streams cluster and place it under a directory + on the cluster, which is accessible to InfoSphere Streams. For example, `/home/Streams/BigInsights_Install_Directory/hadoop`. + 2. Copy the `BigInsights_Install_Directory/hadoop-hdfs` directory to the InfoSphere Streams cluster + and place it under a directory on the cluster, which is accessible to InfoSphere Streams. + For example, `/home/Streams/BigInsights_Install_Directory/hadoop-hdfs` + * For IBM InfoSphere BigInsights 3.x installed on GPFS: * **Important**: If IBM InfoSphere BigInsights is installed on GPFS, you do not need to install InfoSphere Streams on an IBM InfoSphere BigInsights data node. Use the `webhdfs://hdfshost:webhdfsport` schema in the URI that you use to connect to GPFS. @@ -87,21 +98,25 @@ Alternatively, you can fully qualify the operators that are provided by toolkit For example, `/home/Streams/BigInsights_Install_Directory/hadoop-conf` 3. Copy `BigInsights_Install_Directory/lib/biginsights-gpfs.jar` to the InfoSphere Streams cluster and place it under a directory on the cluster, which is accessible to InfoSphere Streams. - For example, `/home/Streams/BigInsights_Install_Directory`. + For example, `/home/Streams/BigInsights_Install_Directory`. + * For IBM InfoSphere BigInsights 4.0.0.0 installed on GPFS: + * The com.ibm.streamsx.hdfs toolkit does not support remote connections to a BigInsight 4.0.0.0 GPFS cluster. The following list describes the environment variables to set when Hadoop and IBM InfoSphere BigInsights libraries are copied to location that is accessible to InfoSphere Streams: * For Apache HDFS, Cloudera, or Hortonworks Data Platform: - * Set **HADOOP_HOME** to `/home/Streams/hadoop`. + * Set **HADOOP_HOME** to `/home/Streams/hadoop`.l * Set **JAVA_HOME** to the location where Java is installed. - * For IBM InfoSphere BigInsights: + * For IBM InfoSphere BigInsights 3.x: * Set **HADOOP_HOME** to `/home/Streams/biginsights/IHC`. * Set **BIGINSIGHTS_HOME** to `/home/Streams/biginsights`. * Set **JAVA_HOME** to the location where Java is installed. - * IBM InfoSphere BigInsights installed on GPFS: + * For IBM InfoSphere BigInsights 4.x: + * Set **HADOOP_HOME** to `/home/Streams/biginsights/hadoop`. + * Set **JAVA_HOME** to the location where Java is installed. + * IBM InfoSphere BigInsights 3.x installed on GPFS: * Set **HADOOP_HOME** to `/opt/ibm/biginsights/IHC/`. * Set **BIGINSIGHTS_HOME** to `/opt/ibm/biginsights`. - * Set **JAVA_HOME** to the location where Java is installed. - + * Set **JAVA_HOME** to the location where Java is installed. 3. Configure the SPL compiler to find the toolkit root directory. Use one of the following methods: * Set the **STREAMS_SPLPATH** environment variable to the root directory of a toolkit or multiple toolkits (with : as a separator). For example: @@ -116,11 +131,11 @@ Alternatively, you can fully qualify the operators that are provided by toolkit use com.ibm.streamsx.hdfs::*; You can also specify a use clause for individual operators by replacing the asterisk (\*) with the operator name. For example: use com.ibm.streamsx.hdfs::HDFS2FileSink; -5. If IBM InfoSphere BigInsights is installed on GPFS: +5. If IBM InfoSphere BigInsights 3.x or 4.x is installed on GPFS: * To access GPFS locally, set the `fs.defaultFS` option in the `core-site.xml` configuration file to `gpfs:///`. - * To access GPFS remotely, modify the `core-site.xml` that you have copied over from the remote system. + * To access GPFS remotely (**only applies to BigInsights 3.x**), modify the `core-site.xml` that you have copied over from the remote system. Set the `fs.default.FS` option in the `core-site.xml` configuration file to `webhdfs://hdfshost:webhdfsport`. - For example, `webhdfs://myhdfshost:14000`. + For example, `webhdfs://myhdfshost:14000`. Ensure that the user is set up to access the file system by using the webhdfs schema. 6. To read and write to HDFS, specify a uniform resource identifier (URI) to connect to HDFS. You can specify the URI in one of the following ways: @@ -129,16 +144,17 @@ Alternatively, you can fully qualify the operators that are provided by toolkit * `$HADOOP_HOME/../hadoop-conf` * `$HADOOP_HOME/etc/hadoop` * `$HADOOP_HOME/conf` - * `$HADOOP_HOME/share/hadoop/hdfs/*` - * `$HADOOP_HOME/share/hadoop/common/*` - * `$HADOOP_HOME/share/hadoop/common/lib/*` - * `$HADOOP_HOME/lib/*` - * `$HADOOP_HOME/*` + * `$HADOOP_HOME/share/hadoop/hdfs/\*` + * `$HADOOP_HOME/share/hadoop/common/\*` + * `$HADOOP_HOME/share/hadoop/common/lib/\*` + * `$HADOOP_HOME/lib/\*` + * `$HADOOP_HOME/\*` Tip: To specify a different location for the HDFS configuration files, set the **configPath** operator parameter. * Specify a value for the **hdfsUri** operator parameter. 7. Build your application. You can use the **sc** command or Streams Studio. 8. Start the InfoSphere Streams instance. 9. Run the application. You can submit the application as a job by using the **streamtool submitjob** command or by using Streams Studio. + 2.0.0 4.0.0.0