Skip to content

Commit

Permalink
Doc update
Browse files Browse the repository at this point in the history
  • Loading branch information
James Cancilla committed Jun 19, 2015
1 parent 1753326 commit f54b32d
Showing 1 changed file with 34 additions and 18 deletions.
52 changes: 34 additions & 18 deletions com.ibm.streamsx.hdfs/info.xml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ The HDFS Toolkit provides operators that can read and write data from Hadoop Dis

The operators in this toolkit use Hadoop Java APIs to access HDFS and GPFS. The operators support the following versions of Hadoop distributions:
* Apache Hadoop versions 2.x
* InfoSphere BigInsights 2.1.2, 3.0.0.x
* InfoSphere BigInsights 2.1.2, 3.0.0.x, 4.0.0.0
* Cloudera distribution including Apache Hadoop version 4 (CDH4) and version 5 (CDH 5)
* Hortonworks Data Platform (HDP) 2.2

Expand Down Expand Up @@ -47,15 +47,20 @@ Alternatively, you can fully qualify the operators that are provided by toolkit

# Procedure

# Procedure

1. If InfoSphere Streams has access to the location where Hadoop is installed,
set the following environment variables:
* For Apache HDFS, Cloudera, or Hortonworks Data Platform:
* Set **HADOOP_HOME** to `Hadoop_Install_Directory`. For example, `/usr/lib/hadoop`.
* Set **JAVA_HOME** to the location where Java is installed.
* For IBM InfoSphere BigInsights:
* For IBM InfoSphere BigInsights 3.x:
* Set **BIGINSIGHTS_HOME** to `BigInsights_Install_Directory`. For example, `/opt/ibm/biginsights`.
* Set **HADOOP_HOME** to `BigInsights_Install_Directory/IHC`. For example, `/opt/ibm/biginsights/IHC`.
* Set **JAVA_HOME** to the location where Java is installed.
* For IBM InfoSphere BigInsights 4.x:
* Set **HADOOP_HOME** to `BigInsights_Install_Directory/hadoop`. For example, `/usr/iop/4.0.0.0/hadoop`.
* Set **JAVA_HOME** to the location where Java is installed.
2. If InfoSphere Streams does not have access to the location where Hadoop is installed,
copy the Hadoop library files to a location that is accessible to InfoSphere Streams
and set the appropriate environment variables.
Expand All @@ -69,13 +74,19 @@ Alternatively, you can fully qualify the operators that are provided by toolkit
cp -Lr /usr/lib/hadoop /usr/lib/hadoop-hdfs /path-on-cluster
2. Copy `/usr/lib/hadoop-hdfs` to the InfoSphere Streams cluster and place it in a directory on the cluster,
which is accessible to InfoSphere Streams.
* For IBM InfoSphere BigInsights:
* For IBM InfoSphere BigInsights 3.x:
1. Copy `BigInsights_Install_Directory/IHC` to the InfoSphere Streams cluster and place it under a directory
on the cluster, which is accessible to InfoSphere Streams. For example, `/home/Streams/BigInsights_Install_Directory/IHC`.
2. Copy the`BigInsights_Install_Directory/hadoop-conf` directory to the InfoSphere Streams cluster
and place it under a directory on the cluster, which is accessible to InfoSphere Streams.
For example, `/home/Streams/BigInsights_Install_Directory/hadoop-conf`
* For IBM InfoSphere BigInsights installed on GPFS:
* For IBM InfoSphere BigInsights 4.x:
1. Copy `BigInsights_Install_Directory/hadoop` to the InfoSphere Streams cluster and place it under a directory
on the cluster, which is accessible to InfoSphere Streams. For example, `/home/Streams/BigInsights_Install_Directory/hadoop`.
2. Copy the `BigInsights_Install_Directory/hadoop-hdfs` directory to the InfoSphere Streams cluster
and place it under a directory on the cluster, which is accessible to InfoSphere Streams.
For example, `/home/Streams/BigInsights_Install_Directory/hadoop-hdfs`
* For IBM InfoSphere BigInsights 3.x installed on GPFS:
* **Important**: If IBM InfoSphere BigInsights is installed on GPFS, you do not need to install InfoSphere Streams
on an IBM InfoSphere BigInsights data node. Use the `webhdfs://hdfshost:webhdfsport` schema in the URI
that you use to connect to GPFS.
Expand All @@ -87,21 +98,25 @@ Alternatively, you can fully qualify the operators that are provided by toolkit
For example, `/home/Streams/BigInsights_Install_Directory/hadoop-conf`
3. Copy `BigInsights_Install_Directory/lib/biginsights-gpfs.jar` to the InfoSphere Streams cluster
and place it under a directory on the cluster, which is accessible to InfoSphere Streams.
For example, `/home/Streams/BigInsights_Install_Directory`.
For example, `/home/Streams/BigInsights_Install_Directory`.
* For IBM InfoSphere BigInsights 4.0.0.0 installed on GPFS:
* The com.ibm.streamsx.hdfs toolkit does not support remote connections to a BigInsight 4.0.0.0 GPFS cluster.
The following list describes the environment variables to set when Hadoop and IBM InfoSphere BigInsights
libraries are copied to location that is accessible to InfoSphere Streams:
* For Apache HDFS, Cloudera, or Hortonworks Data Platform:
* Set **HADOOP_HOME** to `/home/Streams/hadoop`.
* Set **HADOOP_HOME** to `/home/Streams/hadoop`.l
* Set **JAVA_HOME** to the location where Java is installed.
* For IBM InfoSphere BigInsights:
* For IBM InfoSphere BigInsights 3.x:
* Set **HADOOP_HOME** to `/home/Streams/biginsights/IHC`.
* Set **BIGINSIGHTS_HOME** to `/home/Streams/biginsights`.
* Set **JAVA_HOME** to the location where Java is installed.
* IBM InfoSphere BigInsights installed on GPFS:
* For IBM InfoSphere BigInsights 4.x:
* Set **HADOOP_HOME** to `/home/Streams/biginsights/hadoop`.
* Set **JAVA_HOME** to the location where Java is installed.
* IBM InfoSphere BigInsights 3.x installed on GPFS:
* Set **HADOOP_HOME** to `/opt/ibm/biginsights/IHC/`.
* Set **BIGINSIGHTS_HOME** to `/opt/ibm/biginsights`.
* Set **JAVA_HOME** to the location where Java is installed.

* Set **JAVA_HOME** to the location where Java is installed.
3. Configure the SPL compiler to find the toolkit root directory. Use one of the following methods:
* Set the **STREAMS_SPLPATH** environment variable to the root directory of a toolkit
or multiple toolkits (with : as a separator). For example:
Expand All @@ -116,11 +131,11 @@ Alternatively, you can fully qualify the operators that are provided by toolkit
use com.ibm.streamsx.hdfs::*;
You can also specify a use clause for individual operators by replacing the asterisk (\*) with the operator name. For example:
use com.ibm.streamsx.hdfs::HDFS2FileSink;
5. If IBM InfoSphere BigInsights is installed on GPFS:
5. If IBM InfoSphere BigInsights 3.x or 4.x is installed on GPFS:
* To access GPFS locally, set the `fs.defaultFS` option in the `core-site.xml` configuration file to `gpfs:///`.
* To access GPFS remotely, modify the `core-site.xml` that you have copied over from the remote system.
* To access GPFS remotely (**only applies to BigInsights 3.x**), modify the `core-site.xml` that you have copied over from the remote system.
Set the `fs.default.FS` option in the `core-site.xml` configuration file to `webhdfs://hdfshost:webhdfsport`.
For example, `webhdfs://myhdfshost:14000`.
For example, `webhdfs://myhdfshost:14000`.
Ensure that the user is set up to access the file system by using the webhdfs schema.
6. To read and write to HDFS, specify a uniform resource identifier (URI) to connect to HDFS.
You can specify the URI in one of the following ways:
Expand All @@ -129,16 +144,17 @@ Alternatively, you can fully qualify the operators that are provided by toolkit
* `$HADOOP_HOME/../hadoop-conf`
* `$HADOOP_HOME/etc/hadoop`
* `$HADOOP_HOME/conf`
* `$HADOOP_HOME/share/hadoop/hdfs/*`
* `$HADOOP_HOME/share/hadoop/common/*`
* `$HADOOP_HOME/share/hadoop/common/lib/*`
* `$HADOOP_HOME/lib/*`
* `$HADOOP_HOME/*`
* `$HADOOP_HOME/share/hadoop/hdfs/\*`
* `$HADOOP_HOME/share/hadoop/common/\*`
* `$HADOOP_HOME/share/hadoop/common/lib/\*`
* `$HADOOP_HOME/lib/\*`
* `$HADOOP_HOME/\*`
Tip: To specify a different location for the HDFS configuration files, set the **configPath** operator parameter.
* Specify a value for the **hdfsUri** operator parameter.
7. Build your application. You can use the **sc** command or Streams Studio.
8. Start the InfoSphere Streams instance.
9. Run the application. You can submit the application as a job by using the **streamtool submitjob** command or by using Streams Studio.

</description>
<version>2.0.0</version>
<requiredProductVersion>4.0.0.0</requiredProductVersion>
Expand Down

0 comments on commit f54b32d

Please sign in to comment.