Releases: IBMStreams/streamsx.hdfs
HDFS for Bluemix Toolkit v2.1.0
This is the first release of the HDFS for Bluemix toolkit and has been tested with the Analytics for Apache Hadoop service and the Streaming Analytics (Streams 4.1) service on Bluemix. It does not include support for information governance.
HDFS Toolkit v3.0.0
This is an official release of the HDFS v3.0.0 toolkit to support IBM Streams v4.1.
Highlights of the release include:
- Support for data governance
- Support for IBM BigInsights 4.1
BlueMix Support via WebHdfs v0.3.2
This provides support for HDFS access on Bluemix. It includes:
WebHdfsRead
andWebHdfsReadFiles
for reading filesWebHdfsWrite
for writing filesWebHdfsDirectoryScan
for scanning directories.
This release brings:
- samples importable in StreamsStudio and other sample cleanup
- composite operators use partitionColocation to fuse their operators together
Alpha version of HDFS Toolkit, v2.1
This alpha release adds the ability to read text and sequence files in parallel via the HadoopReader operator. If you are looking for a production-release, the latest one is v2.0.0.
Unlike the HDFS2FileSource, which reads either lines or binary blobs, the HadoopReader reads key-value pairs. (For text files, the key is the position in the file.)
When in a parallel region, the HadoopReader reads a portion of the file as determined by its channel. Note that files compressed with unsplittable compression cannot be read in parallel, and only channel 0 will produce any tuples. However, sequence files, text files, and text files compressed with splitable compression (ie, with bz2) are read in parallel.
Some limitations of the operator are given here.
The demos/WordCount
directory gives an example of using this operator to do word count.
Note that as this is a pre-release. The operator interface (and even then name) may change, and there is no guarantee that this will be in the the official HDFS toolkit v2.1.0, the next product version, or in any future version. The code is in the SequenceFile branch, not the master branch.
HDFS Toolkit v2.0.0
This is an official release of the HDFS v2.0 toolkit to support InfoSphere Streams v4.0.
Highlights of the release include:
- Updates to all operators to support Application Bundle
- Support for consistent region
- Support for InfoSphere Big Insight v3.0.0.2
- Support for Cloudera CDH 5
- Support for HortonWorks HDP 2.2.0
- Support for reading and writing data into HDFS in binary format
- Support for dyanamic filename for HDFS2FileSink operator
v1.2.0 of HDFS Toolkit for Streams 3.2.1
In this release, we have the following changes:
- HDFS operators are renamed back to HDFS2 operators
- Fixing #31
v1.0 HDFS Toolkit
This is a prerelease of the HDFS toolkit. This release contains:
- a snapshot of the HDFS2* operators from Streams 3.2.1 release
- Issue #15 : HDFSFile Sink does not flush buffer on job cancel