diff --git a/com.ibm.streamsx.hdfs/com.ibm.streamsx.hdfs/HDFS2DirectoryScan/HDFS2DirectoryScan.xml b/com.ibm.streamsx.hdfs/com.ibm.streamsx.hdfs/HDFS2DirectoryScan/HDFS2DirectoryScan.xml index d3cb81d..0ebe7fb 100644 --- a/com.ibm.streamsx.hdfs/com.ibm.streamsx.hdfs/HDFS2DirectoryScan/HDFS2DirectoryScan.xml +++ b/com.ibm.streamsx.hdfs/com.ibm.streamsx.hdfs/HDFS2DirectoryScan/HDFS2DirectoryScan.xml @@ -10,18 +10,19 @@ The `HDFS2DirectoryScan` is similar to the `DirectoryScan` operator. The `HDFS2DirectoryScan` operator repeatedly scans an HDFS directory and writes the names of new or modified files that are found in the directory to the output port. The operator sleeps between scans. -# Consistent Region Behavior +# Behavior in a consistent region - * The operator can participate in a consistent. - * The operator can be at the start of a consistent region if there is no input port. - * The operator supports periodic and operator-driven consistent region policies. - * If consistent region policy is set as operator driven, the operator initiates a drain after - each tuple is submitted. This allows for a consistent state to be established after a file is fully processed. - * If consistent region policy is set as periodic, the operator respects the period setting - and establishes consistent states accordingly. - This means that multiple files can be processed before a consistent state is established. - * At checkpoint, the operator saves the last submitted filename and its modification timestamp to the checkpoint. - * Upon application failures, the operator resubmits all files that are newer than the last submitted file at checkpoint. + The `HDFS2DirectoryScan` operator can participate in a consistent region. + The operator can be at the start of a consistent region if there is no input port. + The operator supports periodic and operator-driven consistent region policies. + + If consistent region policy is set as operator driven, the operator initiates a drain after each tuple is submitted. + This allows for a consistent state to be established after a file is fully processed. + If consistent region policy is set as periodic, the operator respects the period setting and establishes consistent states accordingly. + This means that multiple files can be processed before a consistent state is established. + + At checkpoint, the operator saves the last submitted filename and its modification timestamp to the checkpoint. + Upon application failures, the operator resubmits all files that are newer than the last submitted file at checkpoint. # Exceptions diff --git a/com.ibm.streamsx.hdfs/com.ibm.streamsx.hdfs/HDFS2FileSink/HDFS2FileSink.xml b/com.ibm.streamsx.hdfs/com.ibm.streamsx.hdfs/HDFS2FileSink/HDFS2FileSink.xml index 6747117..2f5a0b4 100644 --- a/com.ibm.streamsx.hdfs/com.ibm.streamsx.hdfs/HDFS2FileSink/HDFS2FileSink.xml +++ b/com.ibm.streamsx.hdfs/com.ibm.streamsx.hdfs/HDFS2FileSink/HDFS2FileSink.xml @@ -12,9 +12,9 @@ You can optionally control whether the operator closes the current output file a of the file in bytes, the number of tuples that are written to the file, or the time in seconds that the file is open for writing, or when the operator receives a punctuation marker. -# Consistent Region Behavior +# Behavior in a consistent region -The `HDFS2FileSink` operator supports consistent region. +The `HDFS2FileSink` operator can participate in a consistent region. The operator can be part of a consistent region, but cannot be at the start of a consistent region. The operator guarantees that tuples are written to a file in HDFS at least once, but duplicated tuples can be written to the file if application failure occurs. diff --git a/com.ibm.streamsx.hdfs/com.ibm.streamsx.hdfs/HDFS2FileSource/HDFS2FileSource.xml b/com.ibm.streamsx.hdfs/com.ibm.streamsx.hdfs/HDFS2FileSource/HDFS2FileSource.xml index ade315f..b9c28fa 100644 --- a/com.ibm.streamsx.hdfs/com.ibm.streamsx.hdfs/HDFS2FileSource/HDFS2FileSource.xml +++ b/com.ibm.streamsx.hdfs/com.ibm.streamsx.hdfs/HDFS2FileSource/HDFS2FileSource.xml @@ -11,22 +11,22 @@ The operator opens a file on HDFS and sends out its contents in tuple format on If the optional input port is not specified, the operator reads the HDFS file that is specified in the **file** parameter and provides the file contents on the output port. If the optional input port is configured, the operator reads the files that are named by the attribute in the tuples that arrive on its input port and places a punctuation marker between each file. + +# Behavior in a consistent region -# Consistent Region Behavior +The `HDFS2FileSource` operator can participate in a consistent region. +The operator can be at the start of a consistent region if there is no input port. - * The operator can participate in a consistent. - * The operator can be at the start of a consistent region if there is no input port. - * The operator supports periodic and operator-driven consistent region policies. - * If consistent region policy is set as operator driven, - the operator initiates a drain after a file is fully read. - * If consistent region policy is set as periodic, the operator respects the period setting - and establishes consistent states accordingly. - This means that multiple consistent states can be established before a file is fully read. - * At checkpoint, the operator saves the current file name and file cursor location. - * If the operator does not have an input port, upon application failures, the operator resets - the file cursor back to the checkpointed location, and starts replaying tuples from the cursor location. - * If the operator has an input port and is in a consistent region, the operator relies on its upstream operators - to properly replay the filenames for it to re-read the files from the beginning. +The operator supports periodic and operator-driven consistent region policies. +If the consistent region policy is set as operator driven, the operator initiates a drain after a file is fully read. +If the consistent region policy is set as periodic, the operator respects the period setting and establishes consistent states accordingly. +This means that multiple consistent states can be established before a file is fully read. + +At checkpoint, the operator saves the current file name and file cursor location. +If the operator does not have an input port, upon application failures, the operator resets +the file cursor back to the checkpointed location, and starts replaying tuples from the cursor location. +If the operator has an input port and is in a consistent region, the operator relies on its upstream operators +to properly reply the filenames for it to re-read the files from the beginning. # Exceptions @@ -129,7 +129,7 @@ The following example shows how the operator accesses GPFS remotely and reads a file -This parameter specifies the name of file that the operator opens and reads. +This parameter specifies the name of the file that the operator opens and reads. This parameter must be specified when the optional input port is not configured. If the optional input port is used and the file name is specified, the operator generates an error. true