Skip to content

Commit

Permalink
Merge pull request #693 from AbsaOSS/release/2.7.3
Browse files Browse the repository at this point in the history
Release Cobrix v2.7.3
  • Loading branch information
yruslan authored Jul 17, 2024
2 parents 6368a8b + c233e9b commit 7696102
Show file tree
Hide file tree
Showing 16 changed files with 55 additions and 40 deletions.
8 changes: 4 additions & 4 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,16 +18,16 @@ jobs:
strategy:
fail-fast: false
matrix:
scala: [ 2.11.12, 2.12.19, 2.13.13 ]
spark: [ 2.4.8, 3.4.2, 3.5.1 ]
scala: [ 2.11.12, 2.12.19, 2.13.14 ]
spark: [ 2.4.8, 3.4.3, 3.5.1 ]
exclude:
- scala: 2.11.12
spark: 3.4.2
spark: 3.4.3
- scala: 2.11.12
spark: 3.5.1
- scala: 2.12.19
spark: 2.4.8
- scala: 2.13.13
- scala: 2.13.14
spark: 2.4.8
name: Spark ${{matrix.spark}} on Scala ${{matrix.scala}}
steps:
Expand Down
42 changes: 27 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,13 +74,13 @@ You can link against this library in your program at the following coordinates:
</tr>
<tr>
<td>
<pre>groupId: za.co.absa.cobrix<br>artifactId: spark-cobol_2.11<br>version: 2.7.2</pre>
<pre>groupId: za.co.absa.cobrix<br>artifactId: spark-cobol_2.11<br>version: 2.7.3</pre>
</td>
<td>
<pre>groupId: za.co.absa.cobrix<br>artifactId: spark-cobol_2.12<br>version: 2.7.2</pre>
<pre>groupId: za.co.absa.cobrix<br>artifactId: spark-cobol_2.12<br>version: 2.7.3</pre>
</td>
<td>
<pre>groupId: za.co.absa.cobrix<br>artifactId: spark-cobol_2.13<br>version: 2.7.2</pre>
<pre>groupId: za.co.absa.cobrix<br>artifactId: spark-cobol_2.13<br>version: 2.7.3</pre>
</td>
</tr>
</table>
Expand All @@ -91,17 +91,17 @@ This package can be added to Spark using the `--packages` command line option. F

### Spark compiled with Scala 2.11
```
$SPARK_HOME/bin/spark-shell --packages za.co.absa.cobrix:spark-cobol_2.11:2.7.2
$SPARK_HOME/bin/spark-shell --packages za.co.absa.cobrix:spark-cobol_2.11:2.7.3
```

### Spark compiled with Scala 2.12
```
$SPARK_HOME/bin/spark-shell --packages za.co.absa.cobrix:spark-cobol_2.12:2.7.2
$SPARK_HOME/bin/spark-shell --packages za.co.absa.cobrix:spark-cobol_2.12:2.7.3
```

### Spark compiled with Scala 2.13
```
$SPARK_HOME/bin/spark-shell --packages za.co.absa.cobrix:spark-cobol_2.13:2.7.2
$SPARK_HOME/bin/spark-shell --packages za.co.absa.cobrix:spark-cobol_2.13:2.7.3
```

## Usage
Expand Down Expand Up @@ -238,18 +238,18 @@ to decode various binary formats.

The jars that you need to get are:

* spark-cobol_2.12-2.7.2.jar
* cobol-parser_2.12-2.7.2.jar
* spark-cobol_2.12-2.7.3.jar
* cobol-parser_2.12-2.7.3.jar
* scodec-core_2.12-1.10.3.jar
* scodec-bits_2.12-1.1.4.jar

> Versions older than 2.7.1 also need `antlr4-runtime-4.8.jar`.
After that you can specify these jars in `spark-shell` command line. Here is an example:
```
$ spark-shell --packages za.co.absa.cobrix:spark-cobol_2.12:2.7.2
$ spark-shell --packages za.co.absa.cobrix:spark-cobol_2.12:2.7.3
or
$ spark-shell --master yarn --deploy-mode client --driver-cores 4 --driver-memory 4G --jars spark-cobol_2.12-2.7.2.jar,cobol-parser_2.12-2.7.2.jar,scodec-core_2.12-1.10.3.jar,scodec-bits_2.12-1.1.4.jar
$ spark-shell --master yarn --deploy-mode client --driver-cores 4 --driver-memory 4G --jars spark-cobol_2.12-2.7.3.jar,cobol-parser_2.12-2.7.3.jar,scodec-core_2.12-1.10.3.jar,scodec-bits_2.12-1.1.4.jar
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Expand Down Expand Up @@ -310,17 +310,17 @@ Creating an uber jar for Cobrix is very easy. Steps to build:
sbt -DSPARK_VERSION="3.4.0" ++2.12.19 assembly

# For Scala 2.13
sbt -DSPARK_VERSION="3.3.2" ++2.13.13 assembly
sbt -DSPARK_VERSION="3.4.0" ++2.13.13 assembly
sbt -DSPARK_VERSION="3.3.2" ++2.13.14 assembly
sbt -DSPARK_VERSION="3.4.0" ++2.13.14 assembly
```

You can collect the uber jar of `spark-cobol` either at
`spark-cobol/target/scala-2.11/` or in `spark-cobol/target/scala-2.12/` depending on the Scala version you used.
The fat jar will have '-bundle' suffix. You can also download pre-built bundles from https://github.com/AbsaOSS/cobrix/releases/tag/v2.7.2
The fat jar will have '-bundle' suffix. You can also download pre-built bundles from https://github.com/AbsaOSS/cobrix/releases/tag/v2.7.3

Then, run `spark-shell` or `spark-submit` adding the fat jar as the option.
```sh
$ spark-shell --jars spark-cobol_2.12_3.3-2.7.3-SNAPSHOT-bundle.jar
$ spark-shell --jars spark-cobol_2.12_3.3-2.7.4-SNAPSHOT-bundle.jar
```

> <b>A note for building and running tests on Windows</b>
Expand All @@ -332,7 +332,7 @@ $ spark-shell --jars spark-cobol_2.12_3.3-2.7.3-SNAPSHOT-bundle.jar
> ```sh
> sbt ++2.11.12 assembly
> sbt ++2.12.19 assembly
> sbt ++2.13.13 assembly
> sbt ++2.13.14 assembly
> ```

## Other Features
Expand Down Expand Up @@ -1770,6 +1770,18 @@ at org.apache.hadoop.io.nativeio.NativeIO$POSIX.getStat(NativeIO.java:608)
A: Update hadoop dll to version 3.2.2 or newer.

## Changelog
- #### 2.7.3 released 17 Jule 2024.
- [#678](https://github.com/AbsaOSS/cobrix/issues/678) Add the ability to generate Spark schema based on strict integral precision:
```scala
// `decimal(n,0)` will be used instead of `integer` and `long`
.option("strict_integral_precision", "true")
```
- [#689](https://github.com/AbsaOSS/cobrix/issues/689) Add support for '_' for hierarchical key generation at leaf level:
```scala
.option("segment_id_level0", "SEG0") // Root segment
.option("segment_id_level1", "_") // Leaf segment (use 'all other' segment IDs)
```

- #### 2.7.2 released 7 June 2024.
- [#684](https://github.com/AbsaOSS/cobrix/issues/684) Fixed failing to read a data file in certain combination of options.
- [#685](https://github.com/AbsaOSS/cobrix/issues/685) Added methods to flatten schema of a dataframe more effective than `flattenSchema()`, but does not flatten arrays:
Expand Down
2 changes: 1 addition & 1 deletion build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ import com.github.sbt.jacoco.report.JacocoReportSettings

lazy val scala211 = "2.11.12"
lazy val scala212 = "2.12.19"
lazy val scala213 = "2.13.13"
lazy val scala213 = "2.13.14"

ThisBuild / organization := "za.co.absa.cobrix"

Expand Down
2 changes: 1 addition & 1 deletion cobol-converters/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
<parent>
<groupId>za.co.absa.cobrix</groupId>
<artifactId>cobrix_2.12</artifactId>
<version>2.7.3-SNAPSHOT</version>
<version>2.7.4-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
</parent>

Expand Down
2 changes: 1 addition & 1 deletion cobol-parser/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
<parent>
<groupId>za.co.absa.cobrix</groupId>
<artifactId>cobrix_2.12</artifactId>
<version>2.7.3-SNAPSHOT</version>
<version>2.7.4-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
</parent>

Expand Down
2 changes: 1 addition & 1 deletion examples/examples-collection/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
<scala.compat.version>2.11</scala.compat.version>
<spark.version>2.4.8</spark.version>
<specs.version>2.4.16</specs.version>
<spark.cobol.version>2.7.2</spark.cobol.version>
<spark.cobol.version>2.7.3</spark.cobol.version>
</properties>

<dependencies>
Expand Down
12 changes: 6 additions & 6 deletions examples/spark-cobol-app/build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@ ThisBuild / name := "spark-cobol-app"
ThisBuild / version := "0.1.0-SNAPSHOT"
ThisBuild / scalaVersion := "2.12.17"

val sparkVersion = "3.3.2"
val sparkCobolVersion = "2.7.2"
val sparkVersion = "3.5.1"
val sparkCobolVersion = "2.7.3"
val scalatestVersion = "3.2.14"

ThisBuild / libraryDependencies ++= Seq(
Expand All @@ -30,17 +30,17 @@ ThisBuild / libraryDependencies ++= Seq(
)

// Do not run tests in parallel
parallelExecution in Test := false
Test / parallelExecution := false

// Do not run tests on assembly
test in assembly := {}
assembly / test := {}

// Do not include Scala in the fat jar
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
assembly / assemblyOption := (assemblyOption in assembly).value.copy(includeScala = false)

// This merge strategy retains service entries for all services in manifest.
// It allows custom Spark data sources to be used together, e.g. 'spark-xml' and 'spark-cobol'.
assemblyMergeStrategy in assembly := {
assembly / assemblyMergeStrategy := {
case PathList("META-INF", xs @ _*) =>
xs map {_.toLowerCase} match {
case "manifest.mf" :: Nil =>
Expand Down
4 changes: 2 additions & 2 deletions examples/spark-cobol-app/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@
<scala.version>2.12.17</scala.version>
<scala.compat.version>2.12</scala.compat.version>
<scalatest.version>3.2.14</scalatest.version>
<spark.version>3.3.2</spark.version>
<spark.cobol.version>2.7.2</spark.cobol.version>
<spark.version>3.5.1</spark.version>
<spark.cobol.version>2.7.3</spark.cobol.version>
</properties>

<dependencies>
Expand Down
2 changes: 1 addition & 1 deletion examples/spark-cobol-app/project/build.properties
Original file line number Diff line number Diff line change
@@ -1 +1 @@
sbt.version=1.8.0
sbt.version=1.9.9
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ class CustomRecordHeadersParser extends Serializable with RecordHeaderParser {
* @param recordNum A sequential record number
* @return A parsed record metadata
*/
override def getRecordMetadata(header: Array[Byte], fileOffset: Long, fileSize: Long, recordNum: Long): RecordMetadata = {
override def getRecordMetadata(header: Array[Byte], fileOffset: Long, maxOffset: Long, fileSize: Long, recordNum: Long): RecordMetadata = {
val rdwHeaderBlock = getHeaderLength
if (header.length < rdwHeaderBlock) {
RecordMetadata(-1, isValid = false)
Expand All @@ -52,7 +52,10 @@ class CustomRecordHeadersParser extends Serializable with RecordHeaderParser {
val rdwHeaders = header.map(_ & 0xFF).mkString(",")
throw new IllegalStateException(s"Custom RDW headers too big (length = $recordLength > ${Constants.maxRdWRecordSize}). Headers = $rdwHeaders at $fileOffset.")
}
RecordMetadata(recordLength, isValid)
if (maxOffset - fileOffset >= recordLength)
RecordMetadata(recordLength, isValid)
else
RecordMetadata(-1, isValid = false)
} else {
val rdwHeaders = header.map(_ & 0xFF).mkString(",")
throw new IllegalStateException(s"Custom RDW headers should never be zero ($rdwHeaders). Found zero size record at $fileOffset.")
Expand Down
2 changes: 1 addition & 1 deletion examples/spark-cobol-s3-standalone/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
<scala.compat.version>2.11</scala.compat.version>
<scalatest.version>3.2.3</scalatest.version>
<spark.version>2.4.8</spark.version>
<spark.cobol.version>2.7.2</spark.cobol.version>
<spark.cobol.version>2.7.3</spark.cobol.version>
<hadoop.version>3.2.4</hadoop.version>
</properties>

Expand Down
2 changes: 1 addition & 1 deletion examples/spark-cobol-s3/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
<scala.compat.version>2.11</scala.compat.version>
<scalatest.version>3.2.14</scalatest.version>
<spark.version>2.4.8</spark.version>
<spark.cobol.version>2.7.2</spark.cobol.version>
<spark.cobol.version>2.7.3</spark.cobol.version>
</properties>

<dependencies>
Expand Down
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
<groupId>za.co.absa.cobrix</groupId>
<artifactId>cobrix_2.12</artifactId>

<version>2.7.3-SNAPSHOT</version>
<version>2.7.4-SNAPSHOT</version>

<packaging>pom</packaging>

Expand Down
2 changes: 1 addition & 1 deletion project/Dependencies.scala
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ object Dependencies {
private val mockitoVersion = "4.11.0"

private val defaultSparkVersionForScala211 = "2.4.8"
private val defaultSparkVersionForScala212 = "3.4.2"
private val defaultSparkVersionForScala212 = "3.4.3"
private val defaultSparkVersionForScala213 = "3.5.1"

def sparkFallbackVersion(scalaVersion: String): String = {
Expand Down
2 changes: 1 addition & 1 deletion spark-cobol/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
<parent>
<groupId>za.co.absa.cobrix</groupId>
<artifactId>cobrix_2.12</artifactId>
<version>2.7.3-SNAPSHOT</version>
<version>2.7.4-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
</parent>

Expand Down
2 changes: 1 addition & 1 deletion version.sbt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
ThisBuild / version := "2.7.3-SNAPSHOT"
ThisBuild / version := "2.7.4-SNAPSHOT"

0 comments on commit 7696102

Please sign in to comment.