Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Cache Argument Method for Spark 2.4 #100

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 19 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,28 +60,30 @@ The Almaren Framework provides a simplified consistent minimalistic layer over A
To add Almaren Framework dependency to your sbt build:

```
libraryDependencies += "com.github.music-of-the-ainur" %% "almaren-framework" % "0.9.10-2.4"
libraryDependencies += "com.github.music-of-the-ainur" %% "almaren-framework" % "0.9.11-2.4"
```

To run in spark-shell:

```
spark-shell --packages "com.github.music-of-the-ainur:almaren-framework_2.12:0.9.10-2.4"
spark-shell --packages "com.github.music-of-the-ainur:almaren-framework_2.12:0.9.11-2.4"
```

Almaren connector is available in
[Maven Central](https://mvnrepository.com/artifact/com.github.music-of-the-ainur) repository.

| version | Connector Artifact |
|----------------------------|------------------------------------------------------------------|
| Spark 3.4.x and scala 2.13 | `com.github.music-of-the-ainur:almaren-framework_2.13:0.9.10-3.4` |
| Spark 3.4.x and scala 2.12 | `com.github.music-of-the-ainur:almaren-framework_2.12:0.9.10-3.4` |
| Spark 3.3.x and scala 2.13 | `com.github.music-of-the-ainur:almaren-framework_2.13:0.9.10-3.3` |
| Spark 3.3.x and scala 2.12 | `com.github.music-of-the-ainur:almaren-framework_2.12:0.9.10-3.3` |
| Spark 3.2.x and scala 2.12 | `com.github.music-of-the-ainur:almaren-framework_2.12:0.9.10-3.2` |
| Spark 3.1.x and scala 2.12 | `com.github.music-of-the-ainur:almaren-framework_2.12:0.9.10-3.1` |
| Spark 2.4.x and scala 2.12 | `com.github.music-of-the-ainur:almaren-framework_2.12:0.9.10-2.4` |
| Spark 2.4.x and scala 2.11 | `com.github.music-of-the-ainur:almaren-framework_2.11:0.9.10-2.4` |
| version | Connector Artifact |
|----------------------------|-------------------------------------------------------------------|
| Spark 3.5.x and scala 2.13 | `com.github.music-of-the-ainur:almaren-framework_2.13:0.9.11-3.5` |
| Spark 3.5.x and scala 2.12 | `com.github.music-of-the-ainur:almaren-framework_2.12:0.9.11-3.5` |
| Spark 3.4.x and scala 2.13 | `com.github.music-of-the-ainur:almaren-framework_2.13:0.9.11-3.4` |
| Spark 3.4.x and scala 2.12 | `com.github.music-of-the-ainur:almaren-framework_2.12:0.9.11-3.4` |
| Spark 3.3.x and scala 2.13 | `com.github.music-of-the-ainur:almaren-framework_2.13:0.9.11-3.3` |
| Spark 3.3.x and scala 2.12 | `com.github.music-of-the-ainur:almaren-framework_2.12:0.9.11-3.3` |
| Spark 3.2.x and scala 2.12 | `com.github.music-of-the-ainur:almaren-framework_2.12:0.9.11-3.2` |
| Spark 3.1.x and scala 2.12 | `com.github.music-of-the-ainur:almaren-framework_2.12:0.9.11-3.1` |
| Spark 2.4.x and scala 2.12 | `com.github.music-of-the-ainur:almaren-framework_2.12:0.9.11-2.4` |
| Spark 2.4.x and scala 2.11 | `com.github.music-of-the-ainur:almaren-framework_2.11:0.9.11-2.4` |

### Batch Example
```scala
Expand Down Expand Up @@ -350,6 +352,11 @@ Cache/Uncache both DataFrame or Table
```scala
cache(true)
```
Cache Dataframe with Storage Level

```scala
cache(true,storageLevel = MEMORY_AND_DISK)
```

#### Coalesce

Expand Down
4 changes: 2 additions & 2 deletions build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-avro" % sparkVersion,
"com.databricks" %% "spark-xml" % "0.13.0",
"com.github.music-of-the-ainur" %% "quenya-dsl" % s"1.2.3-${majorVersion}",
"org.scalatest" %% "scalatest" % "3.2.14" % "test",
"org.postgresql" % "postgresql" % "42.2.8" % "test"
"org.scalatest" %% "scalatest" % "3.2.17" % "test",
"org.postgresql" % "postgresql" % "42.6.0" % "test"
)

enablePlugins(GitVersioning)
Expand Down
2 changes: 1 addition & 1 deletion project/build.properties
Original file line number Diff line number Diff line change
@@ -1 +1 @@
sbt.version=1.3.7
sbt.version=1.9.6
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import com.github.music.of.the.ainur.almaren.Tree
import com.github.music.of.the.ainur.almaren.builder.Core
import com.github.music.of.the.ainur.almaren.state.core._
import org.apache.spark.sql.Column
import org.apache.spark.storage.StorageLevel

private[almaren] trait Main extends Core {
def sql(sql: String): Option[Tree] =
Expand All @@ -12,8 +13,8 @@ private[almaren] trait Main extends Core {
def alias(alias:String): Option[Tree] =
Alias(alias)

def cache(opType:Boolean = true,tableName:Option[String] = None): Option[Tree] =
Cache(opType, tableName)
def cache(opType:Boolean = true,tableName:Option[String] = None,storageLevel: Option[StorageLevel] = None): Option[Tree] =
Cache(opType, tableName = tableName, storageLevel = storageLevel)

def coalesce(size:Int): Option[Tree] =
Coalesce(size)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ package com.github.music.of.the.ainur.almaren.state.core
import com.github.music.of.the.ainur.almaren.State
import com.github.music.of.the.ainur.almaren.util.Constants
import org.apache.spark.sql.{Column, DataFrame}
import org.apache.spark.storage.StorageLevel

private[almaren] abstract class Main extends State {
override def executor(df: DataFrame): DataFrame = core(df)
Expand Down Expand Up @@ -82,22 +83,29 @@ case class Alias(alias:String) extends Main {
}
}

case class Cache(opType:Boolean = true,tableName:Option[String] = None) extends Main {
case class Cache(opType: Boolean = true, tableName: Option[String] = None, storageLevel: Option[StorageLevel] = None) extends Main {
override def core(df: DataFrame): DataFrame = cache(df)

def cache(df: DataFrame): DataFrame = {
logger.info(s"opType:{$opType}, tableName{$tableName}")
logger.info(s"opType:{$opType}, tableName:{$tableName}, StorageType:{$storageLevel}")
tableName match {
case Some(t) => cacheTable(df,t)
case None => cacheDf(df)
case Some(t) => cacheTable(df, t)
case None => cacheDf(df, storageLevel)
}
df
}
private def cacheDf(df:DataFrame): Unit = opType match {
case true => df.persist()
private def cacheDf(df: DataFrame, storageLevel: Option[StorageLevel]): Unit = opType match {
case true => {
storageLevel match {
case Some(value) => df.persist(value)
case None => df.persist()
}
}
case false => df.unpersist()

}
private def cacheTable(df:DataFrame,tableName: String): Unit =

private def cacheTable(df: DataFrame, tableName: String): Unit =
opType match {
case true => df.sqlContext.cacheTable(tableName)
case false => df.sqlContext.uncacheTable(tableName)
Expand Down
14 changes: 13 additions & 1 deletion src/test/scala/com/github/music/of/the/ainur/almaren/Test.scala
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ import org.scalatest._
import java.io.File
import scala.collection.immutable._
import org.scalatest.funsuite.AnyFunSuite

import org.apache.spark.storage.StorageLevel._

class Test extends AnyFunSuite with BeforeAndAfter {
val almaren = Almaren("App Test")
Expand Down Expand Up @@ -385,6 +385,18 @@ class Test extends AnyFunSuite with BeforeAndAfter {
assert(bool_cache)
}

val testCacheDfStorage: DataFrame = almaren.builder.sourceSql("select * from cache_test").cache(true,storageLevel = Some(MEMORY_ONLY)).batch
val bool_cache_storage = testCacheDfStorage.storageLevel.useMemory
test("Testing Cache Memory Storage") {
assert(bool_cache_storage)
}

val testCacheDfDiskStorage: DataFrame = almaren.builder.sourceSql("select * from cache_test").cache(true, storageLevel = Some(DISK_ONLY)).batch
val bool_cache_disk_storage = testCacheDfDiskStorage.storageLevel.useDisk
test("Testing Cache Disk Storage") {
assert(bool_cache_disk_storage)
}

val testUnCacheDf = almaren.builder.sourceSql("select * from cache_test").cache(false).batch
val bool_uncache = testUnCacheDf.storageLevel.useMemory
test("Testing Uncache") {
Expand Down
Loading