Skip to content

Commit

Permalink
Merge pull request datastax#1016 from datastax/SPARKC-355-russ
Browse files Browse the repository at this point in the history
SPARKC-355: Shade guava and include the Cassandra Java Driver in public distributions
  • Loading branch information
RussellSpitzer authored Aug 24, 2016
2 parents 374657c + 6004efe commit faefe51
Show file tree
Hide file tree
Showing 6 changed files with 242 additions and 26 deletions.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,7 @@ See [Building And Artifacts](doc/12_building_and_artifacts.md)
- [Python](doc/15_python.md)
- [Frequently Asked Questions](doc/FAQ.md)
- [Configuration Parameter Reference Table](doc/reference.md)
- [Tips for Developing the Spark Cassandra Connector](doc/developers.md)

## Online Training
### DataStax Academy
Expand All @@ -137,8 +138,11 @@ Make sure you have installed and enabled the Scala Plugin.
Open the project with IntelliJ IDEA and it will automatically create the project structure
from the provided SBT configuration.

[Tips for Developing the Spark Cassandra Connector](doc/developers.md)

Before contributing your changes to the project, please make sure that all unit tests and integration tests pass.
Don't forget to add an appropriate entry at the top of CHANGES.txt.
Create a Jira at the [Spark Cassandra Connector Jira](https://datastax-oss.atlassian.net/projects/SPARKC/issues)
Finally open a pull-request on GitHub and await review.

Please prefix pull request description with the JIRA number, for example: "SPARKC-123: Fix the ...".
Expand Down
84 changes: 84 additions & 0 deletions doc/developers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Documentation

## Developers Tips

### Getting Started

The Spark Cassandra Connector is built using sbt. There is a premade
launching script for sbt so it is unneccessary to download it. To invoke
this script you can run `./sbt/sbt` from a clone of this repository.

For information on setting up your clone please follow the [Github
Help](https://help.github.com/articles/cloning-a-repository/)

Once in the sbt shell you will be able to build and run tests for the
connector without any Spark or Cassandra nodes running. The most common
commands to use when developing the connector are

1. `test` - Runs the the unit tests for the project.
2. `it:test` - Runs the integeration tests with embedded C* and Spark
3. `assembly` - Builds a fat jar for using with --jars in spark submit or spark-shell

The integration tests located in `spark-cassandra-connector/src/it` should
probably be the first place to look for anyone considering adding code.
There are many examples of executing a feature of the connector with
the embedded Cassandra and Spark nodes and are the core of our test
coverage.

### Sub-Projects

The connector currently contains several subprojects
#### spark-cassandra-connector
This sub project contains all of the actual connector code and is where
any new features or tests should go. This Scala project also contains the
Java api and related code.

#### spark-cassandra-connector-embedded
The code used to start the embedded services used in the integration tests.
This contains methods for starting up C* as a thread within the running
test code.

#### spark-cassandra-connector-doc
Code for building reference documentation. This uses the code from
`spark-cassandra-connector` to determine what belongs in the reference
file. It should mostly be used for regenerating the reference file after
new parameters have been added or old parameters have been changed. Tests
in `spark-cassandra-connector` will throw errors if the reference file is
not up to date. To fix this run `spark-cassandra-connector-doc/run` to
update the file. It is still necessary to commit the changed file after
running this sub-project.

#### spark-cassandra-connector-perf
Code for performance based tests. Add any performance comparisons needed
to this project.

### Continuous Testing

It's often useful when implementing new features to have the tests run
in a loop on code change. Sbt provides a method for this by using the
`~` operator. With this `~ test` will run the unit tests every time a
change in the source code is detected. This is often useful to use in
conjunction with `testOnly` which runs a single test. So if a new feature
were being added to the integration suite `foo` you may want to run
`~ it:testOnly foo`. Which would only run the suite you are interested in
on a loop while you are modifying the connector code.

### Packaging

The `spark-shell` and `spark-submit` are able to use local caches to load
libraries and this can be taken advantage of by the SCC. For example
if you wanted to test the maven artifacts produced for your current build
you could run `publishM2` which would generate the needed artifacts and
pom in your local cache. You can then reference this from `spark-shell`
or `spark-submit` using the following code
```bash
./bin/spark-shell --repositories file:/Users/yourUser/.m2/repository --packages com.datastax.spark:spark-cassandra-connector_2.10:1.6.0-14-gcfca49e
```
Where you would change the revision `1.6.0-14-gcfca49e` to match the output
of your publish command.

This same method should work with `publishLocal`
after the merging of [SPARK-12666](https://issues.apache.org/jira/browse/SPARK-12666)



9 changes: 5 additions & 4 deletions project/Settings.scala
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ object Settings extends Build {
spAppendScalaVersion := true,
spIncludeMaven := true,
spIgnoreProvided := true,
spShade := true,
credentials += Credentials(Path.userHome / ".ivy2" / ".credentials")
)

Expand Down Expand Up @@ -205,7 +206,8 @@ object Settings extends Build {
lazy val defaultSettings = projectSettings ++ mimaSettings ++ releaseSettings ++ testSettings

lazy val rootSettings = Seq(
cleanKeepFiles ++= Seq("resolution-cache", "streams", "spark-archives").map(target.value / _)
cleanKeepFiles ++= Seq("resolution-cache", "streams", "spark-archives").map(target.value / _),
updateOptions := updateOptions.value.withCachedResolution(true)
)

lazy val demoSettings = projectSettings ++ noPublish ++ Seq(
Expand All @@ -230,7 +232,7 @@ object Settings extends Build {
cp
}
)
lazy val assembledSettings = defaultSettings ++ customTasks ++ sparkPackageSettings ++ sbtAssemblySettings
lazy val assembledSettings = defaultSettings ++ customTasks ++ sbtAssemblySettings ++ sparkPackageSettings

val testOptionSettings = Seq(
Tests.Argument(TestFrameworks.ScalaTest, "-oDF"),
Expand Down Expand Up @@ -347,8 +349,7 @@ object Settings extends Build {
assemblyShadeRules in assembly := {
val shadePackage = "shade.com.datastax.spark.connector"
Seq(
ShadeRule.rename("com.google.common.**" -> s"$shadePackage.google.common.@1").inAll,
ShadeRule.rename("io.netty.**" -> s"$shadePackage.netty.@1").inAll
ShadeRule.rename("com.google.common.**" -> s"$shadePackage.google.common.@1").inAll
)
}
)
Expand Down
Loading

0 comments on commit faefe51

Please sign in to comment.