Skip to content

Commit

Permalink
Merge pull request #273 from marklogic/release/2.3.0.rc1
Browse files Browse the repository at this point in the history
Merge 2.3.0.rc1
  • Loading branch information
rjrudin authored Jul 26, 2024
2 parents 1a907f8 + 7ad73ac commit 75fa27e
Show file tree
Hide file tree
Showing 252 changed files with 9,925 additions and 1,399 deletions.
2 changes: 1 addition & 1 deletion .env
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
# Defines environment variables for docker-compose.
# Can be overridden via e.g. `MARKLOGIC_TAG=latest-10.0 docker-compose up -d --build`.
MARKLOGIC_TAG=11.1.0-centos-1.1.0
MARKLOGIC_TAG=11.2.0-centos-1.1.2
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,4 @@ logs
venv
.venv
docker
export
79 changes: 39 additions & 40 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,26 +24,6 @@ The above will result in a new MarkLogic instance with a single node.
Alternatively, if you would like to test against a 3-node MarkLogic cluster with a load balancer in front of it,
run `docker-compose -f docker-compose-3nodes.yaml up -d --build`.

## Accessing MarkLogic logs in Grafana

This project's `docker-compose-3nodes.yaml` file includes
[Grafana, Loki, and promtail services](https://grafana.com/docs/loki/latest/clients/promtail/) for the primary reason of
collecting MarkLogic log files and allowing them to be viewed and searched via Grafana.

Once you have run `docker-compose`, you can access Grafana at http://localhost:3000 . Follow these instructions to
access MarkLogic logging data:

1. Click on the hamburger in the upper left hand corner and select "Explore", or simply go to
http://localhost:3000/explore .
2. Verify that "Loki" is the default data source - you should see it selected in the upper left hand corner below
the "Home" link.
3. Click on the "Select label" dropdown and choose `job`. Click on the "Select value" label for this filter and
select `marklogic` as the value.
4. Click on the blue "Run query" button in the upper right hand corner.

You should now see logs from all 3 nodes in the MarkLogic cluster.


## Deploying the test application

To deploy the test application, first create `./gradle-local.properties` and add the following to it:
Expand All @@ -63,20 +43,6 @@ To run the tests against the test application, run the following Gradle task:

./gradlew test

If you installed MarkLogic using this project's `docker-compose.yaml` file, you can also run the tests from within the
Docker environment by first running the following task:

./gradlew dockerBuildCache

The above task is a mostly one-time step to build a Docker image that contains all of this project's Gradle
dependencies. This will allow the next step to run much more quickly. You'll only need to run this again when the
project's Gradle dependencies change.

You can then run the tests from within the Docker environment via the following task:

./gradlew dockerTest


## Generating code quality reports with SonarQube

In order to use SonarQube, you must have used Docker to run this project's `docker-compose.yml` file and you must
Expand Down Expand Up @@ -117,6 +83,25 @@ you've introduced on the feature branch you're working on. You can then click on
Note that if you only need results on code smells and vulnerabilities, you can repeatedly run `./gradlew sonar`
without having to re-run the tests.

## Accessing MarkLogic logs in Grafana

This project's `docker-compose-3nodes.yaml` file includes
[Grafana, Loki, and promtail services](https://grafana.com/docs/loki/latest/clients/promtail/) for the primary reason of
collecting MarkLogic log files and allowing them to be viewed and searched via Grafana.

Once you have run `docker-compose`, you can access Grafana at http://localhost:3000 . Follow these instructions to
access MarkLogic logging data:

1. Click on the hamburger in the upper left hand corner and select "Explore", or simply go to
http://localhost:3000/explore .
2. Verify that "Loki" is the default data source - you should see it selected in the upper left hand corner below
the "Home" link.
3. Click on the "Select label" dropdown and choose `job`. Click on the "Select value" label for this filter and
select `marklogic` as the value.
4. Click on the blue "Run query" button in the upper right hand corner.

You should now see logs from all 3 nodes in the MarkLogic cluster.

# Testing with PySpark

The documentation for this project
Expand All @@ -131,7 +116,7 @@ This will produce a single jar file for the connector in the `./build/libs` dire

You can then launch PySpark with the connector available via:

pyspark --jars build/libs/marklogic-spark-connector-2.2-SNAPSHOT.jar
pyspark --jars build/libs/marklogic-spark-connector-2.3.0.rc1.jar

The below command is an example of loading data from the test application deployed via the instructions at the top of
this page.
Expand Down Expand Up @@ -171,14 +156,28 @@ df2.head()
json.loads(df2.head()['content'])
```

For a quick test of writing documents, use the following:

```
spark.read.option("header", True).csv("src/test/resources/data.csv")\
.repartition(2)\
.write.format("marklogic")\
.option("spark.marklogic.client.uri", "spark-test-user:spark@localhost:8000")\
.option("spark.marklogic.write.permissions", "spark-user-role,read,spark-user-role,update")\
.option("spark.marklogic.write.logProgress", 50)\
.option("spark.marklogic.write.batchSize", 10)\
.mode("append")\
.save()
```

# Testing against a local Spark cluster

When you run PySpark, it will create its own Spark cluster. If you'd like to try against a separate Spark cluster
that still runs on your local machine, perform the following steps:

1. Use [sdkman to install Spark](https://sdkman.io/sdks#spark). Run `sdk install spark 3.4.1` since we are currently
building against Spark 3.4.1.
1. Use [sdkman to install Spark](https://sdkman.io/sdks#spark). Run `sdk install spark 3.4.3` since we are currently
building against Spark 3.4.3.
2. `cd ~/.sdkman/candidates/spark/current/sbin`, which is where sdkman will install Spark.
3. Run `./start-master.sh` to start a master Spark node.
4. `cd ../logs` and open the master log file that was created to find the address for the master node. It will be in a
Expand All @@ -193,7 +192,7 @@ The Spark master GUI is at <http://localhost:8080>. You can use this to view det

Now that you have a Spark cluster running, you just need to tell PySpark to connect to it:

pyspark --master spark://NYWHYC3G0W:7077 --jars build/libs/marklogic-spark-connector-2.2-SNAPSHOT.jar
pyspark --master spark://NYWHYC3G0W:7077 --jars build/libs/marklogic-spark-connector-2.3.0.rc1.jar

You can then run the same commands as shown in the PySpark section above. The Spark master GUI will allow you to
examine details of each of the commands that you run.
Expand All @@ -212,12 +211,12 @@ You will need the connector jar available, so run `./gradlew clean shadowJar` if
You can then run a test Python program in this repository via the following (again, change the master address as
needed); note that you run this outside of PySpark, and `spark-submit` is available after having installed PySpark:

spark-submit --master spark://NYWHYC3G0W:7077 --jars build/libs/marklogic-spark-connector-2.2-SNAPSHOT.jar src/test/python/test_program.py
spark-submit --master spark://NYWHYC3G0W:7077 --jars build/libs/marklogic-spark-connector-2.3.0.rc1.jar src/test/python/test_program.py

You can also test a Java program. To do so, first move the `com.marklogic.spark.TestProgram` class from `src/test/java`
to `src/main/java`. Then run `./gradlew clean shadowJar` to rebuild the connector jar. Then run the following:

spark-submit --master spark://NYWHYC3G0W:7077 --class com.marklogic.spark.TestProgram build/libs/marklogic-spark-connector-2.2-SNAPSHOT.jar
spark-submit --master spark://NYWHYC3G0W:7077 --class com.marklogic.spark.TestProgram build/libs/marklogic-spark-connector-2.3.0.rc1.jar

Be sure to move `TestProgram` back to `src/test/java` when you are done.

Expand Down
1 change: 0 additions & 1 deletion Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,6 @@ pipeline{
buildDiscarder logRotator(artifactDaysToKeepStr: '7', artifactNumToKeepStr: '', daysToKeepStr: '30', numToKeepStr: '')
}
environment{
JAVA8_HOME_DIR="/home/builder/java/openjdk-1.8.0-262"
JAVA11_HOME_DIR="/home/builder/java/jdk-11.0.2"
GRADLE_DIR =".gradle"
DMC_USER = credentials('MLBUILD_USER')
Expand Down
2 changes: 1 addition & 1 deletion LICENSE.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Copyright © 2023 MarkLogic Corporation.
Copyright © 2024 MarkLogic Corporation.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Expand Down
105 changes: 45 additions & 60 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -2,61 +2,74 @@ plugins {
id 'java-library'
id 'net.saliman.properties' version '1.5.2'
id 'com.github.johnrengelman.shadow' version '8.1.1'
id "com.marklogic.ml-gradle" version "4.6.0"
id "com.marklogic.ml-gradle" version "4.7.0"
id 'maven-publish'
id 'signing'
id "jacoco"
id "org.sonarqube" version "4.4.1.3373"
}

group 'com.marklogic'
version '2.2.0'
version '2.3.0.rc1'

java {
sourceCompatibility = 1.8
targetCompatibility = 1.8
// To support reading RDF files, Apache Jena is used - but that requires Java 11. If we want to do a 2.2.0 release
// without requiring Java 11, we'll remove the support for reading RDF files along with the Jena dependency.
sourceCompatibility = 11
targetCompatibility = 11
}

repositories {
mavenCentral()
}

configurations {
// Defines all the implementation dependencies, but in such a way that they are not included as dependencies in the
// library's pom.xml file. This is due to the shadow jar being published instead of a jar only containing this
// project's classes. The shadow jar is published due to the need to relocate several packages to avoid conflicts
// with Spark.
shadowDependencies

// This approach allows for all of the dependencies to be available for compilation and for running tests.
compileOnly.extendsFrom(shadowDependencies)
testImplementation.extendsFrom(compileOnly)
}

dependencies {
compileOnly 'org.apache.spark:spark-sql_2.12:' + sparkVersion
implementation ("com.marklogic:marklogic-client-api:6.5.0") {
// This is compileOnly as any environment this is used in will provide the Spark dependencies itself.
compileOnly ('org.apache.spark:spark-sql_2.12:' + sparkVersion) {
// Excluded from our ETL tool for size reasons, so excluded here as well to ensure we don't need it.
exclude module: "rocksdbjni"
}

shadowDependencies ("com.marklogic:marklogic-client-api:6.6.1") {
// The Java Client uses Jackson 2.15.2; Scala 3.4.x does not yet support that and will throw the following error:
// Scala module 2.14.2 requires Jackson Databind version >= 2.14.0 and < 2.15.0 - Found jackson-databind version 2.15.2
// So the 4 Jackson modules are excluded to allow for Spark's to be used.
exclude module: 'jackson-core'
exclude module: 'jackson-databind'
exclude module: 'jackson-annotations'
exclude module: 'jackson-dataformat-csv'
exclude group: "com.fasterxml.jackson.core"
exclude group: "com.fasterxml.jackson.dataformat"
}

// Required for converting JSON to XML. Using 2.14.2 to align with Spark 3.4.1.
shadowDependencies "com.fasterxml.jackson.dataformat:jackson-dataformat-xml:2.14.2"

// Need this so that an OkHttpClientConfigurator can be created.
implementation 'com.squareup.okhttp3:okhttp:4.12.0'
shadowDependencies 'com.squareup.okhttp3:okhttp:4.12.0'

// Makes it possible to use lambdas in Java 8 to implement Spark's Function1 and Function2 interfaces
// See https://github.com/scala/scala-java8-compat for more information
implementation("org.scala-lang.modules:scala-java8-compat_2.12:1.0.2") {
// Prefer the Scala libraries used within the user's Spark runtime.
exclude module: "scala-library"
shadowDependencies ("org.apache.jena:jena-arq:4.10.0") {
exclude group: "com.fasterxml.jackson.core"
exclude group: "com.fasterxml.jackson.dataformat"
}

testImplementation 'org.apache.spark:spark-sql_2.12:' + sparkVersion
shadowDependencies "org.jdom:jdom2:2.0.6.1"

// The exclusions in these two modules ensure that we use the Jackson libraries from spark-sql when running the tests.
testImplementation ('com.marklogic:ml-app-deployer:4.6.0') {
exclude module: 'jackson-core'
exclude module: 'jackson-databind'
exclude module: 'jackson-annotations'
exclude module: 'jackson-dataformat-csv'
testImplementation ('com.marklogic:ml-app-deployer:4.7.0') {
exclude group: "com.fasterxml.jackson.core"
exclude group: "com.fasterxml.jackson.dataformat"
}
testImplementation ('com.marklogic:marklogic-junit5:1.4.0') {
exclude module: 'jackson-core'
exclude module: 'jackson-databind'
exclude module: 'jackson-annotations'
exclude module: 'jackson-dataformat-csv'
exclude group: "com.fasterxml.jackson.core"
exclude group: "com.fasterxml.jackson.dataformat"
}

testImplementation "ch.qos.logback:logback-classic:1.3.14"
Expand Down Expand Up @@ -105,7 +118,11 @@ if (JavaVersion.current().isCompatibleWith(JavaVersion.VERSION_17)) {
}

shadowJar {
// "all" is the default; no need for that in the connector filename.
configurations = [project.configurations.shadowDependencies]

// "all" is the default; no need for that in the connector filename. This also results in this becoming the library
// artifact that is published as a dependency. That is desirable as it includes the relocated packages listed below,
// which a dependent would otherwise have to manage themselves.
archiveClassifier.set("")

// Spark uses an older version of OkHttp; see
Expand All @@ -121,38 +138,6 @@ task perfTest(type: JavaExec) {
args mlHost
}

task dockerBuildCache(type: Exec) {
description = "Creates an image named 'marklogic-spark-cache' containing a cache of the Gradle dependencies."
commandLine 'docker', 'build', '--no-cache', '-t', 'marklogic-spark-cache', '.'
}

task dockerTest(type: Exec) {
description = "Run all of the tests within a Docker environment."
commandLine 'docker', 'run',
// Allows for communicating with the MarkLogic cluster that is setup via docker-compose.yaml.
'--network=marklogic_spark_external_net',
// Map the project directory into the Docker container.
'-v', getProjectDir().getAbsolutePath() + ':/root/project',
// Working directory for the Gradle tasks below.
'-w', '/root/project',
// Remove the container after it finishes running.
'--rm',
// Use the output of dockerBuildCache to avoid downloading all the Gradle dependencies.
'marklogic-spark-cache:latest',
'gradle', '-i', '-PmlHost=bootstrap_3n.local', 'test'
}

task dockerPerfTest(type: Exec) {
description = "Run PerformanceTester a Docker environment."
commandLine 'docker', 'run',
'--network=marklogic_spark_external_net',
'-v', getProjectDir().getAbsolutePath() + ':/root/project',
'-w', '/root/project',
'--rm',
'marklogic-spark-cache:latest',
'gradle', '-i', '-PmlHost=bootstrap_3n.local', 'perfTest'
}

task sourcesJar(type: Jar, dependsOn: classes) {
archiveClassifier = "sources"
from sourceSets.main.allSource
Expand Down
6 changes: 3 additions & 3 deletions docker-compose-3nodes.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ services:
# by this host. Note that each MarkLogic host has its 8000-8002 ports exposed externally so that the apps on those
# ports can each be accessed if needed.
bootstrap_3n:
image: "marklogicdb/marklogic-db:11.1.0-centos-1.1.0"
image: "marklogicdb/marklogic-db:${MARKLOGIC_TAG}"
platform: linux/amd64
container_name: bootstrap_3n
hostname: bootstrap_3n.local
Expand All @@ -50,7 +50,7 @@ services:
- internal_net

node2:
image: "marklogicdb/marklogic-db:11.1.0-centos-1.1.0"
image: "marklogicdb/marklogic-db:${MARKLOGIC_TAG}"
platform: linux/amd64
container_name: node2
hostname: node2.local
Expand All @@ -74,7 +74,7 @@ services:
- internal_net

node3:
image: "marklogicdb/marklogic-db:11.1.0-centos-1.1.0"
image: "marklogicdb/marklogic-db:${MARKLOGIC_TAG}"
platform: linux/amd64
container_name: node3
hostname: node3.local
Expand Down
3 changes: 2 additions & 1 deletion docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@ services:

# Copied from https://docs.sonarsource.com/sonarqube/latest/setup-and-upgrade/install-the-server/#example-docker-compose-configuration .
sonarqube:
image: sonarqube:community
# Using 10.2 to avoid requiring Java 17 for now.
image: sonarqube:10.2.1-community
depends_on:
- postgres
environment:
Expand Down
4 changes: 2 additions & 2 deletions docs/Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -224,8 +224,8 @@ GEM
rb-fsevent (0.11.2)
rb-inotify (0.10.1)
ffi (~> 1.0)
rexml (3.2.8)
strscan (>= 3.0.9)
rexml (3.3.2)
strscan
rouge (3.26.0)
ruby2_keywords (0.0.5)
rubyzip (2.3.2)
Expand Down
Loading

0 comments on commit 75fa27e

Please sign in to comment.