Release 2.4.0 · marklogic/marklogic-spark-connector

This minor release addresses the following items:

Can now stream regular files, ZIP files, gzip files, and archive files by setting the new spark.marklogic.streamFiles option to a value of true. Using this option in the reader phase results in the reading of files being deferred until the writer phase. Using this option in the writer phase results in each file being streamed to MarkLogic in a separate request to MarkLogic, thus avoiding ever reading the contents of the file or zip entry into memory.
Can now stream documents from MarkLogic to regular files, ZIP files, gzip files, and archive files by setting the same option above - spark.marklogic.streamFiles - to a value of `true. Using this option in the reader phase results in the reading of documents being deferred until the writer phase. Using this option in the writer phase results in each document being streamed from MarkLogic to a file or zip entry, thus avoiding ever reading the contents of the document into memory.
Files with spaces in the path are now handled correctly when reading files into MarkLogic. However, when streaming files into MarkLogic, the spaces in the path will be encoded due to a pending server fix.
Archive files - zip files containing content and metadata - now contain the metadata entry followed by the content entry for each document. This supports streaming archive files. Archive files generated by version 2.3.x of the connector - with the content entry followed by the metadata entry - can still be read, though they cannot be streamed.
Now compiled and tested against Spark 3.5.3.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2.4.0