Skip to content

Spark NLP 5.3.3: Patch release

Compare
Choose a tag to compare
@maziyarpanahi maziyarpanahi released this 05 Apr 17:44
· 108 commits to master since this release
4ac11a0

🔥 New Features & Enhancements

  • NEW: Introducing UAEEmbeddings for sentence embeddings using Universal AnglE Embedding, aimed at improving semantic textual similarity tasks.

UAE is a novel angle-optimized text embedding model, designed to improve semantic textual similarity tasks, which are crucial for Large Language Model (LLM) applications. By introducing angle optimization in a complex space, AnglE effectively mitigates saturation of the cosine similarity function. https://arxiv.org/pdf/2309.12871.pdf

🔥 The universal English sentence embedding WhereIsAI/UAE-Large-V1 achieves SOTA on the MTEB Leaderboard with an average score of 64.64!

  • Introduce critical enhancements and optimizations to the processing of the CoNLL-U format for Dependency Parsers training, including enhanced multiword token handling and improved handling of missing uPos values
  • Implement cache mechanism for metadata.json, enhancing efficiency by avoiding unnecessary downloads
  • Add example notebook for DocumentCharacterTextSplitter
  • Add example notebook for DeBertaForZeroShotClassification
  • Add example notebooks for BGEEmbeddings and MPNetEmbeddings
  • Add example notebook for MPNetForQuestionAnswering
  • Add example notebook for MPNetForSequenceClassification

🐛 Bug Fixes

  • Address a bug with serializing ONNX models that lack a .onnx_data file, ensuring better reliability in model serialization processes
  • Delete redundant Multilingual_Translation_with_M2M100.ipynb notebook entries
  • Fix Colab link for the M2M100 notebook

📖 Documentation


❤️ Community support

  • Slack For live discussion with the Spark NLP community and the team
  • GitHub Bug reports, feature requests, and contributions
  • Discussions Engage with other community members, share ideas, and show off how you use Spark NLP!
  • Medium Spark NLP articles
  • YouTube Spark NLP video tutorials

Installation

Python

#PyPI

pip install spark-nlp==5.3.3

Spark Packages

spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x: (Scala 2.12):

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3

pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3

GPU

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.3.3

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.3.3

Apple Silicon (M1 & M2)

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.3.3

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.3.3

AArch64

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.3.3

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.3.3

Maven

spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp_2.12</artifactId>
    <version>5.3.3</version>
</dependency>

spark-nlp-gpu:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-gpu_2.12</artifactId>
    <version>5.3.3</version>
</dependency>

spark-nlp-silicon:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-silicon_2.12</artifactId>
    <version>5.3.3</version>
</dependency>

spark-nlp-aarch64:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-aarch64_2.12</artifactId>
    <version>5.3.3</version>
</dependency>

FAT JARs

What's Changed

Full Changelog: 5.3.2...5.3.3