Spark NLP 5.3.3: Patch release
🔥 New Features & Enhancements
- NEW: Introducing
UAEEmbeddings
for sentence embeddings using Universal AnglE Embedding, aimed at improving semantic textual similarity tasks.
UAE is a novel angle-optimized text embedding model, designed to improve semantic textual similarity tasks, which are crucial for Large Language Model (LLM) applications. By introducing angle optimization in a complex space, AnglE effectively mitigates saturation of the cosine similarity function. https://arxiv.org/pdf/2309.12871.pdf
🔥 The universal English sentence embedding WhereIsAI/UAE-Large-V1
achieves SOTA on the MTEB Leaderboard with an average score of 64.64!
- Introduce critical enhancements and optimizations to the processing of the CoNLL-U format for Dependency Parsers training, including enhanced multiword token handling and improved handling of missing uPos values
- Implement cache mechanism for
metadata.json
, enhancing efficiency by avoiding unnecessary downloads - Add example notebook for
DocumentCharacterTextSplitter
- Add example notebook for
DeBertaForZeroShotClassification
- Add example notebooks for
BGEEmbeddings
andMPNetEmbeddings
- Add example notebook for
MPNetForQuestionAnswering
- Add example notebook for
MPNetForSequenceClassification
🐛 Bug Fixes
- Address a bug with serializing ONNX models that lack a
.onnx_data
file, ensuring better reliability in model serialization processes - Delete redundant
Multilingual_Translation_with_M2M100.ipynb
notebook entries - Fix Colab link for the M2M100 notebook
📖 Documentation
- Import models from TF Hub & HuggingFace
- Spark NLP Notebooks
- Models Hub with new models
- Spark NLP Articles
- Spark NLP in Action
- Spark NLP Documentation
- Spark NLP Scala APIs
- Spark NLP Python APIs
❤️ Community support
- Slack For live discussion with the Spark NLP community and the team
- GitHub Bug reports, feature requests, and contributions
- Discussions Engage with other community members, share ideas, and show off how you use Spark NLP!
- Medium Spark NLP articles
- YouTube Spark NLP video tutorials
Installation
Python
#PyPI
pip install spark-nlp==5.3.3
Spark Packages
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x: (Scala 2.12):
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3
GPU
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.3.3
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.3.3
Apple Silicon (M1 & M2)
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.3.3
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.3.3
AArch64
spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.3.3
pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.3.3
Maven
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp_2.12</artifactId>
<version>5.3.3</version>
</dependency>
spark-nlp-gpu:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-gpu_2.12</artifactId>
<version>5.3.3</version>
</dependency>
spark-nlp-silicon:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-silicon_2.12</artifactId>
<version>5.3.3</version>
</dependency>
spark-nlp-aarch64:
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp-aarch64_2.12</artifactId>
<version>5.3.3</version>
</dependency>
FAT JARs
-
CPU on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x/3.5.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-5.3.3.jar
-
GPU on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x/3.5.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-assembly-5.3.3.jar
-
M1 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x/3.5.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-silicon-assembly-5.3.3.jar
-
AArch64 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x/3.5.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-aarch64-assembly-5.3.3.jar
What's Changed
- Uploading missing notebooks from Spark NLP v 5.1.4 by @AbdullahMubeenAnwar in #14196
- SPARKNLP-962: UAEEmbeddings by @DevinTDHa in #14199
- Cache mechanism implementation for metadata.json by @mehmetbutgul in #14224
- [SPARKNLP-1031] Solves Dependency Parsers training issue by @danilojsl in #14225
- Models hub by @maziyarpanahi in #14228
- release/533-release-candidate by @maziyarpanahi in #14227
- Models hub by @maziyarpanahi in #14230
Full Changelog: 5.3.2...5.3.3