diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index a27b26887bde26..7fca613516af3b 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -33,7 +33,7 @@ on: jobs: spark34: if: "! contains(toJSON(github.event.commits.*.message), '[skip test]')" - runs-on: macos-latest + runs-on: macos-13 env: TF_CPP_MIN_LOG_LEVEL: 3 JAVA_OPTS: "-Xmx4096m -XX:+UseG1GC" @@ -41,9 +41,9 @@ jobs: steps: - uses: actions/checkout@v3 - - uses: actions/setup-java@v3 + - uses: actions/setup-java@v4 with: - distribution: 'adopt' + distribution: 'temurin' java-version: '8' cache: 'sbt' - name: Install Python 3.7 @@ -73,7 +73,7 @@ jobs: python3.7 -m pytest -v -m fast spark35: if: "! contains(toJSON(github.event.commits.*.message), '[skip test]')" - runs-on: macos-latest + runs-on: macos-13 env: TF_CPP_MIN_LOG_LEVEL: 3 JAVA_OPTS: "-Xmx4096m -XX:+UseG1GC" @@ -109,7 +109,7 @@ jobs: spark33: if: "! contains(toJSON(github.event.commits.*.message), '[skip test]')" - runs-on: macos-latest + runs-on: macos-13 env: TF_CPP_MIN_LOG_LEVEL: 3 JAVA_OPTS: "-Xmx4096m -XX:+UseG1GC" diff --git a/CHANGELOG b/CHANGELOG index 3c353ab2f98a4e..a7d44214610baf 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,3 +1,27 @@ +======== +5.4.0 +======== +---------------- +New Features & Enhancements +---------------- +* Added OpenVINO Runtime integration for various models, enabling enhanced inference performance. (#14246) +* Added Python APIs to incorporate OpenVINO support. (#14242) +* Introduced support for ONNX models and average pooling in ONNX-based annotators. (#14245) +* Implemented MPNet for token classification. (#14244) +* Added support for MistralAI LLM and LLAMA2. (#14243) +* Improved caching mechanisms in Streamlit demos. (#14241) +* Enhanced models' card and README documentation for Models Hub. (#14240) +* Added OpenVINO GPU dependencies. (#14236) +* Locked macOS version for runners and added missing SBT setup. (#14235) + +---------------- +Bug Fixes +---------------- +* Fixed bugs in Colab notebooks. (#14239) +* Resolved issues with BERT backend and broken annotators. (#14238) +* Corrected LLAMA2 position ID and generation bug. (#14237) + + ======== 5.3.3 ======== diff --git a/README.md b/README.md index af1f1e91fc7e7b..cb7c32736e8638 100644 --- a/README.md +++ b/README.md @@ -139,7 +139,7 @@ documentation and examples - Text-To-Text Transfer Transformer (Google T5) - Generative Pre-trained Transformer 2 (OpenAI GPT2) - Seq2Seq for NLG, Translation, and Comprehension (Facebook BART) -- Chat and Conversational LLMs (Facebook Llama-22) +- Chat and Conversational LLMs (Facebook Llama-2) - Vision Transformer (Google ViT) - Swin Image Classification (Microsoft Swin Transformer) - ConvNext Image Classification (Facebook ConvNext) @@ -149,10 +149,10 @@ documentation and examples - Automatic Speech Recognition (HuBERT) - Automatic Speech Recognition (OpenAI Whisper) - Named entity recognition (Deep learning) -- Easy ONNX and TensorFlow integrations +- Easy ONNX, OpenVINO, and TensorFlow integrations - GPU Support - Full integration with Spark ML functions -- +30000 pre-trained models in +200 languages! +- +31000 pre-trained models in +200 languages! - +6000 pre-trained pipelines in +200 languages! - Multi-lingual NER models: Arabic, Bengali, Chinese, Danish, Dutch, English, Finnish, French, German, Hebrew, Italian, Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Urdu, and more. @@ -166,7 +166,7 @@ To use Spark NLP you need the following requirements: **GPU (optional):** -Spark NLP 5.3.3 is built with ONNX 1.17.0 and TensorFlow 2.7.1 deep learning engines. The minimum following NVIDIA® software are only required for GPU support: +Spark NLP 5.4.0 is built with ONNX 1.17.0 and TensorFlow 2.7.1 deep learning engines. The minimum following NVIDIA® software are only required for GPU support: - NVIDIA® GPU drivers version 450.80.02 or higher - CUDA® Toolkit 11.2 @@ -182,7 +182,7 @@ $ java -version $ conda create -n sparknlp python=3.7 -y $ conda activate sparknlp # spark-nlp by default is based on pyspark 3.x -$ pip install spark-nlp==5.3.3 pyspark==3.3.1 +$ pip install spark-nlp==5.4.0 pyspark==3.3.1 ``` In Python console or Jupyter `Python3` kernel: @@ -227,10 +227,11 @@ For more examples, you can visit our dedicated [examples](https://github.com/Joh ## Apache Spark Support -Spark NLP *5.3.3* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x +Spark NLP *5.4.0* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x | Spark NLP | Apache Spark 3.5.x | Apache Spark 3.4.x | Apache Spark 3.3.x | Apache Spark 3.2.x | Apache Spark 3.1.x | Apache Spark 3.0.x | Apache Spark 2.4.x | Apache Spark 2.3.x | |-----------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------| +| 5.4.x | YES | YES | YES | YES | YES | YES | NO | NO | | 5.3.x | YES | YES | YES | YES | YES | YES | NO | NO | | 5.2.x | YES | YES | YES | YES | YES | YES | NO | NO | | 5.1.x | Partially | YES | YES | YES | YES | YES | NO | NO | @@ -240,12 +241,6 @@ Spark NLP *5.3.3* has been built on top of Apache Spark 3.4 while fully supports | 4.2.x | NO | NO | YES | YES | YES | YES | NO | NO | | 4.1.x | NO | NO | YES | YES | YES | YES | NO | NO | | 4.0.x | NO | NO | YES | YES | YES | YES | NO | NO | -| 3.4.x | NO | NO | N/A | Partially | YES | YES | YES | YES | -| 3.3.x | NO | NO | NO | NO | YES | YES | YES | YES | -| 3.2.x | NO | NO | NO | NO | YES | YES | YES | YES | -| 3.1.x | NO | NO | NO | NO | YES | YES | YES | YES | -| 3.0.x | NO | NO | NO | NO | YES | YES | YES | YES | -| 2.7.x | NO | NO | NO | NO | NO | NO | YES | YES | Find out more about `Spark NLP` versions from our [release notes](https://github.com/JohnSnowLabs/spark-nlp/releases). @@ -262,16 +257,10 @@ Find out more about `Spark NLP` versions from our [release notes](https://github | 4.2.x | YES | YES | YES | YES | YES | NO | YES | | 4.1.x | YES | YES | YES | YES | NO | NO | YES | | 4.0.x | YES | YES | YES | YES | NO | NO | YES | -| 3.4.x | YES | YES | YES | YES | NO | YES | YES | -| 3.3.x | YES | YES | YES | NO | NO | YES | YES | -| 3.2.x | YES | YES | YES | NO | NO | YES | YES | -| 3.1.x | YES | YES | YES | NO | NO | YES | YES | -| 3.0.x | YES | YES | YES | NO | NO | YES | YES | -| 2.7.x | YES | YES | NO | NO | NO | YES | NO | ## Databricks Support -Spark NLP 5.3.3 has been tested and is compatible with the following runtimes: +Spark NLP 5.4.0 has been tested and is compatible with the following runtimes: **CPU:** @@ -344,7 +333,7 @@ Spark NLP 5.3.3 has been tested and is compatible with the following runtimes: ## EMR Support -Spark NLP 5.3.3 has been tested and is compatible with the following EMR releases: +Spark NLP 5.4.0 has been tested and is compatible with the following EMR releases: - emr-6.2.0 - emr-6.3.0 @@ -394,11 +383,11 @@ Spark NLP supports all major releases of Apache Spark 3.0.x, Apache Spark 3.1.x, ```sh # CPU -spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3 +spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0 -pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3 +pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0 -spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3 +spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0 ``` The `spark-nlp` has been published to @@ -407,11 +396,11 @@ the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/s ```sh # GPU -spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.3.3 +spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.4.0 -pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.3.3 +pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.4.0 -spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.3.3 +spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.4.0 ``` @@ -421,11 +410,11 @@ the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/s ```sh # AArch64 -spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.3.3 +spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.4.0 -pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.3.3 +pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.4.0 -spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.3.3 +spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.4.0 ``` @@ -435,11 +424,11 @@ the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/s ```sh # M1/M2 (Apple Silicon) -spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.3.3 +spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.4.0 -pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.3.3 +pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.4.0 -spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.3.3 +spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.4.0 ``` @@ -453,7 +442,7 @@ set in your SparkSession: spark-shell \ --driver-memory 16g \ --conf spark.kryoserializer.buffer.max=2000M \ - --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3 + --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0 ``` ## Scala @@ -471,7 +460,7 @@ coordinates: com.johnsnowlabs.nlp spark-nlp_2.12 - 5.3.3 + 5.4.0 ``` @@ -482,7 +471,7 @@ coordinates: com.johnsnowlabs.nlp spark-nlp-gpu_2.12 - 5.3.3 + 5.4.0 ``` @@ -493,7 +482,7 @@ coordinates: com.johnsnowlabs.nlp spark-nlp-aarch64_2.12 - 5.3.3 + 5.4.0 ``` @@ -504,7 +493,7 @@ coordinates: com.johnsnowlabs.nlp spark-nlp-silicon_2.12 - 5.3.3 + 5.4.0 ``` @@ -514,28 +503,28 @@ coordinates: ```sbtshell // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp -libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "5.3.3" +libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "5.4.0" ``` **spark-nlp-gpu:** ```sbtshell // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-gpu -libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "5.3.3" +libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "5.4.0" ``` **spark-nlp-aarch64:** ```sbtshell // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-aarch64 -libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "5.3.3" +libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "5.4.0" ``` **spark-nlp-silicon:** ```sbtshell // https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-silicon -libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-silicon" % "5.3.3" +libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-silicon" % "5.4.0" ``` Maven @@ -557,7 +546,7 @@ If you installed pyspark through pip/conda, you can install `spark-nlp` through Pip: ```bash -pip install spark-nlp==5.3.3 +pip install spark-nlp==5.4.0 ``` Conda: @@ -586,7 +575,7 @@ spark = SparkSession.builder .config("spark.driver.memory", "16G") .config("spark.driver.maxResultSize", "0") .config("spark.kryoserializer.buffer.max", "2000M") - .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3") + .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0") .getOrCreate() ``` @@ -657,7 +646,7 @@ Use either one of the following options - Add the following Maven Coordinates to the interpreter's library list ```bash -com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3 +com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0 ``` - Add a path to pre-built jar from [here](#compiled-jars) in the interpreter's library list making sure the jar is @@ -668,7 +657,7 @@ com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3 Apart from the previous step, install the python module through pip ```bash -pip install spark-nlp==5.3.3 +pip install spark-nlp==5.4.0 ``` Or you can install `spark-nlp` from inside Zeppelin by using Conda: @@ -696,7 +685,7 @@ launch the Jupyter from the same Python environment: $ conda create -n sparknlp python=3.8 -y $ conda activate sparknlp # spark-nlp by default is based on pyspark 3.x -$ pip install spark-nlp==5.3.3 pyspark==3.3.1 jupyter +$ pip install spark-nlp==5.4.0 pyspark==3.3.1 jupyter $ jupyter notebook ``` @@ -713,7 +702,7 @@ export PYSPARK_PYTHON=python3 export PYSPARK_DRIVER_PYTHON=jupyter export PYSPARK_DRIVER_PYTHON_OPTS=notebook -pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3 +pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0 ``` Alternatively, you can mix in using `--jars` option for pyspark + `pip install spark-nlp` @@ -740,7 +729,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi # -s is for spark-nlp # -g will enable upgrading libcudnn8 to 8.1.0 on Google Colab for GPU usage # by default they are set to the latest -!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.3.3 +!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.4.0 ``` [Spark NLP quick start on Google Colab](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/quick_start_google_colab.ipynb) @@ -763,7 +752,7 @@ This script comes with the two options to define `pyspark` and `spark-nlp` versi # -s is for spark-nlp # -g will enable upgrading libcudnn8 to 8.1.0 on Kaggle for GPU usage # by default they are set to the latest -!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.3.3 +!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.4.0 ``` [Spark NLP quick start on Kaggle Kernel](https://www.kaggle.com/mozzie/spark-nlp-named-entity-recognition) is a live @@ -782,9 +771,9 @@ demo on Kaggle Kernel that performs named entity recognitions by using Spark NLP 3. In `Libraries` tab inside your cluster you need to follow these steps: - 3.1. Install New -> PyPI -> `spark-nlp==5.3.3` -> Install + 3.1. Install New -> PyPI -> `spark-nlp==5.4.0` -> Install - 3.2. Install New -> Maven -> Coordinates -> `com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3` -> Install + 3.2. Install New -> Maven -> Coordinates -> `com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0` -> Install 4. Now you can attach your notebook to the cluster and use Spark NLP! @@ -835,7 +824,7 @@ A sample of your software configuration in JSON on S3 (must be public access): "spark.kryoserializer.buffer.max": "2000M", "spark.serializer": "org.apache.spark.serializer.KryoSerializer", "spark.driver.maxResultSize": "0", - "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3" + "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0" } }] ``` @@ -844,7 +833,7 @@ A sample of AWS CLI to launch EMR cluster: ```.sh aws emr create-cluster \ ---name "Spark NLP 5.3.3" \ +--name "Spark NLP 5.4.0" \ --release-label emr-6.2.0 \ --applications Name=Hadoop Name=Spark Name=Hive \ --instance-type m4.4xlarge \ @@ -908,7 +897,7 @@ gcloud dataproc clusters create ${CLUSTER_NAME} \ --enable-component-gateway \ --metadata 'PIP_PACKAGES=spark-nlp spark-nlp-display google-cloud-bigquery google-cloud-storage' \ --initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/python/pip-install.sh \ - --properties spark:spark.serializer=org.apache.spark.serializer.KryoSerializer,spark:spark.driver.maxResultSize=0,spark:spark.kryoserializer.buffer.max=2000M,spark:spark.jars.packages=com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3 + --properties spark:spark.serializer=org.apache.spark.serializer.KryoSerializer,spark:spark.driver.maxResultSize=0,spark:spark.kryoserializer.buffer.max=2000M,spark:spark.jars.packages=com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0 ``` 2. On an existing one, you need to install spark-nlp and spark-nlp-display packages from PyPI. @@ -951,7 +940,7 @@ spark = SparkSession.builder .config("spark.kryoserializer.buffer.max", "2000m") .config("spark.jsl.settings.pretrained.cache_folder", "sample_data/pretrained") .config("spark.jsl.settings.storage.cluster_tmp_dir", "sample_data/storage") - .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3") + .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0") .getOrCreate() ``` @@ -965,7 +954,7 @@ spark-shell \ --conf spark.kryoserializer.buffer.max=2000M \ --conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \ --conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \ - --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3 + --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0 ``` **pyspark:** @@ -978,7 +967,7 @@ pyspark \ --conf spark.kryoserializer.buffer.max=2000M \ --conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \ --conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \ - --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3 + --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.4.0 ``` **Databricks:** @@ -1250,7 +1239,7 @@ spark = SparkSession.builder .config("spark.driver.memory", "16G") .config("spark.driver.maxResultSize", "0") .config("spark.kryoserializer.buffer.max", "2000M") - .config("spark.jars", "/tmp/spark-nlp-assembly-5.3.3.jar") + .config("spark.jars", "/tmp/spark-nlp-assembly-5.4.0.jar") .getOrCreate() ``` @@ -1259,7 +1248,7 @@ spark = SparkSession.builder version (3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x) - If you are local, you can load the Fat JAR from your local FileSystem, however, if you are in a cluster setup you need to put the Fat JAR on a distributed FileSystem such as HDFS, DBFS, S3, etc. ( - i.e., `hdfs:///tmp/spark-nlp-assembly-5.3.3.jar`) + i.e., `hdfs:///tmp/spark-nlp-assembly-5.4.0.jar`) Example of using pretrained Models and Pipelines in offline: diff --git a/build.sbt b/build.sbt index 284cadb1aaec1b..9e0e57ac29e51b 100644 --- a/build.sbt +++ b/build.sbt @@ -6,7 +6,7 @@ name := getPackageName(is_silicon, is_gpu, is_aarch64) organization := "com.johnsnowlabs.nlp" -version := "5.3.3" +version := "5.4.0" (ThisBuild / scalaVersion) := scalaVer @@ -153,8 +153,7 @@ lazy val utilDependencies = Seq( exclude ("org.slf4j", "slf4j-api"), gcpStorage exclude ("com.fasterxml.jackson.core", "jackson-core") - exclude ("com.fasterxml.jackson.dataformat", "jackson-dataformat-cbor") - , + exclude ("com.fasterxml.jackson.dataformat", "jackson-dataformat-cbor"), greex, azureIdentity, azureStorage) @@ -181,6 +180,17 @@ val onnxDependencies: Seq[sbt.ModuleID] = else Seq(onnxCPU) +val openVinoDependencies: Seq[sbt.ModuleID] = + if (is_gpu.equals("true")) + Seq(openVinoGPU) + else +// else if (is_silicon.equals("true")) +// Seq(openVinoCPU) +// else if (is_aarch64.equals("true")) +// Seq(openVinoCPU) +// else + Seq(openVinoCPU) + lazy val mavenProps = settingKey[Unit]("workaround for Maven properties") lazy val root = (project in file(".")) @@ -192,6 +202,7 @@ lazy val root = (project in file(".")) utilDependencies ++ tensorflowDependencies ++ onnxDependencies ++ + openVinoDependencies ++ typedDependencyParserDependencies, // TODO potentially improve this? mavenProps := { diff --git a/docs/README.md b/docs/README.md index af1f1e91fc7e7b..33f8a9061dad9b 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,1336 +1,15 @@ -# Spark NLP: State-of-the-Art Natural Language Processing & LLMs Library +# Spark NLP Documentation -

- - - - - - - - - - - - - - -

+We welcome you to contribute to Spark NLP documentation hosted inside `en/` directory. All the files are in Markdown format. -Spark NLP is a state-of-the-art Natural Language Processing library built on top of Apache Spark. It provides **simple**, **performant** & **accurate** NLP annotations for machine learning pipelines that **scale** easily in a distributed -environment. -Spark NLP comes with **36000+** pretrained **pipelines** and **models** in more than **200+** languages. -It also offers tasks such as **Tokenization**, **Word Segmentation**, **Part-of-Speech Tagging**, Word and Sentence **Embeddings**, **Named Entity Recognition**, **Dependency Parsing**, **Spell Checking**, **Text Classification**, **Sentiment Analysis**, **Token Classification**, **Machine Translation** (+180 languages), **Summarization**, **Question Answering**, **Table Question Answering**, **Text Generation**, **Image Classification**, **Image to Text (captioning)**, **Automatic Speech Recognition**, **Zero-Shot Learning**, and many more [NLP tasks](#features). +## Development -**Spark NLP** is the only open-source NLP library in **production** that offers state-of-the-art transformers such as **BERT**, **CamemBERT**, **ALBERT**, **ELECTRA**, **XLNet**, **DistilBERT**, **RoBERTa**, **DeBERTa**, **XLM-RoBERTa**, **Longformer**, **ELMO**, **Universal Sentence Encoder**, **Llama-2**, **M2M100**, **BART**, **Instructor**, **E5**, **Google T5**, **MarianMT**, **OpenAI GPT2**, **Vision Transformers (ViT)**, **OpenAI Whisper**, and many more not only to **Python** and **R**, but also to **JVM** ecosystem (**Java**, **Scala**, and **Kotlin**) at **scale** by extending **Apache Spark** natively. - -## Project's website - -Take a look at our official Spark NLP page: [https://sparknlp.org/](https://sparknlp.org/) for user -documentation and examples - -## Community support - -- [Slack](https://join.slack.com/t/spark-nlp/shared_invite/zt-198dipu77-L3UWNe_AJ8xqDk0ivmih5Q) For live discussion with the Spark NLP community and the team -- [GitHub](https://github.com/JohnSnowLabs/spark-nlp) Bug reports, feature requests, and contributions -- [Discussions](https://github.com/JohnSnowLabs/spark-nlp/discussions) Engage with other community members, share ideas, - and show off how you use Spark NLP! -- [Medium](https://medium.com/spark-nlp) Spark NLP articles -- [YouTube](https://www.youtube.com/channel/UCmFOjlpYEhxf_wJUDuz6xxQ/videos) Spark NLP video tutorials - -## Table of contents - -- [Features](#features) -- [Requirements](#requirements) -- [Quick Start](#quick-start) -- [Apache Spark Support](#apache-spark-support) -- [Scala & Python Support](#scala-and-python-support) -- [Databricks Support](#databricks-support) -- [EMR Support](#emr-support) -- [Using Spark NLP](#usage) - - [Packages Cheatsheet](#packages-cheatsheet) - - [Spark Packages](#spark-packages) - - [Scala](#scala) - - [Maven](#maven) - - [SBT](#sbt) - - [Python](#python) - - [Pip/Conda](#pipconda) - - [Compiled JARs](#compiled-jars) - - [Apache Zeppelin](#apache-zeppelin) - - [Jupyter Notebook](#jupyter-notebook-python) - - [Google Colab Notebook](#google-colab-notebook) - - [Kaggle Kernel](#kaggle-kernel) - - [Databricks Cluster](#databricks-cluster) - - [EMR Cluster](#emr-cluster) - - [GCP Dataproc](#gcp-dataproc) - - [Spark NLP Configuration](#spark-nlp-configuration) -- [Pipelines & Models](#pipelines-and-models) - - [Pipelines](#pipelines) - - [Models](#models) -- [Offline](#offline) -- [Examples](#examples) -- [FAQ](#faq) -- [Citation](#citation) -- [Contributing](#contributing) - -## Features - -- Tokenization -- Trainable Word Segmentation -- Stop Words Removal -- Token Normalizer -- Document Normalizer -- Document & Text Splitter -- Stemmer -- Lemmatizer -- NGrams -- Regex Matching -- Text Matching -- Chunking -- Date Matcher -- Sentence Detector -- Deep Sentence Detector (Deep learning) -- Dependency parsing (Labeled/unlabeled) -- SpanBertCorefModel (Coreference Resolution) -- Part-of-speech tagging -- Sentiment Detection (ML models) -- Spell Checker (ML and DL models) -- Word Embeddings (GloVe and Word2Vec) -- Doc2Vec (based on Word2Vec) -- BERT Embeddings (TF Hub & HuggingFace models) -- DistilBERT Embeddings (HuggingFace models) -- CamemBERT Embeddings (HuggingFace models) -- RoBERTa Embeddings (HuggingFace models) -- DeBERTa Embeddings (HuggingFace v2 & v3 models) -- XLM-RoBERTa Embeddings (HuggingFace models) -- Longformer Embeddings (HuggingFace models) -- ALBERT Embeddings (TF Hub & HuggingFace models) -- XLNet Embeddings -- ELMO Embeddings (TF Hub models) -- Universal Sentence Encoder (TF Hub models) -- BERT Sentence Embeddings (TF Hub & HuggingFace models) -- RoBerta Sentence Embeddings (HuggingFace models) -- XLM-RoBerta Sentence Embeddings (HuggingFace models) -- INSTRUCTOR Embeddings (HuggingFace models) -- E5 Embeddings (HuggingFace models) -- MPNet Embeddings (HuggingFace models) -- UAE Embeddings (HuggingFace models) -- OpenAI Embeddings -- Sentence & Chunk Embeddings -- Unsupervised keywords extraction -- Language Detection & Identification (up to 375 languages) -- Multi-class & Multi-labe Sentiment analysis (Deep learning) -- Multi-class Text Classification (Deep learning) -- BERT for Token & Sequence Classification & Question Answering -- DistilBERT for Token & Sequence Classification & Question Answering -- CamemBERT for Token & Sequence Classification & Question Answering -- ALBERT for Token & Sequence Classification & Question Answering -- RoBERTa for Token & Sequence Classification & Question Answering -- DeBERTa for Token & Sequence Classification & Question Answering -- XLM-RoBERTa for Token & Sequence Classification & Question Answering -- Longformer for Token & Sequence Classification & Question Answering -- MPnet for Token & Sequence Classification & Question Answering -- XLNet for Token & Sequence Classification -- Zero-Shot NER Model -- Zero-Shot Text Classification by Transformers (ZSL) -- Neural Machine Translation (MarianMT) -- Many-to-Many multilingual translation model (Facebook M2M100) -- Table Question Answering (TAPAS) -- Text-To-Text Transfer Transformer (Google T5) -- Generative Pre-trained Transformer 2 (OpenAI GPT2) -- Seq2Seq for NLG, Translation, and Comprehension (Facebook BART) -- Chat and Conversational LLMs (Facebook Llama-22) -- Vision Transformer (Google ViT) -- Swin Image Classification (Microsoft Swin Transformer) -- ConvNext Image Classification (Facebook ConvNext) -- Vision Encoder Decoder for image-to-text like captioning -- Zero-Shot Image Classification by OpenAI's CLIP -- Automatic Speech Recognition (Wav2Vec2) -- Automatic Speech Recognition (HuBERT) -- Automatic Speech Recognition (OpenAI Whisper) -- Named entity recognition (Deep learning) -- Easy ONNX and TensorFlow integrations -- GPU Support -- Full integration with Spark ML functions -- +30000 pre-trained models in +200 languages! -- +6000 pre-trained pipelines in +200 languages! -- Multi-lingual NER models: Arabic, Bengali, Chinese, Danish, Dutch, English, Finnish, French, German, Hebrew, Italian, - Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Urdu, and more. - -## Requirements - -To use Spark NLP you need the following requirements: - -- Java 8 and 11 -- Apache Spark 3.5.x, 3.4.x, 3.3.x, 3.2.x, 3.1.x, 3.0.x - -**GPU (optional):** - -Spark NLP 5.3.3 is built with ONNX 1.17.0 and TensorFlow 2.7.1 deep learning engines. The minimum following NVIDIA® software are only required for GPU support: - -- NVIDIA® GPU drivers version 450.80.02 or higher -- CUDA® Toolkit 11.2 -- cuDNN SDK 8.1.0 - -## Quick Start - -This is a quick example of how to use Spark NLP pre-trained pipeline in Python and PySpark: - -```sh -$ java -version -# should be Java 8 or 11 (Oracle or OpenJDK) -$ conda create -n sparknlp python=3.7 -y -$ conda activate sparknlp -# spark-nlp by default is based on pyspark 3.x -$ pip install spark-nlp==5.3.3 pyspark==3.3.1 -``` - -In Python console or Jupyter `Python3` kernel: - -```python -# Import Spark NLP -from sparknlp.base import * -from sparknlp.annotator import * -from sparknlp.pretrained import PretrainedPipeline -import sparknlp - -# Start SparkSession with Spark NLP -# start() functions has 3 parameters: gpu, apple_silicon, and memory -# sparknlp.start(gpu=True) will start the session with GPU support -# sparknlp.start(apple_silicon=True) will start the session with macOS M1 & M2 support -# sparknlp.start(memory="16G") to change the default driver memory in SparkSession -spark = sparknlp.start() - -# Download a pre-trained pipeline -pipeline = PretrainedPipeline('explain_document_dl', lang='en') - -# Your testing dataset -text = """ -The Mona Lisa is a 16th century oil painting created by Leonardo. -It's held at the Louvre in Paris. -""" - -# Annotate your testing dataset -result = pipeline.annotate(text) - -# What's in the pipeline -list(result.keys()) -Output: ['entities', 'stem', 'checked', 'lemma', 'document', - 'pos', 'token', 'ner', 'embeddings', 'sentence'] - -# Check the results -result['entities'] -Output: ['Mona Lisa', 'Leonardo', 'Louvre', 'Paris'] -``` - -For more examples, you can visit our dedicated [examples](https://github.com/JohnSnowLabs/spark-nlp/tree/master/examples) to showcase all Spark NLP use cases! - -## Apache Spark Support - -Spark NLP *5.3.3* has been built on top of Apache Spark 3.4 while fully supports Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x - -| Spark NLP | Apache Spark 3.5.x | Apache Spark 3.4.x | Apache Spark 3.3.x | Apache Spark 3.2.x | Apache Spark 3.1.x | Apache Spark 3.0.x | Apache Spark 2.4.x | Apache Spark 2.3.x | -|-----------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------| -| 5.3.x | YES | YES | YES | YES | YES | YES | NO | NO | -| 5.2.x | YES | YES | YES | YES | YES | YES | NO | NO | -| 5.1.x | Partially | YES | YES | YES | YES | YES | NO | NO | -| 5.0.x | YES | YES | YES | YES | YES | YES | NO | NO | -| 4.4.x | YES | YES | YES | YES | YES | YES | NO | NO | -| 4.3.x | NO | NO | YES | YES | YES | YES | NO | NO | -| 4.2.x | NO | NO | YES | YES | YES | YES | NO | NO | -| 4.1.x | NO | NO | YES | YES | YES | YES | NO | NO | -| 4.0.x | NO | NO | YES | YES | YES | YES | NO | NO | -| 3.4.x | NO | NO | N/A | Partially | YES | YES | YES | YES | -| 3.3.x | NO | NO | NO | NO | YES | YES | YES | YES | -| 3.2.x | NO | NO | NO | NO | YES | YES | YES | YES | -| 3.1.x | NO | NO | NO | NO | YES | YES | YES | YES | -| 3.0.x | NO | NO | NO | NO | YES | YES | YES | YES | -| 2.7.x | NO | NO | NO | NO | NO | NO | YES | YES | - -Find out more about `Spark NLP` versions from our [release notes](https://github.com/JohnSnowLabs/spark-nlp/releases). - -## Scala and Python Support - -| Spark NLP | Python 3.6 | Python 3.7 | Python 3.8 | Python 3.9 | Python 3.10| Scala 2.11 | Scala 2.12 | -|-----------|------------|------------|------------|------------|------------|------------|------------| -| 5.3.x | NO | YES | YES | YES | YES | NO | YES | -| 5.2.x | NO | YES | YES | YES | YES | NO | YES | -| 5.1.x | NO | YES | YES | YES | YES | NO | YES | -| 5.0.x | NO | YES | YES | YES | YES | NO | YES | -| 4.4.x | NO | YES | YES | YES | YES | NO | YES | -| 4.3.x | YES | YES | YES | YES | YES | NO | YES | -| 4.2.x | YES | YES | YES | YES | YES | NO | YES | -| 4.1.x | YES | YES | YES | YES | NO | NO | YES | -| 4.0.x | YES | YES | YES | YES | NO | NO | YES | -| 3.4.x | YES | YES | YES | YES | NO | YES | YES | -| 3.3.x | YES | YES | YES | NO | NO | YES | YES | -| 3.2.x | YES | YES | YES | NO | NO | YES | YES | -| 3.1.x | YES | YES | YES | NO | NO | YES | YES | -| 3.0.x | YES | YES | YES | NO | NO | YES | YES | -| 2.7.x | YES | YES | NO | NO | NO | YES | NO | - -## Databricks Support - -Spark NLP 5.3.3 has been tested and is compatible with the following runtimes: - -**CPU:** - -- 9.1 -- 9.1 ML -- 10.1 -- 10.1 ML -- 10.2 -- 10.2 ML -- 10.3 -- 10.3 ML -- 10.4 -- 10.4 ML -- 10.5 -- 10.5 ML -- 11.0 -- 11.0 ML -- 11.1 -- 11.1 ML -- 11.2 -- 11.2 ML -- 11.3 -- 11.3 ML -- 12.0 -- 12.0 ML -- 12.1 -- 12.1 ML -- 12.2 -- 12.2 ML -- 13.0 -- 13.0 ML -- 13.1 -- 13.1 ML -- 13.2 -- 13.2 ML -- 13.3 -- 13.3 ML -- 14.0 -- 14.0 ML -- 14.1 -- 14.1 ML -- 14.2 -- 14.2 ML -- 14.3 -- 14.3 ML - -**GPU:** - -- 9.1 ML & GPU -- 10.1 ML & GPU -- 10.2 ML & GPU -- 10.3 ML & GPU -- 10.4 ML & GPU -- 10.5 ML & GPU -- 11.0 ML & GPU -- 11.1 ML & GPU -- 11.2 ML & GPU -- 11.3 ML & GPU -- 12.0 ML & GPU -- 12.1 ML & GPU -- 12.2 ML & GPU -- 13.0 ML & GPU -- 13.1 ML & GPU -- 13.2 ML & GPU -- 13.3 ML & GPU -- 14.0 ML & GPU -- 14.1 ML & GPU -- 14.2 ML & GPU -- 14.3 ML & GPU - -## EMR Support - -Spark NLP 5.3.3 has been tested and is compatible with the following EMR releases: - -- emr-6.2.0 -- emr-6.3.0 -- emr-6.3.1 -- emr-6.4.0 -- emr-6.5.0 -- emr-6.6.0 -- emr-6.7.0 -- emr-6.8.0 -- emr-6.9.0 -- emr-6.10.0 -- emr-6.11.0 -- emr-6.12.0 -- emr-6.13.0 -- emr-6.14.0 -- emr-6.15.0 -- emr-7.0.0 - -Full list of [Amazon EMR 6.x releases](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-6x.html) -Full list of [Amazon EMR 7.x releases](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-7x.html) - -NOTE: The EMR 6.1.0 and 6.1.1 are not supported. - -## Usage - -## Packages Cheatsheet - -This is a cheatsheet for corresponding Spark NLP Maven package to Apache Spark / PySpark major version: - -| Apache Spark | Spark NLP on CPU | Spark NLP on GPU | Spark NLP on AArch64 (linux) | Spark NLP on Apple Silicon | -|-------------------------|--------------------|----------------------------|--------------------------------|--------------------------------------| -| 3.0/3.1/3.2/3.3/3.4/3.5 | `spark-nlp` | `spark-nlp-gpu` | `spark-nlp-aarch64` | `spark-nlp-silicon` | -| Start Function | `sparknlp.start()` | `sparknlp.start(gpu=True)` | `sparknlp.start(aarch64=True)` | `sparknlp.start(apple_silicon=True)` | - -NOTE: `M1/M2` and `AArch64` are under `experimental` support. Access and support to these architectures are limited by the -community and we had to build most of the dependencies by ourselves to make them compatible. We support these two -architectures, however, they may not work in some environments. - -## Spark Packages - -### Command line (requires internet connection) - -Spark NLP supports all major releases of Apache Spark 3.0.x, Apache Spark 3.1.x, Apache Spark 3.2.x, Apache Spark 3.3.x, Apache Spark 3.4.x, and Apache Spark 3.5.x - -#### Apache Spark 3.x (3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x - Scala 2.12) - -```sh -# CPU - -spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3 - -pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3 - -spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3 -``` - -The `spark-nlp` has been published to -the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp). - -```sh -# GPU - -spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.3.3 - -pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.3.3 - -spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.3.3 - -``` - -The `spark-nlp-gpu` has been published to -the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-gpu). - -```sh -# AArch64 - -spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.3.3 - -pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.3.3 - -spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.3.3 - -``` - -The `spark-nlp-aarch64` has been published to -the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-aarch64). - -```sh -# M1/M2 (Apple Silicon) - -spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.3.3 - -pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.3.3 - -spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.3.3 - -``` - -The `spark-nlp-silicon` has been published to -the [Maven Repository](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-silicon). - -**NOTE**: In case you are using large pretrained models like UniversalSentenceEncoder, you need to have the following -set in your SparkSession: - -```sh -spark-shell \ - --driver-memory 16g \ - --conf spark.kryoserializer.buffer.max=2000M \ - --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3 -``` - -## Scala - -Spark NLP supports Scala 2.12.15 if you are using Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x versions. Our packages are -deployed to Maven central. To add any of our packages as a dependency in your application you can follow these -coordinates: - -### Maven - -**spark-nlp** on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x: - -```xml - - - com.johnsnowlabs.nlp - spark-nlp_2.12 - 5.3.3 - -``` - -**spark-nlp-gpu:** - -```xml - - - com.johnsnowlabs.nlp - spark-nlp-gpu_2.12 - 5.3.3 - -``` - -**spark-nlp-aarch64:** - -```xml - - - com.johnsnowlabs.nlp - spark-nlp-aarch64_2.12 - 5.3.3 - -``` - -**spark-nlp-silicon:** - -```xml - - - com.johnsnowlabs.nlp - spark-nlp-silicon_2.12 - 5.3.3 - -``` - -### SBT - -**spark-nlp** on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x: - -```sbtshell -// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp -libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "5.3.3" -``` - -**spark-nlp-gpu:** - -```sbtshell -// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-gpu -libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-gpu" % "5.3.3" -``` - -**spark-nlp-aarch64:** - -```sbtshell -// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-aarch64 -libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-aarch64" % "5.3.3" -``` - -**spark-nlp-silicon:** - -```sbtshell -// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-silicon -libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp-silicon" % "5.3.3" -``` - -Maven -Central: [https://mvnrepository.com/artifact/com.johnsnowlabs.nlp](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp) - -If you are interested, there is a simple SBT project for Spark NLP to guide you on how to use it in your -projects [Spark NLP SBT Starter](https://github.com/maziyarpanahi/spark-nlp-starter) - -## Python - -Spark NLP supports Python 3.6.x and above depending on your major PySpark version. - -### Python without explicit Pyspark installation - -### Pip/Conda - -If you installed pyspark through pip/conda, you can install `spark-nlp` through the same channel. - -Pip: +For development purposes, you need to have `bundle` and `Gem` installed on your system. Please run these commands: ```bash -pip install spark-nlp==5.3.3 -``` - -Conda: - -```bash -conda install -c johnsnowlabs spark-nlp -``` - -PyPI [spark-nlp package](https://pypi.org/project/spark-nlp/) / -Anaconda [spark-nlp package](https://anaconda.org/JohnSnowLabs/spark-nlp) - -Then you'll have to create a SparkSession either from Spark NLP: - -```python -import sparknlp - -spark = sparknlp.start() -``` - -or manually: - -```python -spark = SparkSession.builder - .appName("Spark NLP") - .master("local[*]") - .config("spark.driver.memory", "16G") - .config("spark.driver.maxResultSize", "0") - .config("spark.kryoserializer.buffer.max", "2000M") - .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3") - .getOrCreate() -``` - -If using local jars, you can use `spark.jars` instead for comma-delimited jar files. For cluster setups, of course, -you'll have to put the jars in a reachable location for all driver and executor nodes. - -**Quick example:** - -```python -import sparknlp -from sparknlp.pretrained import PretrainedPipeline - -# create or get Spark Session - -spark = sparknlp.start() - -sparknlp.version() -spark.version - -# download, load and annotate a text by pre-trained pipeline - -pipeline = PretrainedPipeline('recognize_entities_dl', 'en') -result = pipeline.annotate('The Mona Lisa is a 16th century oil painting created by Leonardo') -``` - -## Compiled JARs - -### Build from source - -#### spark-nlp - -- FAT-JAR for CPU on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x - -```bash -sbt assembly -``` - -- FAT-JAR for GPU on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x - -```bash -sbt -Dis_gpu=true assembly -``` - -- FAT-JAR for M! on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x - -```bash -sbt -Dis_silicon=true assembly -``` - -### Using the jar manually - -If for some reason you need to use the JAR, you can either download the Fat JARs provided here or download it -from [Maven Central](https://mvnrepository.com/artifact/com.johnsnowlabs.nlp). - -To add JARs to spark programs use the `--jars` option: - -```sh -spark-shell --jars spark-nlp.jar -``` - -The preferred way to use the library when running spark programs is using the `--packages` option as specified in -the `spark-packages` section. - -## Apache Zeppelin - -Use either one of the following options - -- Add the following Maven Coordinates to the interpreter's library list - -```bash -com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3 -``` - -- Add a path to pre-built jar from [here](#compiled-jars) in the interpreter's library list making sure the jar is - available to driver path - -### Python in Zeppelin - -Apart from the previous step, install the python module through pip - -```bash -pip install spark-nlp==5.3.3 -``` - -Or you can install `spark-nlp` from inside Zeppelin by using Conda: - -```bash -python.conda install -c johnsnowlabs spark-nlp -``` - -Configure Zeppelin properly, use cells with %spark.pyspark or any interpreter name you chose. - -Finally, in Zeppelin interpreter settings, make sure you set properly zeppelin.python to the python you want to use and -install the pip library with (e.g. `python3`). - -An alternative option would be to set `SPARK_SUBMIT_OPTIONS` (zeppelin-env.sh) and make sure `--packages` is there as -shown earlier since it includes both scala and python side installation. - -## Jupyter Notebook (Python) - -**Recommended:** - -The easiest way to get this done on Linux and macOS is to simply install `spark-nlp` and `pyspark` PyPI packages and -launch the Jupyter from the same Python environment: - -```sh -$ conda create -n sparknlp python=3.8 -y -$ conda activate sparknlp -# spark-nlp by default is based on pyspark 3.x -$ pip install spark-nlp==5.3.3 pyspark==3.3.1 jupyter -$ jupyter notebook -``` - -Then you can use `python3` kernel to run your code with creating SparkSession via `spark = sparknlp.start()`. - -**Optional:** - -If you are in different operating systems and require to make Jupyter Notebook run by using pyspark, you can follow -these steps: - -```bash -export SPARK_HOME=/path/to/your/spark/folder -export PYSPARK_PYTHON=python3 -export PYSPARK_DRIVER_PYTHON=jupyter -export PYSPARK_DRIVER_PYTHON_OPTS=notebook - -pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3 -``` - -Alternatively, you can mix in using `--jars` option for pyspark + `pip install spark-nlp` - -If not using pyspark at all, you'll have to run the instructions -pointed [here](#python-without-explicit-pyspark-installation) - -## Google Colab Notebook - -Google Colab is perhaps the easiest way to get started with spark-nlp. It requires no installation or setup other than -having a Google account. - -Run the following code in Google Colab notebook and start using spark-nlp right away. - -```sh -# This is only to setup PySpark and Spark NLP on Colab -!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash -``` - -This script comes with the two options to define `pyspark` and `spark-nlp` versions via options: - -```sh -# -p is for pyspark -# -s is for spark-nlp -# -g will enable upgrading libcudnn8 to 8.1.0 on Google Colab for GPU usage -# by default they are set to the latest -!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.3.3 -``` - -[Spark NLP quick start on Google Colab](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/quick_start_google_colab.ipynb) -is a live demo on Google Colab that performs named entity recognitions and sentiment analysis by using Spark NLP -pretrained pipelines. - -## Kaggle Kernel - -Run the following code in Kaggle Kernel and start using spark-nlp right away. - -```sh -# Let's setup Kaggle for Spark NLP and PySpark -!wget https://setup.johnsnowlabs.com/kaggle.sh -O - | bash -``` - -This script comes with the two options to define `pyspark` and `spark-nlp` versions via options: - -```sh -# -p is for pyspark -# -s is for spark-nlp -# -g will enable upgrading libcudnn8 to 8.1.0 on Kaggle for GPU usage -# by default they are set to the latest -!wget https://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 3.2.3 -s 5.3.3 -``` - -[Spark NLP quick start on Kaggle Kernel](https://www.kaggle.com/mozzie/spark-nlp-named-entity-recognition) is a live -demo on Kaggle Kernel that performs named entity recognitions by using Spark NLP pretrained pipeline. - -## Databricks Cluster - -1. Create a cluster if you don't have one already - -2. On a new cluster or existing one you need to add the following to the `Advanced Options -> Spark` tab: - - ```bash - spark.kryoserializer.buffer.max 2000M - spark.serializer org.apache.spark.serializer.KryoSerializer - ``` - -3. In `Libraries` tab inside your cluster you need to follow these steps: - - 3.1. Install New -> PyPI -> `spark-nlp==5.3.3` -> Install - - 3.2. Install New -> Maven -> Coordinates -> `com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3` -> Install - -4. Now you can attach your notebook to the cluster and use Spark NLP! - -NOTE: Databricks' runtimes support different Apache Spark major releases. Please make sure you choose the correct Spark -NLP Maven package name (Maven Coordinate) for your runtime from -our [Packages Cheatsheet](https://github.com/JohnSnowLabs/spark-nlp#packages-cheatsheet) - -## EMR Cluster - -To launch EMR clusters with Apache Spark/PySpark and Spark NLP correctly you need to have bootstrap and software -configuration. - -A sample of your bootstrap script - -```.sh -#!/bin/bash -set -x -e - -echo -e 'export PYSPARK_PYTHON=/usr/bin/python3 -export HADOOP_CONF_DIR=/etc/hadoop/conf -export SPARK_JARS_DIR=/usr/lib/spark/jars -export SPARK_HOME=/usr/lib/spark' >> $HOME/.bashrc && source $HOME/.bashrc - -sudo python3 -m pip install awscli boto spark-nlp - -set +x -exit 0 - -``` - -A sample of your software configuration in JSON on S3 (must be public access): - -```.json -[{ - "Classification": "spark-env", - "Configurations": [{ - "Classification": "export", - "Properties": { - "PYSPARK_PYTHON": "/usr/bin/python3" - } - }] -}, -{ - "Classification": "spark-defaults", - "Properties": { - "spark.yarn.stagingDir": "hdfs:///tmp", - "spark.yarn.preserve.staging.files": "true", - "spark.kryoserializer.buffer.max": "2000M", - "spark.serializer": "org.apache.spark.serializer.KryoSerializer", - "spark.driver.maxResultSize": "0", - "spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3" - } -}] -``` - -A sample of AWS CLI to launch EMR cluster: - -```.sh -aws emr create-cluster \ ---name "Spark NLP 5.3.3" \ ---release-label emr-6.2.0 \ ---applications Name=Hadoop Name=Spark Name=Hive \ ---instance-type m4.4xlarge \ ---instance-count 3 \ ---use-default-roles \ ---log-uri "s3:///" \ ---bootstrap-actions Path=s3:///emr-bootstrap.sh,Name=custome \ ---configurations "https:///sparknlp-config.json" \ ---ec2-attributes KeyName=,EmrManagedMasterSecurityGroup=,EmrManagedSlaveSecurityGroup= \ ---profile -``` - -## GCP Dataproc - -1. Create a cluster if you don't have one already as follows. - -At gcloud shell: - -```bash -gcloud services enable dataproc.googleapis.com \ - compute.googleapis.com \ - storage-component.googleapis.com \ - bigquery.googleapis.com \ - bigquerystorage.googleapis.com -``` - -```bash -REGION= -``` - -```bash -BUCKET_NAME= -gsutil mb -c standard -l ${REGION} gs://${BUCKET_NAME} -``` - -```bash -REGION= -ZONE= -CLUSTER_NAME= -BUCKET_NAME= -``` - -You can set image-version, master-machine-type, worker-machine-type, -master-boot-disk-size, worker-boot-disk-size, num-workers as your needs. -If you use the previous image-version from 2.0, you should also add ANACONDA to optional-components. -And, you should enable gateway. -Don't forget to set the maven coordinates for the jar in properties. - -```bash -gcloud dataproc clusters create ${CLUSTER_NAME} \ - --region=${REGION} \ - --zone=${ZONE} \ - --image-version=2.0 \ - --master-machine-type=n1-standard-4 \ - --worker-machine-type=n1-standard-2 \ - --master-boot-disk-size=128GB \ - --worker-boot-disk-size=128GB \ - --num-workers=2 \ - --bucket=${BUCKET_NAME} \ - --optional-components=JUPYTER \ - --enable-component-gateway \ - --metadata 'PIP_PACKAGES=spark-nlp spark-nlp-display google-cloud-bigquery google-cloud-storage' \ - --initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/python/pip-install.sh \ - --properties spark:spark.serializer=org.apache.spark.serializer.KryoSerializer,spark:spark.driver.maxResultSize=0,spark:spark.kryoserializer.buffer.max=2000M,spark:spark.jars.packages=com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3 -``` - -2. On an existing one, you need to install spark-nlp and spark-nlp-display packages from PyPI. - -3. Now, you can attach your notebook to the cluster and use the Spark NLP! - -## Spark NLP Configuration - -You can change the following Spark NLP configurations via Spark Configuration: - -| Property Name | Default | Meaning | -|---------------------------------------------------------|----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `spark.jsl.settings.pretrained.cache_folder` | `~/cache_pretrained` | The location to download and extract pretrained `Models` and `Pipelines`. By default, it will be in User's Home directory under `cache_pretrained` directory | -| `spark.jsl.settings.storage.cluster_tmp_dir` | `hadoop.tmp.dir` | The location to use on a cluster for temporarily files such as unpacking indexes for WordEmbeddings. By default, this locations is the location of `hadoop.tmp.dir` set via Hadoop configuration for Apache Spark. NOTE: `S3` is not supported and it must be local, HDFS, or DBFS | -| `spark.jsl.settings.annotator.log_folder` | `~/annotator_logs` | The location to save logs from annotators during training such as `NerDLApproach`, `ClassifierDLApproach`, `SentimentDLApproach`, `MultiClassifierDLApproach`, etc. By default, it will be in User's Home directory under `annotator_logs` directory | -| `spark.jsl.settings.aws.credentials.access_key_id` | `None` | Your AWS access key to use your S3 bucket to store log files of training models or access tensorflow graphs used in `NerDLApproach` | -| `spark.jsl.settings.aws.credentials.secret_access_key` | `None` | Your AWS secret access key to use your S3 bucket to store log files of training models or access tensorflow graphs used in `NerDLApproach` | -| `spark.jsl.settings.aws.credentials.session_token` | `None` | Your AWS MFA session token to use your S3 bucket to store log files of training models or access tensorflow graphs used in `NerDLApproach` | -| `spark.jsl.settings.aws.s3_bucket` | `None` | Your AWS S3 bucket to store log files of training models or access tensorflow graphs used in `NerDLApproach` | -| `spark.jsl.settings.aws.region` | `None` | Your AWS region to use your S3 bucket to store log files of training models or access tensorflow graphs used in `NerDLApproach` | -| `spark.jsl.settings.onnx.gpuDeviceId` | `0` | Constructs CUDA execution provider options for the specified non-negative device id. | -| `spark.jsl.settings.onnx.intraOpNumThreads` | `6` | Sets the size of the CPU thread pool used for executing a single graph, if executing on a CPU. | -| `spark.jsl.settings.onnx.optimizationLevel` | `ALL_OPT` | Sets the optimization level of this options object, overriding the old setting. | -| `spark.jsl.settings.onnx.executionMode` | `SEQUENTIAL` | Sets the execution mode of this options object, overriding the old setting. | - -### How to set Spark NLP Configuration - -**SparkSession:** - -You can use `.config()` during SparkSession creation to set Spark NLP configurations. - -```python -from pyspark.sql import SparkSession - -spark = SparkSession.builder - .master("local[*]") - .config("spark.driver.memory", "16G") - .config("spark.driver.maxResultSize", "0") - .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") - .config("spark.kryoserializer.buffer.max", "2000m") - .config("spark.jsl.settings.pretrained.cache_folder", "sample_data/pretrained") - .config("spark.jsl.settings.storage.cluster_tmp_dir", "sample_data/storage") - .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3") - .getOrCreate() -``` - -**spark-shell:** - -```sh -spark-shell \ - --driver-memory 16g \ - --conf spark.driver.maxResultSize=0 \ - --conf spark.serializer=org.apache.spark.serializer.KryoSerializer - --conf spark.kryoserializer.buffer.max=2000M \ - --conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \ - --conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \ - --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3 -``` - -**pyspark:** - -```sh -pyspark \ - --driver-memory 16g \ - --conf spark.driver.maxResultSize=0 \ - --conf spark.serializer=org.apache.spark.serializer.KryoSerializer - --conf spark.kryoserializer.buffer.max=2000M \ - --conf spark.jsl.settings.pretrained.cache_folder="sample_data/pretrained" \ - --conf spark.jsl.settings.storage.cluster_tmp_dir="sample_data/storage" \ - --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.3.3 -``` - -**Databricks:** - -On a new cluster or existing one you need to add the following to the `Advanced Options -> Spark` tab: - -```bash -spark.kryoserializer.buffer.max 2000M -spark.serializer org.apache.spark.serializer.KryoSerializer -spark.jsl.settings.pretrained.cache_folder dbfs:/PATH_TO_CACHE -spark.jsl.settings.storage.cluster_tmp_dir dbfs:/PATH_TO_STORAGE -spark.jsl.settings.annotator.log_folder dbfs:/PATH_TO_LOGS -``` - -NOTE: If this is an existing cluster, after adding new configs or changing existing properties you need to restart it. - -### S3 Integration - -In Spark NLP we can define S3 locations to: - -- Export log files of training models -- Store tensorflow graphs used in `NerDLApproach` - -**Logging:** - -To configure S3 path for logging while training models. We need to set up AWS credentials as well as an S3 path - -```bash -spark.conf.set("spark.jsl.settings.annotator.log_folder", "s3://my/s3/path/logs") -spark.conf.set("spark.jsl.settings.aws.credentials.access_key_id", "MY_KEY_ID") -spark.conf.set("spark.jsl.settings.aws.credentials.secret_access_key", "MY_SECRET_ACCESS_KEY") -spark.conf.set("spark.jsl.settings.aws.s3_bucket", "my.bucket") -spark.conf.set("spark.jsl.settings.aws.region", "my-region") -``` - -Now you can check the log on your S3 path defined in *spark.jsl.settings.annotator.log_folder* property. -Make sure to use the prefix *s3://*, otherwise it will use the default configuration. - -**Tensorflow Graphs:** - -To reference S3 location for downloading graphs. We need to set up AWS credentials - -```bash -spark.conf.set("spark.jsl.settings.aws.credentials.access_key_id", "MY_KEY_ID") -spark.conf.set("spark.jsl.settings.aws.credentials.secret_access_key", "MY_SECRET_ACCESS_KEY") -spark.conf.set("spark.jsl.settings.aws.region", "my-region") -``` - -**MFA Configuration:** - -In case your AWS account is configured with MFA. You will need first to get temporal credentials and add session token -to the configuration as shown in the examples below -For logging: - -```bash -spark.conf.set("spark.jsl.settings.aws.credentials.session_token", "MY_TOKEN") -``` - -An example of a bash script that gets temporal AWS credentials can be -found [here](https://github.com/JohnSnowLabs/spark-nlp/blob/master/scripts/aws_tmp_credentials.sh) -This script requires three arguments: - -```bash -./aws_tmp_credentials.sh iam_user duration serial_number -``` - -## Pipelines and Models - -### Pipelines - -**Quick example:** - -```scala -import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline -import com.johnsnowlabs.nlp.SparkNLP - -SparkNLP.version() - -val testData = spark.createDataFrame(Seq( - (1, "Google has announced the release of a beta version of the popular TensorFlow machine learning library"), - (2, "Donald John Trump (born June 14, 1946) is the 45th and current president of the United States") -)).toDF("id", "text") - -val pipeline = PretrainedPipeline("explain_document_dl", lang = "en") - -val annotation = pipeline.transform(testData) - -annotation.show() -/* -import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline -import com.johnsnowlabs.nlp.SparkNLP -2.5.0 -testData: org.apache.spark.sql.DataFrame = [id: int, text: string] -pipeline: com.johnsnowlabs.nlp.pretrained.PretrainedPipeline = PretrainedPipeline(explain_document_dl,en,public/models) -annotation: org.apache.spark.sql.DataFrame = [id: int, text: string ... 10 more fields] -+---+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+ -| id| text| document| token| sentence| checked| lemma| stem| pos| embeddings| ner| entities| -+---+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+ -| 1|Google has announ...|[[document, 0, 10...|[[token, 0, 5, Go...|[[document, 0, 10...|[[token, 0, 5, Go...|[[token, 0, 5, Go...|[[token, 0, 5, go...|[[pos, 0, 5, NNP,...|[[word_embeddings...|[[named_entity, 0...|[[chunk, 0, 5, Go...| -| 2|The Paris metro w...|[[document, 0, 11...|[[token, 0, 2, Th...|[[document, 0, 11...|[[token, 0, 2, Th...|[[token, 0, 2, Th...|[[token, 0, 2, th...|[[pos, 0, 2, DT, ...|[[word_embeddings...|[[named_entity, 0...|[[chunk, 4, 8, Pa...| -+---+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+ -*/ - -annotation.select("entities.result").show(false) - -/* -+----------------------------------+ -|result | -+----------------------------------+ -|[Google, TensorFlow] | -|[Donald John Trump, United States]| -+----------------------------------+ -*/ -``` - -#### Showing Available Pipelines - -There are functions in Spark NLP that will list all the available Pipelines -of a particular language for you: - -```scala -import com.johnsnowlabs.nlp.pretrained.ResourceDownloader - -ResourceDownloader.showPublicPipelines(lang = "en") -/* -+--------------------------------------------+------+---------+ -| Pipeline | lang | version | -+--------------------------------------------+------+---------+ -| dependency_parse | en | 2.0.2 | -| analyze_sentiment_ml | en | 2.0.2 | -| check_spelling | en | 2.1.0 | -| match_datetime | en | 2.1.0 | - ... -| explain_document_ml | en | 3.1.3 | -+--------------------------------------------+------+---------+ -*/ -``` - -Or if we want to check for a particular version: - -```scala -import com.johnsnowlabs.nlp.pretrained.ResourceDownloader - -ResourceDownloader.showPublicPipelines(lang = "en", version = "3.1.0") -/* -+---------------------------------------+------+---------+ -| Pipeline | lang | version | -+---------------------------------------+------+---------+ -| dependency_parse | en | 2.0.2 | - ... -| clean_slang | en | 3.0.0 | -| clean_pattern | en | 3.0.0 | -| check_spelling | en | 3.0.0 | -| dependency_parse | en | 3.0.0 | -+---------------------------------------+------+---------+ -*/ -``` - -#### Please check out our Models Hub for the full list of [pre-trained pipelines](https://sparknlp.org/models) with examples, demos, benchmarks, and more - -### Models - -**Some selected languages: -** `Afrikaans, Arabic, Armenian, Basque, Bengali, Breton, Bulgarian, Catalan, Czech, Dutch, English, Esperanto, Finnish, French, Galician, German, Greek, Hausa, Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Latvian, Marathi, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali, Southern Sotho, Spanish, Swahili, Swedish, Tswana, Turkish, Ukrainian, Zulu` - -**Quick online example:** - -```python -# load NER model trained by deep learning approach and GloVe word embeddings -ner_dl = NerDLModel.pretrained('ner_dl') -# load NER model trained by deep learning approach and BERT word embeddings -ner_bert = NerDLModel.pretrained('ner_dl_bert') -``` - -```scala -// load French POS tagger model trained by Universal Dependencies -val french_pos = PerceptronModel.pretrained("pos_ud_gsd", lang = "fr") -// load Italian LemmatizerModel -val italian_lemma = LemmatizerModel.pretrained("lemma_dxc", lang = "it") -```` - -**Quick offline example:** - -- Loading `PerceptronModel` annotator model inside Spark NLP Pipeline - -```scala -val french_pos = PerceptronModel.load("/tmp/pos_ud_gsd_fr_2.0.2_2.4_1556531457346/") - .setInputCols("document", "token") - .setOutputCol("pos") -``` - -#### Showing Available Models - -There are functions in Spark NLP that will list all the available Models -of a particular Annotator and language for you: - -```scala -import com.johnsnowlabs.nlp.pretrained.ResourceDownloader - -ResourceDownloader.showPublicModels(annotator = "NerDLModel", lang = "en") -/* -+---------------------------------------------+------+---------+ -| Model | lang | version | -+---------------------------------------------+------+---------+ -| onto_100 | en | 2.1.0 | -| onto_300 | en | 2.1.0 | -| ner_dl_bert | en | 2.2.0 | -| onto_100 | en | 2.4.0 | -| ner_conll_elmo | en | 3.2.2 | -+---------------------------------------------+------+---------+ -*/ -``` - -Or if we want to check for a particular version: - -```scala -import com.johnsnowlabs.nlp.pretrained.ResourceDownloader - -ResourceDownloader.showPublicModels(annotator = "NerDLModel", lang = "en", version = "3.1.0") -/* -+----------------------------+------+---------+ -| Model | lang | version | -+----------------------------+------+---------+ -| onto_100 | en | 2.1.0 | -| ner_aspect_based_sentiment | en | 2.6.2 | -| ner_weibo_glove_840B_300d | en | 2.6.2 | -| nerdl_atis_840b_300d | en | 2.7.1 | -| nerdl_snips_100d | en | 2.7.3 | -+----------------------------+------+---------+ -*/ -``` - -And to see a list of available annotators, you can use: - -```scala -import com.johnsnowlabs.nlp.pretrained.ResourceDownloader - -ResourceDownloader.showAvailableAnnotators() -/* -AlbertEmbeddings -AlbertForTokenClassification -AssertionDLModel -... -XlmRoBertaSentenceEmbeddings -XlnetEmbeddings -*/ -``` - -#### Please check out our Models Hub for the full list of [pre-trained models](https://sparknlp.org/models) with examples, demo, benchmark, and more - -## Offline - -Spark NLP library and all the pre-trained models/pipelines can be used entirely offline with no access to the Internet. -If you are behind a proxy or a firewall with no access to the Maven repository (to download packages) or/and no access -to S3 (to automatically download models and pipelines), you can simply follow the instructions to have Spark NLP without -any limitations offline: - -- Instead of using the Maven package, you need to load our Fat JAR -- Instead of using PretrainedPipeline for pretrained pipelines or the `.pretrained()` function to download pretrained - models, you will need to manually download your pipeline/model from [Models Hub](https://sparknlp.org/models), - extract it, and load it. - -Example of `SparkSession` with Fat JAR to have Spark NLP offline: - -```python -spark = SparkSession.builder - .appName("Spark NLP") - .master("local[*]") - .config("spark.driver.memory", "16G") - .config("spark.driver.maxResultSize", "0") - .config("spark.kryoserializer.buffer.max", "2000M") - .config("spark.jars", "/tmp/spark-nlp-assembly-5.3.3.jar") - .getOrCreate() -``` - -- You can download provided Fat JARs from each [release notes](https://github.com/JohnSnowLabs/spark-nlp/releases), - please pay attention to pick the one that suits your environment depending on the device (CPU/GPU) and Apache Spark - version (3.0.x, 3.1.x, 3.2.x, 3.3.x, 3.4.x, and 3.5.x) -- If you are local, you can load the Fat JAR from your local FileSystem, however, if you are in a cluster setup you need - to put the Fat JAR on a distributed FileSystem such as HDFS, DBFS, S3, etc. ( - i.e., `hdfs:///tmp/spark-nlp-assembly-5.3.3.jar`) - -Example of using pretrained Models and Pipelines in offline: - -```python -# instead of using pretrained() for online: -# french_pos = PerceptronModel.pretrained("pos_ud_gsd", lang="fr") -# you download this model, extract it, and use .load -french_pos = PerceptronModel.load("/tmp/pos_ud_gsd_fr_2.0.2_2.4_1556531457346/") - .setInputCols("document", "token") - .setOutputCol("pos") - -# example for pipelines -# instead of using PretrainedPipeline -# pipeline = PretrainedPipeline('explain_document_dl', lang='en') -# you download this pipeline, extract it, and use PipelineModel -PipelineModel.load("/tmp/explain_document_dl_en_2.0.2_2.4_1556530585689/") -``` - -- Since you are downloading and loading models/pipelines manually, this means Spark NLP is not downloading the most - recent and compatible models/pipelines for you. Choosing the right model/pipeline is on you -- If you are local, you can load the model/pipeline from your local FileSystem, however, if you are in a cluster setup - you need to put the model/pipeline on a distributed FileSystem such as HDFS, DBFS, S3, etc. ( - i.e., `hdfs:///tmp/explain_document_dl_en_2.0.2_2.4_1556530585689/`) - -## Examples - -Need more **examples**? Check out our dedicated [Spark NLP Examples](https://github.com/JohnSnowLabs/spark-nlp/tree/master/examples) -repository to showcase all Spark NLP use cases! - -Also, don't forget to check [Spark NLP in Action](https://sparknlp.org/demo) built by Streamlit. - -### All examples: [spark-nlp/examples](https://github.com/JohnSnowLabs/spark-nlp/tree/master/examples) - -## FAQ - -[Check our Articles and Videos page here](https://sparknlp.org/learn) - -## Citation - -We have published a [paper](https://www.sciencedirect.com/science/article/pii/S2665963821000063) that you can cite for -the Spark NLP library: - -```bibtex -@article{KOCAMAN2021100058, - title = {Spark NLP: Natural language understanding at scale}, - journal = {Software Impacts}, - pages = {100058}, - year = {2021}, - issn = {2665-9638}, - doi = {https://doi.org/10.1016/j.simpa.2021.100058}, - url = {https://www.sciencedirect.com/science/article/pii/S2665963.2.300063}, - author = {Veysel Kocaman and David Talby}, - keywords = {Spark, Natural language processing, Deep learning, Tensorflow, Cluster}, - abstract = {Spark NLP is a Natural Language Processing (NLP) library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines that can scale easily in a distributed environment. Spark NLP comes with 1100+ pretrained pipelines and models in more than 192+ languages. It supports nearly all the NLP tasks and modules that can be used seamlessly in a cluster. Downloaded more than 2.7 million times and experiencing 9x growth since January 2020, Spark NLP is used by 54% of healthcare organizations as the world’s most widely used NLP library in the enterprise.} - } -} -``` - -## Contributing - -We appreciate any sort of contributions: - -- ideas -- feedback -- documentation -- bug reports -- NLP training and testing corpora -- Development and testing - -Clone the repo and submit your pull-requests! Or directly create issues in this repo. - -## John Snow Labs +bundle update +bundle install +bundle exec jekyll serve -[http://johnsnowlabs.com](http://johnsnowlabs.com) +# Server address: http://127.0.0.1:4000 +``` \ No newline at end of file diff --git a/docs/_layouts/landing.html b/docs/_layouts/landing.html index 4033101cee7175..ee4766b9904aa2 100755 --- a/docs/_layouts/landing.html +++ b/docs/_layouts/landing.html @@ -201,7 +201,7 @@

{{ _section.title }}

{% highlight bash %} # Using PyPI - $ pip install spark-nlp==5.3.3 + $ pip install spark-nlp==5.4.0 # Using Anaconda/Conda $ conda install -c johnsnowlabs spark-nlp diff --git a/docs/_posts/SaiDeepaPeri/2024-05-06-deepa_xlmroberta_ner_large_en_panx_en.md b/docs/_posts/SaiDeepaPeri/2024-05-06-deepa_xlmroberta_ner_large_en_panx_en.md new file mode 100644 index 00000000000000..82384ddfe6e218 --- /dev/null +++ b/docs/_posts/SaiDeepaPeri/2024-05-06-deepa_xlmroberta_ner_large_en_panx_en.md @@ -0,0 +1,76 @@ +--- +layout: model +title: Deepa Panx Model for English +author: SaiDeepaPeri +name: deepa_xlmroberta_ner_large_en_panx +date: 2024-05-06 +tags: [en, open_source] +task: Named Entity Recognition +language: en +edition: Spark NLP 4.1.0 +spark_version: 3.0 +supported: false +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Named Entity Recognition trained on English panx + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/community.johnsnowlabs.com/SaiDeepaPeri/deepa_xlmroberta_ner_large_en_panx_en_4.1.0_3.0_1715017572119.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://community.johnsnowlabs.com/SaiDeepaPeri/deepa_xlmroberta_ner_large_en_panx_en_4.1.0_3.0_1715017572119.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols(["document"]) \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("deepa_xlmroberta_ner_large_en_panx", "en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter() \ + .setInputCols(["document", "token", "ner"]) \ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +``` + +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deepa_xlmroberta_ner_large_en_panx| +|Compatibility:|Spark NLP 4.1.0+| +|License:|Open Source| +|Edition:|Community| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.8 GB| +|Case sensitive:|true| +|Max sentence length:|256| \ No newline at end of file diff --git a/docs/_posts/SaiDeepaPeri/2024-05-06-deepa_xlmroberta_ner_large_panx_dataset_en.md b/docs/_posts/SaiDeepaPeri/2024-05-06-deepa_xlmroberta_ner_large_panx_dataset_en.md new file mode 100644 index 00000000000000..8bf6d9a7b5c199 --- /dev/null +++ b/docs/_posts/SaiDeepaPeri/2024-05-06-deepa_xlmroberta_ner_large_panx_dataset_en.md @@ -0,0 +1,78 @@ +--- +layout: model +title: "Deepa NER XLMRoberta Large Model : deepa_xlmroberta_ner_large_panx" +author: SaiDeepaPeri +name: deepa_xlmroberta_ner_large_panx_dataset +date: 2024-05-06 +tags: [en, open_source] +task: Named Entity Recognition +language: en +edition: Spark NLP 4.1.0 +spark_version: 3.0 +supported: false +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +NER model XLM Roberta Large Model + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/community.johnsnowlabs.com/SaiDeepaPeri/deepa_xlmroberta_ner_large_panx_dataset_en_4.1.0_3.0_1715028210601.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://community.johnsnowlabs.com/SaiDeepaPeri/deepa_xlmroberta_ner_large_panx_dataset_en_4.1.0_3.0_1715028210601.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +# Create a custom Tokenizer that splits text based on spaces +tokenizer = RegexTokenizer() \ + .setInputCols(["document"]) \ + .setOutputCol("token").setPattern("\\s+") \ + +# deepa_xlmroberta_ner_large_en_panx +token_classifier = XlmRoBertaForTokenClassification.pretrained("deepa_xlmroberta_ner_large_panx", "en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter() \ + .setInputCols(["document", "token", "ner"]) \ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +``` + +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deepa_xlmroberta_ner_large_panx_dataset| +|Compatibility:|Spark NLP 4.1.0+| +|License:|Open Source| +|Edition:|Community| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.8 GB| +|Case sensitive:|true| +|Max sentence length:|256| \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-02-11-bge_m3_xx.md b/docs/_posts/ahmedlone127/2024-02-11-bge_m3_xx.md index b7efcdf8a199c4..717af5c0065301 100644 --- a/docs/_posts/ahmedlone127/2024-02-11-bge_m3_xx.md +++ b/docs/_posts/ahmedlone127/2024-02-11-bge_m3_xx.md @@ -68,7 +68,7 @@ val sentencerDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx .setOutputCol("sentence") val embeddings = XlmRoBertaSentenceEmbeddings - .pretrained("bge_m3", "xx") + .pretrained("bge_m3 ", "xx") .setInputCols(Array("sentence")) .setOutputCol("embeddings") diff --git a/docs/_posts/ahmedlone127/2024-05-19-llama_2_7b_chat_hf_int4_en.md b/docs/_posts/ahmedlone127/2024-05-19-llama_2_7b_chat_hf_int4_en.md new file mode 100644 index 00000000000000..a103323a8bf234 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-05-19-llama_2_7b_chat_hf_int4_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: Llama-2 text-to-text model 7b int4 +author: John Snow Labs +name: llama_2_7b_chat_hf_int4 +date: 2024-05-19 +tags: [en, llama2, open_source] +task: Text Generation +language: en +nav_key: models +edition: Spark NLP 5.3.0 +spark_version: 3.0 +supported: true +recommended: true +annotator: LLAMA2Transformer +article_header: +type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Meta developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases. Llama-2-Chat models outperform open-source chat models on most benchmarks we tested, and in our human evaluations for helpfulness and safety, are on par with some popular closed-source models like ChatGPT and PaLM. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/llama_2_7b_chat_hf_int4_en_5.3.0_3.0_1708946358903.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/llama_2_7b_chat_hf_int4_en_5.3.0_3.0_1708946358903.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ +.setInputCol("text") \ +.setOutputCol("documents") + +llama2 = LLAMA2Transformer \ + .pretrained("llama_2_7b_chat_hf_int4") \ + .setMaxOutputLength(50) \ + .setDoSample(False) \ + .setInputCols(["documents"]) \ + .setOutputCol("generation") + +pipeline = Pipeline().setStages([documentAssembler, llama2]) +data = spark.createDataFrame([["My name is Leonardo."]]).toDF("text") +result = pipeline.fit(data).transform(data) +result.select("summaries.generation").show(truncate=False) +``` +```scala +val documentAssembler = new DocumentAssembler() +.setInputCol("text") +.setOutputCol("documents") + +val llama2 = LLAMA2Transformer.pretrained("llama_2_7b_chat_hf_int4") + .setMaxOutputLength(50) + .setDoSample(False) + .setInputCols(["documents"]) + .setOutputCol("generation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, llama2)) + +val data = Seq("My name is Leonardo.").toDF("text") +val result = pipeline.fit(data).transform(data) +results.select("generation.result").show(truncate = false) +``` + +
+ + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|llama_2_7b_chat_hf_int4| +|Compatibility:|Spark NLP 5.3.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents]| +|Output Labels:|[generation]| +|Language:|en| diff --git a/docs/_posts/ahmedlone127/2024-05-19-llama_2_7b_chat_hf_int8_en.md b/docs/_posts/ahmedlone127/2024-05-19-llama_2_7b_chat_hf_int8_en.md new file mode 100644 index 00000000000000..b3c1209115374c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-05-19-llama_2_7b_chat_hf_int8_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: Llama-2 text-to-text model 7b int8 +author: John Snow Labs +name: llama_2_7b_chat_hf_int8 +date: 2024-05-19 +tags: [en, llama2, open_source] +task: Text Generation +language: en +nav_key: models +edition: Spark NLP 5.3.0 +spark_version: 3.0 +supported: true +recommended: true +annotator: LLAMA2Transformer +article_header: +type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Meta developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases. Llama-2-Chat models outperform open-source chat models on most benchmarks we tested, and in our human evaluations for helpfulness and safety, are on par with some popular closed-source models like ChatGPT and PaLM. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/llama_2_7b_chat_hf_int8_en_5.3.0_3.0_1708952065310.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/llama_2_7b_chat_hf_int8_en_5.3.0_3.0_1708952065310.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ +.setInputCol("text") \ +.setOutputCol("documents") + +llama2 = LLAMA2Transformer \ + .pretrained("llama_2_7b_chat_hf_int8") \ + .setMaxOutputLength(50) \ + .setDoSample(False) \ + .setInputCols(["documents"]) \ + .setOutputCol("generation") + +pipeline = Pipeline().setStages([documentAssembler, llama2]) +data = spark.createDataFrame([["My name is Leonardo."]]).toDF("text") +result = pipeline.fit(data).transform(data) +result.select("summaries.generation").show(truncate=False) +``` +```scala +val documentAssembler = new DocumentAssembler() +.setInputCol("text") +.setOutputCol("documents") + +val llama2 = LLAMA2Transformer.pretrained("llama_2_7b_chat_hf_int8") + .setMaxOutputLength(50) + .setDoSample(False) + .setInputCols(["documents"]) + .setOutputCol("generation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, llama2)) + +val data = Seq("My name is Leonardo.").toDF("text") +val result = pipeline.fit(data).transform(data) +results.select("generation.result").show(truncate = false) +``` + +
+ + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|llama_2_7b_chat_hf_int8| +|Compatibility:|Spark NLP 5.3.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents]| +|Output Labels:|[generation]| +|Language:|en| diff --git a/docs/_posts/ahmedlone127/2024-05-19-m2m100_1.2B_xx.md b/docs/_posts/ahmedlone127/2024-05-19-m2m100_1.2B_xx.md new file mode 100644 index 00000000000000..24ba9548f44e06 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-05-19-m2m100_1.2B_xx.md @@ -0,0 +1,89 @@ +--- +layout: model +title: M2M100 Multilingual Translation 1.2B +author: John Snow Labs +name: m2m100_418M +date: 2024-05-19 +tags: [xx, m2m100, open_source] +task: Text Generation +language: xx +nav_key: models +edition: Spark NLP 5.3.0 +spark_version: 3.0 +supported: true +recommended: true +annotator: M2M100Transformer +article_header: +type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +M2M100 is a multilingual encoder-decoder (seq-to-seq) model trained for Many-to-Many multilingual translation +The model that can directly translate between the 9,900 directions of 100 languages. To translate into a target language, the target language id is forced as the first generated token. To force the target language id as the first generated token, pass the forced_bos_token_id parameter to the generate method. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/m2m100_1.2B_xx_5.3.0_3.0_1708953931627.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/m2m100_1.2B_xx_5.3.0_3.0_1708953931627.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ +.setInputCol("text") \ +.setOutputCol("documents") + +m2m100 = M2M100Transformer.pretrained("m2m100_1.2B","xx") \ + .setInputCols(["documents"]) \ + .setMaxOutputLength(50) \ + .setOutputCol("generation") \ + .setSrcLang("en") \ + .setTgtLang("zh") + + +pipeline = Pipeline().setStages([documentAssembler, m2m100]) +data = spark.createDataFrame([["My name is Leonardo."]]).toDF("text") +result = pipeline.fit(data).transform(data) +result.show(truncate = false) +``` +```scala +val documentAssembler = new DocumentAssembler() +.setInputCol("text") +.setOutputCol("documents") + +val m2m100 = M2M100Transformer.pretrained("m2m100_1.2B","xx") + .setInputCols(Array("documents")) + .setMaxOutputLength(50) + .setOutputCol("generation") + .setSrcLang("en") + .setTgtLang("zh") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, m2m100)) + +val data = Seq("My name is Leonardo.").toDF("text") +val result = pipeline.fit(data).transform(data) +result.show(truncate = false) +``` + +
+ + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|m2m100_1.2B| +|Compatibility:|Spark NLP 5.3.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents]| +|Output Labels:|[generation]| +|Language:|xx| diff --git a/docs/_posts/ahmedlone127/2024-05-19-m2m100_418M_xx.md b/docs/_posts/ahmedlone127/2024-05-19-m2m100_418M_xx.md new file mode 100644 index 00000000000000..fa7c63cc394bc1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-05-19-m2m100_418M_xx.md @@ -0,0 +1,89 @@ +--- +layout: model +title: M2M100 Multilingual Translation 418M +author: John Snow Labs +name: m2m100_418M +date: 2024-05-19 +tags: [xx, m2m100, open_source] +task: Text Generation +language: xx +nav_key: models +edition: Spark NLP 5.3.0 +spark_version: 3.0 +supported: true +recommended: true +annotator: M2M100Transformer +article_header: +type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +M2M100 is a multilingual encoder-decoder (seq-to-seq) model trained for Many-to-Many multilingual translation +The model that can directly translate between the 9,900 directions of 100 languages. To translate into a target language, the target language id is forced as the first generated token. To force the target language id as the first generated token, pass the forced_bos_token_id parameter to the generate method. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/m2m100_418M_xx_5.3.0_3.0_1708953899877.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/m2m100_418M_xx_5.3.0_3.0_1708953899877.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ +.setInputCol("text") \ +.setOutputCol("documents") + +m2m100 = M2M100Transformer.pretrained("m2m100_418M","xx") \ + .setInputCols(["documents"]) \ + .setMaxOutputLength(50) \ + .setOutputCol("generation") \ + .setSrcLang("en") \ + .setTgtLang("zh") + + +pipeline = Pipeline().setStages([documentAssembler, m2m100]) +data = spark.createDataFrame([["My name is Leonardo."]]).toDF("text") +result = pipeline.fit(data).transform(data) +result.show(truncate = false) +``` +```scala +val documentAssembler = new DocumentAssembler() +.setInputCol("text") +.setOutputCol("documents") + +val m2m100 = M2M100Transformer.pretrained("m2m100_418M","xx") + .setInputCols(Array("documents")) + .setMaxOutputLength(50) + .setOutputCol("generation") + .setSrcLang("en") + .setTgtLang("zh") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, m2m100)) + +val data = Seq("My name is Leonardo.").toDF("text") +val result = pipeline.fit(data).transform(data) +result.show(truncate = false) +``` + +
+ + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|m2m100_418M| +|Compatibility:|Spark NLP 5.3.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[documents]| +|Output Labels:|[generation]| +|Language:|xx| diff --git a/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_nowr_1_1_en.md b/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_nowr_1_1_en.md new file mode 100644 index 00000000000000..7bcc1e9c6c2598 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_nowr_1_1_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English baai_bge_base_english_nowr_1_1 BGEEmbeddings from alexakkol +author: John Snow Labs +name: baai_bge_base_english_nowr_1_1 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`baai_bge_base_english_nowr_1_1` is a English model originally trained by alexakkol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/baai_bge_base_english_nowr_1_1_en_5.4.0_3.0_1718060836858.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/baai_bge_base_english_nowr_1_1_en_5.4.0_3.0_1718060836858.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("baai_bge_base_english_nowr_1_1","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("baai_bge_base_english_nowr_1_1","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|baai_bge_base_english_nowr_1_1| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|381.8 MB| + +## References + +https://huggingface.co/alexakkol/BAAI-bge-base-en-nowr-1-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_nowr_1_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_nowr_1_1_pipeline_en.md new file mode 100644 index 00000000000000..c850ce6f447dfb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_nowr_1_1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English baai_bge_base_english_nowr_1_1_pipeline pipeline BGEEmbeddings from alexakkol +author: John Snow Labs +name: baai_bge_base_english_nowr_1_1_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`baai_bge_base_english_nowr_1_1_pipeline` is a English model originally trained by alexakkol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/baai_bge_base_english_nowr_1_1_pipeline_en_5.4.0_3.0_1718060870292.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/baai_bge_base_english_nowr_1_1_pipeline_en_5.4.0_3.0_1718060870292.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("baai_bge_base_english_nowr_1_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("baai_bge_base_english_nowr_1_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|baai_bge_base_english_nowr_1_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|381.8 MB| + +## References + +https://huggingface.co/alexakkol/BAAI-bge-base-en-nowr-1-1 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_nowr_1_2_en.md b/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_nowr_1_2_en.md new file mode 100644 index 00000000000000..64fad700bf1058 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_nowr_1_2_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English baai_bge_base_english_nowr_1_2 BGEEmbeddings from alexakkol +author: John Snow Labs +name: baai_bge_base_english_nowr_1_2 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`baai_bge_base_english_nowr_1_2` is a English model originally trained by alexakkol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/baai_bge_base_english_nowr_1_2_en_5.4.0_3.0_1718061906872.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/baai_bge_base_english_nowr_1_2_en_5.4.0_3.0_1718061906872.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("baai_bge_base_english_nowr_1_2","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("baai_bge_base_english_nowr_1_2","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|baai_bge_base_english_nowr_1_2| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|112.6 MB| + +## References + +https://huggingface.co/alexakkol/BAAI-bge-base-en-nowr-1-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_nowr_1_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_nowr_1_2_pipeline_en.md new file mode 100644 index 00000000000000..fdf6a00bfe0c65 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_nowr_1_2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English baai_bge_base_english_nowr_1_2_pipeline pipeline BGEEmbeddings from alexakkol +author: John Snow Labs +name: baai_bge_base_english_nowr_1_2_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`baai_bge_base_english_nowr_1_2_pipeline` is a English model originally trained by alexakkol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/baai_bge_base_english_nowr_1_2_pipeline_en_5.4.0_3.0_1718061918748.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/baai_bge_base_english_nowr_1_2_pipeline_en_5.4.0_3.0_1718061918748.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("baai_bge_base_english_nowr_1_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("baai_bge_base_english_nowr_1_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|baai_bge_base_english_nowr_1_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|112.6 MB| + +## References + +https://huggingface.co/alexakkol/BAAI-bge-base-en-nowr-1-2 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_v1_5_tunned_for_blender_issues_en.md b/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_v1_5_tunned_for_blender_issues_en.md new file mode 100644 index 00000000000000..356e6d5553886d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_v1_5_tunned_for_blender_issues_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English baai_bge_base_english_v1_5_tunned_for_blender_issues BGEEmbeddings from mano-wii +author: John Snow Labs +name: baai_bge_base_english_v1_5_tunned_for_blender_issues +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`baai_bge_base_english_v1_5_tunned_for_blender_issues` is a English model originally trained by mano-wii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/baai_bge_base_english_v1_5_tunned_for_blender_issues_en_5.4.0_3.0_1718061935130.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/baai_bge_base_english_v1_5_tunned_for_blender_issues_en_5.4.0_3.0_1718061935130.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("baai_bge_base_english_v1_5_tunned_for_blender_issues","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("baai_bge_base_english_v1_5_tunned_for_blender_issues","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|baai_bge_base_english_v1_5_tunned_for_blender_issues| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|391.8 MB| + +## References + +https://huggingface.co/mano-wii/BAAI_bge-base-en-v1.5-tunned-for-blender-issues \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_v1_5_tunned_for_blender_issues_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_v1_5_tunned_for_blender_issues_pipeline_en.md new file mode 100644 index 00000000000000..c52f180e4f4f9d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_v1_5_tunned_for_blender_issues_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English baai_bge_base_english_v1_5_tunned_for_blender_issues_pipeline pipeline BGEEmbeddings from mano-wii +author: John Snow Labs +name: baai_bge_base_english_v1_5_tunned_for_blender_issues_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`baai_bge_base_english_v1_5_tunned_for_blender_issues_pipeline` is a English model originally trained by mano-wii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/baai_bge_base_english_v1_5_tunned_for_blender_issues_pipeline_en_5.4.0_3.0_1718061967273.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/baai_bge_base_english_v1_5_tunned_for_blender_issues_pipeline_en_5.4.0_3.0_1718061967273.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("baai_bge_base_english_v1_5_tunned_for_blender_issues_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("baai_bge_base_english_v1_5_tunned_for_blender_issues_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|baai_bge_base_english_v1_5_tunned_for_blender_issues_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|391.8 MB| + +## References + +https://huggingface.co/mano-wii/BAAI_bge-base-en-v1.5-tunned-for-blender-issues + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-baai_bge_small_english_nowr_1_1_en.md b/docs/_posts/ahmedlone127/2024-06-10-baai_bge_small_english_nowr_1_1_en.md new file mode 100644 index 00000000000000..87a12a6a465256 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-baai_bge_small_english_nowr_1_1_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English baai_bge_small_english_nowr_1_1 BGEEmbeddings from alexakkol +author: John Snow Labs +name: baai_bge_small_english_nowr_1_1 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`baai_bge_small_english_nowr_1_1` is a English model originally trained by alexakkol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/baai_bge_small_english_nowr_1_1_en_5.4.0_3.0_1718062330784.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/baai_bge_small_english_nowr_1_1_en_5.4.0_3.0_1718062330784.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("baai_bge_small_english_nowr_1_1","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("baai_bge_small_english_nowr_1_1","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|baai_bge_small_english_nowr_1_1| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|112.6 MB| + +## References + +https://huggingface.co/alexakkol/BAAI-bge-small-en-nowr-1-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-baai_bge_small_english_nowr_1_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-baai_bge_small_english_nowr_1_1_pipeline_en.md new file mode 100644 index 00000000000000..fd2c0afffa822e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-baai_bge_small_english_nowr_1_1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English baai_bge_small_english_nowr_1_1_pipeline pipeline BGEEmbeddings from alexakkol +author: John Snow Labs +name: baai_bge_small_english_nowr_1_1_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`baai_bge_small_english_nowr_1_1_pipeline` is a English model originally trained by alexakkol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/baai_bge_small_english_nowr_1_1_pipeline_en_5.4.0_3.0_1718062342271.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/baai_bge_small_english_nowr_1_1_pipeline_en_5.4.0_3.0_1718062342271.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("baai_bge_small_english_nowr_1_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("baai_bge_small_english_nowr_1_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|baai_bge_small_english_nowr_1_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|112.6 MB| + +## References + +https://huggingface.co/alexakkol/BAAI-bge-small-en-nowr-1-1 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_fine_tuned_reels_1_1_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_fine_tuned_reels_1_1_en.md new file mode 100644 index 00000000000000..83f24c0862a2b9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_fine_tuned_reels_1_1_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_english_v1_5_fine_tuned_reels_1_1 BGEEmbeddings from ditengm +author: John Snow Labs +name: bge_base_english_v1_5_fine_tuned_reels_1_1 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_fine_tuned_reels_1_1` is a English model originally trained by ditengm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_fine_tuned_reels_1_1_en_5.4.0_3.0_1718061343041.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_fine_tuned_reels_1_1_en_5.4.0_3.0_1718061343041.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_fine_tuned_reels_1_1","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_fine_tuned_reels_1_1","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_fine_tuned_reels_1_1| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|378.9 MB| + +## References + +https://huggingface.co/ditengm/bge-base-en-v1.5-fine-tuned_reels_1.1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_fine_tuned_reels_1_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_fine_tuned_reels_1_1_pipeline_en.md new file mode 100644 index 00000000000000..e1da2183d9b18d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_fine_tuned_reels_1_1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_english_v1_5_fine_tuned_reels_1_1_pipeline pipeline BGEEmbeddings from ditengm +author: John Snow Labs +name: bge_base_english_v1_5_fine_tuned_reels_1_1_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_fine_tuned_reels_1_1_pipeline` is a English model originally trained by ditengm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_fine_tuned_reels_1_1_pipeline_en_5.4.0_3.0_1718061383890.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_fine_tuned_reels_1_1_pipeline_en_5.4.0_3.0_1718061383890.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_english_v1_5_fine_tuned_reels_1_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_english_v1_5_fine_tuned_reels_1_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_fine_tuned_reels_1_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|378.9 MB| + +## References + +https://huggingface.co/ditengm/bge-base-en-v1.5-fine-tuned_reels_1.1 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_fine_tuning_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_fine_tuning_en.md new file mode 100644 index 00000000000000..d84a23b6194f27 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_fine_tuning_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_english_v1_5_fine_tuning BGEEmbeddings from bespin-global +author: John Snow Labs +name: bge_base_english_v1_5_fine_tuning +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_fine_tuning` is a English model originally trained by bespin-global. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_fine_tuning_en_5.4.0_3.0_1718060292998.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_fine_tuning_en_5.4.0_3.0_1718060292998.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_fine_tuning","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_fine_tuning","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_fine_tuning| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|391.3 MB| + +## References + +https://huggingface.co/bespin-global/bge-base-en-v1.5-fine-tuning \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_fine_tuning_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_fine_tuning_pipeline_en.md new file mode 100644 index 00000000000000..cda338a6b9b79d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_fine_tuning_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_english_v1_5_fine_tuning_pipeline pipeline BGEEmbeddings from bespin-global +author: John Snow Labs +name: bge_base_english_v1_5_fine_tuning_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_fine_tuning_pipeline` is a English model originally trained by bespin-global. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_fine_tuning_pipeline_en_5.4.0_3.0_1718059920999.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_fine_tuning_pipeline_en_5.4.0_3.0_1718059920999.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_english_v1_5_fine_tuning_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_english_v1_5_fine_tuning_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_fine_tuning_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|391.3 MB| + +## References + +https://huggingface.co/bespin-global/bge-base-en-v1.5-fine-tuning + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_finetune10epochs_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_finetune10epochs_en.md new file mode 100644 index 00000000000000..3d310da8b86b58 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_finetune10epochs_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_english_v1_5_finetune10epochs BGEEmbeddings from DaisyMak +author: John Snow Labs +name: bge_base_english_v1_5_finetune10epochs +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_finetune10epochs` is a English model originally trained by DaisyMak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_finetune10epochs_en_5.4.0_3.0_1718062128718.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_finetune10epochs_en_5.4.0_3.0_1718062128718.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_finetune10epochs","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_finetune10epochs","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_finetune10epochs| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|398.6 MB| + +## References + +https://huggingface.co/DaisyMak/bge_base_en_v1.5_finetune10epochs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_finetune10epochs_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_finetune10epochs_pipeline_en.md new file mode 100644 index 00000000000000..094a68ba31216f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_finetune10epochs_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_english_v1_5_finetune10epochs_pipeline pipeline BGEEmbeddings from DaisyMak +author: John Snow Labs +name: bge_base_english_v1_5_finetune10epochs_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_finetune10epochs_pipeline` is a English model originally trained by DaisyMak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_finetune10epochs_pipeline_en_5.4.0_3.0_1718062157638.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_finetune10epochs_pipeline_en_5.4.0_3.0_1718062157638.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_english_v1_5_finetune10epochs_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_english_v1_5_finetune10epochs_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_finetune10epochs_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|398.6 MB| + +## References + +https://huggingface.co/DaisyMak/bge_base_en_v1.5_finetune10epochs + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_finetune_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_finetune_en.md new file mode 100644 index 00000000000000..b203152b83eab1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_finetune_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_english_v1_5_finetune BGEEmbeddings from DaisyMak +author: John Snow Labs +name: bge_base_english_v1_5_finetune +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_finetune` is a English model originally trained by DaisyMak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_finetune_en_5.4.0_3.0_1718060651651.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_finetune_en_5.4.0_3.0_1718060651651.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_finetune","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_finetune","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_finetune| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|398.9 MB| + +## References + +https://huggingface.co/DaisyMak/bge_base_en_v1.5_finetune \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_finetune_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_finetune_pipeline_en.md new file mode 100644 index 00000000000000..c54c6e2e895372 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_finetune_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_english_v1_5_finetune_pipeline pipeline BGEEmbeddings from DaisyMak +author: John Snow Labs +name: bge_base_english_v1_5_finetune_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_finetune_pipeline` is a English model originally trained by DaisyMak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_finetune_pipeline_en_5.4.0_3.0_1718060680773.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_finetune_pipeline_en_5.4.0_3.0_1718060680773.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_english_v1_5_finetune_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_english_v1_5_finetune_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_finetune_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|398.9 MB| + +## References + +https://huggingface.co/DaisyMak/bge_base_en_v1.5_finetune + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_1_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_1_en.md new file mode 100644 index 00000000000000..b99239e443add0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_1_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_english_v1_5_ft_quora_0_1 BGEEmbeddings from 567-labs +author: John Snow Labs +name: bge_base_english_v1_5_ft_quora_0_1 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_ft_quora_0_1` is a English model originally trained by 567-labs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_1_en_5.4.0_3.0_1718061548167.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_1_en_5.4.0_3.0_1718061548167.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_ft_quora_0_1","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_ft_quora_0_1","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_ft_quora_0_1| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|391.9 MB| + +## References + +https://huggingface.co/567-labs/bge-base-en-v1.5-ft-quora-0.1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_1_pipeline_en.md new file mode 100644 index 00000000000000..3c8bcaa17bf7c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_english_v1_5_ft_quora_0_1_pipeline pipeline BGEEmbeddings from 567-labs +author: John Snow Labs +name: bge_base_english_v1_5_ft_quora_0_1_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_ft_quora_0_1_pipeline` is a English model originally trained by 567-labs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_1_pipeline_en_5.4.0_3.0_1718061580555.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_1_pipeline_en_5.4.0_3.0_1718061580555.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_english_v1_5_ft_quora_0_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_english_v1_5_ft_quora_0_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_ft_quora_0_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|391.9 MB| + +## References + +https://huggingface.co/567-labs/bge-base-en-v1.5-ft-quora-0.1 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_7_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_7_en.md new file mode 100644 index 00000000000000..12156067428e70 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_7_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_english_v1_5_ft_quora_0_7 BGEEmbeddings from 567-labs +author: John Snow Labs +name: bge_base_english_v1_5_ft_quora_0_7 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_ft_quora_0_7` is a English model originally trained by 567-labs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_7_en_5.4.0_3.0_1718062068068.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_7_en_5.4.0_3.0_1718062068068.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_ft_quora_0_7","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_ft_quora_0_7","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_ft_quora_0_7| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|398.9 MB| + +## References + +https://huggingface.co/567-labs/bge-base-en-v1.5-ft-quora-0.7 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_7_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_7_pipeline_en.md new file mode 100644 index 00000000000000..5e7254e4fd493d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_7_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_english_v1_5_ft_quora_0_7_pipeline pipeline BGEEmbeddings from 567-labs +author: John Snow Labs +name: bge_base_english_v1_5_ft_quora_0_7_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_ft_quora_0_7_pipeline` is a English model originally trained by 567-labs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_7_pipeline_en_5.4.0_3.0_1718062096929.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_7_pipeline_en_5.4.0_3.0_1718062096929.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_english_v1_5_ft_quora_0_7_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_english_v1_5_ft_quora_0_7_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_ft_quora_0_7_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|398.9 MB| + +## References + +https://huggingface.co/567-labs/bge-base-en-v1.5-ft-quora-0.7 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_9_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_9_en.md new file mode 100644 index 00000000000000..6869e5f94c99ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_9_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_english_v1_5_ft_quora_0_9 BGEEmbeddings from 567-labs +author: John Snow Labs +name: bge_base_english_v1_5_ft_quora_0_9 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_ft_quora_0_9` is a English model originally trained by 567-labs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_9_en_5.4.0_3.0_1718060415847.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_9_en_5.4.0_3.0_1718060415847.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_ft_quora_0_9","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_ft_quora_0_9","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_ft_quora_0_9| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|399.5 MB| + +## References + +https://huggingface.co/567-labs/bge-base-en-v1.5-ft-quora-0.9 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_9_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_9_pipeline_en.md new file mode 100644 index 00000000000000..6fd6f87601208a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_9_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_english_v1_5_ft_quora_0_9_pipeline pipeline BGEEmbeddings from 567-labs +author: John Snow Labs +name: bge_base_english_v1_5_ft_quora_0_9_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_ft_quora_0_9_pipeline` is a English model originally trained by 567-labs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_9_pipeline_en_5.4.0_3.0_1718060444264.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_9_pipeline_en_5.4.0_3.0_1718060444264.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_english_v1_5_ft_quora_0_9_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_english_v1_5_ft_quora_0_9_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_ft_quora_0_9_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|399.5 MB| + +## References + +https://huggingface.co/567-labs/bge-base-en-v1.5-ft-quora-0.9 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_2_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_2_en.md new file mode 100644 index 00000000000000..d68499300ba3aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_2_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_2 BGEEmbeddings from Sailesh9999 +author: John Snow Labs +name: bge_base_financial_matryoshka_2 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_2` is a English model originally trained by Sailesh9999. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_2_en_5.4.0_3.0_1718061721095.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_2_en_5.4.0_3.0_1718061721095.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_2","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_2","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_2| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.0 MB| + +## References + +https://huggingface.co/Sailesh9999/bge-base-financial-matryoshka_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_2_pipeline_en.md new file mode 100644 index 00000000000000..3858f6113d10d4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_2_pipeline pipeline BGEEmbeddings from Sailesh9999 +author: John Snow Labs +name: bge_base_financial_matryoshka_2_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_2_pipeline` is a English model originally trained by Sailesh9999. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_2_pipeline_en_5.4.0_3.0_1718061755520.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_2_pipeline_en_5.4.0_3.0_1718061755520.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_financial_matryoshka_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_financial_matryoshka_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/Sailesh9999/bge-base-financial-matryoshka_2 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_3_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_3_en.md new file mode 100644 index 00000000000000..0d572a5404d689 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_3_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_3 BGEEmbeddings from Sailesh9999 +author: John Snow Labs +name: bge_base_financial_matryoshka_3 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_3` is a English model originally trained by Sailesh9999. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_3_en_5.4.0_3.0_1718061341209.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_3_en_5.4.0_3.0_1718061341209.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_3","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_3","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_3| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/Sailesh9999/bge-base-financial-matryoshka_3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_3_pipeline_en.md new file mode 100644 index 00000000000000..4ec7488f60d454 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_3_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_3_pipeline pipeline BGEEmbeddings from Sailesh9999 +author: John Snow Labs +name: bge_base_financial_matryoshka_3_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_3_pipeline` is a English model originally trained by Sailesh9999. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_3_pipeline_en_5.4.0_3.0_1718061378108.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_3_pipeline_en_5.4.0_3.0_1718061378108.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_financial_matryoshka_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_financial_matryoshka_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/Sailesh9999/bge-base-financial-matryoshka_3 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_andresckamilo_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_andresckamilo_en.md new file mode 100644 index 00000000000000..808ec000635b61 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_andresckamilo_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_andresckamilo BGEEmbeddings from Andresckamilo +author: John Snow Labs +name: bge_base_financial_matryoshka_andresckamilo +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_andresckamilo` is a English model originally trained by Andresckamilo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_andresckamilo_en_5.4.0_3.0_1718062145384.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_andresckamilo_en_5.4.0_3.0_1718062145384.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_andresckamilo","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_andresckamilo","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_andresckamilo| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/Andresckamilo/bge-base-financial-matryoshka \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_andresckamilo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_andresckamilo_pipeline_en.md new file mode 100644 index 00000000000000..b7c266563ecf0a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_andresckamilo_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_andresckamilo_pipeline pipeline BGEEmbeddings from Andresckamilo +author: John Snow Labs +name: bge_base_financial_matryoshka_andresckamilo_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_andresckamilo_pipeline` is a English model originally trained by Andresckamilo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_andresckamilo_pipeline_en_5.4.0_3.0_1718062180077.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_andresckamilo_pipeline_en_5.4.0_3.0_1718062180077.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_financial_matryoshka_andresckamilo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_financial_matryoshka_andresckamilo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_andresckamilo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/Andresckamilo/bge-base-financial-matryoshka + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_gk29382231121_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_gk29382231121_en.md new file mode 100644 index 00000000000000..8a84e026f64db0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_gk29382231121_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_gk29382231121 BGEEmbeddings from gK29382231121 +author: John Snow Labs +name: bge_base_financial_matryoshka_gk29382231121 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_gk29382231121` is a English model originally trained by gK29382231121. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_gk29382231121_en_5.4.0_3.0_1718063444199.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_gk29382231121_en_5.4.0_3.0_1718063444199.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_gk29382231121","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_gk29382231121","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_gk29382231121| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.2 MB| + +## References + +https://huggingface.co/gK29382231121/bge-base-financial-matryoshka \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_gk29382231121_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_gk29382231121_pipeline_en.md new file mode 100644 index 00000000000000..01a99bd4096715 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_gk29382231121_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_gk29382231121_pipeline pipeline BGEEmbeddings from gK29382231121 +author: John Snow Labs +name: bge_base_financial_matryoshka_gk29382231121_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_gk29382231121_pipeline` is a English model originally trained by gK29382231121. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_gk29382231121_pipeline_en_5.4.0_3.0_1718063478643.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_gk29382231121_pipeline_en_5.4.0_3.0_1718063478643.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_financial_matryoshka_gk29382231121_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_financial_matryoshka_gk29382231121_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_gk29382231121_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.2 MB| + +## References + +https://huggingface.co/gK29382231121/bge-base-financial-matryoshka + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_frombge_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_frombge_en.md new file mode 100644 index 00000000000000..495793bd9b5b4f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_frombge_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_frombge BGEEmbeddings from joshus +author: John Snow Labs +name: bge_base_frombge +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_frombge` is a English model originally trained by joshus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_frombge_en_5.4.0_3.0_1718061477607.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_frombge_en_5.4.0_3.0_1718061477607.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_frombge","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_frombge","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_frombge| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|382.9 MB| + +## References + +https://huggingface.co/joshus/bge-base-frombge \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_frombge_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_frombge_pipeline_en.md new file mode 100644 index 00000000000000..dfacf14768b528 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_frombge_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_frombge_pipeline pipeline BGEEmbeddings from joshus +author: John Snow Labs +name: bge_base_frombge_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_frombge_pipeline` is a English model originally trained by joshus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_frombge_pipeline_en_5.4.0_3.0_1718061525807.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_frombge_pipeline_en_5.4.0_3.0_1718061525807.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_frombge_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_frombge_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_frombge_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|382.9 MB| + +## References + +https://huggingface.co/joshus/bge-base-frombge + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v2_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v2_en.md new file mode 100644 index 00000000000000..2d1287e6ae3cb6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v2_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_securiti_dataset_1_v2 BGEEmbeddings from MugheesAwan11 +author: John Snow Labs +name: bge_base_securiti_dataset_1_v2 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_securiti_dataset_1_v2` is a English model originally trained by MugheesAwan11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v2_en_5.4.0_3.0_1718061325859.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v2_en_5.4.0_3.0_1718061325859.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_securiti_dataset_1_v2","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_securiti_dataset_1_v2","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_securiti_dataset_1_v2| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|376.4 MB| + +## References + +https://huggingface.co/MugheesAwan11/bge-base-securiti-dataset-1-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v2_pipeline_en.md new file mode 100644 index 00000000000000..1567a52029b15e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_securiti_dataset_1_v2_pipeline pipeline BGEEmbeddings from MugheesAwan11 +author: John Snow Labs +name: bge_base_securiti_dataset_1_v2_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_securiti_dataset_1_v2_pipeline` is a English model originally trained by MugheesAwan11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v2_pipeline_en_5.4.0_3.0_1718061365578.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v2_pipeline_en_5.4.0_3.0_1718061365578.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_securiti_dataset_1_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_securiti_dataset_1_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_securiti_dataset_1_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|376.4 MB| + +## References + +https://huggingface.co/MugheesAwan11/bge-base-securiti-dataset-1-v2 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v4_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v4_en.md new file mode 100644 index 00000000000000..8eaa661bca6c21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v4_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_securiti_dataset_1_v4 BGEEmbeddings from MugheesAwan11 +author: John Snow Labs +name: bge_base_securiti_dataset_1_v4 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_securiti_dataset_1_v4` is a English model originally trained by MugheesAwan11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v4_en_5.4.0_3.0_1718063769712.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v4_en_5.4.0_3.0_1718063769712.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_securiti_dataset_1_v4","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_securiti_dataset_1_v4","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_securiti_dataset_1_v4| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|376.5 MB| + +## References + +https://huggingface.co/MugheesAwan11/bge-base-securiti-dataset-1-v4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v4_pipeline_en.md new file mode 100644 index 00000000000000..404e6a88fb916d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v4_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_securiti_dataset_1_v4_pipeline pipeline BGEEmbeddings from MugheesAwan11 +author: John Snow Labs +name: bge_base_securiti_dataset_1_v4_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_securiti_dataset_1_v4_pipeline` is a English model originally trained by MugheesAwan11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v4_pipeline_en_5.4.0_3.0_1718063809314.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v4_pipeline_en_5.4.0_3.0_1718063809314.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_securiti_dataset_1_v4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_securiti_dataset_1_v4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_securiti_dataset_1_v4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|376.5 MB| + +## References + +https://huggingface.co/MugheesAwan11/bge-base-securiti-dataset-1-v4 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v6_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v6_en.md new file mode 100644 index 00000000000000..194149ad581739 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v6_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_securiti_dataset_1_v6 BGEEmbeddings from MugheesAwan11 +author: John Snow Labs +name: bge_base_securiti_dataset_1_v6 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_securiti_dataset_1_v6` is a English model originally trained by MugheesAwan11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v6_en_5.4.0_3.0_1718061536674.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v6_en_5.4.0_3.0_1718061536674.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_securiti_dataset_1_v6","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_securiti_dataset_1_v6","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_securiti_dataset_1_v6| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|376.6 MB| + +## References + +https://huggingface.co/MugheesAwan11/bge-base-securiti-dataset-1-v6 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v6_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v6_pipeline_en.md new file mode 100644 index 00000000000000..97706ca1155d91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v6_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_securiti_dataset_1_v6_pipeline pipeline BGEEmbeddings from MugheesAwan11 +author: John Snow Labs +name: bge_base_securiti_dataset_1_v6_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_securiti_dataset_1_v6_pipeline` is a English model originally trained by MugheesAwan11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v6_pipeline_en_5.4.0_3.0_1718061576401.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v6_pipeline_en_5.4.0_3.0_1718061576401.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_securiti_dataset_1_v6_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_securiti_dataset_1_v6_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_securiti_dataset_1_v6_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|376.6 MB| + +## References + +https://huggingface.co/MugheesAwan11/bge-base-securiti-dataset-1-v6 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_large_0846_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_large_0846_en.md new file mode 100644 index 00000000000000..014163b572c2fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_large_0846_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_large_0846 BGEEmbeddings from joshus +author: John Snow Labs +name: bge_large_0846 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_0846` is a English model originally trained by joshus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_0846_en_5.4.0_3.0_1718063123051.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_0846_en_5.4.0_3.0_1718063123051.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_large_0846","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_large_0846","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_0846| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/joshus/bge_large_0846 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_large_0846_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_large_0846_pipeline_en.md new file mode 100644 index 00000000000000..f77370ec97bb1d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_large_0846_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_large_0846_pipeline pipeline BGEEmbeddings from joshus +author: John Snow Labs +name: bge_large_0846_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_0846_pipeline` is a English model originally trained by joshus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_0846_pipeline_en_5.4.0_3.0_1718063214529.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_0846_pipeline_en_5.4.0_3.0_1718063214529.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_large_0846_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_large_0846_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_0846_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/joshus/bge_large_0846 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_large_english_v1_5_semicon_ym_0122_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_large_english_v1_5_semicon_ym_0122_en.md new file mode 100644 index 00000000000000..9c8c85ce5c4eb1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_large_english_v1_5_semicon_ym_0122_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_large_english_v1_5_semicon_ym_0122 BGEEmbeddings from Niraya666 +author: John Snow Labs +name: bge_large_english_v1_5_semicon_ym_0122 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_english_v1_5_semicon_ym_0122` is a English model originally trained by Niraya666. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_english_v1_5_semicon_ym_0122_en_5.4.0_3.0_1718063790188.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_english_v1_5_semicon_ym_0122_en_5.4.0_3.0_1718063790188.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_large_english_v1_5_semicon_ym_0122","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_large_english_v1_5_semicon_ym_0122","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_english_v1_5_semicon_ym_0122| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Niraya666/bge-large-en-v1.5-semicon-ym-0122 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_large_english_v1_5_semicon_ym_0122_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_large_english_v1_5_semicon_ym_0122_pipeline_en.md new file mode 100644 index 00000000000000..da0ffa58114412 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_large_english_v1_5_semicon_ym_0122_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_large_english_v1_5_semicon_ym_0122_pipeline pipeline BGEEmbeddings from Niraya666 +author: John Snow Labs +name: bge_large_english_v1_5_semicon_ym_0122_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_english_v1_5_semicon_ym_0122_pipeline` is a English model originally trained by Niraya666. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_english_v1_5_semicon_ym_0122_pipeline_en_5.4.0_3.0_1718063908042.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_english_v1_5_semicon_ym_0122_pipeline_en_5.4.0_3.0_1718063908042.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_large_english_v1_5_semicon_ym_0122_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_large_english_v1_5_semicon_ym_0122_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_english_v1_5_semicon_ym_0122_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Niraya666/bge-large-en-v1.5-semicon-ym-0122 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_large_fine_tuned_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_large_fine_tuned_en.md new file mode 100644 index 00000000000000..4c5f62f201cb3b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_large_fine_tuned_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_large_fine_tuned BGEEmbeddings from kwang123 +author: John Snow Labs +name: bge_large_fine_tuned +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_fine_tuned` is a English model originally trained by kwang123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_fine_tuned_en_5.4.0_3.0_1718060992320.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_fine_tuned_en_5.4.0_3.0_1718060992320.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_large_fine_tuned","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_large_fine_tuned","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_fine_tuned| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/kwang123/bge-large-fine-tuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_large_fine_tuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_large_fine_tuned_pipeline_en.md new file mode 100644 index 00000000000000..dfb7734b8ace56 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_large_fine_tuned_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_large_fine_tuned_pipeline pipeline BGEEmbeddings from kwang123 +author: John Snow Labs +name: bge_large_fine_tuned_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_fine_tuned_pipeline` is a English model originally trained by kwang123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_fine_tuned_pipeline_en_5.4.0_3.0_1718061087284.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_fine_tuned_pipeline_en_5.4.0_3.0_1718061087284.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_large_fine_tuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_large_fine_tuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_fine_tuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/kwang123/bge-large-fine-tuned + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_micro_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_micro_en.md new file mode 100644 index 00000000000000..1bd652e80cabd0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_micro_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_micro BGEEmbeddings from TaylorAI +author: John Snow Labs +name: bge_micro +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_micro` is a English model originally trained by TaylorAI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_micro_en_5.4.0_3.0_1718060403391.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_micro_en_5.4.0_3.0_1718060403391.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_micro","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_micro","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_micro| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|41.5 MB| + +## References + +https://huggingface.co/TaylorAI/bge-micro \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_micro_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_micro_pipeline_en.md new file mode 100644 index 00000000000000..14c90bfbcccb21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_micro_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_micro_pipeline pipeline BGEEmbeddings from TaylorAI +author: John Snow Labs +name: bge_micro_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_micro_pipeline` is a English model originally trained by TaylorAI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_micro_pipeline_en_5.4.0_3.0_1718060418880.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_micro_pipeline_en_5.4.0_3.0_1718060418880.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_micro_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_micro_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_micro_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|41.5 MB| + +## References + +https://huggingface.co/TaylorAI/bge-micro + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_micro_v2_smartcomponents_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_micro_v2_smartcomponents_en.md new file mode 100644 index 00000000000000..7df6863e4b8462 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_micro_v2_smartcomponents_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_micro_v2_smartcomponents BGEEmbeddings from SmartComponents +author: John Snow Labs +name: bge_micro_v2_smartcomponents +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_micro_v2_smartcomponents` is a English model originally trained by SmartComponents. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_micro_v2_smartcomponents_en_5.4.0_3.0_1718062026135.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_micro_v2_smartcomponents_en_5.4.0_3.0_1718062026135.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_micro_v2_smartcomponents","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_micro_v2_smartcomponents","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_micro_v2_smartcomponents| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|41.5 MB| + +## References + +https://huggingface.co/SmartComponents/bge-micro-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_micro_v2_smartcomponents_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_micro_v2_smartcomponents_pipeline_en.md new file mode 100644 index 00000000000000..ec3ea424eca66e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_micro_v2_smartcomponents_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_micro_v2_smartcomponents_pipeline pipeline BGEEmbeddings from SmartComponents +author: John Snow Labs +name: bge_micro_v2_smartcomponents_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_micro_v2_smartcomponents_pipeline` is a English model originally trained by SmartComponents. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_micro_v2_smartcomponents_pipeline_en_5.4.0_3.0_1718062041678.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_micro_v2_smartcomponents_pipeline_en_5.4.0_3.0_1718062041678.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_micro_v2_smartcomponents_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_micro_v2_smartcomponents_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_micro_v2_smartcomponents_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|41.5 MB| + +## References + +https://huggingface.co/SmartComponents/bge-micro-v2 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_small_english_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_small_english_en.md new file mode 100644 index 00000000000000..bd1c9cd5a1b5c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_small_english_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_small_english BGEEmbeddings from vectoriseai +author: John Snow Labs +name: bge_small_english +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_small_english` is a English model originally trained by vectoriseai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_small_english_en_5.4.0_3.0_1718060625255.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_small_english_en_5.4.0_3.0_1718060625255.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_small_english","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_small_english","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_small_english| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|79.9 MB| + +## References + +https://huggingface.co/vectoriseai/bge-small-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_small_english_ft_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_small_english_ft_en.md new file mode 100644 index 00000000000000..21884d2a12f108 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_small_english_ft_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_small_english_ft BGEEmbeddings from PetroGPT +author: John Snow Labs +name: bge_small_english_ft +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_small_english_ft` is a English model originally trained by PetroGPT. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_small_english_ft_en_5.4.0_3.0_1718060820706.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_small_english_ft_en_5.4.0_3.0_1718060820706.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_small_english_ft","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_small_english_ft","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_small_english_ft| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|112.9 MB| + +## References + +https://huggingface.co/PetroGPT/bge-small-en-ft \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_small_english_ft_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_small_english_ft_pipeline_en.md new file mode 100644 index 00000000000000..4b54dbe9de1555 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_small_english_ft_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_small_english_ft_pipeline pipeline BGEEmbeddings from PetroGPT +author: John Snow Labs +name: bge_small_english_ft_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_small_english_ft_pipeline` is a English model originally trained by PetroGPT. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_small_english_ft_pipeline_en_5.4.0_3.0_1718060832479.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_small_english_ft_pipeline_en_5.4.0_3.0_1718060832479.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_small_english_ft_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_small_english_ft_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_small_english_ft_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|112.9 MB| + +## References + +https://huggingface.co/PetroGPT/bge-small-en-ft + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_small_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_small_english_pipeline_en.md new file mode 100644 index 00000000000000..bc5fa48e8bf6e4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_small_english_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_small_english_pipeline pipeline BGEEmbeddings from vectoriseai +author: John Snow Labs +name: bge_small_english_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_small_english_pipeline` is a English model originally trained by vectoriseai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_small_english_pipeline_en_5.4.0_3.0_1718060654155.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_small_english_pipeline_en_5.4.0_3.0_1718060654155.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_small_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_small_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_small_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|79.9 MB| + +## References + +https://huggingface.co/vectoriseai/bge-small-en + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_small_qq_qa_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_small_qq_qa_en.md new file mode 100644 index 00000000000000..a2b9f5eaf281ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_small_qq_qa_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_small_qq_qa BGEEmbeddings from svjack +author: John Snow Labs +name: bge_small_qq_qa +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_small_qq_qa` is a English model originally trained by svjack. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_small_qq_qa_en_5.4.0_3.0_1718060259331.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_small_qq_qa_en_5.4.0_3.0_1718060259331.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_small_qq_qa","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_small_qq_qa","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_small_qq_qa| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|78.1 MB| + +## References + +https://huggingface.co/svjack/bge-small-qq-qa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_small_qq_qa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_small_qq_qa_pipeline_en.md new file mode 100644 index 00000000000000..6e1077ab573770 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_small_qq_qa_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_small_qq_qa_pipeline pipeline BGEEmbeddings from svjack +author: John Snow Labs +name: bge_small_qq_qa_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_small_qq_qa_pipeline` is a English model originally trained by svjack. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_small_qq_qa_pipeline_en_5.4.0_3.0_1718060268489.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_small_qq_qa_pipeline_en_5.4.0_3.0_1718060268489.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_small_qq_qa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_small_qq_qa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_small_qq_qa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|78.1 MB| + +## References + +https://huggingface.co/svjack/bge-small-qq-qa + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_waray_philippines_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_waray_philippines_en.md new file mode 100644 index 00000000000000..9da60cbd6bbd87 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_waray_philippines_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_waray_philippines BGEEmbeddings from YoungPanda +author: John Snow Labs +name: bge_waray_philippines +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_waray_philippines` is a English model originally trained by YoungPanda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_waray_philippines_en_5.4.0_3.0_1718063799676.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_waray_philippines_en_5.4.0_3.0_1718063799676.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_waray_philippines","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_waray_philippines","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_waray_philippines| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/YoungPanda/bge_war \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_waray_philippines_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_waray_philippines_pipeline_en.md new file mode 100644 index 00000000000000..6f2375cfc258f9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_waray_philippines_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_waray_philippines_pipeline pipeline BGEEmbeddings from YoungPanda +author: John Snow Labs +name: bge_waray_philippines_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_waray_philippines_pipeline` is a English model originally trained by YoungPanda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_waray_philippines_pipeline_en_5.4.0_3.0_1718063901462.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_waray_philippines_pipeline_en_5.4.0_3.0_1718063901462.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_waray_philippines_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_waray_philippines_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_waray_philippines_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/YoungPanda/bge_war + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-embed_bge_base_edu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-embed_bge_base_edu_pipeline_en.md new file mode 100644 index 00000000000000..73f17eb4de3755 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-embed_bge_base_edu_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English embed_bge_base_edu_pipeline pipeline BGEEmbeddings from HelixAI +author: John Snow Labs +name: embed_bge_base_edu_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`embed_bge_base_edu_pipeline` is a English model originally trained by HelixAI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/embed_bge_base_edu_pipeline_en_5.4.0_3.0_1718059904353.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/embed_bge_base_edu_pipeline_en_5.4.0_3.0_1718059904353.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("embed_bge_base_edu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("embed_bge_base_edu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|embed_bge_base_edu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|384.4 MB| + +## References + +https://huggingface.co/HelixAI/embed_bge_base_edu + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-finetuned_bge_embeddings_en.md b/docs/_posts/ahmedlone127/2024-06-10-finetuned_bge_embeddings_en.md new file mode 100644 index 00000000000000..26bb67d7f4db90 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-finetuned_bge_embeddings_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English finetuned_bge_embeddings BGEEmbeddings from austinpatrickm +author: John Snow Labs +name: finetuned_bge_embeddings +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_bge_embeddings` is a English model originally trained by austinpatrickm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_bge_embeddings_en_5.4.0_3.0_1718061952445.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_bge_embeddings_en_5.4.0_3.0_1718061952445.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("finetuned_bge_embeddings","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("finetuned_bge_embeddings","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_bge_embeddings| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|388.4 MB| + +## References + +https://huggingface.co/austinpatrickm/finetuned-bge-embeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-finetuned_bge_embeddings_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-finetuned_bge_embeddings_pipeline_en.md new file mode 100644 index 00000000000000..a8aa8f7c5e7cc4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-finetuned_bge_embeddings_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English finetuned_bge_embeddings_pipeline pipeline BGEEmbeddings from austinpatrickm +author: John Snow Labs +name: finetuned_bge_embeddings_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_bge_embeddings_pipeline` is a English model originally trained by austinpatrickm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_bge_embeddings_pipeline_en_5.4.0_3.0_1718061985354.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_bge_embeddings_pipeline_en_5.4.0_3.0_1718061985354.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_bge_embeddings_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_bge_embeddings_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_bge_embeddings_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|388.4 MB| + +## References + +https://huggingface.co/austinpatrickm/finetuned-bge-embeddings + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-minebge_en.md b/docs/_posts/ahmedlone127/2024-06-10-minebge_en.md new file mode 100644 index 00000000000000..1889f40d32b123 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-minebge_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English minebge BGEEmbeddings from arjunsama +author: John Snow Labs +name: minebge +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`minebge` is a English model originally trained by arjunsama. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/minebge_en_5.4.0_3.0_1718062753734.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/minebge_en_5.4.0_3.0_1718062753734.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("minebge","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("minebge","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|minebge| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|394.2 MB| + +## References + +https://huggingface.co/arjunsama/minebge \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-minebge_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-minebge_pipeline_en.md new file mode 100644 index 00000000000000..05d7237d66ae8c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-minebge_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English minebge_pipeline pipeline BGEEmbeddings from arjunsama +author: John Snow Labs +name: minebge_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`minebge_pipeline` is a English model originally trained by arjunsama. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/minebge_pipeline_en_5.4.0_3.0_1718062786389.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/minebge_pipeline_en_5.4.0_3.0_1718062786389.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("minebge_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("minebge_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|minebge_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|394.2 MB| + +## References + +https://huggingface.co/arjunsama/minebge + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-mps_invoice_product_baai_bge_large_english_v1_5v2_en.md b/docs/_posts/ahmedlone127/2024-06-10-mps_invoice_product_baai_bge_large_english_v1_5v2_en.md new file mode 100644 index 00000000000000..54f63dd800ed31 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-mps_invoice_product_baai_bge_large_english_v1_5v2_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English mps_invoice_product_baai_bge_large_english_v1_5v2 BGEEmbeddings from vincentpremise +author: John Snow Labs +name: mps_invoice_product_baai_bge_large_english_v1_5v2 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mps_invoice_product_baai_bge_large_english_v1_5v2` is a English model originally trained by vincentpremise. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mps_invoice_product_baai_bge_large_english_v1_5v2_en_5.4.0_3.0_1718062744479.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mps_invoice_product_baai_bge_large_english_v1_5v2_en_5.4.0_3.0_1718062744479.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("mps_invoice_product_baai_bge_large_english_v1_5v2","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("mps_invoice_product_baai_bge_large_english_v1_5v2","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mps_invoice_product_baai_bge_large_english_v1_5v2| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/vincentpremise/mps-invoice-product-BAAI_bge_large_en_v1.5v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-mps_invoice_product_baai_bge_large_english_v1_5v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-mps_invoice_product_baai_bge_large_english_v1_5v2_pipeline_en.md new file mode 100644 index 00000000000000..518df6b538396a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-mps_invoice_product_baai_bge_large_english_v1_5v2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English mps_invoice_product_baai_bge_large_english_v1_5v2_pipeline pipeline BGEEmbeddings from vincentpremise +author: John Snow Labs +name: mps_invoice_product_baai_bge_large_english_v1_5v2_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mps_invoice_product_baai_bge_large_english_v1_5v2_pipeline` is a English model originally trained by vincentpremise. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mps_invoice_product_baai_bge_large_english_v1_5v2_pipeline_en_5.4.0_3.0_1718062838559.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mps_invoice_product_baai_bge_large_english_v1_5v2_pipeline_en_5.4.0_3.0_1718062838559.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mps_invoice_product_baai_bge_large_english_v1_5v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mps_invoice_product_baai_bge_large_english_v1_5v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mps_invoice_product_baai_bge_large_english_v1_5v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/vincentpremise/mps-invoice-product-BAAI_bge_large_en_v1.5v2 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-mps_invoice_product_baai_bge_small_english_v1_5_en.md b/docs/_posts/ahmedlone127/2024-06-10-mps_invoice_product_baai_bge_small_english_v1_5_en.md new file mode 100644 index 00000000000000..03bb3cbf245d95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-mps_invoice_product_baai_bge_small_english_v1_5_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English mps_invoice_product_baai_bge_small_english_v1_5 BGEEmbeddings from vincentpremise +author: John Snow Labs +name: mps_invoice_product_baai_bge_small_english_v1_5 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mps_invoice_product_baai_bge_small_english_v1_5` is a English model originally trained by vincentpremise. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mps_invoice_product_baai_bge_small_english_v1_5_en_5.4.0_3.0_1718063691061.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mps_invoice_product_baai_bge_small_english_v1_5_en_5.4.0_3.0_1718063691061.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("mps_invoice_product_baai_bge_small_english_v1_5","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("mps_invoice_product_baai_bge_small_english_v1_5","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mps_invoice_product_baai_bge_small_english_v1_5| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|110.7 MB| + +## References + +https://huggingface.co/vincentpremise/mps-invoice-product-BAAI_bge_small_en_v1.5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-mps_invoice_product_baai_bge_small_english_v1_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-mps_invoice_product_baai_bge_small_english_v1_5_pipeline_en.md new file mode 100644 index 00000000000000..866a603eae26ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-mps_invoice_product_baai_bge_small_english_v1_5_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English mps_invoice_product_baai_bge_small_english_v1_5_pipeline pipeline BGEEmbeddings from vincentpremise +author: John Snow Labs +name: mps_invoice_product_baai_bge_small_english_v1_5_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mps_invoice_product_baai_bge_small_english_v1_5_pipeline` is a English model originally trained by vincentpremise. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mps_invoice_product_baai_bge_small_english_v1_5_pipeline_en_5.4.0_3.0_1718063703355.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mps_invoice_product_baai_bge_small_english_v1_5_pipeline_en_5.4.0_3.0_1718063703355.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mps_invoice_product_baai_bge_small_english_v1_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mps_invoice_product_baai_bge_small_english_v1_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mps_invoice_product_baai_bge_small_english_v1_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|110.7 MB| + +## References + +https://huggingface.co/vincentpremise/mps-invoice-product-BAAI_bge_small_en_v1.5 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-oosinc_bge_finetune_en.md b/docs/_posts/ahmedlone127/2024-06-10-oosinc_bge_finetune_en.md new file mode 100644 index 00000000000000..d85bb9bb2b0e97 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-oosinc_bge_finetune_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English oosinc_bge_finetune BGEEmbeddings from oosinc +author: John Snow Labs +name: oosinc_bge_finetune +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`oosinc_bge_finetune` is a English model originally trained by oosinc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/oosinc_bge_finetune_en_5.4.0_3.0_1718060779820.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/oosinc_bge_finetune_en_5.4.0_3.0_1718060779820.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("oosinc_bge_finetune","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("oosinc_bge_finetune","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|oosinc_bge_finetune| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|119.3 MB| + +## References + +https://huggingface.co/oosinc/oosinc-bge-finetune \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-oosinc_bge_finetune_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-oosinc_bge_finetune_pipeline_en.md new file mode 100644 index 00000000000000..9282f10d290d50 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-oosinc_bge_finetune_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English oosinc_bge_finetune_pipeline pipeline BGEEmbeddings from oosinc +author: John Snow Labs +name: oosinc_bge_finetune_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`oosinc_bge_finetune_pipeline` is a English model originally trained by oosinc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/oosinc_bge_finetune_pipeline_en_5.4.0_3.0_1718060789282.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/oosinc_bge_finetune_pipeline_en_5.4.0_3.0_1718060789282.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("oosinc_bge_finetune_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("oosinc_bge_finetune_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|oosinc_bge_finetune_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|119.3 MB| + +## References + +https://huggingface.co/oosinc/oosinc-bge-finetune + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-philai_bge_2e_en.md b/docs/_posts/ahmedlone127/2024-06-10-philai_bge_2e_en.md new file mode 100644 index 00000000000000..198a635e2e6401 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-philai_bge_2e_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English philai_bge_2e BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_bge_2e +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_bge_2e` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_bge_2e_en_5.4.0_3.0_1718063787326.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_bge_2e_en_5.4.0_3.0_1718063787326.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("philai_bge_2e","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("philai_bge_2e","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_bge_2e| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-bge-2e \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-philai_bge_2e_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-philai_bge_2e_pipeline_en.md new file mode 100644 index 00000000000000..8de834ed1e4ef9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-philai_bge_2e_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English philai_bge_2e_pipeline pipeline BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_bge_2e_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_bge_2e_pipeline` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_bge_2e_pipeline_en_5.4.0_3.0_1718063878798.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_bge_2e_pipeline_en_5.4.0_3.0_1718063878798.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("philai_bge_2e_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("philai_bge_2e_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_bge_2e_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-bge-2e + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-philai_bge_2e_t_en.md b/docs/_posts/ahmedlone127/2024-06-10-philai_bge_2e_t_en.md new file mode 100644 index 00000000000000..183fa4335e7c17 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-philai_bge_2e_t_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English philai_bge_2e_t BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_bge_2e_t +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_bge_2e_t` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_bge_2e_t_en_5.4.0_3.0_1718062730968.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_bge_2e_t_en_5.4.0_3.0_1718062730968.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("philai_bge_2e_t","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("philai_bge_2e_t","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_bge_2e_t| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-bge-2e-t \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-philai_bge_2e_t_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-philai_bge_2e_t_pipeline_en.md new file mode 100644 index 00000000000000..db0c00489c6d24 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-philai_bge_2e_t_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English philai_bge_2e_t_pipeline pipeline BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_bge_2e_t_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_bge_2e_t_pipeline` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_bge_2e_t_pipeline_en_5.4.0_3.0_1718062808030.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_bge_2e_t_pipeline_en_5.4.0_3.0_1718062808030.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("philai_bge_2e_t_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("philai_bge_2e_t_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_bge_2e_t_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-bge-2e-t + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-philai_bge_6e_en.md b/docs/_posts/ahmedlone127/2024-06-10-philai_bge_6e_en.md new file mode 100644 index 00000000000000..3bd9e61331eb52 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-philai_bge_6e_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English philai_bge_6e BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_bge_6e +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_bge_6e` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_bge_6e_en_5.4.0_3.0_1718061469404.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_bge_6e_en_5.4.0_3.0_1718061469404.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("philai_bge_6e","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("philai_bge_6e","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_bge_6e| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-bge-6e \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-philai_bge_6e_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-philai_bge_6e_pipeline_en.md new file mode 100644 index 00000000000000..8a6e813e61a007 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-philai_bge_6e_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English philai_bge_6e_pipeline pipeline BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_bge_6e_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_bge_6e_pipeline` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_bge_6e_pipeline_en_5.4.0_3.0_1718061548633.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_bge_6e_pipeline_en_5.4.0_3.0_1718061548633.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("philai_bge_6e_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("philai_bge_6e_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_bge_6e_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-bge-6e + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-squirtle_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-squirtle_pipeline_en.md new file mode 100644 index 00000000000000..53564e60dfa83f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-squirtle_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English squirtle_pipeline pipeline BGEEmbeddings from Mihaiii +author: John Snow Labs +name: squirtle_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`squirtle_pipeline` is a English model originally trained by Mihaiii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/squirtle_pipeline_en_5.4.0_3.0_1718059716161.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/squirtle_pipeline_en_5.4.0_3.0_1718059716161.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("squirtle_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("squirtle_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|squirtle_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|56.9 MB| + +## References + +https://huggingface.co/Mihaiii/Squirtle + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-test24_en.md b/docs/_posts/ahmedlone127/2024-06-10-test24_en.md new file mode 100644 index 00000000000000..588425b9513346 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-test24_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English test24 BGEEmbeddings from Mihaiii +author: John Snow Labs +name: test24 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test24` is a English model originally trained by Mihaiii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test24_en_5.4.0_3.0_1718061894558.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test24_en_5.4.0_3.0_1718061894558.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("test24","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("test24","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test24| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|64.3 MB| + +## References + +https://huggingface.co/Mihaiii/test24 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-test24_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-test24_pipeline_en.md new file mode 100644 index 00000000000000..bdccfb667078aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-test24_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English test24_pipeline pipeline BGEEmbeddings from Mihaiii +author: John Snow Labs +name: test24_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test24_pipeline` is a English model originally trained by Mihaiii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test24_pipeline_en_5.4.0_3.0_1718061898769.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test24_pipeline_en_5.4.0_3.0_1718061898769.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test24_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test24_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test24_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|64.3 MB| + +## References + +https://huggingface.co/Mihaiii/test24 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-test25_en.md b/docs/_posts/ahmedlone127/2024-06-10-test25_en.md new file mode 100644 index 00000000000000..8faf0422fa6486 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-test25_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English test25 BGEEmbeddings from Mihaiii +author: John Snow Labs +name: test25 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test25` is a English model originally trained by Mihaiii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test25_en_5.4.0_3.0_1718059799876.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test25_en_5.4.0_3.0_1718059799876.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("test25","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("test25","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test25| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|64.2 MB| + +## References + +https://huggingface.co/Mihaiii/test25 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-test25_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-test25_pipeline_en.md new file mode 100644 index 00000000000000..a05397f7f7b6d9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-test25_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English test25_pipeline pipeline BGEEmbeddings from Mihaiii +author: John Snow Labs +name: test25_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test25_pipeline` is a English model originally trained by Mihaiii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test25_pipeline_en_5.4.0_3.0_1718059803927.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test25_pipeline_en_5.4.0_3.0_1718059803927.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test25_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test25_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test25_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|64.2 MB| + +## References + +https://huggingface.co/Mihaiii/test25 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-testbge_en.md b/docs/_posts/ahmedlone127/2024-06-10-testbge_en.md new file mode 100644 index 00000000000000..244bf991315253 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-testbge_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English testbge BGEEmbeddings from Neokun004 +author: John Snow Labs +name: testbge +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`testbge` is a English model originally trained by Neokun004. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/testbge_en_5.4.0_3.0_1718059726765.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/testbge_en_5.4.0_3.0_1718059726765.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("testbge","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("testbge","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|testbge| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|112.3 MB| + +## References + +https://huggingface.co/Neokun004/Testbge \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-wartortle_en.md b/docs/_posts/ahmedlone127/2024-06-10-wartortle_en.md new file mode 100644 index 00000000000000..2c6c36de036979 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-wartortle_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English wartortle BGEEmbeddings from Mihaiii +author: John Snow Labs +name: wartortle +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wartortle` is a English model originally trained by Mihaiii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wartortle_en_5.4.0_3.0_1718060253302.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wartortle_en_5.4.0_3.0_1718060253302.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("wartortle","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("wartortle","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wartortle| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|63.5 MB| + +## References + +https://huggingface.co/Mihaiii/Wartortle \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-wartortle_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-wartortle_pipeline_en.md new file mode 100644 index 00000000000000..fb1d2e65a49db7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-wartortle_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English wartortle_pipeline pipeline BGEEmbeddings from Mihaiii +author: John Snow Labs +name: wartortle_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wartortle_pipeline` is a English model originally trained by Mihaiii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wartortle_pipeline_en_5.4.0_3.0_1718060257643.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wartortle_pipeline_en_5.4.0_3.0_1718060257643.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("wartortle_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("wartortle_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wartortle_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|63.5 MB| + +## References + +https://huggingface.co/Mihaiii/Wartortle + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-afriberta_base_finetuned_hausa_2e_3_en.md b/docs/_posts/ahmedlone127/2024-06-11-afriberta_base_finetuned_hausa_2e_3_en.md new file mode 100644 index 00000000000000..150227d1823670 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-afriberta_base_finetuned_hausa_2e_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English afriberta_base_finetuned_hausa_2e_3 XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afriberta_base_finetuned_hausa_2e_3 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afriberta_base_finetuned_hausa_2e_3` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afriberta_base_finetuned_hausa_2e_3_en_5.4.0_3.0_1718133900684.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afriberta_base_finetuned_hausa_2e_3_en_5.4.0_3.0_1718133900684.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afriberta_base_finetuned_hausa_2e_3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afriberta_base_finetuned_hausa_2e_3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afriberta_base_finetuned_hausa_2e_3| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|415.3 MB| + +## References + +https://huggingface.co/grace-pro/afriberta-base-finetuned-hausa-2e-3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-afriberta_base_finetuned_hausa_2e_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-afriberta_base_finetuned_hausa_2e_3_pipeline_en.md new file mode 100644 index 00000000000000..3c3ffca869378f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-afriberta_base_finetuned_hausa_2e_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English afriberta_base_finetuned_hausa_2e_3_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afriberta_base_finetuned_hausa_2e_3_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afriberta_base_finetuned_hausa_2e_3_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afriberta_base_finetuned_hausa_2e_3_pipeline_en_5.4.0_3.0_1718133928444.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afriberta_base_finetuned_hausa_2e_3_pipeline_en_5.4.0_3.0_1718133928444.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("afriberta_base_finetuned_hausa_2e_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("afriberta_base_finetuned_hausa_2e_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afriberta_base_finetuned_hausa_2e_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|415.4 MB| + +## References + +https://huggingface.co/grace-pro/afriberta-base-finetuned-hausa-2e-3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-afriberta_base_hausa_5e_5_en.md b/docs/_posts/ahmedlone127/2024-06-11-afriberta_base_hausa_5e_5_en.md new file mode 100644 index 00000000000000..2b09386204c934 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-afriberta_base_hausa_5e_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English afriberta_base_hausa_5e_5 XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afriberta_base_hausa_5e_5 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afriberta_base_hausa_5e_5` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afriberta_base_hausa_5e_5_en_5.4.0_3.0_1718131983889.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afriberta_base_hausa_5e_5_en_5.4.0_3.0_1718131983889.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afriberta_base_hausa_5e_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afriberta_base_hausa_5e_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afriberta_base_hausa_5e_5| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|415.2 MB| + +## References + +https://huggingface.co/grace-pro/afriberta-base-hausa-5e-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-afriberta_base_hausa_5e_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-afriberta_base_hausa_5e_5_pipeline_en.md new file mode 100644 index 00000000000000..e9fc651caac2c3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-afriberta_base_hausa_5e_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English afriberta_base_hausa_5e_5_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afriberta_base_hausa_5e_5_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afriberta_base_hausa_5e_5_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afriberta_base_hausa_5e_5_pipeline_en_5.4.0_3.0_1718132010249.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afriberta_base_hausa_5e_5_pipeline_en_5.4.0_3.0_1718132010249.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("afriberta_base_hausa_5e_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("afriberta_base_hausa_5e_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afriberta_base_hausa_5e_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|415.3 MB| + +## References + +https://huggingface.co/grace-pro/afriberta-base-hausa-5e-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-afriberta_large_finetuned_hausa_2e_4_en.md b/docs/_posts/ahmedlone127/2024-06-11-afriberta_large_finetuned_hausa_2e_4_en.md new file mode 100644 index 00000000000000..6014a859cb5e7d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-afriberta_large_finetuned_hausa_2e_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English afriberta_large_finetuned_hausa_2e_4 XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afriberta_large_finetuned_hausa_2e_4 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afriberta_large_finetuned_hausa_2e_4` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afriberta_large_finetuned_hausa_2e_4_en_5.4.0_3.0_1718130334818.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afriberta_large_finetuned_hausa_2e_4_en_5.4.0_3.0_1718130334818.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afriberta_large_finetuned_hausa_2e_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afriberta_large_finetuned_hausa_2e_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afriberta_large_finetuned_hausa_2e_4| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/grace-pro/afriberta-large-finetuned-hausa-2e-4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-afriberta_large_finetuned_hausa_2e_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-afriberta_large_finetuned_hausa_2e_4_pipeline_en.md new file mode 100644 index 00000000000000..ae5541862f4711 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-afriberta_large_finetuned_hausa_2e_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English afriberta_large_finetuned_hausa_2e_4_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afriberta_large_finetuned_hausa_2e_4_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afriberta_large_finetuned_hausa_2e_4_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afriberta_large_finetuned_hausa_2e_4_pipeline_en_5.4.0_3.0_1718130365197.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afriberta_large_finetuned_hausa_2e_4_pipeline_en_5.4.0_3.0_1718130365197.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("afriberta_large_finetuned_hausa_2e_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("afriberta_large_finetuned_hausa_2e_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afriberta_large_finetuned_hausa_2e_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/grace-pro/afriberta-large-finetuned-hausa-2e-4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-afriberta_large_hausa_5e_5_en.md b/docs/_posts/ahmedlone127/2024-06-11-afriberta_large_hausa_5e_5_en.md new file mode 100644 index 00000000000000..12a93c143e95de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-afriberta_large_hausa_5e_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English afriberta_large_hausa_5e_5 XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afriberta_large_hausa_5e_5 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afriberta_large_hausa_5e_5` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afriberta_large_hausa_5e_5_en_5.4.0_3.0_1718130022818.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afriberta_large_hausa_5e_5_en_5.4.0_3.0_1718130022818.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afriberta_large_hausa_5e_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afriberta_large_hausa_5e_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afriberta_large_hausa_5e_5| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/grace-pro/afriberta-large-hausa-5e-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-afriberta_large_hausa_5e_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-afriberta_large_hausa_5e_5_pipeline_en.md new file mode 100644 index 00000000000000..9ae7f4eb760aa8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-afriberta_large_hausa_5e_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English afriberta_large_hausa_5e_5_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afriberta_large_hausa_5e_5_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afriberta_large_hausa_5e_5_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afriberta_large_hausa_5e_5_pipeline_en_5.4.0_3.0_1718130052562.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afriberta_large_hausa_5e_5_pipeline_en_5.4.0_3.0_1718130052562.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("afriberta_large_hausa_5e_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("afriberta_large_hausa_5e_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afriberta_large_hausa_5e_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/grace-pro/afriberta-large-hausa-5e-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_base_hausa_5e_5_en.md b/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_base_hausa_5e_5_en.md new file mode 100644 index 00000000000000..fc381e68be2362 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_base_hausa_5e_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English afro_xlmr_base_hausa_5e_5 XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afro_xlmr_base_hausa_5e_5 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afro_xlmr_base_hausa_5e_5` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afro_xlmr_base_hausa_5e_5_en_5.4.0_3.0_1718137940018.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afro_xlmr_base_hausa_5e_5_en_5.4.0_3.0_1718137940018.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afro_xlmr_base_hausa_5e_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afro_xlmr_base_hausa_5e_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afro_xlmr_base_hausa_5e_5| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/afro-xlmr-base-hausa-5e-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_base_hausa_5e_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_base_hausa_5e_5_pipeline_en.md new file mode 100644 index 00000000000000..3ffff0e2000e8a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_base_hausa_5e_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English afro_xlmr_base_hausa_5e_5_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afro_xlmr_base_hausa_5e_5_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afro_xlmr_base_hausa_5e_5_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afro_xlmr_base_hausa_5e_5_pipeline_en_5.4.0_3.0_1718138005485.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afro_xlmr_base_hausa_5e_5_pipeline_en_5.4.0_3.0_1718138005485.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("afro_xlmr_base_hausa_5e_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("afro_xlmr_base_hausa_5e_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afro_xlmr_base_hausa_5e_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/afro-xlmr-base-hausa-5e-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_mini_finetuned_hausa_2e_4_en.md b/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_mini_finetuned_hausa_2e_4_en.md new file mode 100644 index 00000000000000..c6dae8975eeff0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_mini_finetuned_hausa_2e_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English afro_xlmr_mini_finetuned_hausa_2e_4 XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afro_xlmr_mini_finetuned_hausa_2e_4 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afro_xlmr_mini_finetuned_hausa_2e_4` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afro_xlmr_mini_finetuned_hausa_2e_4_en_5.4.0_3.0_1718138748756.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afro_xlmr_mini_finetuned_hausa_2e_4_en_5.4.0_3.0_1718138748756.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afro_xlmr_mini_finetuned_hausa_2e_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afro_xlmr_mini_finetuned_hausa_2e_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afro_xlmr_mini_finetuned_hausa_2e_4| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|443.1 MB| + +## References + +https://huggingface.co/grace-pro/afro-xlmr-mini-finetuned-hausa-2e-4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_mini_finetuned_hausa_2e_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_mini_finetuned_hausa_2e_4_pipeline_en.md new file mode 100644 index 00000000000000..b1090fd40fe20c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_mini_finetuned_hausa_2e_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English afro_xlmr_mini_finetuned_hausa_2e_4_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afro_xlmr_mini_finetuned_hausa_2e_4_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afro_xlmr_mini_finetuned_hausa_2e_4_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afro_xlmr_mini_finetuned_hausa_2e_4_pipeline_en_5.4.0_3.0_1718138776673.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afro_xlmr_mini_finetuned_hausa_2e_4_pipeline_en_5.4.0_3.0_1718138776673.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("afro_xlmr_mini_finetuned_hausa_2e_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("afro_xlmr_mini_finetuned_hausa_2e_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afro_xlmr_mini_finetuned_hausa_2e_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|443.2 MB| + +## References + +https://huggingface.co/grace-pro/afro-xlmr-mini-finetuned-hausa-2e-4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_mini_finetuned_igbo_en.md b/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_mini_finetuned_igbo_en.md new file mode 100644 index 00000000000000..ba163430a87271 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_mini_finetuned_igbo_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English afro_xlmr_mini_finetuned_igbo XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afro_xlmr_mini_finetuned_igbo +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afro_xlmr_mini_finetuned_igbo` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afro_xlmr_mini_finetuned_igbo_en_5.4.0_3.0_1718139553148.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afro_xlmr_mini_finetuned_igbo_en_5.4.0_3.0_1718139553148.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afro_xlmr_mini_finetuned_igbo","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afro_xlmr_mini_finetuned_igbo", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afro_xlmr_mini_finetuned_igbo| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|443.1 MB| + +## References + +https://huggingface.co/grace-pro/afro-xlmr-mini-finetuned-igbo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_mini_finetuned_igbo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_mini_finetuned_igbo_pipeline_en.md new file mode 100644 index 00000000000000..73d7e223e5517d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_mini_finetuned_igbo_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English afro_xlmr_mini_finetuned_igbo_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afro_xlmr_mini_finetuned_igbo_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afro_xlmr_mini_finetuned_igbo_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afro_xlmr_mini_finetuned_igbo_pipeline_en_5.4.0_3.0_1718139581600.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afro_xlmr_mini_finetuned_igbo_pipeline_en_5.4.0_3.0_1718139581600.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("afro_xlmr_mini_finetuned_igbo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("afro_xlmr_mini_finetuned_igbo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afro_xlmr_mini_finetuned_igbo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|443.1 MB| + +## References + +https://huggingface.co/grace-pro/afro-xlmr-mini-finetuned-igbo + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-aligned_source_5e_5_en.md b/docs/_posts/ahmedlone127/2024-06-11-aligned_source_5e_5_en.md new file mode 100644 index 00000000000000..3025f5630262a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-aligned_source_5e_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English aligned_source_5e_5 XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: aligned_source_5e_5 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`aligned_source_5e_5` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/aligned_source_5e_5_en_5.4.0_3.0_1718138757574.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/aligned_source_5e_5_en_5.4.0_3.0_1718138757574.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("aligned_source_5e_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("aligned_source_5e_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|aligned_source_5e_5| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/aligned_source_5e-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-aligned_source_5e_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-aligned_source_5e_5_pipeline_en.md new file mode 100644 index 00000000000000..76ad28119a6e42 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-aligned_source_5e_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English aligned_source_5e_5_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: aligned_source_5e_5_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`aligned_source_5e_5_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/aligned_source_5e_5_pipeline_en_5.4.0_3.0_1718138840466.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/aligned_source_5e_5_pipeline_en_5.4.0_3.0_1718138840466.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("aligned_source_5e_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("aligned_source_5e_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|aligned_source_5e_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/aligned_source_5e-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-angela_diacritics_shuffle_eval_en.md b/docs/_posts/ahmedlone127/2024-06-11-angela_diacritics_shuffle_eval_en.md new file mode 100644 index 00000000000000..187b8b394ca86d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-angela_diacritics_shuffle_eval_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English angela_diacritics_shuffle_eval XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_diacritics_shuffle_eval +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_diacritics_shuffle_eval` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_diacritics_shuffle_eval_en_5.4.0_3.0_1718139666989.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_diacritics_shuffle_eval_en_5.4.0_3.0_1718139666989.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_diacritics_shuffle_eval","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_diacritics_shuffle_eval", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_diacritics_shuffle_eval| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_diacritics_shuffle_eval \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-angela_diacritics_shuffle_eval_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-angela_diacritics_shuffle_eval_pipeline_en.md new file mode 100644 index 00000000000000..d349d9b7db2943 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-angela_diacritics_shuffle_eval_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English angela_diacritics_shuffle_eval_pipeline pipeline XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_diacritics_shuffle_eval_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_diacritics_shuffle_eval_pipeline` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_diacritics_shuffle_eval_pipeline_en_5.4.0_3.0_1718139732623.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_diacritics_shuffle_eval_pipeline_en_5.4.0_3.0_1718139732623.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("angela_diacritics_shuffle_eval_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("angela_diacritics_shuffle_eval_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_diacritics_shuffle_eval_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_diacritics_shuffle_eval + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-angela_punc_untranslated_eval_en.md b/docs/_posts/ahmedlone127/2024-06-11-angela_punc_untranslated_eval_en.md new file mode 100644 index 00000000000000..87d99314e563e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-angela_punc_untranslated_eval_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English angela_punc_untranslated_eval XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_punc_untranslated_eval +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_punc_untranslated_eval` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_punc_untranslated_eval_en_5.4.0_3.0_1718130085204.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_punc_untranslated_eval_en_5.4.0_3.0_1718130085204.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_punc_untranslated_eval","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_punc_untranslated_eval", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_punc_untranslated_eval| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_punc_untranslated_eval \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-angela_punc_untranslated_eval_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-angela_punc_untranslated_eval_pipeline_en.md new file mode 100644 index 00000000000000..f814b01a7bceba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-angela_punc_untranslated_eval_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English angela_punc_untranslated_eval_pipeline pipeline XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_punc_untranslated_eval_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_punc_untranslated_eval_pipeline` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_punc_untranslated_eval_pipeline_en_5.4.0_3.0_1718130151231.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_punc_untranslated_eval_pipeline_en_5.4.0_3.0_1718130151231.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("angela_punc_untranslated_eval_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("angela_punc_untranslated_eval_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_punc_untranslated_eval_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_punc_untranslated_eval + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-angela_punctuation_test_en.md b/docs/_posts/ahmedlone127/2024-06-11-angela_punctuation_test_en.md new file mode 100644 index 00000000000000..7359539fcd1706 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-angela_punctuation_test_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English angela_punctuation_test XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_punctuation_test +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_punctuation_test` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_punctuation_test_en_5.4.0_3.0_1718139132372.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_punctuation_test_en_5.4.0_3.0_1718139132372.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_punctuation_test","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_punctuation_test", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_punctuation_test| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_punctuation_test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-angela_punctuation_test_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-angela_punctuation_test_pipeline_en.md new file mode 100644 index 00000000000000..f4911c1dfe47d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-angela_punctuation_test_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English angela_punctuation_test_pipeline pipeline XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_punctuation_test_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_punctuation_test_pipeline` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_punctuation_test_pipeline_en_5.4.0_3.0_1718139197594.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_punctuation_test_pipeline_en_5.4.0_3.0_1718139197594.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("angela_punctuation_test_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("angela_punctuation_test_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_punctuation_test_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_punctuation_test + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-angela_shuffle_diacritics_eval_en.md b/docs/_posts/ahmedlone127/2024-06-11-angela_shuffle_diacritics_eval_en.md new file mode 100644 index 00000000000000..83ba08542d2bbd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-angela_shuffle_diacritics_eval_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English angela_shuffle_diacritics_eval XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_shuffle_diacritics_eval +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_shuffle_diacritics_eval` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_shuffle_diacritics_eval_en_5.4.0_3.0_1718138759438.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_shuffle_diacritics_eval_en_5.4.0_3.0_1718138759438.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_shuffle_diacritics_eval","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_shuffle_diacritics_eval", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_shuffle_diacritics_eval| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_shuffle_diacritics_eval \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-angela_shuffle_diacritics_eval_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-angela_shuffle_diacritics_eval_pipeline_en.md new file mode 100644 index 00000000000000..3b87176ad0b099 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-angela_shuffle_diacritics_eval_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English angela_shuffle_diacritics_eval_pipeline pipeline XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_shuffle_diacritics_eval_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_shuffle_diacritics_eval_pipeline` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_shuffle_diacritics_eval_pipeline_en_5.4.0_3.0_1718138838117.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_shuffle_diacritics_eval_pipeline_en_5.4.0_3.0_1718138838117.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("angela_shuffle_diacritics_eval_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("angela_shuffle_diacritics_eval_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_shuffle_diacritics_eval_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_shuffle_diacritics_eval + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-angela_shuffle_punc_eval_en.md b/docs/_posts/ahmedlone127/2024-06-11-angela_shuffle_punc_eval_en.md new file mode 100644 index 00000000000000..099bbaae33bbc2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-angela_shuffle_punc_eval_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English angela_shuffle_punc_eval XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_shuffle_punc_eval +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_shuffle_punc_eval` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_shuffle_punc_eval_en_5.4.0_3.0_1718138230713.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_shuffle_punc_eval_en_5.4.0_3.0_1718138230713.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_shuffle_punc_eval","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_shuffle_punc_eval", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_shuffle_punc_eval| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_shuffle_punc_eval \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-angela_shuffle_punc_eval_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-angela_shuffle_punc_eval_pipeline_en.md new file mode 100644 index 00000000000000..94e322fef6cac9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-angela_shuffle_punc_eval_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English angela_shuffle_punc_eval_pipeline pipeline XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_shuffle_punc_eval_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_shuffle_punc_eval_pipeline` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_shuffle_punc_eval_pipeline_en_5.4.0_3.0_1718138295997.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_shuffle_punc_eval_pipeline_en_5.4.0_3.0_1718138295997.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("angela_shuffle_punc_eval_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("angela_shuffle_punc_eval_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_shuffle_punc_eval_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_shuffle_punc_eval + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-angela_untranslated_entities_regular_eval_en.md b/docs/_posts/ahmedlone127/2024-06-11-angela_untranslated_entities_regular_eval_en.md new file mode 100644 index 00000000000000..e53e701628e301 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-angela_untranslated_entities_regular_eval_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English angela_untranslated_entities_regular_eval XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_untranslated_entities_regular_eval +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_untranslated_entities_regular_eval` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_untranslated_entities_regular_eval_en_5.4.0_3.0_1718139714365.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_untranslated_entities_regular_eval_en_5.4.0_3.0_1718139714365.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_untranslated_entities_regular_eval","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_untranslated_entities_regular_eval", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_untranslated_entities_regular_eval| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_untranslated_entities_regular_eval \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-angela_untranslated_entities_regular_eval_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-angela_untranslated_entities_regular_eval_pipeline_en.md new file mode 100644 index 00000000000000..5e2fc7cc286dad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-angela_untranslated_entities_regular_eval_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English angela_untranslated_entities_regular_eval_pipeline pipeline XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_untranslated_entities_regular_eval_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_untranslated_entities_regular_eval_pipeline` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_untranslated_entities_regular_eval_pipeline_en_5.4.0_3.0_1718139780015.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_untranslated_entities_regular_eval_pipeline_en_5.4.0_3.0_1718139780015.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("angela_untranslated_entities_regular_eval_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("angela_untranslated_entities_regular_eval_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_untranslated_entities_regular_eval_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_untranslated_entities_regular_eval + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-angela_untranslated_shuffle_eval_en.md b/docs/_posts/ahmedlone127/2024-06-11-angela_untranslated_shuffle_eval_en.md new file mode 100644 index 00000000000000..b573ffa3ca9ecd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-angela_untranslated_shuffle_eval_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English angela_untranslated_shuffle_eval XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_untranslated_shuffle_eval +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_untranslated_shuffle_eval` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_untranslated_shuffle_eval_en_5.4.0_3.0_1718137796639.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_untranslated_shuffle_eval_en_5.4.0_3.0_1718137796639.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_untranslated_shuffle_eval","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_untranslated_shuffle_eval", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_untranslated_shuffle_eval| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_untranslated_shuffle_eval \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-angela_untranslated_shuffle_eval_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-angela_untranslated_shuffle_eval_pipeline_en.md new file mode 100644 index 00000000000000..6f89dfebd5b2fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-angela_untranslated_shuffle_eval_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English angela_untranslated_shuffle_eval_pipeline pipeline XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_untranslated_shuffle_eval_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_untranslated_shuffle_eval_pipeline` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_untranslated_shuffle_eval_pipeline_en_5.4.0_3.0_1718137873209.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_untranslated_shuffle_eval_pipeline_en_5.4.0_3.0_1718137873209.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("angela_untranslated_shuffle_eval_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("angela_untranslated_shuffle_eval_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_untranslated_shuffle_eval_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_untranslated_shuffle_eval + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-arabnizer_xlmr_panx_arabic_ar.md b/docs/_posts/ahmedlone127/2024-06-11-arabnizer_xlmr_panx_arabic_ar.md new file mode 100644 index 00000000000000..593170b6f94e0b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-arabnizer_xlmr_panx_arabic_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic arabnizer_xlmr_panx_arabic XlmRoBertaForTokenClassification from mohammedaly22 +author: John Snow Labs +name: arabnizer_xlmr_panx_arabic +date: 2024-06-11 +tags: [ar, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: ar +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`arabnizer_xlmr_panx_arabic` is a Arabic model originally trained by mohammedaly22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/arabnizer_xlmr_panx_arabic_ar_5.4.0_3.0_1718131128743.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/arabnizer_xlmr_panx_arabic_ar_5.4.0_3.0_1718131128743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("arabnizer_xlmr_panx_arabic","ar") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("arabnizer_xlmr_panx_arabic", "ar") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|arabnizer_xlmr_panx_arabic| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|ar| +|Size:|831.3 MB| + +## References + +https://huggingface.co/mohammedaly22/arabnizer-xlmr-panx-ar \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-arabnizer_xlmr_panx_arabic_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-06-11-arabnizer_xlmr_panx_arabic_pipeline_ar.md new file mode 100644 index 00000000000000..fbcb17ed1594e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-arabnizer_xlmr_panx_arabic_pipeline_ar.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Arabic arabnizer_xlmr_panx_arabic_pipeline pipeline XlmRoBertaForTokenClassification from mohammedaly22 +author: John Snow Labs +name: arabnizer_xlmr_panx_arabic_pipeline +date: 2024-06-11 +tags: [ar, open_source, pipeline, onnx] +task: Named Entity Recognition +language: ar +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`arabnizer_xlmr_panx_arabic_pipeline` is a Arabic model originally trained by mohammedaly22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/arabnizer_xlmr_panx_arabic_pipeline_ar_5.4.0_3.0_1718131237090.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/arabnizer_xlmr_panx_arabic_pipeline_ar_5.4.0_3.0_1718131237090.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("arabnizer_xlmr_panx_arabic_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("arabnizer_xlmr_panx_arabic_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|arabnizer_xlmr_panx_arabic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|831.3 MB| + +## References + +https://huggingface.co/mohammedaly22/arabnizer-xlmr-panx-ar + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-baai__bge_small_english_v1_5__mozart_fine_tuned_10_en.md b/docs/_posts/ahmedlone127/2024-06-11-baai__bge_small_english_v1_5__mozart_fine_tuned_10_en.md new file mode 100644 index 00000000000000..0b92d00a07d491 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-baai__bge_small_english_v1_5__mozart_fine_tuned_10_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English baai__bge_small_english_v1_5__mozart_fine_tuned_10 BGEEmbeddings from mozart-ai +author: John Snow Labs +name: baai__bge_small_english_v1_5__mozart_fine_tuned_10 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`baai__bge_small_english_v1_5__mozart_fine_tuned_10` is a English model originally trained by mozart-ai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/baai__bge_small_english_v1_5__mozart_fine_tuned_10_en_5.4.0_3.0_1718069220842.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/baai__bge_small_english_v1_5__mozart_fine_tuned_10_en_5.4.0_3.0_1718069220842.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("baai__bge_small_english_v1_5__mozart_fine_tuned_10","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("baai__bge_small_english_v1_5__mozart_fine_tuned_10","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|baai__bge_small_english_v1_5__mozart_fine_tuned_10| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|115.3 MB| + +## References + +https://huggingface.co/mozart-ai/BAAI__bge-small-en-v1.5__Mozart_Fine_Tuned-10 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-baai__bge_small_english_v1_5__mozart_fine_tuned_10_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-baai__bge_small_english_v1_5__mozart_fine_tuned_10_pipeline_en.md new file mode 100644 index 00000000000000..e3b2ebd3d546c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-baai__bge_small_english_v1_5__mozart_fine_tuned_10_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English baai__bge_small_english_v1_5__mozart_fine_tuned_10_pipeline pipeline BGEEmbeddings from mozart-ai +author: John Snow Labs +name: baai__bge_small_english_v1_5__mozart_fine_tuned_10_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`baai__bge_small_english_v1_5__mozart_fine_tuned_10_pipeline` is a English model originally trained by mozart-ai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/baai__bge_small_english_v1_5__mozart_fine_tuned_10_pipeline_en_5.4.0_3.0_1718069231479.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/baai__bge_small_english_v1_5__mozart_fine_tuned_10_pipeline_en_5.4.0_3.0_1718069231479.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("baai__bge_small_english_v1_5__mozart_fine_tuned_10_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("baai__bge_small_english_v1_5__mozart_fine_tuned_10_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|baai__bge_small_english_v1_5__mozart_fine_tuned_10_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|115.3 MB| + +## References + +https://huggingface.co/mozart-ai/BAAI__bge-small-en-v1.5__Mozart_Fine_Tuned-10 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-base_finetuned_frombge_en.md b/docs/_posts/ahmedlone127/2024-06-11-base_finetuned_frombge_en.md new file mode 100644 index 00000000000000..406a30aada41cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-base_finetuned_frombge_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English base_finetuned_frombge BGEEmbeddings from joshus +author: John Snow Labs +name: base_finetuned_frombge +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_finetuned_frombge` is a English model originally trained by joshus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_finetuned_frombge_en_5.4.0_3.0_1718065219693.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_finetuned_frombge_en_5.4.0_3.0_1718065219693.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("base_finetuned_frombge","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("base_finetuned_frombge","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_finetuned_frombge| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|382.6 MB| + +## References + +https://huggingface.co/joshus/base-finetuned-frombge \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-base_finetuned_frombge_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-base_finetuned_frombge_pipeline_en.md new file mode 100644 index 00000000000000..e40b2e28c9d383 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-base_finetuned_frombge_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English base_finetuned_frombge_pipeline pipeline BGEEmbeddings from joshus +author: John Snow Labs +name: base_finetuned_frombge_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_finetuned_frombge_pipeline` is a English model originally trained by joshus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_finetuned_frombge_pipeline_en_5.4.0_3.0_1718065257507.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_finetuned_frombge_pipeline_en_5.4.0_3.0_1718065257507.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("base_finetuned_frombge_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("base_finetuned_frombge_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_finetuned_frombge_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|382.6 MB| + +## References + +https://huggingface.co/joshus/base-finetuned-frombge + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_0803_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_0803_en.md new file mode 100644 index 00000000000000..0783809e83c42b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_0803_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_0803 BGEEmbeddings from joshus +author: John Snow Labs +name: bge_base_0803 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_0803` is a English model originally trained by joshus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_0803_en_5.4.0_3.0_1718065572021.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_0803_en_5.4.0_3.0_1718065572021.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_0803","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_0803","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_0803| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|382.6 MB| + +## References + +https://huggingface.co/joshus/bge-base-0803 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_0803_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_0803_pipeline_en.md new file mode 100644 index 00000000000000..181080a01df278 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_0803_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_0803_pipeline pipeline BGEEmbeddings from joshus +author: John Snow Labs +name: bge_base_0803_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_0803_pipeline` is a English model originally trained by joshus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_0803_pipeline_en_5.4.0_3.0_1718065612145.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_0803_pipeline_en_5.4.0_3.0_1718065612145.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_0803_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_0803_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_0803_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|382.7 MB| + +## References + +https://huggingface.co/joshus/bge-base-0803 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_argilla_sdk_matryoshka_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_argilla_sdk_matryoshka_en.md new file mode 100644 index 00000000000000..2606766a049f23 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_argilla_sdk_matryoshka_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_argilla_sdk_matryoshka BGEEmbeddings from plaguss +author: John Snow Labs +name: bge_base_argilla_sdk_matryoshka +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_argilla_sdk_matryoshka` is a English model originally trained by plaguss. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_argilla_sdk_matryoshka_en_5.4.0_3.0_1718070324668.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_argilla_sdk_matryoshka_en_5.4.0_3.0_1718070324668.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_argilla_sdk_matryoshka","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_argilla_sdk_matryoshka","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_argilla_sdk_matryoshka| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|377.9 MB| + +## References + +https://huggingface.co/plaguss/bge-base-argilla-sdk-matryoshka \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_argilla_sdk_matryoshka_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_argilla_sdk_matryoshka_pipeline_en.md new file mode 100644 index 00000000000000..9c9d9ae39ebcca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_argilla_sdk_matryoshka_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_argilla_sdk_matryoshka_pipeline pipeline BGEEmbeddings from plaguss +author: John Snow Labs +name: bge_base_argilla_sdk_matryoshka_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_argilla_sdk_matryoshka_pipeline` is a English model originally trained by plaguss. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_argilla_sdk_matryoshka_pipeline_en_5.4.0_3.0_1718070363620.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_argilla_sdk_matryoshka_pipeline_en_5.4.0_3.0_1718070363620.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_argilla_sdk_matryoshka_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_argilla_sdk_matryoshka_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_argilla_sdk_matryoshka_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|377.9 MB| + +## References + +https://huggingface.co/plaguss/bge-base-argilla-sdk-matryoshka + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_en.md new file mode 100644 index 00000000000000..98d476f7a9f93c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_english BGEEmbeddings from Narsil +author: John Snow Labs +name: bge_base_english +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english` is a English model originally trained by Narsil. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_en_5.4.0_3.0_1718067511636.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_en_5.4.0_3.0_1718067511636.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_english","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_english","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|259.0 MB| + +## References + +https://huggingface.co/Narsil/bge-base-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_pipeline_en.md new file mode 100644 index 00000000000000..5150d05a9a5827 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_english_pipeline pipeline BGEEmbeddings from Narsil +author: John Snow Labs +name: bge_base_english_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_pipeline` is a English model originally trained by Narsil. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_pipeline_en_5.4.0_3.0_1718067609835.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_pipeline_en_5.4.0_3.0_1718067609835.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|259.0 MB| + +## References + +https://huggingface.co/Narsil/bge-base-en + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_finetuned_300_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_finetuned_300_en.md new file mode 100644 index 00000000000000..d30a28f802dfbb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_finetuned_300_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_english_v1_5_finetuned_300 BGEEmbeddings from ramnathv +author: John Snow Labs +name: bge_base_english_v1_5_finetuned_300 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_finetuned_300` is a English model originally trained by ramnathv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_finetuned_300_en_5.4.0_3.0_1718064744487.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_finetuned_300_en_5.4.0_3.0_1718064744487.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_finetuned_300","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_finetuned_300","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_finetuned_300| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|384.8 MB| + +## References + +https://huggingface.co/ramnathv/bge-base-en-v1.5-finetuned-300 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_finetuned_300_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_finetuned_300_pipeline_en.md new file mode 100644 index 00000000000000..08638750898f10 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_finetuned_300_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_english_v1_5_finetuned_300_pipeline pipeline BGEEmbeddings from ramnathv +author: John Snow Labs +name: bge_base_english_v1_5_finetuned_300_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_finetuned_300_pipeline` is a English model originally trained by ramnathv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_finetuned_300_pipeline_en_5.4.0_3.0_1718064776631.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_finetuned_300_pipeline_en_5.4.0_3.0_1718064776631.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_english_v1_5_finetuned_300_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_english_v1_5_finetuned_300_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_finetuned_300_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|384.8 MB| + +## References + +https://huggingface.co/ramnathv/bge-base-en-v1.5-finetuned-300 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_0_3_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_0_3_en.md new file mode 100644 index 00000000000000..53f523107806c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_0_3_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_english_v1_5_ft_quora_0_3 BGEEmbeddings from 567-labs +author: John Snow Labs +name: bge_base_english_v1_5_ft_quora_0_3 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_ft_quora_0_3` is a English model originally trained by 567-labs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_3_en_5.4.0_3.0_1718065501803.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_3_en_5.4.0_3.0_1718065501803.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_ft_quora_0_3","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_ft_quora_0_3","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_ft_quora_0_3| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|396.1 MB| + +## References + +https://huggingface.co/567-labs/bge-base-en-v1.5-ft-quora-0.3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_0_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_0_3_pipeline_en.md new file mode 100644 index 00000000000000..b70cc7fb5614eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_0_3_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_english_v1_5_ft_quora_0_3_pipeline pipeline BGEEmbeddings from 567-labs +author: John Snow Labs +name: bge_base_english_v1_5_ft_quora_0_3_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_ft_quora_0_3_pipeline` is a English model originally trained by 567-labs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_3_pipeline_en_5.4.0_3.0_1718065530886.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_3_pipeline_en_5.4.0_3.0_1718065530886.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_english_v1_5_ft_quora_0_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_english_v1_5_ft_quora_0_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_ft_quora_0_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|396.1 MB| + +## References + +https://huggingface.co/567-labs/bge-base-en-v1.5-ft-quora-0.3 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_0_5_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_0_5_en.md new file mode 100644 index 00000000000000..c436bf742247b7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_0_5_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_english_v1_5_ft_quora_0_5 BGEEmbeddings from 567-labs +author: John Snow Labs +name: bge_base_english_v1_5_ft_quora_0_5 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_ft_quora_0_5` is a English model originally trained by 567-labs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_5_en_5.4.0_3.0_1718067747603.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_5_en_5.4.0_3.0_1718067747603.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_ft_quora_0_5","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_ft_quora_0_5","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_ft_quora_0_5| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|398.0 MB| + +## References + +https://huggingface.co/567-labs/bge-base-en-v1.5-ft-quora-0.5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_0_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_0_5_pipeline_en.md new file mode 100644 index 00000000000000..6239d621717779 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_0_5_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_english_v1_5_ft_quora_0_5_pipeline pipeline BGEEmbeddings from 567-labs +author: John Snow Labs +name: bge_base_english_v1_5_ft_quora_0_5_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_ft_quora_0_5_pipeline` is a English model originally trained by 567-labs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_5_pipeline_en_5.4.0_3.0_1718067776480.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_5_pipeline_en_5.4.0_3.0_1718067776480.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_english_v1_5_ft_quora_0_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_english_v1_5_ft_quora_0_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_ft_quora_0_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|398.0 MB| + +## References + +https://huggingface.co/567-labs/bge-base-en-v1.5-ft-quora-0.5 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_567_labs_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_567_labs_en.md new file mode 100644 index 00000000000000..63a0976fb25615 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_567_labs_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_english_v1_5_ft_quora_567_labs BGEEmbeddings from 567-labs +author: John Snow Labs +name: bge_base_english_v1_5_ft_quora_567_labs +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_ft_quora_567_labs` is a English model originally trained by 567-labs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_567_labs_en_5.4.0_3.0_1718064656500.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_567_labs_en_5.4.0_3.0_1718064656500.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_ft_quora_567_labs","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_ft_quora_567_labs","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_ft_quora_567_labs| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|400.5 MB| + +## References + +https://huggingface.co/567-labs/bge-base-en-v1.5-ft-quora \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_567_labs_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_567_labs_pipeline_en.md new file mode 100644 index 00000000000000..cbecc2ac583324 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_567_labs_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_english_v1_5_ft_quora_567_labs_pipeline pipeline BGEEmbeddings from 567-labs +author: John Snow Labs +name: bge_base_english_v1_5_ft_quora_567_labs_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_ft_quora_567_labs_pipeline` is a English model originally trained by 567-labs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_567_labs_pipeline_en_5.4.0_3.0_1718064684009.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_567_labs_pipeline_en_5.4.0_3.0_1718064684009.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_english_v1_5_ft_quora_567_labs_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_english_v1_5_ft_quora_567_labs_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_ft_quora_567_labs_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|400.5 MB| + +## References + +https://huggingface.co/567-labs/bge-base-en-v1.5-ft-quora + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_krunchykat_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_krunchykat_en.md new file mode 100644 index 00000000000000..2a5c085f849dc3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_krunchykat_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_english_v1_5_ft_quora_krunchykat BGEEmbeddings from krunchykat +author: John Snow Labs +name: bge_base_english_v1_5_ft_quora_krunchykat +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_ft_quora_krunchykat` is a English model originally trained by krunchykat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_krunchykat_en_5.4.0_3.0_1718069916603.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_krunchykat_en_5.4.0_3.0_1718069916603.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_ft_quora_krunchykat","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_ft_quora_krunchykat","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_ft_quora_krunchykat| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|400.5 MB| + +## References + +https://huggingface.co/krunchykat/bge-base-en-v1.5-ft-quora \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_krunchykat_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_krunchykat_pipeline_en.md new file mode 100644 index 00000000000000..7ca476a511f305 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_krunchykat_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_english_v1_5_ft_quora_krunchykat_pipeline pipeline BGEEmbeddings from krunchykat +author: John Snow Labs +name: bge_base_english_v1_5_ft_quora_krunchykat_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_ft_quora_krunchykat_pipeline` is a English model originally trained by krunchykat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_krunchykat_pipeline_en_5.4.0_3.0_1718069950790.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_krunchykat_pipeline_en_5.4.0_3.0_1718069950790.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_english_v1_5_ft_quora_krunchykat_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_english_v1_5_ft_quora_krunchykat_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_ft_quora_krunchykat_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|400.5 MB| + +## References + +https://huggingface.co/krunchykat/bge-base-en-v1.5-ft-quora + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_semicon_ym_0122_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_semicon_ym_0122_en.md new file mode 100644 index 00000000000000..9774e57c56b2f9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_semicon_ym_0122_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_english_v1_5_semicon_ym_0122 BGEEmbeddings from Niraya666 +author: John Snow Labs +name: bge_base_english_v1_5_semicon_ym_0122 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_semicon_ym_0122` is a English model originally trained by Niraya666. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_semicon_ym_0122_en_5.4.0_3.0_1718069614829.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_semicon_ym_0122_en_5.4.0_3.0_1718069614829.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_semicon_ym_0122","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_semicon_ym_0122","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_semicon_ym_0122| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|380.6 MB| + +## References + +https://huggingface.co/Niraya666/bge-base-en-v1.5-semicon-ym-0122 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_semicon_ym_0122_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_semicon_ym_0122_pipeline_en.md new file mode 100644 index 00000000000000..611495211ace7f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_semicon_ym_0122_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_english_v1_5_semicon_ym_0122_pipeline pipeline BGEEmbeddings from Niraya666 +author: John Snow Labs +name: bge_base_english_v1_5_semicon_ym_0122_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_semicon_ym_0122_pipeline` is a English model originally trained by Niraya666. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_semicon_ym_0122_pipeline_en_5.4.0_3.0_1718069652500.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_semicon_ym_0122_pipeline_en_5.4.0_3.0_1718069652500.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_english_v1_5_semicon_ym_0122_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_english_v1_5_semicon_ym_0122_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_semicon_ym_0122_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|380.6 MB| + +## References + +https://huggingface.co/Niraya666/bge-base-en-v1.5-semicon-ym-0122 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_en.md new file mode 100644 index 00000000000000..5bf2e7eb5673c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_financial BGEEmbeddings from riphunter7001x +author: John Snow Labs +name: bge_base_financial +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial` is a English model originally trained by riphunter7001x. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_en_5.4.0_3.0_1718071167837.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_en_5.4.0_3.0_1718071167837.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_financial","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_financial","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.5 MB| + +## References + +https://huggingface.co/riphunter7001x/bge-base-financial \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_mugheesawan11_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_mugheesawan11_en.md new file mode 100644 index 00000000000000..0a5e6f303f94e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_mugheesawan11_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_mugheesawan11 BGEEmbeddings from MugheesAwan11 +author: John Snow Labs +name: bge_base_financial_matryoshka_mugheesawan11 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_mugheesawan11` is a English model originally trained by MugheesAwan11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_mugheesawan11_en_5.4.0_3.0_1718068377569.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_mugheesawan11_en_5.4.0_3.0_1718068377569.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_mugheesawan11","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_mugheesawan11","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_mugheesawan11| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.0 MB| + +## References + +https://huggingface.co/MugheesAwan11/bge-base-financial-matryoshka \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_mugheesawan11_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_mugheesawan11_pipeline_en.md new file mode 100644 index 00000000000000..8b8d3a3a789389 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_mugheesawan11_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_mugheesawan11_pipeline pipeline BGEEmbeddings from MugheesAwan11 +author: John Snow Labs +name: bge_base_financial_matryoshka_mugheesawan11_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_mugheesawan11_pipeline` is a English model originally trained by MugheesAwan11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_mugheesawan11_pipeline_en_5.4.0_3.0_1718068413353.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_mugheesawan11_pipeline_en_5.4.0_3.0_1718068413353.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_financial_matryoshka_mugheesawan11_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_financial_matryoshka_mugheesawan11_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_mugheesawan11_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/MugheesAwan11/bge-base-financial-matryoshka + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_phamkinhquoc2002_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_phamkinhquoc2002_en.md new file mode 100644 index 00000000000000..ef5c7514f6b225 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_phamkinhquoc2002_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_phamkinhquoc2002 BGEEmbeddings from phamkinhquoc2002 +author: John Snow Labs +name: bge_base_financial_matryoshka_phamkinhquoc2002 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_phamkinhquoc2002` is a English model originally trained by phamkinhquoc2002. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_phamkinhquoc2002_en_5.4.0_3.0_1718066241417.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_phamkinhquoc2002_en_5.4.0_3.0_1718066241417.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_phamkinhquoc2002","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_phamkinhquoc2002","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_phamkinhquoc2002| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|256.0 MB| + +## References + +https://huggingface.co/phamkinhquoc2002/bge-base-financial-matryoshka \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_phamkinhquoc2002_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_phamkinhquoc2002_pipeline_en.md new file mode 100644 index 00000000000000..25411e24a51fe9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_phamkinhquoc2002_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_phamkinhquoc2002_pipeline pipeline BGEEmbeddings from phamkinhquoc2002 +author: John Snow Labs +name: bge_base_financial_matryoshka_phamkinhquoc2002_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_phamkinhquoc2002_pipeline` is a English model originally trained by phamkinhquoc2002. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_phamkinhquoc2002_pipeline_en_5.4.0_3.0_1718066340084.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_phamkinhquoc2002_pipeline_en_5.4.0_3.0_1718066340084.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_financial_matryoshka_phamkinhquoc2002_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_financial_matryoshka_phamkinhquoc2002_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_phamkinhquoc2002_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|256.0 MB| + +## References + +https://huggingface.co/phamkinhquoc2002/bge-base-financial-matryoshka + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_philschmid_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_philschmid_en.md new file mode 100644 index 00000000000000..8986d33677907e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_philschmid_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_philschmid BGEEmbeddings from philschmid +author: John Snow Labs +name: bge_base_financial_matryoshka_philschmid +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_philschmid` is a English model originally trained by philschmid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_philschmid_en_5.4.0_3.0_1718066188278.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_philschmid_en_5.4.0_3.0_1718066188278.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_philschmid","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_philschmid","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_philschmid| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/philschmid/bge-base-financial-matryoshka \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_philschmid_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_philschmid_pipeline_en.md new file mode 100644 index 00000000000000..62703a59100de4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_philschmid_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_philschmid_pipeline pipeline BGEEmbeddings from philschmid +author: John Snow Labs +name: bge_base_financial_matryoshka_philschmid_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_philschmid_pipeline` is a English model originally trained by philschmid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_philschmid_pipeline_en_5.4.0_3.0_1718066222589.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_philschmid_pipeline_en_5.4.0_3.0_1718066222589.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_financial_matryoshka_philschmid_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_financial_matryoshka_philschmid_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_philschmid_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/philschmid/bge-base-financial-matryoshka + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_sailesh9999_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_sailesh9999_en.md new file mode 100644 index 00000000000000..fbc56f1811476a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_sailesh9999_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_sailesh9999 BGEEmbeddings from Sailesh9999 +author: John Snow Labs +name: bge_base_financial_matryoshka_sailesh9999 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_sailesh9999` is a English model originally trained by Sailesh9999. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_sailesh9999_en_5.4.0_3.0_1718066452795.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_sailesh9999_en_5.4.0_3.0_1718066452795.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_sailesh9999","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_sailesh9999","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_sailesh9999| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/Sailesh9999/bge-base-financial-matryoshka \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_sailesh9999_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_sailesh9999_pipeline_en.md new file mode 100644 index 00000000000000..1bc5b18b5d337f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_sailesh9999_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_sailesh9999_pipeline pipeline BGEEmbeddings from Sailesh9999 +author: John Snow Labs +name: bge_base_financial_matryoshka_sailesh9999_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_sailesh9999_pipeline` is a English model originally trained by Sailesh9999. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_sailesh9999_pipeline_en_5.4.0_3.0_1718066493968.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_sailesh9999_pipeline_en_5.4.0_3.0_1718066493968.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_financial_matryoshka_sailesh9999_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_financial_matryoshka_sailesh9999_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_sailesh9999_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/Sailesh9999/bge-base-financial-matryoshka + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_test_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_test_en.md new file mode 100644 index 00000000000000..a9c0f4d93566b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_test_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_test BGEEmbeddings from phamkinhquoc2002 +author: John Snow Labs +name: bge_base_financial_matryoshka_test +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_test` is a English model originally trained by phamkinhquoc2002. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_test_en_5.4.0_3.0_1718068288916.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_test_en_5.4.0_3.0_1718068288916.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_test","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_test","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_test| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|256.0 MB| + +## References + +https://huggingface.co/phamkinhquoc2002/bge-base-financial-matryoshka_test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_test_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_test_pipeline_en.md new file mode 100644 index 00000000000000..762a6ce99ca92a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_test_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_test_pipeline pipeline BGEEmbeddings from phamkinhquoc2002 +author: John Snow Labs +name: bge_base_financial_matryoshka_test_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_test_pipeline` is a English model originally trained by phamkinhquoc2002. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_test_pipeline_en_5.4.0_3.0_1718068388194.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_test_pipeline_en_5.4.0_3.0_1718068388194.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_financial_matryoshka_test_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_financial_matryoshka_test_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_test_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|256.0 MB| + +## References + +https://huggingface.co/phamkinhquoc2002/bge-base-financial-matryoshka_test + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_uonyeka_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_uonyeka_en.md new file mode 100644 index 00000000000000..924bb44192a483 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_uonyeka_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_uonyeka BGEEmbeddings from uonyeka +author: John Snow Labs +name: bge_base_financial_matryoshka_uonyeka +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_uonyeka` is a English model originally trained by uonyeka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_uonyeka_en_5.4.0_3.0_1718064614997.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_uonyeka_en_5.4.0_3.0_1718064614997.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_uonyeka","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_uonyeka","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_uonyeka| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.2 MB| + +## References + +https://huggingface.co/uonyeka/bge-base-financial-matryoshka \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_uonyeka_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_uonyeka_pipeline_en.md new file mode 100644 index 00000000000000..36d6b4838499ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_uonyeka_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_uonyeka_pipeline pipeline BGEEmbeddings from uonyeka +author: John Snow Labs +name: bge_base_financial_matryoshka_uonyeka_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_uonyeka_pipeline` is a English model originally trained by uonyeka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_uonyeka_pipeline_en_5.4.0_3.0_1718064649259.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_uonyeka_pipeline_en_5.4.0_3.0_1718064649259.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_financial_matryoshka_uonyeka_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_financial_matryoshka_uonyeka_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_uonyeka_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.2 MB| + +## References + +https://huggingface.co/uonyeka/bge-base-financial-matryoshka + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_waheedlone_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_waheedlone_en.md new file mode 100644 index 00000000000000..b23dce53c3e121 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_waheedlone_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_waheedlone BGEEmbeddings from WaheedLone +author: John Snow Labs +name: bge_base_financial_matryoshka_waheedlone +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_waheedlone` is a English model originally trained by WaheedLone. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_waheedlone_en_5.4.0_3.0_1718068545032.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_waheedlone_en_5.4.0_3.0_1718068545032.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_waheedlone","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_waheedlone","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_waheedlone| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/WaheedLone/bge-base-financial-matryoshka \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_waheedlone_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_waheedlone_pipeline_en.md new file mode 100644 index 00000000000000..42e4c6cdc454f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_waheedlone_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_waheedlone_pipeline pipeline BGEEmbeddings from WaheedLone +author: John Snow Labs +name: bge_base_financial_matryoshka_waheedlone_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_waheedlone_pipeline` is a English model originally trained by WaheedLone. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_waheedlone_pipeline_en_5.4.0_3.0_1718068579611.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_waheedlone_pipeline_en_5.4.0_3.0_1718068579611.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_financial_matryoshka_waheedlone_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_financial_matryoshka_waheedlone_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_waheedlone_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/WaheedLone/bge-base-financial-matryoshka + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_pipeline_en.md new file mode 100644 index 00000000000000..30d93e721ec776 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_financial_pipeline pipeline BGEEmbeddings from riphunter7001x +author: John Snow Labs +name: bge_base_financial_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_pipeline` is a English model originally trained by riphunter7001x. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_pipeline_en_5.4.0_3.0_1718071202731.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_pipeline_en_5.4.0_3.0_1718071202731.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_financial_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_financial_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.5 MB| + +## References + +https://huggingface.co/riphunter7001x/bge-base-financial + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetune_v2_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetune_v2_en.md new file mode 100644 index 00000000000000..53398215ae0f70 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetune_v2_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_finetune_v2 BGEEmbeddings from Suva +author: John Snow Labs +name: bge_base_finetune_v2 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_finetune_v2` is a English model originally trained by Suva. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_finetune_v2_en_5.4.0_3.0_1718066856172.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_finetune_v2_en_5.4.0_3.0_1718066856172.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_finetune_v2","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_finetune_v2","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_finetune_v2| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|385.4 MB| + +## References + +https://huggingface.co/Suva/bge-base-finetune-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetune_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetune_v2_pipeline_en.md new file mode 100644 index 00000000000000..c49798537b71c5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetune_v2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_finetune_v2_pipeline pipeline BGEEmbeddings from Suva +author: John Snow Labs +name: bge_base_finetune_v2_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_finetune_v2_pipeline` is a English model originally trained by Suva. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_finetune_v2_pipeline_en_5.4.0_3.0_1718066889529.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_finetune_v2_pipeline_en_5.4.0_3.0_1718066889529.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_finetune_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_finetune_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_finetune_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|385.4 MB| + +## References + +https://huggingface.co/Suva/bge-base-finetune-v2 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetuned_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetuned_en.md new file mode 100644 index 00000000000000..80980ea4ade301 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetuned_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_finetuned BGEEmbeddings from Suva +author: John Snow Labs +name: bge_base_finetuned +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_finetuned` is a English model originally trained by Suva. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_finetuned_en_5.4.0_3.0_1718067158517.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_finetuned_en_5.4.0_3.0_1718067158517.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_finetuned","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_finetuned","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_finetuned| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|381.7 MB| + +## References + +https://huggingface.co/Suva/bge-base-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetuned_financial_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetuned_financial_en.md new file mode 100644 index 00000000000000..41154bdff09b71 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetuned_financial_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_finetuned_financial BGEEmbeddings from Nishanth7803 +author: John Snow Labs +name: bge_base_finetuned_financial +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_finetuned_financial` is a English model originally trained by Nishanth7803. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_finetuned_financial_en_5.4.0_3.0_1718092788141.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_finetuned_financial_en_5.4.0_3.0_1718092788141.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_finetuned_financial","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_finetuned_financial","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_finetuned_financial| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/Nishanth7803/bge-base-finetuned-financial \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetuned_financial_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetuned_financial_pipeline_en.md new file mode 100644 index 00000000000000..8de2b578bda5d9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetuned_financial_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_finetuned_financial_pipeline pipeline BGEEmbeddings from Nishanth7803 +author: John Snow Labs +name: bge_base_finetuned_financial_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_finetuned_financial_pipeline` is a English model originally trained by Nishanth7803. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_finetuned_financial_pipeline_en_5.4.0_3.0_1718092823761.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_finetuned_financial_pipeline_en_5.4.0_3.0_1718092823761.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_finetuned_financial_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_finetuned_financial_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_finetuned_financial_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/Nishanth7803/bge-base-finetuned-financial + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..ccc001fb802b37 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetuned_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_finetuned_pipeline pipeline BGEEmbeddings from Suva +author: John Snow Labs +name: bge_base_finetuned_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_finetuned_pipeline` is a English model originally trained by Suva. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_finetuned_pipeline_en_5.4.0_3.0_1718067193949.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_finetuned_pipeline_en_5.4.0_3.0_1718067193949.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|381.7 MB| + +## References + +https://huggingface.co/Suva/bge-base-finetuned + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_securiti_dataset_1_v3_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_securiti_dataset_1_v3_en.md new file mode 100644 index 00000000000000..45e78b26efba89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_securiti_dataset_1_v3_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_securiti_dataset_1_v3 BGEEmbeddings from MugheesAwan11 +author: John Snow Labs +name: bge_base_securiti_dataset_1_v3 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_securiti_dataset_1_v3` is a English model originally trained by MugheesAwan11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v3_en_5.4.0_3.0_1718068326018.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v3_en_5.4.0_3.0_1718068326018.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_securiti_dataset_1_v3","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_securiti_dataset_1_v3","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_securiti_dataset_1_v3| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|376.5 MB| + +## References + +https://huggingface.co/MugheesAwan11/bge-base-securiti-dataset-1-v3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_securiti_dataset_1_v3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_securiti_dataset_1_v3_pipeline_en.md new file mode 100644 index 00000000000000..bacb064ba4a2de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_securiti_dataset_1_v3_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_securiti_dataset_1_v3_pipeline pipeline BGEEmbeddings from MugheesAwan11 +author: John Snow Labs +name: bge_base_securiti_dataset_1_v3_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_securiti_dataset_1_v3_pipeline` is a English model originally trained by MugheesAwan11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v3_pipeline_en_5.4.0_3.0_1718068367075.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v3_pipeline_en_5.4.0_3.0_1718068367075.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_securiti_dataset_1_v3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_securiti_dataset_1_v3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_securiti_dataset_1_v3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|376.5 MB| + +## References + +https://huggingface.co/MugheesAwan11/bge-base-securiti-dataset-1-v3 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_securiti_dataset_1_v5_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_securiti_dataset_1_v5_en.md new file mode 100644 index 00000000000000..fed9a54052012f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_securiti_dataset_1_v5_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_securiti_dataset_1_v5 BGEEmbeddings from MugheesAwan11 +author: John Snow Labs +name: bge_base_securiti_dataset_1_v5 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_securiti_dataset_1_v5` is a English model originally trained by MugheesAwan11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v5_en_5.4.0_3.0_1718066529630.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v5_en_5.4.0_3.0_1718066529630.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_securiti_dataset_1_v5","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_securiti_dataset_1_v5","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_securiti_dataset_1_v5| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|376.6 MB| + +## References + +https://huggingface.co/MugheesAwan11/bge-base-securiti-dataset-1-v5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_securiti_dataset_1_v5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_securiti_dataset_1_v5_pipeline_en.md new file mode 100644 index 00000000000000..11dfb8e72c0160 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_securiti_dataset_1_v5_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_securiti_dataset_1_v5_pipeline pipeline BGEEmbeddings from MugheesAwan11 +author: John Snow Labs +name: bge_base_securiti_dataset_1_v5_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_securiti_dataset_1_v5_pipeline` is a English model originally trained by MugheesAwan11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v5_pipeline_en_5.4.0_3.0_1718066569289.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v5_pipeline_en_5.4.0_3.0_1718066569289.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_securiti_dataset_1_v5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_securiti_dataset_1_v5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_securiti_dataset_1_v5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|376.6 MB| + +## References + +https://huggingface.co/MugheesAwan11/bge-base-securiti-dataset-1-v5 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_fin_intent_large_chinese_v1_5_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_fin_intent_large_chinese_v1_5_en.md new file mode 100644 index 00000000000000..48bbe26ef4cdfe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_fin_intent_large_chinese_v1_5_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_fin_intent_large_chinese_v1_5 BGEEmbeddings from luchun +author: John Snow Labs +name: bge_fin_intent_large_chinese_v1_5 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_fin_intent_large_chinese_v1_5` is a English model originally trained by luchun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_fin_intent_large_chinese_v1_5_en_5.4.0_3.0_1718069318884.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_fin_intent_large_chinese_v1_5_en_5.4.0_3.0_1718069318884.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_fin_intent_large_chinese_v1_5","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_fin_intent_large_chinese_v1_5","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_fin_intent_large_chinese_v1_5| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/luchun/bge_fin_intent_large_zh_v1.5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_fin_intent_large_chinese_v1_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_fin_intent_large_chinese_v1_5_pipeline_en.md new file mode 100644 index 00000000000000..90d1f9253e582b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_fin_intent_large_chinese_v1_5_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_fin_intent_large_chinese_v1_5_pipeline pipeline BGEEmbeddings from luchun +author: John Snow Labs +name: bge_fin_intent_large_chinese_v1_5_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_fin_intent_large_chinese_v1_5_pipeline` is a English model originally trained by luchun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_fin_intent_large_chinese_v1_5_pipeline_en_5.4.0_3.0_1718069406837.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_fin_intent_large_chinese_v1_5_pipeline_en_5.4.0_3.0_1718069406837.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_fin_intent_large_chinese_v1_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_fin_intent_large_chinese_v1_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_fin_intent_large_chinese_v1_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/luchun/bge_fin_intent_large_zh_v1.5 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_large_chinese_v2_2_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_large_chinese_v2_2_en.md new file mode 100644 index 00000000000000..53dded9bc32fb4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_large_chinese_v2_2_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_large_chinese_v2_2 BGEEmbeddings from clinno +author: John Snow Labs +name: bge_large_chinese_v2_2 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_chinese_v2_2` is a English model originally trained by clinno. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_chinese_v2_2_en_5.4.0_3.0_1718068973981.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_chinese_v2_2_en_5.4.0_3.0_1718068973981.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_large_chinese_v2_2","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_large_chinese_v2_2","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_chinese_v2_2| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/clinno/bge-large-zh-v2.2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_large_chinese_v2_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_large_chinese_v2_2_pipeline_en.md new file mode 100644 index 00000000000000..851f0c9cc64cc9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_large_chinese_v2_2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_large_chinese_v2_2_pipeline pipeline BGEEmbeddings from clinno +author: John Snow Labs +name: bge_large_chinese_v2_2_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_chinese_v2_2_pipeline` is a English model originally trained by clinno. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_chinese_v2_2_pipeline_en_5.4.0_3.0_1718069063205.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_chinese_v2_2_pipeline_en_5.4.0_3.0_1718069063205.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_large_chinese_v2_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_large_chinese_v2_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_chinese_v2_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/clinno/bge-large-zh-v2.2 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_large_english_v1_5_finetuned_300_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_large_english_v1_5_finetuned_300_en.md new file mode 100644 index 00000000000000..8244f4edb1f34d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_large_english_v1_5_finetuned_300_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_large_english_v1_5_finetuned_300 BGEEmbeddings from ramnathv +author: John Snow Labs +name: bge_large_english_v1_5_finetuned_300 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_english_v1_5_finetuned_300` is a English model originally trained by ramnathv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_english_v1_5_finetuned_300_en_5.4.0_3.0_1718070925146.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_english_v1_5_finetuned_300_en_5.4.0_3.0_1718070925146.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_large_english_v1_5_finetuned_300","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_large_english_v1_5_finetuned_300","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_english_v1_5_finetuned_300| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/ramnathv/bge-large-en-v1.5-finetuned-300 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_large_english_v1_5_finetuned_300_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_large_english_v1_5_finetuned_300_pipeline_en.md new file mode 100644 index 00000000000000..ca76448b27a082 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_large_english_v1_5_finetuned_300_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_large_english_v1_5_finetuned_300_pipeline pipeline BGEEmbeddings from ramnathv +author: John Snow Labs +name: bge_large_english_v1_5_finetuned_300_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_english_v1_5_finetuned_300_pipeline` is a English model originally trained by ramnathv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_english_v1_5_finetuned_300_pipeline_en_5.4.0_3.0_1718071024233.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_english_v1_5_finetuned_300_pipeline_en_5.4.0_3.0_1718071024233.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_large_english_v1_5_finetuned_300_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_large_english_v1_5_finetuned_300_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_english_v1_5_finetuned_300_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/ramnathv/bge-large-en-v1.5-finetuned-300 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_large_english_v1_5_isoko_27001_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_large_english_v1_5_isoko_27001_en.md new file mode 100644 index 00000000000000..ea6669643dd4a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_large_english_v1_5_isoko_27001_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_large_english_v1_5_isoko_27001 BGEEmbeddings from Basti8499 +author: John Snow Labs +name: bge_large_english_v1_5_isoko_27001 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_english_v1_5_isoko_27001` is a English model originally trained by Basti8499. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_english_v1_5_isoko_27001_en_5.4.0_3.0_1718067937932.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_english_v1_5_isoko_27001_en_5.4.0_3.0_1718067937932.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_large_english_v1_5_isoko_27001","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_large_english_v1_5_isoko_27001","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_english_v1_5_isoko_27001| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Basti8499/bge-large-en-v1.5-ISO-27001 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_large_english_v1_5_isoko_27001_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_large_english_v1_5_isoko_27001_pipeline_en.md new file mode 100644 index 00000000000000..42c2fe245f18c3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_large_english_v1_5_isoko_27001_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_large_english_v1_5_isoko_27001_pipeline pipeline BGEEmbeddings from Basti8499 +author: John Snow Labs +name: bge_large_english_v1_5_isoko_27001_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_english_v1_5_isoko_27001_pipeline` is a English model originally trained by Basti8499. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_english_v1_5_isoko_27001_pipeline_en_5.4.0_3.0_1718068026491.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_english_v1_5_isoko_27001_pipeline_en_5.4.0_3.0_1718068026491.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_large_english_v1_5_isoko_27001_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_large_english_v1_5_isoko_27001_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_english_v1_5_isoko_27001_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Basti8499/bge-large-en-v1.5-ISO-27001 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_large_fine_tuned_paraphrase_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_large_fine_tuned_paraphrase_en.md new file mode 100644 index 00000000000000..4b3fd17641b451 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_large_fine_tuned_paraphrase_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_large_fine_tuned_paraphrase BGEEmbeddings from kwang123 +author: John Snow Labs +name: bge_large_fine_tuned_paraphrase +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_fine_tuned_paraphrase` is a English model originally trained by kwang123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_fine_tuned_paraphrase_en_5.4.0_3.0_1718065837377.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_fine_tuned_paraphrase_en_5.4.0_3.0_1718065837377.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_large_fine_tuned_paraphrase","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_large_fine_tuned_paraphrase","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_fine_tuned_paraphrase| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/kwang123/bge-large-fine-tuned-paraphrase \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_large_fine_tuned_paraphrase_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_large_fine_tuned_paraphrase_pipeline_en.md new file mode 100644 index 00000000000000..1fbe6a70fdf418 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_large_fine_tuned_paraphrase_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_large_fine_tuned_paraphrase_pipeline pipeline BGEEmbeddings from kwang123 +author: John Snow Labs +name: bge_large_fine_tuned_paraphrase_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_fine_tuned_paraphrase_pipeline` is a English model originally trained by kwang123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_fine_tuned_paraphrase_pipeline_en_5.4.0_3.0_1718065932307.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_fine_tuned_paraphrase_pipeline_en_5.4.0_3.0_1718065932307.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_large_fine_tuned_paraphrase_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_large_fine_tuned_paraphrase_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_fine_tuned_paraphrase_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/kwang123/bge-large-fine-tuned-paraphrase + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_large_finetuned_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_large_finetuned_en.md new file mode 100644 index 00000000000000..3b989d26e56de8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_large_finetuned_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_large_finetuned BGEEmbeddings from Suva +author: John Snow Labs +name: bge_large_finetuned +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_finetuned` is a English model originally trained by Suva. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_finetuned_en_5.4.0_3.0_1718067513153.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_finetuned_en_5.4.0_3.0_1718067513153.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_large_finetuned","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_large_finetuned","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_finetuned| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Suva/bge-large-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_large_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_large_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..83fe488d9f9947 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_large_finetuned_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_large_finetuned_pipeline pipeline BGEEmbeddings from Suva +author: John Snow Labs +name: bge_large_finetuned_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_finetuned_pipeline` is a English model originally trained by Suva. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_finetuned_pipeline_en_5.4.0_3.0_1718067606500.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_finetuned_pipeline_en_5.4.0_3.0_1718067606500.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_large_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_large_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Suva/bge-large-finetuned + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_large_frombge_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_large_frombge_en.md new file mode 100644 index 00000000000000..b42137c6096a8d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_large_frombge_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_large_frombge BGEEmbeddings from joshus +author: John Snow Labs +name: bge_large_frombge +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_frombge` is a English model originally trained by joshus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_frombge_en_5.4.0_3.0_1718067524307.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_frombge_en_5.4.0_3.0_1718067524307.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_large_frombge","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_large_frombge","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_frombge| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/joshus/bge-large-frombge \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_large_frombge_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_large_frombge_pipeline_en.md new file mode 100644 index 00000000000000..39e67c73b8b3bb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_large_frombge_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_large_frombge_pipeline pipeline BGEEmbeddings from joshus +author: John Snow Labs +name: bge_large_frombge_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_frombge_pipeline` is a English model originally trained by joshus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_frombge_pipeline_en_5.4.0_3.0_1718067620106.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_frombge_pipeline_en_5.4.0_3.0_1718067620106.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_large_frombge_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_large_frombge_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_frombge_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/joshus/bge-large-frombge + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_large_medical_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_large_medical_en.md new file mode 100644 index 00000000000000..6c668cb96d7758 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_large_medical_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_large_medical BGEEmbeddings from ls-da3m0ns +author: John Snow Labs +name: bge_large_medical +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_medical` is a English model originally trained by ls-da3m0ns. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_medical_en_5.4.0_3.0_1718068972650.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_medical_en_5.4.0_3.0_1718068972650.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_large_medical","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_large_medical","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_medical| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/ls-da3m0ns/bge_large_medical \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_large_medical_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_large_medical_pipeline_en.md new file mode 100644 index 00000000000000..472b989cc427e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_large_medical_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_large_medical_pipeline pipeline BGEEmbeddings from ls-da3m0ns +author: John Snow Labs +name: bge_large_medical_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_medical_pipeline` is a English model originally trained by ls-da3m0ns. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_medical_pipeline_en_5.4.0_3.0_1718069069828.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_medical_pipeline_en_5.4.0_3.0_1718069069828.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_large_medical_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_large_medical_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_medical_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/ls-da3m0ns/bge_large_medical + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_large_v1_5_fine_tuning_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_large_v1_5_fine_tuning_en.md new file mode 100644 index 00000000000000..b00c834ef09006 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_large_v1_5_fine_tuning_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_large_v1_5_fine_tuning BGEEmbeddings from bespin-global +author: John Snow Labs +name: bge_large_v1_5_fine_tuning +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_v1_5_fine_tuning` is a English model originally trained by bespin-global. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_v1_5_fine_tuning_en_5.4.0_3.0_1718070424136.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_v1_5_fine_tuning_en_5.4.0_3.0_1718070424136.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_large_v1_5_fine_tuning","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_large_v1_5_fine_tuning","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_v1_5_fine_tuning| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/bespin-global/bge-large-v1.5-fine-tuning \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_large_v1_5_fine_tuning_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_large_v1_5_fine_tuning_pipeline_en.md new file mode 100644 index 00000000000000..d4b058517c2ba3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_large_v1_5_fine_tuning_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_large_v1_5_fine_tuning_pipeline pipeline BGEEmbeddings from bespin-global +author: John Snow Labs +name: bge_large_v1_5_fine_tuning_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_v1_5_fine_tuning_pipeline` is a English model originally trained by bespin-global. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_v1_5_fine_tuning_pipeline_en_5.4.0_3.0_1718070509670.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_v1_5_fine_tuning_pipeline_en_5.4.0_3.0_1718070509670.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_large_v1_5_fine_tuning_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_large_v1_5_fine_tuning_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_v1_5_fine_tuning_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/bespin-global/bge-large-v1.5-fine-tuning + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_micro_v2_taylorai_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_micro_v2_taylorai_en.md new file mode 100644 index 00000000000000..20510e703e76ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_micro_v2_taylorai_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_micro_v2_taylorai BGEEmbeddings from TaylorAI +author: John Snow Labs +name: bge_micro_v2_taylorai +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_micro_v2_taylorai` is a English model originally trained by TaylorAI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_micro_v2_taylorai_en_5.4.0_3.0_1718066828545.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_micro_v2_taylorai_en_5.4.0_3.0_1718066828545.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_micro_v2_taylorai","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_micro_v2_taylorai","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_micro_v2_taylorai| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|41.5 MB| + +## References + +https://huggingface.co/TaylorAI/bge-micro-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_micro_v2_taylorai_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_micro_v2_taylorai_pipeline_en.md new file mode 100644 index 00000000000000..b60ef1ab07770e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_micro_v2_taylorai_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_micro_v2_taylorai_pipeline pipeline BGEEmbeddings from TaylorAI +author: John Snow Labs +name: bge_micro_v2_taylorai_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_micro_v2_taylorai_pipeline` is a English model originally trained by TaylorAI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_micro_v2_taylorai_pipeline_en_5.4.0_3.0_1718066843713.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_micro_v2_taylorai_pipeline_en_5.4.0_3.0_1718066843713.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_micro_v2_taylorai_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_micro_v2_taylorai_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_micro_v2_taylorai_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|41.5 MB| + +## References + +https://huggingface.co/TaylorAI/bge-micro-v2 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_small_book_qa_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_small_book_qa_en.md new file mode 100644 index 00000000000000..91abe6d7db1944 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_small_book_qa_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_small_book_qa BGEEmbeddings from svjack +author: John Snow Labs +name: bge_small_book_qa +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_small_book_qa` is a English model originally trained by svjack. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_small_book_qa_en_5.4.0_3.0_1718067110698.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_small_book_qa_en_5.4.0_3.0_1718067110698.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_small_book_qa","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_small_book_qa","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_small_book_qa| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|77.9 MB| + +## References + +https://huggingface.co/svjack/bge-small-book-qa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_small_book_qa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_small_book_qa_pipeline_en.md new file mode 100644 index 00000000000000..3e07c2195c6844 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_small_book_qa_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_small_book_qa_pipeline pipeline BGEEmbeddings from svjack +author: John Snow Labs +name: bge_small_book_qa_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_small_book_qa_pipeline` is a English model originally trained by svjack. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_small_book_qa_pipeline_en_5.4.0_3.0_1718067120445.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_small_book_qa_pipeline_en_5.4.0_3.0_1718067120445.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_small_book_qa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_small_book_qa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_small_book_qa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|77.9 MB| + +## References + +https://huggingface.co/svjack/bge-small-book-qa + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_small_dmr_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_small_dmr_en.md new file mode 100644 index 00000000000000..c888ae34eb3bce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_small_dmr_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_small_dmr BGEEmbeddings from McGill-NLP +author: John Snow Labs +name: bge_small_dmr +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_small_dmr` is a English model originally trained by McGill-NLP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_small_dmr_en_5.4.0_3.0_1718068959912.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_small_dmr_en_5.4.0_3.0_1718068959912.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_small_dmr","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_small_dmr","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_small_dmr| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|118.8 MB| + +## References + +https://huggingface.co/McGill-NLP/bge-small-dmr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_small_dmr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_small_dmr_pipeline_en.md new file mode 100644 index 00000000000000..af0d4cc93cd003 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_small_dmr_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_small_dmr_pipeline pipeline BGEEmbeddings from McGill-NLP +author: John Snow Labs +name: bge_small_dmr_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_small_dmr_pipeline` is a English model originally trained by McGill-NLP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_small_dmr_pipeline_en_5.4.0_3.0_1718068970032.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_small_dmr_pipeline_en_5.4.0_3.0_1718068970032.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_small_dmr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_small_dmr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_small_dmr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|118.8 MB| + +## References + +https://huggingface.co/McGill-NLP/bge-small-dmr + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_small_english_v1_5_fine_tuned_v0_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_small_english_v1_5_fine_tuned_v0_en.md new file mode 100644 index 00000000000000..019ad695fe0539 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_small_english_v1_5_fine_tuned_v0_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_small_english_v1_5_fine_tuned_v0 BGEEmbeddings from RMWeerasinghe +author: John Snow Labs +name: bge_small_english_v1_5_fine_tuned_v0 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_small_english_v1_5_fine_tuned_v0` is a English model originally trained by RMWeerasinghe. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_small_english_v1_5_fine_tuned_v0_en_5.4.0_3.0_1718070869960.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_small_english_v1_5_fine_tuned_v0_en_5.4.0_3.0_1718070869960.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_small_english_v1_5_fine_tuned_v0","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_small_english_v1_5_fine_tuned_v0","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_small_english_v1_5_fine_tuned_v0| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|112.3 MB| + +## References + +https://huggingface.co/RMWeerasinghe/bge-small-en-v1.5-fine-tuned-v0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_small_english_v1_5_fine_tuned_v0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_small_english_v1_5_fine_tuned_v0_pipeline_en.md new file mode 100644 index 00000000000000..9d43bac7187b85 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_small_english_v1_5_fine_tuned_v0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_small_english_v1_5_fine_tuned_v0_pipeline pipeline BGEEmbeddings from RMWeerasinghe +author: John Snow Labs +name: bge_small_english_v1_5_fine_tuned_v0_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_small_english_v1_5_fine_tuned_v0_pipeline` is a English model originally trained by RMWeerasinghe. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_small_english_v1_5_fine_tuned_v0_pipeline_en_5.4.0_3.0_1718070881475.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_small_english_v1_5_fine_tuned_v0_pipeline_en_5.4.0_3.0_1718070881475.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_small_english_v1_5_fine_tuned_v0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_small_english_v1_5_fine_tuned_v0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_small_english_v1_5_fine_tuned_v0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|112.4 MB| + +## References + +https://huggingface.co/RMWeerasinghe/bge-small-en-v1.5-fine-tuned-v0 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_small_english_v1_5_ft_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_small_english_v1_5_ft_en.md new file mode 100644 index 00000000000000..0f08260ae09d24 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_small_english_v1_5_ft_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_small_english_v1_5_ft BGEEmbeddings from Rebecca19990101 +author: John Snow Labs +name: bge_small_english_v1_5_ft +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_small_english_v1_5_ft` is a English model originally trained by Rebecca19990101. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_small_english_v1_5_ft_en_5.4.0_3.0_1718066588306.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_small_english_v1_5_ft_en_5.4.0_3.0_1718066588306.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_small_english_v1_5_ft","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_small_english_v1_5_ft","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_small_english_v1_5_ft| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|112.6 MB| + +## References + +https://huggingface.co/Rebecca19990101/bge-small-en-v1.5-ft \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_small_english_v1_5_ft_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_small_english_v1_5_ft_pipeline_en.md new file mode 100644 index 00000000000000..28164eba7d3997 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_small_english_v1_5_ft_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_small_english_v1_5_ft_pipeline pipeline BGEEmbeddings from Rebecca19990101 +author: John Snow Labs +name: bge_small_english_v1_5_ft_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_small_english_v1_5_ft_pipeline` is a English model originally trained by Rebecca19990101. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_small_english_v1_5_ft_pipeline_en_5.4.0_3.0_1718066599896.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_small_english_v1_5_ft_pipeline_en_5.4.0_3.0_1718066599896.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_small_english_v1_5_ft_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_small_english_v1_5_ft_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_small_english_v1_5_ft_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|112.6 MB| + +## References + +https://huggingface.co/Rebecca19990101/bge-small-en-v1.5-ft + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-cat_ner_xlmr_en.md b/docs/_posts/ahmedlone127/2024-06-11-cat_ner_xlmr_en.md new file mode 100644 index 00000000000000..d2c10e42153585 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-cat_ner_xlmr_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cat_ner_xlmr XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: cat_ner_xlmr +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cat_ner_xlmr` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cat_ner_xlmr_en_5.4.0_3.0_1718133095328.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cat_ner_xlmr_en_5.4.0_3.0_1718133095328.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("cat_ner_xlmr","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("cat_ner_xlmr", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cat_ner_xlmr| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/homersimpson/cat-ner-xlmr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-cat_ner_xlmr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-cat_ner_xlmr_pipeline_en.md new file mode 100644 index 00000000000000..385f4e30bd760f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-cat_ner_xlmr_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cat_ner_xlmr_pipeline pipeline XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: cat_ner_xlmr_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cat_ner_xlmr_pipeline` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cat_ner_xlmr_pipeline_en_5.4.0_3.0_1718133250662.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cat_ner_xlmr_pipeline_en_5.4.0_3.0_1718133250662.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cat_ner_xlmr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cat_ner_xlmr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cat_ner_xlmr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/homersimpson/cat-ner-xlmr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-clinico_xlm_roberta_en.md b/docs/_posts/ahmedlone127/2024-06-11-clinico_xlm_roberta_en.md new file mode 100644 index 00000000000000..b5d0c1a46f3feb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-clinico_xlm_roberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English clinico_xlm_roberta XlmRoBertaForTokenClassification from joheras +author: John Snow Labs +name: clinico_xlm_roberta +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clinico_xlm_roberta` is a English model originally trained by joheras. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clinico_xlm_roberta_en_5.4.0_3.0_1718128896490.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clinico_xlm_roberta_en_5.4.0_3.0_1718128896490.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("clinico_xlm_roberta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("clinico_xlm_roberta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinico_xlm_roberta| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|809.2 MB| + +## References + +https://huggingface.co/joheras/clinico-xlm-roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-clinico_xlm_roberta_large_finetuned_augmented1_en.md b/docs/_posts/ahmedlone127/2024-06-11-clinico_xlm_roberta_large_finetuned_augmented1_en.md new file mode 100644 index 00000000000000..743dfadef873e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-clinico_xlm_roberta_large_finetuned_augmented1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English clinico_xlm_roberta_large_finetuned_augmented1 XlmRoBertaForTokenClassification from joheras +author: John Snow Labs +name: clinico_xlm_roberta_large_finetuned_augmented1 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clinico_xlm_roberta_large_finetuned_augmented1` is a English model originally trained by joheras. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clinico_xlm_roberta_large_finetuned_augmented1_en_5.4.0_3.0_1718116724883.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clinico_xlm_roberta_large_finetuned_augmented1_en_5.4.0_3.0_1718116724883.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("clinico_xlm_roberta_large_finetuned_augmented1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("clinico_xlm_roberta_large_finetuned_augmented1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinico_xlm_roberta_large_finetuned_augmented1| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|990.7 MB| + +## References + +https://huggingface.co/joheras/clinico-xlm-roberta-large-finetuned-augmented1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-clinico_xlm_roberta_large_finetuned_augmented1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-clinico_xlm_roberta_large_finetuned_augmented1_pipeline_en.md new file mode 100644 index 00000000000000..ab91d907073976 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-clinico_xlm_roberta_large_finetuned_augmented1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English clinico_xlm_roberta_large_finetuned_augmented1_pipeline pipeline XlmRoBertaForTokenClassification from joheras +author: John Snow Labs +name: clinico_xlm_roberta_large_finetuned_augmented1_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clinico_xlm_roberta_large_finetuned_augmented1_pipeline` is a English model originally trained by joheras. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clinico_xlm_roberta_large_finetuned_augmented1_pipeline_en_5.4.0_3.0_1718116804876.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clinico_xlm_roberta_large_finetuned_augmented1_pipeline_en_5.4.0_3.0_1718116804876.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("clinico_xlm_roberta_large_finetuned_augmented1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("clinico_xlm_roberta_large_finetuned_augmented1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinico_xlm_roberta_large_finetuned_augmented1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|990.7 MB| + +## References + +https://huggingface.co/joheras/clinico-xlm-roberta-large-finetuned-augmented1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-clinico_xlm_roberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-clinico_xlm_roberta_pipeline_en.md new file mode 100644 index 00000000000000..6b368408600786 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-clinico_xlm_roberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English clinico_xlm_roberta_pipeline pipeline XlmRoBertaForTokenClassification from joheras +author: John Snow Labs +name: clinico_xlm_roberta_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clinico_xlm_roberta_pipeline` is a English model originally trained by joheras. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clinico_xlm_roberta_pipeline_en_5.4.0_3.0_1718129063045.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clinico_xlm_roberta_pipeline_en_5.4.0_3.0_1718129063045.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("clinico_xlm_roberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("clinico_xlm_roberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinico_xlm_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|809.2 MB| + +## References + +https://huggingface.co/joheras/clinico-xlm-roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-embed_bge_base_edu_en.md b/docs/_posts/ahmedlone127/2024-06-11-embed_bge_base_edu_en.md new file mode 100644 index 00000000000000..7a26233be7f622 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-embed_bge_base_edu_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English embed_bge_base_edu BGEEmbeddings from HelixAI +author: John Snow Labs +name: embed_bge_base_edu +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`embed_bge_base_edu` is a English model originally trained by HelixAI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/embed_bge_base_edu_en_5.4.0_3.0_1718064591261.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/embed_bge_base_edu_en_5.4.0_3.0_1718064591261.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("embed_bge_base_edu","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("embed_bge_base_edu","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|embed_bge_base_edu| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|384.4 MB| + +## References + +https://huggingface.co/HelixAI/embed_bge_base_edu \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-embed_bge_base_edu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-embed_bge_base_edu_pipeline_en.md new file mode 100644 index 00000000000000..28d954abd52b07 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-embed_bge_base_edu_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English embed_bge_base_edu_pipeline pipeline BGEEmbeddings from HelixAI +author: John Snow Labs +name: embed_bge_base_edu_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`embed_bge_base_edu_pipeline` is a English model originally trained by HelixAI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/embed_bge_base_edu_pipeline_en_5.4.0_3.0_1718064624942.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/embed_bge_base_edu_pipeline_en_5.4.0_3.0_1718064624942.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("embed_bge_base_edu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("embed_bge_base_edu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|embed_bge_base_edu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|384.4 MB| + +## References + +https://huggingface.co/HelixAI/embed_bge_base_edu + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-enlm_roberta_conll2003_final_stemmed_en.md b/docs/_posts/ahmedlone127/2024-06-11-enlm_roberta_conll2003_final_stemmed_en.md new file mode 100644 index 00000000000000..cec67577fae4b2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-enlm_roberta_conll2003_final_stemmed_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English enlm_roberta_conll2003_final_stemmed XlmRoBertaForTokenClassification from manirai91 +author: John Snow Labs +name: enlm_roberta_conll2003_final_stemmed +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`enlm_roberta_conll2003_final_stemmed` is a English model originally trained by manirai91. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/enlm_roberta_conll2003_final_stemmed_en_5.4.0_3.0_1718130791037.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/enlm_roberta_conll2003_final_stemmed_en_5.4.0_3.0_1718130791037.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("enlm_roberta_conll2003_final_stemmed","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("enlm_roberta_conll2003_final_stemmed", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|enlm_roberta_conll2003_final_stemmed| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|464.4 MB| + +## References + +https://huggingface.co/manirai91/enlm-roberta-conll2003-final-stemmed \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-enlm_roberta_conll2003_final_stemmed_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-enlm_roberta_conll2003_final_stemmed_pipeline_en.md new file mode 100644 index 00000000000000..1ab1ad8976349f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-enlm_roberta_conll2003_final_stemmed_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English enlm_roberta_conll2003_final_stemmed_pipeline pipeline XlmRoBertaForTokenClassification from manirai91 +author: John Snow Labs +name: enlm_roberta_conll2003_final_stemmed_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`enlm_roberta_conll2003_final_stemmed_pipeline` is a English model originally trained by manirai91. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/enlm_roberta_conll2003_final_stemmed_pipeline_en_5.4.0_3.0_1718130820497.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/enlm_roberta_conll2003_final_stemmed_pipeline_en_5.4.0_3.0_1718130820497.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("enlm_roberta_conll2003_final_stemmed_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("enlm_roberta_conll2003_final_stemmed_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|enlm_roberta_conll2003_final_stemmed_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.4 MB| + +## References + +https://huggingface.co/manirai91/enlm-roberta-conll2003-final-stemmed + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-finetune_bge_small_english_en.md b/docs/_posts/ahmedlone127/2024-06-11-finetune_bge_small_english_en.md new file mode 100644 index 00000000000000..a40343ecc6ea23 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-finetune_bge_small_english_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English finetune_bge_small_english BGEEmbeddings from srmishra +author: John Snow Labs +name: finetune_bge_small_english +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetune_bge_small_english` is a English model originally trained by srmishra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetune_bge_small_english_en_5.4.0_3.0_1718070674961.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetune_bge_small_english_en_5.4.0_3.0_1718070674961.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("finetune_bge_small_english","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("finetune_bge_small_english","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetune_bge_small_english| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|111.4 MB| + +## References + +https://huggingface.co/srmishra/finetune-bge-small-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-finetune_bge_small_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-finetune_bge_small_english_pipeline_en.md new file mode 100644 index 00000000000000..e50735f7c776a5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-finetune_bge_small_english_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English finetune_bge_small_english_pipeline pipeline BGEEmbeddings from srmishra +author: John Snow Labs +name: finetune_bge_small_english_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetune_bge_small_english_pipeline` is a English model originally trained by srmishra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetune_bge_small_english_pipeline_en_5.4.0_3.0_1718070687403.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetune_bge_small_english_pipeline_en_5.4.0_3.0_1718070687403.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetune_bge_small_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetune_bge_small_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetune_bge_small_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|111.4 MB| + +## References + +https://huggingface.co/srmishra/finetune-bge-small-en + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-finetune_bge_small_english_v2_en.md b/docs/_posts/ahmedlone127/2024-06-11-finetune_bge_small_english_v2_en.md new file mode 100644 index 00000000000000..2abfc6f2f68179 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-finetune_bge_small_english_v2_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English finetune_bge_small_english_v2 BGEEmbeddings from srmishra +author: John Snow Labs +name: finetune_bge_small_english_v2 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetune_bge_small_english_v2` is a English model originally trained by srmishra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetune_bge_small_english_v2_en_5.4.0_3.0_1718068366186.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetune_bge_small_english_v2_en_5.4.0_3.0_1718068366186.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("finetune_bge_small_english_v2","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("finetune_bge_small_english_v2","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetune_bge_small_english_v2| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|111.6 MB| + +## References + +https://huggingface.co/srmishra/finetune-bge-small-en-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-finetune_bge_small_english_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-finetune_bge_small_english_v2_pipeline_en.md new file mode 100644 index 00000000000000..c8bcfa07388df2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-finetune_bge_small_english_v2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English finetune_bge_small_english_v2_pipeline pipeline BGEEmbeddings from srmishra +author: John Snow Labs +name: finetune_bge_small_english_v2_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetune_bge_small_english_v2_pipeline` is a English model originally trained by srmishra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetune_bge_small_english_v2_pipeline_en_5.4.0_3.0_1718068378155.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetune_bge_small_english_v2_pipeline_en_5.4.0_3.0_1718068378155.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetune_bge_small_english_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetune_bge_small_english_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetune_bge_small_english_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|111.6 MB| + +## References + +https://huggingface.co/srmishra/finetune-bge-small-en-v2 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-finetuned_bge_embeddings_v2_en.md b/docs/_posts/ahmedlone127/2024-06-11-finetuned_bge_embeddings_v2_en.md new file mode 100644 index 00000000000000..3439602576ebea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-finetuned_bge_embeddings_v2_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English finetuned_bge_embeddings_v2 BGEEmbeddings from austinpatrickm +author: John Snow Labs +name: finetuned_bge_embeddings_v2 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_bge_embeddings_v2` is a English model originally trained by austinpatrickm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_bge_embeddings_v2_en_5.4.0_3.0_1718068767959.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_bge_embeddings_v2_en_5.4.0_3.0_1718068767959.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("finetuned_bge_embeddings_v2","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("finetuned_bge_embeddings_v2","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_bge_embeddings_v2| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.3 MB| + +## References + +https://huggingface.co/austinpatrickm/finetuned_bge_embeddings_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-finetuned_bge_embeddings_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-finetuned_bge_embeddings_v2_pipeline_en.md new file mode 100644 index 00000000000000..e05a358c20f686 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-finetuned_bge_embeddings_v2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English finetuned_bge_embeddings_v2_pipeline pipeline BGEEmbeddings from austinpatrickm +author: John Snow Labs +name: finetuned_bge_embeddings_v2_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_bge_embeddings_v2_pipeline` is a English model originally trained by austinpatrickm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_bge_embeddings_v2_pipeline_en_5.4.0_3.0_1718068801953.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_bge_embeddings_v2_pipeline_en_5.4.0_3.0_1718068801953.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_bge_embeddings_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_bge_embeddings_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_bge_embeddings_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.3 MB| + +## References + +https://huggingface.co/austinpatrickm/finetuned_bge_embeddings_v2 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-flipped_2e_4_hausa_en.md b/docs/_posts/ahmedlone127/2024-06-11-flipped_2e_4_hausa_en.md new file mode 100644 index 00000000000000..4eade3f65eb966 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-flipped_2e_4_hausa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English flipped_2e_4_hausa XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: flipped_2e_4_hausa +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`flipped_2e_4_hausa` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/flipped_2e_4_hausa_en_5.4.0_3.0_1718135046057.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/flipped_2e_4_hausa_en_5.4.0_3.0_1718135046057.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("flipped_2e_4_hausa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("flipped_2e_4_hausa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|flipped_2e_4_hausa| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/flipped_2e-4_hausa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-flipped_2e_4_hausa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-flipped_2e_4_hausa_pipeline_en.md new file mode 100644 index 00000000000000..469450664c9c63 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-flipped_2e_4_hausa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English flipped_2e_4_hausa_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: flipped_2e_4_hausa_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`flipped_2e_4_hausa_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/flipped_2e_4_hausa_pipeline_en_5.4.0_3.0_1718135123982.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/flipped_2e_4_hausa_pipeline_en_5.4.0_3.0_1718135123982.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("flipped_2e_4_hausa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("flipped_2e_4_hausa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|flipped_2e_4_hausa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/flipped_2e-4_hausa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_baai_bge_large_english_14000_en.md b/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_baai_bge_large_english_14000_en.md new file mode 100644 index 00000000000000..6838ad98a55e03 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_baai_bge_large_english_14000_en.md @@ -0,0 +1,66 @@ +--- +layout: model +title: English frpile_gpl_test_pipeline_baai_bge_large_english_14000 pipeline BGEEmbeddings from DragosGorduza +author: John Snow Labs +name: frpile_gpl_test_pipeline_baai_bge_large_english_14000 +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`frpile_gpl_test_pipeline_baai_bge_large_english_14000` is a English model originally trained by DragosGorduza. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/frpile_gpl_test_pipeline_baai_bge_large_english_14000_en_5.4.0_3.0_1718070029524.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/frpile_gpl_test_pipeline_baai_bge_large_english_14000_en_5.4.0_3.0_1718070029524.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("frpile_gpl_test_pipeline_baai_bge_large_english_14000", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("frpile_gpl_test_pipeline_baai_bge_large_english_14000", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|frpile_gpl_test_pipeline_baai_bge_large_english_14000| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/DragosGorduza/FRPile_GPL_test_pipeline_BAAI-bge-large-en_14000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_baai_bge_large_english_14000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_baai_bge_large_english_14000_pipeline_en.md new file mode 100644 index 00000000000000..db4921b31a6ec8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_baai_bge_large_english_14000_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English frpile_gpl_test_pipeline_baai_bge_large_english_14000_pipeline pipeline BGEEmbeddings from DragosGorduza +author: John Snow Labs +name: frpile_gpl_test_pipeline_baai_bge_large_english_14000_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`frpile_gpl_test_pipeline_baai_bge_large_english_14000_pipeline` is a English model originally trained by DragosGorduza. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/frpile_gpl_test_pipeline_baai_bge_large_english_14000_pipeline_en_5.4.0_3.0_1718070131741.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/frpile_gpl_test_pipeline_baai_bge_large_english_14000_pipeline_en_5.4.0_3.0_1718070131741.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("frpile_gpl_test_pipeline_baai_bge_large_english_14000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("frpile_gpl_test_pipeline_baai_bge_large_english_14000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|frpile_gpl_test_pipeline_baai_bge_large_english_14000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/DragosGorduza/FRPile_GPL_test_pipeline_BAAI-bge-large-en_14000 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_baai_bge_large_english_1400_en.md b/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_baai_bge_large_english_1400_en.md new file mode 100644 index 00000000000000..9f460e5389976c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_baai_bge_large_english_1400_en.md @@ -0,0 +1,66 @@ +--- +layout: model +title: English frpile_gpl_test_pipeline_baai_bge_large_english_1400 pipeline BGEEmbeddings from DragosGorduza +author: John Snow Labs +name: frpile_gpl_test_pipeline_baai_bge_large_english_1400 +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`frpile_gpl_test_pipeline_baai_bge_large_english_1400` is a English model originally trained by DragosGorduza. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/frpile_gpl_test_pipeline_baai_bge_large_english_1400_en_5.4.0_3.0_1718070029081.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/frpile_gpl_test_pipeline_baai_bge_large_english_1400_en_5.4.0_3.0_1718070029081.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("frpile_gpl_test_pipeline_baai_bge_large_english_1400", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("frpile_gpl_test_pipeline_baai_bge_large_english_1400", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|frpile_gpl_test_pipeline_baai_bge_large_english_1400| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/DragosGorduza/FRPile_GPL_test_pipeline_BAAI-bge-large-en_1400 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_baai_bge_large_english_1400_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_baai_bge_large_english_1400_pipeline_en.md new file mode 100644 index 00000000000000..8c4da8abd50941 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_baai_bge_large_english_1400_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English frpile_gpl_test_pipeline_baai_bge_large_english_1400_pipeline pipeline BGEEmbeddings from DragosGorduza +author: John Snow Labs +name: frpile_gpl_test_pipeline_baai_bge_large_english_1400_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`frpile_gpl_test_pipeline_baai_bge_large_english_1400_pipeline` is a English model originally trained by DragosGorduza. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/frpile_gpl_test_pipeline_baai_bge_large_english_1400_pipeline_en_5.4.0_3.0_1718070130971.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/frpile_gpl_test_pipeline_baai_bge_large_english_1400_pipeline_en_5.4.0_3.0_1718070130971.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("frpile_gpl_test_pipeline_baai_bge_large_english_1400_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("frpile_gpl_test_pipeline_baai_bge_large_english_1400_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|frpile_gpl_test_pipeline_baai_bge_large_english_1400_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/DragosGorduza/FRPile_GPL_test_pipeline_BAAI-bge-large-en_1400 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_bge_large_english_1400_en.md b/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_bge_large_english_1400_en.md new file mode 100644 index 00000000000000..a98067398edf6a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_bge_large_english_1400_en.md @@ -0,0 +1,66 @@ +--- +layout: model +title: English frpile_gpl_test_pipeline_bge_large_english_1400 pipeline BGEEmbeddings from DragosGorduza +author: John Snow Labs +name: frpile_gpl_test_pipeline_bge_large_english_1400 +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`frpile_gpl_test_pipeline_bge_large_english_1400` is a English model originally trained by DragosGorduza. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/frpile_gpl_test_pipeline_bge_large_english_1400_en_5.4.0_3.0_1718064838241.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/frpile_gpl_test_pipeline_bge_large_english_1400_en_5.4.0_3.0_1718064838241.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("frpile_gpl_test_pipeline_bge_large_english_1400", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("frpile_gpl_test_pipeline_bge_large_english_1400", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|frpile_gpl_test_pipeline_bge_large_english_1400| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/DragosGorduza/FRPile_GPL_test_pipeline_bge-large-en_1400 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_bge_large_english_1400_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_bge_large_english_1400_pipeline_en.md new file mode 100644 index 00000000000000..181a06f1c99df7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_bge_large_english_1400_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English frpile_gpl_test_pipeline_bge_large_english_1400_pipeline pipeline BGEEmbeddings from DragosGorduza +author: John Snow Labs +name: frpile_gpl_test_pipeline_bge_large_english_1400_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`frpile_gpl_test_pipeline_bge_large_english_1400_pipeline` is a English model originally trained by DragosGorduza. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/frpile_gpl_test_pipeline_bge_large_english_1400_pipeline_en_5.4.0_3.0_1718064864537.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/frpile_gpl_test_pipeline_bge_large_english_1400_pipeline_en_5.4.0_3.0_1718064864537.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("frpile_gpl_test_pipeline_bge_large_english_1400_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("frpile_gpl_test_pipeline_bge_large_english_1400_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|frpile_gpl_test_pipeline_bge_large_english_1400_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/DragosGorduza/FRPile_GPL_test_pipeline_bge-large-en_1400 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-gal_ensp_xlm_r_gl.md b/docs/_posts/ahmedlone127/2024-06-11-gal_ensp_xlm_r_gl.md new file mode 100644 index 00000000000000..13c83ee82a7c63 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-gal_ensp_xlm_r_gl.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Galician gal_ensp_xlm_r XlmRoBertaForTokenClassification from mbruton +author: John Snow Labs +name: gal_ensp_xlm_r +date: 2024-06-11 +tags: [gl, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: gl +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gal_ensp_xlm_r` is a Galician model originally trained by mbruton. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gal_ensp_xlm_r_gl_5.4.0_3.0_1718137952557.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gal_ensp_xlm_r_gl_5.4.0_3.0_1718137952557.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("gal_ensp_xlm_r","gl") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("gal_ensp_xlm_r", "gl") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gal_ensp_xlm_r| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|gl| +|Size:|875.6 MB| + +## References + +https://huggingface.co/mbruton/gal_ensp_XLM-R \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-gal_ensp_xlm_r_pipeline_gl.md b/docs/_posts/ahmedlone127/2024-06-11-gal_ensp_xlm_r_pipeline_gl.md new file mode 100644 index 00000000000000..71b08a0753314d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-gal_ensp_xlm_r_pipeline_gl.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Galician gal_ensp_xlm_r_pipeline pipeline XlmRoBertaForTokenClassification from mbruton +author: John Snow Labs +name: gal_ensp_xlm_r_pipeline +date: 2024-06-11 +tags: [gl, open_source, pipeline, onnx] +task: Named Entity Recognition +language: gl +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gal_ensp_xlm_r_pipeline` is a Galician model originally trained by mbruton. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gal_ensp_xlm_r_pipeline_gl_5.4.0_3.0_1718138026562.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gal_ensp_xlm_r_pipeline_gl_5.4.0_3.0_1718138026562.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("gal_ensp_xlm_r_pipeline", lang = "gl") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("gal_ensp_xlm_r_pipeline", lang = "gl") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gal_ensp_xlm_r_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|gl| +|Size:|875.6 MB| + +## References + +https://huggingface.co/mbruton/gal_ensp_XLM-R + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-gal_sayula_popoluca_iw_catalan_galician_en.md b/docs/_posts/ahmedlone127/2024-06-11-gal_sayula_popoluca_iw_catalan_galician_en.md new file mode 100644 index 00000000000000..f3a33b1d5ed94b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-gal_sayula_popoluca_iw_catalan_galician_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English gal_sayula_popoluca_iw_catalan_galician XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: gal_sayula_popoluca_iw_catalan_galician +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gal_sayula_popoluca_iw_catalan_galician` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gal_sayula_popoluca_iw_catalan_galician_en_5.4.0_3.0_1718135375655.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gal_sayula_popoluca_iw_catalan_galician_en_5.4.0_3.0_1718135375655.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("gal_sayula_popoluca_iw_catalan_galician","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("gal_sayula_popoluca_iw_catalan_galician", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gal_sayula_popoluca_iw_catalan_galician| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|424.0 MB| + +## References + +https://huggingface.co/homersimpson/gal-pos-iw-ca-gl \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-gal_sayula_popoluca_iw_catalan_galician_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-gal_sayula_popoluca_iw_catalan_galician_pipeline_en.md new file mode 100644 index 00000000000000..ce62633ad8bacd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-gal_sayula_popoluca_iw_catalan_galician_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English gal_sayula_popoluca_iw_catalan_galician_pipeline pipeline XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: gal_sayula_popoluca_iw_catalan_galician_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gal_sayula_popoluca_iw_catalan_galician_pipeline` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gal_sayula_popoluca_iw_catalan_galician_pipeline_en_5.4.0_3.0_1718135417434.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gal_sayula_popoluca_iw_catalan_galician_pipeline_en_5.4.0_3.0_1718135417434.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("gal_sayula_popoluca_iw_catalan_galician_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("gal_sayula_popoluca_iw_catalan_galician_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gal_sayula_popoluca_iw_catalan_galician_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|424.0 MB| + +## References + +https://huggingface.co/homersimpson/gal-pos-iw-ca-gl + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-gal_xlm_r_gl.md b/docs/_posts/ahmedlone127/2024-06-11-gal_xlm_r_gl.md new file mode 100644 index 00000000000000..669f074a9a94fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-gal_xlm_r_gl.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Galician gal_xlm_r XlmRoBertaForTokenClassification from mbruton +author: John Snow Labs +name: gal_xlm_r +date: 2024-06-11 +tags: [gl, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: gl +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gal_xlm_r` is a Galician model originally trained by mbruton. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gal_xlm_r_gl_5.4.0_3.0_1718129399804.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gal_xlm_r_gl_5.4.0_3.0_1718129399804.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("gal_xlm_r","gl") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("gal_xlm_r", "gl") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gal_xlm_r| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|gl| +|Size:|811.1 MB| + +## References + +https://huggingface.co/mbruton/gal_XLM-R \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-gal_xlm_r_pipeline_gl.md b/docs/_posts/ahmedlone127/2024-06-11-gal_xlm_r_pipeline_gl.md new file mode 100644 index 00000000000000..2d851f15540bdc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-gal_xlm_r_pipeline_gl.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Galician gal_xlm_r_pipeline pipeline XlmRoBertaForTokenClassification from mbruton +author: John Snow Labs +name: gal_xlm_r_pipeline +date: 2024-06-11 +tags: [gl, open_source, pipeline, onnx] +task: Named Entity Recognition +language: gl +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gal_xlm_r_pipeline` is a Galician model originally trained by mbruton. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gal_xlm_r_pipeline_gl_5.4.0_3.0_1718129547679.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gal_xlm_r_pipeline_gl_5.4.0_3.0_1718129547679.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("gal_xlm_r_pipeline", lang = "gl") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("gal_xlm_r_pipeline", lang = "gl") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gal_xlm_r_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|gl| +|Size:|811.1 MB| + +## References + +https://huggingface.co/mbruton/gal_XLM-R + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-innox_roberta_xlm_en.md b/docs/_posts/ahmedlone127/2024-06-11-innox_roberta_xlm_en.md new file mode 100644 index 00000000000000..5223fe9b18eee2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-innox_roberta_xlm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English innox_roberta_xlm XlmRoBertaForTokenClassification from brao +author: John Snow Labs +name: innox_roberta_xlm +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`innox_roberta_xlm` is a English model originally trained by brao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/innox_roberta_xlm_en_5.4.0_3.0_1718113890181.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/innox_roberta_xlm_en_5.4.0_3.0_1718113890181.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("innox_roberta_xlm","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("innox_roberta_xlm", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|innox_roberta_xlm| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|844.8 MB| + +## References + +https://huggingface.co/brao/innox-roberta-xlm \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-innox_roberta_xlm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-innox_roberta_xlm_pipeline_en.md new file mode 100644 index 00000000000000..569c837e768789 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-innox_roberta_xlm_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English innox_roberta_xlm_pipeline pipeline XlmRoBertaForTokenClassification from brao +author: John Snow Labs +name: innox_roberta_xlm_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`innox_roberta_xlm_pipeline` is a English model originally trained by brao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/innox_roberta_xlm_pipeline_en_5.4.0_3.0_1718113971358.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/innox_roberta_xlm_pipeline_en_5.4.0_3.0_1718113971358.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("innox_roberta_xlm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("innox_roberta_xlm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|innox_roberta_xlm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|844.9 MB| + +## References + +https://huggingface.co/brao/innox-roberta-xlm + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-khmer_sayula_popoluca_roberta_km.md b/docs/_posts/ahmedlone127/2024-06-11-khmer_sayula_popoluca_roberta_km.md new file mode 100644 index 00000000000000..c829ec7de4738b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-khmer_sayula_popoluca_roberta_km.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Central Khmer, Khmer khmer_sayula_popoluca_roberta XlmRoBertaForTokenClassification from seanghay +author: John Snow Labs +name: khmer_sayula_popoluca_roberta +date: 2024-06-11 +tags: [km, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: km +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`khmer_sayula_popoluca_roberta` is a Central Khmer, Khmer model originally trained by seanghay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/khmer_sayula_popoluca_roberta_km_5.4.0_3.0_1718101584580.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/khmer_sayula_popoluca_roberta_km_5.4.0_3.0_1718101584580.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("khmer_sayula_popoluca_roberta","km") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("khmer_sayula_popoluca_roberta", "km") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|khmer_sayula_popoluca_roberta| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|km| +|Size:|834.1 MB| + +## References + +https://huggingface.co/seanghay/khmer-pos-roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-khmer_sayula_popoluca_roberta_pipeline_km.md b/docs/_posts/ahmedlone127/2024-06-11-khmer_sayula_popoluca_roberta_pipeline_km.md new file mode 100644 index 00000000000000..43068fb297c137 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-khmer_sayula_popoluca_roberta_pipeline_km.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Central Khmer, Khmer khmer_sayula_popoluca_roberta_pipeline pipeline XlmRoBertaForTokenClassification from seanghay +author: John Snow Labs +name: khmer_sayula_popoluca_roberta_pipeline +date: 2024-06-11 +tags: [km, open_source, pipeline, onnx] +task: Named Entity Recognition +language: km +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`khmer_sayula_popoluca_roberta_pipeline` is a Central Khmer, Khmer model originally trained by seanghay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/khmer_sayula_popoluca_roberta_pipeline_km_5.4.0_3.0_1718101675243.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/khmer_sayula_popoluca_roberta_pipeline_km_5.4.0_3.0_1718101675243.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("khmer_sayula_popoluca_roberta_pipeline", lang = "km") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("khmer_sayula_popoluca_roberta_pipeline", lang = "km") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|khmer_sayula_popoluca_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|km| +|Size:|834.1 MB| + +## References + +https://huggingface.co/seanghay/khmer-pos-roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-large_finetuned_frombge_en.md b/docs/_posts/ahmedlone127/2024-06-11-large_finetuned_frombge_en.md new file mode 100644 index 00000000000000..2fd99bf5101158 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-large_finetuned_frombge_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English large_finetuned_frombge BGEEmbeddings from joshus +author: John Snow Labs +name: large_finetuned_frombge +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`large_finetuned_frombge` is a English model originally trained by joshus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/large_finetuned_frombge_en_5.4.0_3.0_1718065476775.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/large_finetuned_frombge_en_5.4.0_3.0_1718065476775.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("large_finetuned_frombge","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("large_finetuned_frombge","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|large_finetuned_frombge| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/joshus/large-finetuned-frombge \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-large_finetuned_frombge_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-large_finetuned_frombge_pipeline_en.md new file mode 100644 index 00000000000000..49411c809d1c89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-large_finetuned_frombge_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English large_finetuned_frombge_pipeline pipeline BGEEmbeddings from joshus +author: John Snow Labs +name: large_finetuned_frombge_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`large_finetuned_frombge_pipeline` is a English model originally trained by joshus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/large_finetuned_frombge_pipeline_en_5.4.0_3.0_1718065569338.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/large_finetuned_frombge_pipeline_en_5.4.0_3.0_1718065569338.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("large_finetuned_frombge_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("large_finetuned_frombge_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|large_finetuned_frombge_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/joshus/large-finetuned-frombge + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-mongolian_davlan_xlm_roberta_base_ner_hrl_mn.md b/docs/_posts/ahmedlone127/2024-06-11-mongolian_davlan_xlm_roberta_base_ner_hrl_mn.md new file mode 100644 index 00000000000000..cbf49865cad2cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-mongolian_davlan_xlm_roberta_base_ner_hrl_mn.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Mongolian mongolian_davlan_xlm_roberta_base_ner_hrl XlmRoBertaForTokenClassification from Blgn94 +author: John Snow Labs +name: mongolian_davlan_xlm_roberta_base_ner_hrl +date: 2024-06-11 +tags: [mn, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: mn +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mongolian_davlan_xlm_roberta_base_ner_hrl` is a Mongolian model originally trained by Blgn94. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mongolian_davlan_xlm_roberta_base_ner_hrl_mn_5.4.0_3.0_1718117634412.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mongolian_davlan_xlm_roberta_base_ner_hrl_mn_5.4.0_3.0_1718117634412.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("mongolian_davlan_xlm_roberta_base_ner_hrl","mn") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("mongolian_davlan_xlm_roberta_base_ner_hrl", "mn") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mongolian_davlan_xlm_roberta_base_ner_hrl| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|mn| +|Size:|911.8 MB| + +## References + +https://huggingface.co/Blgn94/mongolian-davlan-xlm-roberta-base-ner-hrl \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-mongolian_davlan_xlm_roberta_base_ner_hrl_pipeline_mn.md b/docs/_posts/ahmedlone127/2024-06-11-mongolian_davlan_xlm_roberta_base_ner_hrl_pipeline_mn.md new file mode 100644 index 00000000000000..a827050beccd15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-mongolian_davlan_xlm_roberta_base_ner_hrl_pipeline_mn.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Mongolian mongolian_davlan_xlm_roberta_base_ner_hrl_pipeline pipeline XlmRoBertaForTokenClassification from Blgn94 +author: John Snow Labs +name: mongolian_davlan_xlm_roberta_base_ner_hrl_pipeline +date: 2024-06-11 +tags: [mn, open_source, pipeline, onnx] +task: Named Entity Recognition +language: mn +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mongolian_davlan_xlm_roberta_base_ner_hrl_pipeline` is a Mongolian model originally trained by Blgn94. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mongolian_davlan_xlm_roberta_base_ner_hrl_pipeline_mn_5.4.0_3.0_1718117724257.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mongolian_davlan_xlm_roberta_base_ner_hrl_pipeline_mn_5.4.0_3.0_1718117724257.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mongolian_davlan_xlm_roberta_base_ner_hrl_pipeline", lang = "mn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mongolian_davlan_xlm_roberta_base_ner_hrl_pipeline", lang = "mn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mongolian_davlan_xlm_roberta_base_ner_hrl_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|mn| +|Size:|911.8 MB| + +## References + +https://huggingface.co/Blgn94/mongolian-davlan-xlm-roberta-base-ner-hrl + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-monolingual_hindi_ner_model_hi.md b/docs/_posts/ahmedlone127/2024-06-11-monolingual_hindi_ner_model_hi.md new file mode 100644 index 00000000000000..d612a61d9ba4d9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-monolingual_hindi_ner_model_hi.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Hindi monolingual_hindi_ner_model XlmRoBertaForTokenClassification from Sankalp-Bahad +author: John Snow Labs +name: monolingual_hindi_ner_model +date: 2024-06-11 +tags: [hi, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: hi +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`monolingual_hindi_ner_model` is a Hindi model originally trained by Sankalp-Bahad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/monolingual_hindi_ner_model_hi_5.4.0_3.0_1718097802374.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/monolingual_hindi_ner_model_hi_5.4.0_3.0_1718097802374.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("monolingual_hindi_ner_model","hi") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("monolingual_hindi_ner_model", "hi") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|monolingual_hindi_ner_model| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|hi| +|Size:|777.8 MB| + +## References + +https://huggingface.co/Sankalp-Bahad/Monolingual-Hindi-NER-Model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-monolingual_hindi_ner_model_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-06-11-monolingual_hindi_ner_model_pipeline_hi.md new file mode 100644 index 00000000000000..16aa982adfba3f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-monolingual_hindi_ner_model_pipeline_hi.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Hindi monolingual_hindi_ner_model_pipeline pipeline XlmRoBertaForTokenClassification from Sankalp-Bahad +author: John Snow Labs +name: monolingual_hindi_ner_model_pipeline +date: 2024-06-11 +tags: [hi, open_source, pipeline, onnx] +task: Named Entity Recognition +language: hi +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`monolingual_hindi_ner_model_pipeline` is a Hindi model originally trained by Sankalp-Bahad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/monolingual_hindi_ner_model_pipeline_hi_5.4.0_3.0_1718097981576.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/monolingual_hindi_ner_model_pipeline_hi_5.4.0_3.0_1718097981576.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("monolingual_hindi_ner_model_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("monolingual_hindi_ner_model_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|monolingual_hindi_ner_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|777.8 MB| + +## References + +https://huggingface.co/Sankalp-Bahad/Monolingual-Hindi-NER-Model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-mps_invoice_product_baai_bge_large_english_v1_5_en.md b/docs/_posts/ahmedlone127/2024-06-11-mps_invoice_product_baai_bge_large_english_v1_5_en.md new file mode 100644 index 00000000000000..161e51533e8c5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-mps_invoice_product_baai_bge_large_english_v1_5_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English mps_invoice_product_baai_bge_large_english_v1_5 BGEEmbeddings from vincentpremise +author: John Snow Labs +name: mps_invoice_product_baai_bge_large_english_v1_5 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mps_invoice_product_baai_bge_large_english_v1_5` is a English model originally trained by vincentpremise. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mps_invoice_product_baai_bge_large_english_v1_5_en_5.4.0_3.0_1718066567656.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mps_invoice_product_baai_bge_large_english_v1_5_en_5.4.0_3.0_1718066567656.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("mps_invoice_product_baai_bge_large_english_v1_5","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("mps_invoice_product_baai_bge_large_english_v1_5","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mps_invoice_product_baai_bge_large_english_v1_5| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/vincentpremise/mps-invoice-product-BAAI_bge_large_en_v1.5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-mps_invoice_product_baai_bge_large_english_v1_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-mps_invoice_product_baai_bge_large_english_v1_5_pipeline_en.md new file mode 100644 index 00000000000000..6b8fb98cdec4f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-mps_invoice_product_baai_bge_large_english_v1_5_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English mps_invoice_product_baai_bge_large_english_v1_5_pipeline pipeline BGEEmbeddings from vincentpremise +author: John Snow Labs +name: mps_invoice_product_baai_bge_large_english_v1_5_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mps_invoice_product_baai_bge_large_english_v1_5_pipeline` is a English model originally trained by vincentpremise. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mps_invoice_product_baai_bge_large_english_v1_5_pipeline_en_5.4.0_3.0_1718066659282.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mps_invoice_product_baai_bge_large_english_v1_5_pipeline_en_5.4.0_3.0_1718066659282.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mps_invoice_product_baai_bge_large_english_v1_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mps_invoice_product_baai_bge_large_english_v1_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mps_invoice_product_baai_bge_large_english_v1_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/vincentpremise/mps-invoice-product-BAAI_bge_large_en_v1.5 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-mps_invoice_product_baai_bge_small_english_v1_5v2_en.md b/docs/_posts/ahmedlone127/2024-06-11-mps_invoice_product_baai_bge_small_english_v1_5v2_en.md new file mode 100644 index 00000000000000..8eb54a118813ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-mps_invoice_product_baai_bge_small_english_v1_5v2_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English mps_invoice_product_baai_bge_small_english_v1_5v2 BGEEmbeddings from vincentpremise +author: John Snow Labs +name: mps_invoice_product_baai_bge_small_english_v1_5v2 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mps_invoice_product_baai_bge_small_english_v1_5v2` is a English model originally trained by vincentpremise. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mps_invoice_product_baai_bge_small_english_v1_5v2_en_5.4.0_3.0_1718068196639.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mps_invoice_product_baai_bge_small_english_v1_5v2_en_5.4.0_3.0_1718068196639.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("mps_invoice_product_baai_bge_small_english_v1_5v2","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("mps_invoice_product_baai_bge_small_english_v1_5v2","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mps_invoice_product_baai_bge_small_english_v1_5v2| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|110.8 MB| + +## References + +https://huggingface.co/vincentpremise/mps-invoice-product-BAAI_bge_small_en_v1.5v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-mps_invoice_product_baai_bge_small_english_v1_5v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-mps_invoice_product_baai_bge_small_english_v1_5v2_pipeline_en.md new file mode 100644 index 00000000000000..64d6cc513ea316 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-mps_invoice_product_baai_bge_small_english_v1_5v2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English mps_invoice_product_baai_bge_small_english_v1_5v2_pipeline pipeline BGEEmbeddings from vincentpremise +author: John Snow Labs +name: mps_invoice_product_baai_bge_small_english_v1_5v2_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mps_invoice_product_baai_bge_small_english_v1_5v2_pipeline` is a English model originally trained by vincentpremise. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mps_invoice_product_baai_bge_small_english_v1_5v2_pipeline_en_5.4.0_3.0_1718068209256.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mps_invoice_product_baai_bge_small_english_v1_5v2_pipeline_en_5.4.0_3.0_1718068209256.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mps_invoice_product_baai_bge_small_english_v1_5v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mps_invoice_product_baai_bge_small_english_v1_5v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mps_invoice_product_baai_bge_small_english_v1_5v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|110.8 MB| + +## References + +https://huggingface.co/vincentpremise/mps-invoice-product-BAAI_bge_small_en_v1.5v2 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-ner_indian_xlm_roberta_en.md b/docs/_posts/ahmedlone127/2024-06-11-ner_indian_xlm_roberta_en.md new file mode 100644 index 00000000000000..e274706d317b27 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-ner_indian_xlm_roberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ner_indian_xlm_roberta XlmRoBertaForTokenClassification from Venkatesh4342 +author: John Snow Labs +name: ner_indian_xlm_roberta +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_indian_xlm_roberta` is a English model originally trained by Venkatesh4342. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_indian_xlm_roberta_en_5.4.0_3.0_1718097570966.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_indian_xlm_roberta_en_5.4.0_3.0_1718097570966.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("ner_indian_xlm_roberta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("ner_indian_xlm_roberta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_indian_xlm_roberta| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|882.7 MB| + +## References + +https://huggingface.co/Venkatesh4342/NER-Indian-xlm-roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-ner_indian_xlm_roberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-ner_indian_xlm_roberta_pipeline_en.md new file mode 100644 index 00000000000000..d008e302c3fc22 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-ner_indian_xlm_roberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ner_indian_xlm_roberta_pipeline pipeline XlmRoBertaForTokenClassification from Venkatesh4342 +author: John Snow Labs +name: ner_indian_xlm_roberta_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_indian_xlm_roberta_pipeline` is a English model originally trained by Venkatesh4342. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_indian_xlm_roberta_pipeline_en_5.4.0_3.0_1718097649711.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_indian_xlm_roberta_pipeline_en_5.4.0_3.0_1718097649711.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ner_indian_xlm_roberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ner_indian_xlm_roberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_indian_xlm_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|882.7 MB| + +## References + +https://huggingface.co/Venkatesh4342/NER-Indian-xlm-roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-nicher_embedder_bge_en.md b/docs/_posts/ahmedlone127/2024-06-11-nicher_embedder_bge_en.md new file mode 100644 index 00000000000000..b502b81f00abef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-nicher_embedder_bge_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English nicher_embedder_bge BGEEmbeddings from nicher92 +author: John Snow Labs +name: nicher_embedder_bge +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nicher_embedder_bge` is a English model originally trained by nicher92. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nicher_embedder_bge_en_5.4.0_3.0_1718070249595.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nicher_embedder_bge_en_5.4.0_3.0_1718070249595.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("nicher_embedder_bge","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("nicher_embedder_bge","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nicher_embedder_bge| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.4 GB| + +## References + +https://huggingface.co/nicher92/nicher-embedder-bge \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-nicher_embedder_bge_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-nicher_embedder_bge_pipeline_en.md new file mode 100644 index 00000000000000..31d0df35424c94 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-nicher_embedder_bge_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English nicher_embedder_bge_pipeline pipeline BGEEmbeddings from nicher92 +author: John Snow Labs +name: nicher_embedder_bge_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nicher_embedder_bge_pipeline` is a English model originally trained by nicher92. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nicher_embedder_bge_pipeline_en_5.4.0_3.0_1718070334483.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nicher_embedder_bge_pipeline_en_5.4.0_3.0_1718070334483.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nicher_embedder_bge_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nicher_embedder_bge_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nicher_embedder_bge_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.4 GB| + +## References + +https://huggingface.co/nicher92/nicher-embedder-bge + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-norwegian_delete_5e_5_hausa_en.md b/docs/_posts/ahmedlone127/2024-06-11-norwegian_delete_5e_5_hausa_en.md new file mode 100644 index 00000000000000..8885a768eb247b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-norwegian_delete_5e_5_hausa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English norwegian_delete_5e_5_hausa XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: norwegian_delete_5e_5_hausa +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`norwegian_delete_5e_5_hausa` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/norwegian_delete_5e_5_hausa_en_5.4.0_3.0_1718134235499.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/norwegian_delete_5e_5_hausa_en_5.4.0_3.0_1718134235499.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("norwegian_delete_5e_5_hausa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("norwegian_delete_5e_5_hausa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|norwegian_delete_5e_5_hausa| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/no-delete_5e-5_hausa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-norwegian_delete_5e_5_hausa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-norwegian_delete_5e_5_hausa_pipeline_en.md new file mode 100644 index 00000000000000..1eeab0e59f33ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-norwegian_delete_5e_5_hausa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English norwegian_delete_5e_5_hausa_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: norwegian_delete_5e_5_hausa_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`norwegian_delete_5e_5_hausa_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/norwegian_delete_5e_5_hausa_pipeline_en_5.4.0_3.0_1718134300332.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/norwegian_delete_5e_5_hausa_pipeline_en_5.4.0_3.0_1718134300332.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("norwegian_delete_5e_5_hausa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("norwegian_delete_5e_5_hausa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|norwegian_delete_5e_5_hausa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/no-delete_5e-5_hausa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-philai_bge_2et_f5_en.md b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_2et_f5_en.md new file mode 100644 index 00000000000000..79fca25a6b9919 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_2et_f5_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English philai_bge_2et_f5 BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_bge_2et_f5 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_bge_2et_f5` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_bge_2et_f5_en_5.4.0_3.0_1718064224905.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_bge_2et_f5_en_5.4.0_3.0_1718064224905.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("philai_bge_2et_f5","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("philai_bge_2et_f5","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_bge_2et_f5| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-bge-2et-f5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-philai_bge_2et_f5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_2et_f5_pipeline_en.md new file mode 100644 index 00000000000000..6a904d3cab16d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_2et_f5_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English philai_bge_2et_f5_pipeline pipeline BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_bge_2et_f5_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_bge_2et_f5_pipeline` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_bge_2et_f5_pipeline_en_5.4.0_3.0_1718064302729.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_bge_2et_f5_pipeline_en_5.4.0_3.0_1718064302729.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("philai_bge_2et_f5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("philai_bge_2et_f5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_bge_2et_f5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-bge-2et-f5 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-philai_bge_2et_f8_en.md b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_2et_f8_en.md new file mode 100644 index 00000000000000..c7ecb3852fc74e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_2et_f8_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English philai_bge_2et_f8 BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_bge_2et_f8 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_bge_2et_f8` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_bge_2et_f8_en_5.4.0_3.0_1718070917192.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_bge_2et_f8_en_5.4.0_3.0_1718070917192.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("philai_bge_2et_f8","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("philai_bge_2et_f8","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_bge_2et_f8| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-bge-2et-f8 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-philai_bge_2et_f8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_2et_f8_pipeline_en.md new file mode 100644 index 00000000000000..9f17c2c909b6bd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_2et_f8_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English philai_bge_2et_f8_pipeline pipeline BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_bge_2et_f8_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_bge_2et_f8_pipeline` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_bge_2et_f8_pipeline_en_5.4.0_3.0_1718070995255.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_bge_2et_f8_pipeline_en_5.4.0_3.0_1718070995255.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("philai_bge_2et_f8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("philai_bge_2et_f8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_bge_2et_f8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-bge-2et-f8 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-philai_bge_2et_f_again_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_2et_f_again_pipeline_en.md new file mode 100644 index 00000000000000..95b9a505d36c86 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_2et_f_again_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English philai_bge_2et_f_again_pipeline pipeline BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_bge_2et_f_again_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_bge_2et_f_again_pipeline` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_bge_2et_f_again_pipeline_en_5.4.0_3.0_1718068000718.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_bge_2et_f_again_pipeline_en_5.4.0_3.0_1718068000718.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("philai_bge_2et_f_again_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("philai_bge_2et_f_again_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_bge_2et_f_again_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-bge-2et-f-again + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-philai_bge_6e_10f_fp16_en.md b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_6e_10f_fp16_en.md new file mode 100644 index 00000000000000..614bb9a833c3c3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_6e_10f_fp16_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English philai_bge_6e_10f_fp16 BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_bge_6e_10f_fp16 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_bge_6e_10f_fp16` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_bge_6e_10f_fp16_en_5.4.0_3.0_1718064279193.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_bge_6e_10f_fp16_en_5.4.0_3.0_1718064279193.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("philai_bge_6e_10f_fp16","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("philai_bge_6e_10f_fp16","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_bge_6e_10f_fp16| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-bge-6e-10f-fp16 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-philai_bge_6e_10f_fp16_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_6e_10f_fp16_pipeline_en.md new file mode 100644 index 00000000000000..21982c70f6013f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_6e_10f_fp16_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English philai_bge_6e_10f_fp16_pipeline pipeline BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_bge_6e_10f_fp16_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_bge_6e_10f_fp16_pipeline` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_bge_6e_10f_fp16_pipeline_en_5.4.0_3.0_1718064357206.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_bge_6e_10f_fp16_pipeline_en_5.4.0_3.0_1718064357206.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("philai_bge_6e_10f_fp16_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("philai_bge_6e_10f_fp16_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_bge_6e_10f_fp16_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-bge-6e-10f-fp16 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-philai_bge_6e_f10_en.md b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_6e_f10_en.md new file mode 100644 index 00000000000000..0f5735b2171fdb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_6e_f10_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English philai_bge_6e_f10 BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_bge_6e_f10 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_bge_6e_f10` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_bge_6e_f10_en_5.4.0_3.0_1718069903534.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_bge_6e_f10_en_5.4.0_3.0_1718069903534.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("philai_bge_6e_f10","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("philai_bge_6e_f10","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_bge_6e_f10| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-bge-6e-f10 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-philai_bge_6e_f10_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_6e_f10_pipeline_en.md new file mode 100644 index 00000000000000..9aca0e506bcdf2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_6e_f10_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English philai_bge_6e_f10_pipeline pipeline BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_bge_6e_f10_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_bge_6e_f10_pipeline` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_bge_6e_f10_pipeline_en_5.4.0_3.0_1718069982119.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_bge_6e_f10_pipeline_en_5.4.0_3.0_1718069982119.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("philai_bge_6e_f10_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("philai_bge_6e_f10_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_bge_6e_f10_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-bge-6e-f10 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-philai_embed_bge_test_en.md b/docs/_posts/ahmedlone127/2024-06-11-philai_embed_bge_test_en.md new file mode 100644 index 00000000000000..7a9a63f843ee8f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-philai_embed_bge_test_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English philai_embed_bge_test BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_embed_bge_test +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_embed_bge_test` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_embed_bge_test_en_5.4.0_3.0_1718070083201.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_embed_bge_test_en_5.4.0_3.0_1718070083201.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("philai_embed_bge_test","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("philai_embed_bge_test","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_embed_bge_test| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/dbourget/philai-embed-bge-test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-philai_embed_bge_test_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-philai_embed_bge_test_pipeline_en.md new file mode 100644 index 00000000000000..4195deefe9d758 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-philai_embed_bge_test_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English philai_embed_bge_test_pipeline pipeline BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_embed_bge_test_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_embed_bge_test_pipeline` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_embed_bge_test_pipeline_en_5.4.0_3.0_1718070162532.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_embed_bge_test_pipeline_en_5.4.0_3.0_1718070162532.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("philai_embed_bge_test_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("philai_embed_bge_test_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_embed_bge_test_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/dbourget/philai-embed-bge-test + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-philai_tsdae_6e_bge_ft_5e_en.md b/docs/_posts/ahmedlone127/2024-06-11-philai_tsdae_6e_bge_ft_5e_en.md new file mode 100644 index 00000000000000..fb82e4d9976556 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-philai_tsdae_6e_bge_ft_5e_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English philai_tsdae_6e_bge_ft_5e BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_tsdae_6e_bge_ft_5e +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_tsdae_6e_bge_ft_5e` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_tsdae_6e_bge_ft_5e_en_5.4.0_3.0_1718071269830.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_tsdae_6e_bge_ft_5e_en_5.4.0_3.0_1718071269830.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("philai_tsdae_6e_bge_ft_5e","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("philai_tsdae_6e_bge_ft_5e","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_tsdae_6e_bge_ft_5e| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-tsdae-6e-bge-ft-5e \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-philai_tsdae_6e_bge_ft_5e_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-philai_tsdae_6e_bge_ft_5e_pipeline_en.md new file mode 100644 index 00000000000000..98b943b691213f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-philai_tsdae_6e_bge_ft_5e_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English philai_tsdae_6e_bge_ft_5e_pipeline pipeline BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_tsdae_6e_bge_ft_5e_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_tsdae_6e_bge_ft_5e_pipeline` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_tsdae_6e_bge_ft_5e_pipeline_en_5.4.0_3.0_1718071347273.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_tsdae_6e_bge_ft_5e_pipeline_en_5.4.0_3.0_1718071347273.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("philai_tsdae_6e_bge_ft_5e_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("philai_tsdae_6e_bge_ft_5e_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_tsdae_6e_bge_ft_5e_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-tsdae-6e-bge-ft-5e + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-pmc_bge_1600_en.md b/docs/_posts/ahmedlone127/2024-06-11-pmc_bge_1600_en.md new file mode 100644 index 00000000000000..92af3edb17afd9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-pmc_bge_1600_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English pmc_bge_1600 BGEEmbeddings from Labib11 +author: John Snow Labs +name: pmc_bge_1600 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pmc_bge_1600` is a English model originally trained by Labib11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pmc_bge_1600_en_5.4.0_3.0_1718066360640.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pmc_bge_1600_en_5.4.0_3.0_1718066360640.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("pmc_bge_1600","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("pmc_bge_1600","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pmc_bge_1600| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Labib11/PMC_bge_1600 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-pmc_bge_1600_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-pmc_bge_1600_pipeline_en.md new file mode 100644 index 00000000000000..461e0f694cc5aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-pmc_bge_1600_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English pmc_bge_1600_pipeline pipeline BGEEmbeddings from Labib11 +author: John Snow Labs +name: pmc_bge_1600_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pmc_bge_1600_pipeline` is a English model originally trained by Labib11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pmc_bge_1600_pipeline_en_5.4.0_3.0_1718066441204.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pmc_bge_1600_pipeline_en_5.4.0_3.0_1718066441204.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("pmc_bge_1600_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("pmc_bge_1600_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pmc_bge_1600_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Labib11/PMC_bge_1600 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-pmc_bge_800_en.md b/docs/_posts/ahmedlone127/2024-06-11-pmc_bge_800_en.md new file mode 100644 index 00000000000000..d241d711b21112 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-pmc_bge_800_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English pmc_bge_800 BGEEmbeddings from Labib11 +author: John Snow Labs +name: pmc_bge_800 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pmc_bge_800` is a English model originally trained by Labib11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pmc_bge_800_en_5.4.0_3.0_1718067490438.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pmc_bge_800_en_5.4.0_3.0_1718067490438.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("pmc_bge_800","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("pmc_bge_800","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pmc_bge_800| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Labib11/PMC_bge_800 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-pmc_bge_800_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-pmc_bge_800_pipeline_en.md new file mode 100644 index 00000000000000..66a2b9f3bc11d3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-pmc_bge_800_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English pmc_bge_800_pipeline pipeline BGEEmbeddings from Labib11 +author: John Snow Labs +name: pmc_bge_800_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pmc_bge_800_pipeline` is a English model originally trained by Labib11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pmc_bge_800_pipeline_en_5.4.0_3.0_1718067571477.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pmc_bge_800_pipeline_en_5.4.0_3.0_1718067571477.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("pmc_bge_800_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("pmc_bge_800_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pmc_bge_800_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Labib11/PMC_bge_800 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-roberta_base_ner_aimlab_en.md b/docs/_posts/ahmedlone127/2024-06-11-roberta_base_ner_aimlab_en.md new file mode 100644 index 00000000000000..b77ddfbb9b8ef4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-roberta_base_ner_aimlab_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_ner_aimlab XlmRoBertaForTokenClassification from Aimlab +author: John Snow Labs +name: roberta_base_ner_aimlab +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_ner_aimlab` is a English model originally trained by Aimlab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_ner_aimlab_en_5.4.0_3.0_1718109052399.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_ner_aimlab_en_5.4.0_3.0_1718109052399.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("roberta_base_ner_aimlab","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("roberta_base_ner_aimlab", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_ner_aimlab| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|661.0 MB| + +## References + +https://huggingface.co/Aimlab/Roberta-Base-NER \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-roberta_base_ner_aimlab_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-roberta_base_ner_aimlab_pipeline_en.md new file mode 100644 index 00000000000000..aed36a6cdf8eea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-roberta_base_ner_aimlab_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_ner_aimlab_pipeline pipeline XlmRoBertaForTokenClassification from Aimlab +author: John Snow Labs +name: roberta_base_ner_aimlab_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_ner_aimlab_pipeline` is a English model originally trained by Aimlab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_ner_aimlab_pipeline_en_5.4.0_3.0_1718109288508.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_ner_aimlab_pipeline_en_5.4.0_3.0_1718109288508.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_ner_aimlab_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_ner_aimlab_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_ner_aimlab_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|661.0 MB| + +## References + +https://huggingface.co/Aimlab/Roberta-Base-NER + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-spa_enpt_xlm_r_es.md b/docs/_posts/ahmedlone127/2024-06-11-spa_enpt_xlm_r_es.md new file mode 100644 index 00000000000000..d922ab33c0951a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-spa_enpt_xlm_r_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish spa_enpt_xlm_r XlmRoBertaForTokenClassification from mbruton +author: John Snow Labs +name: spa_enpt_xlm_r +date: 2024-06-11 +tags: [es, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spa_enpt_xlm_r` is a Castilian, Spanish model originally trained by mbruton. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spa_enpt_xlm_r_es_5.4.0_3.0_1718131430548.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spa_enpt_xlm_r_es_5.4.0_3.0_1718131430548.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("spa_enpt_xlm_r","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("spa_enpt_xlm_r", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spa_enpt_xlm_r| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|877.9 MB| + +## References + +https://huggingface.co/mbruton/spa_enpt_XLM-R \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-spa_enpt_xlm_r_pipeline_es.md b/docs/_posts/ahmedlone127/2024-06-11-spa_enpt_xlm_r_pipeline_es.md new file mode 100644 index 00000000000000..80e95e0c3ba61e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-spa_enpt_xlm_r_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish spa_enpt_xlm_r_pipeline pipeline XlmRoBertaForTokenClassification from mbruton +author: John Snow Labs +name: spa_enpt_xlm_r_pipeline +date: 2024-06-11 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spa_enpt_xlm_r_pipeline` is a Castilian, Spanish model originally trained by mbruton. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spa_enpt_xlm_r_pipeline_es_5.4.0_3.0_1718131504737.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spa_enpt_xlm_r_pipeline_es_5.4.0_3.0_1718131504737.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("spa_enpt_xlm_r_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("spa_enpt_xlm_r_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spa_enpt_xlm_r_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|877.9 MB| + +## References + +https://huggingface.co/mbruton/spa_enpt_XLM-R + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-squirtle_en.md b/docs/_posts/ahmedlone127/2024-06-11-squirtle_en.md new file mode 100644 index 00000000000000..f916472a53856a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-squirtle_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English squirtle BGEEmbeddings from Mihaiii +author: John Snow Labs +name: squirtle +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`squirtle` is a English model originally trained by Mihaiii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/squirtle_en_5.4.0_3.0_1718068737550.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/squirtle_en_5.4.0_3.0_1718068737550.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("squirtle","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("squirtle","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|squirtle| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|56.9 MB| + +## References + +https://huggingface.co/Mihaiii/Squirtle \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-squirtle_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-squirtle_pipeline_en.md new file mode 100644 index 00000000000000..d2c34144840e04 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-squirtle_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English squirtle_pipeline pipeline BGEEmbeddings from Mihaiii +author: John Snow Labs +name: squirtle_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`squirtle_pipeline` is a English model originally trained by Mihaiii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/squirtle_pipeline_en_5.4.0_3.0_1718068741228.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/squirtle_pipeline_en_5.4.0_3.0_1718068741228.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("squirtle_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("squirtle_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|squirtle_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|56.9 MB| + +## References + +https://huggingface.co/Mihaiii/Squirtle + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-ter_class_5e_5_hausa_en.md b/docs/_posts/ahmedlone127/2024-06-11-ter_class_5e_5_hausa_en.md new file mode 100644 index 00000000000000..7a4f59bf50d1f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-ter_class_5e_5_hausa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ter_class_5e_5_hausa XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: ter_class_5e_5_hausa +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ter_class_5e_5_hausa` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ter_class_5e_5_hausa_en_5.4.0_3.0_1718134138177.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ter_class_5e_5_hausa_en_5.4.0_3.0_1718134138177.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("ter_class_5e_5_hausa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("ter_class_5e_5_hausa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ter_class_5e_5_hausa| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/ter_class_5e-5_hausa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-ter_class_5e_5_hausa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-ter_class_5e_5_hausa_pipeline_en.md new file mode 100644 index 00000000000000..a01b8cb2c993d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-ter_class_5e_5_hausa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ter_class_5e_5_hausa_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: ter_class_5e_5_hausa_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ter_class_5e_5_hausa_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ter_class_5e_5_hausa_pipeline_en_5.4.0_3.0_1718134205150.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ter_class_5e_5_hausa_pipeline_en_5.4.0_3.0_1718134205150.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ter_class_5e_5_hausa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ter_class_5e_5_hausa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ter_class_5e_5_hausa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/ter_class_5e-5_hausa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-test25_en.md b/docs/_posts/ahmedlone127/2024-06-11-test25_en.md new file mode 100644 index 00000000000000..c5ba5eeab3785e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-test25_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English test25 BGEEmbeddings from Mihaiii +author: John Snow Labs +name: test25 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test25` is a English model originally trained by Mihaiii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test25_en_5.4.0_3.0_1718066814999.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test25_en_5.4.0_3.0_1718066814999.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("test25","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("test25","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test25| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|64.2 MB| + +## References + +https://huggingface.co/Mihaiii/test25 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-test25_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-test25_pipeline_en.md new file mode 100644 index 00000000000000..002ac821579257 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-test25_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English test25_pipeline pipeline BGEEmbeddings from Mihaiii +author: John Snow Labs +name: test25_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test25_pipeline` is a English model originally trained by Mihaiii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test25_pipeline_en_5.4.0_3.0_1718066819304.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test25_pipeline_en_5.4.0_3.0_1718066819304.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test25_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test25_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test25_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|64.2 MB| + +## References + +https://huggingface.co/Mihaiii/test25 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-testbge_en.md b/docs/_posts/ahmedlone127/2024-06-11-testbge_en.md new file mode 100644 index 00000000000000..7b45a4ccb30976 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-testbge_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English testbge BGEEmbeddings from Neokun004 +author: John Snow Labs +name: testbge +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`testbge` is a English model originally trained by Neokun004. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/testbge_en_5.4.0_3.0_1718068731591.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/testbge_en_5.4.0_3.0_1718068731591.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("testbge","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("testbge","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|testbge| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|112.3 MB| + +## References + +https://huggingface.co/Neokun004/Testbge \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-testbge_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-testbge_pipeline_en.md new file mode 100644 index 00000000000000..2b0d2e7c410cf7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-testbge_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English testbge_pipeline pipeline BGEEmbeddings from Neokun004 +author: John Snow Labs +name: testbge_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`testbge_pipeline` is a English model originally trained by Neokun004. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/testbge_pipeline_en_5.4.0_3.0_1718068743671.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/testbge_pipeline_en_5.4.0_3.0_1718068743671.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("testbge_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("testbge_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|testbge_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|112.3 MB| + +## References + +https://huggingface.co/Neokun004/Testbge + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-text2vec_bge_large_chinese_pipeline_zh.md b/docs/_posts/ahmedlone127/2024-06-11-text2vec_bge_large_chinese_pipeline_zh.md new file mode 100644 index 00000000000000..1b6090b5995b50 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-text2vec_bge_large_chinese_pipeline_zh.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Chinese text2vec_bge_large_chinese_pipeline pipeline BGEEmbeddings from shibing624 +author: John Snow Labs +name: text2vec_bge_large_chinese_pipeline +date: 2024-06-11 +tags: [zh, open_source, pipeline, onnx] +task: Embeddings +language: zh +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text2vec_bge_large_chinese_pipeline` is a Chinese model originally trained by shibing624. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text2vec_bge_large_chinese_pipeline_zh_5.4.0_3.0_1718064982625.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text2vec_bge_large_chinese_pipeline_zh_5.4.0_3.0_1718064982625.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("text2vec_bge_large_chinese_pipeline", lang = "zh") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("text2vec_bge_large_chinese_pipeline", lang = "zh") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text2vec_bge_large_chinese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|zh| +|Size:|1.2 GB| + +## References + +https://huggingface.co/shibing624/text2vec-bge-large-chinese + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-text2vec_bge_large_chinese_zh.md b/docs/_posts/ahmedlone127/2024-06-11-text2vec_bge_large_chinese_zh.md new file mode 100644 index 00000000000000..1401c5576630d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-text2vec_bge_large_chinese_zh.md @@ -0,0 +1,87 @@ +--- +layout: model +title: Chinese text2vec_bge_large_chinese BGEEmbeddings from shibing624 +author: John Snow Labs +name: text2vec_bge_large_chinese +date: 2024-06-11 +tags: [zh, open_source, onnx, embeddings, bge] +task: Embeddings +language: zh +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text2vec_bge_large_chinese` is a Chinese model originally trained by shibing624. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text2vec_bge_large_chinese_zh_5.4.0_3.0_1718064908171.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text2vec_bge_large_chinese_zh_5.4.0_3.0_1718064908171.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("text2vec_bge_large_chinese","zh") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("text2vec_bge_large_chinese","zh") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text2vec_bge_large_chinese| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|zh| +|Size:|1.2 GB| + +## References + +https://huggingface.co/shibing624/text2vec-bge-large-chinese \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized_en.md b/docs/_posts/ahmedlone127/2024-06-11-tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized_en.md new file mode 100644 index 00000000000000..627f1a99ef967c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized XlmRoBertaForTokenClassification from anonymoussubmissions +author: John Snow Labs +name: tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized` is a English model originally trained by anonymoussubmissions. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized_en_5.4.0_3.0_1718116878754.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized_en_5.4.0_3.0_1718116878754.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|859.4 MB| + +## References + +https://huggingface.co/anonymoussubmissions/tner-xlm-roberta-base-ontonotes5-switchboard-earnings21-normalized \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized_pipeline_en.md new file mode 100644 index 00000000000000..66721eacebed9d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized_pipeline pipeline XlmRoBertaForTokenClassification from anonymoussubmissions +author: John Snow Labs +name: tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized_pipeline` is a English model originally trained by anonymoussubmissions. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized_pipeline_en_5.4.0_3.0_1718116969169.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized_pipeline_en_5.4.0_3.0_1718116969169.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|859.4 MB| + +## References + +https://huggingface.co/anonymoussubmissions/tner-xlm-roberta-base-ontonotes5-switchboard-earnings21-normalized + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-tner_xlm_roberta_base_ontonotes5_switchboard_normalized_en.md b/docs/_posts/ahmedlone127/2024-06-11-tner_xlm_roberta_base_ontonotes5_switchboard_normalized_en.md new file mode 100644 index 00000000000000..7a7e479adbd179 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-tner_xlm_roberta_base_ontonotes5_switchboard_normalized_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tner_xlm_roberta_base_ontonotes5_switchboard_normalized XlmRoBertaForTokenClassification from anonymoussubmissions +author: John Snow Labs +name: tner_xlm_roberta_base_ontonotes5_switchboard_normalized +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tner_xlm_roberta_base_ontonotes5_switchboard_normalized` is a English model originally trained by anonymoussubmissions. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tner_xlm_roberta_base_ontonotes5_switchboard_normalized_en_5.4.0_3.0_1718132044391.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tner_xlm_roberta_base_ontonotes5_switchboard_normalized_en_5.4.0_3.0_1718132044391.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("tner_xlm_roberta_base_ontonotes5_switchboard_normalized","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("tner_xlm_roberta_base_ontonotes5_switchboard_normalized", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tner_xlm_roberta_base_ontonotes5_switchboard_normalized| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.7 MB| + +## References + +https://huggingface.co/anonymoussubmissions/tner-xlm-roberta-base-ontonotes5-switchboard-normalized \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-tner_xlm_roberta_base_ontonotes5_switchboard_normalized_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-tner_xlm_roberta_base_ontonotes5_switchboard_normalized_pipeline_en.md new file mode 100644 index 00000000000000..b4f1e299e74fb6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-tner_xlm_roberta_base_ontonotes5_switchboard_normalized_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tner_xlm_roberta_base_ontonotes5_switchboard_normalized_pipeline pipeline XlmRoBertaForTokenClassification from anonymoussubmissions +author: John Snow Labs +name: tner_xlm_roberta_base_ontonotes5_switchboard_normalized_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tner_xlm_roberta_base_ontonotes5_switchboard_normalized_pipeline` is a English model originally trained by anonymoussubmissions. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tner_xlm_roberta_base_ontonotes5_switchboard_normalized_pipeline_en_5.4.0_3.0_1718132124356.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tner_xlm_roberta_base_ontonotes5_switchboard_normalized_pipeline_en_5.4.0_3.0_1718132124356.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tner_xlm_roberta_base_ontonotes5_switchboard_normalized_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tner_xlm_roberta_base_ontonotes5_switchboard_normalized_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tner_xlm_roberta_base_ontonotes5_switchboard_normalized_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.8 MB| + +## References + +https://huggingface.co/anonymoussubmissions/tner-xlm-roberta-base-ontonotes5-switchboard-normalized + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-unfiltered_norwegian_delete_hausa_en.md b/docs/_posts/ahmedlone127/2024-06-11-unfiltered_norwegian_delete_hausa_en.md new file mode 100644 index 00000000000000..7dd0fe1ea14ec7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-unfiltered_norwegian_delete_hausa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English unfiltered_norwegian_delete_hausa XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: unfiltered_norwegian_delete_hausa +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`unfiltered_norwegian_delete_hausa` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/unfiltered_norwegian_delete_hausa_en_5.4.0_3.0_1718137800494.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/unfiltered_norwegian_delete_hausa_en_5.4.0_3.0_1718137800494.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("unfiltered_norwegian_delete_hausa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("unfiltered_norwegian_delete_hausa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|unfiltered_norwegian_delete_hausa| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/unfiltered_no_delete_hausa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-unfiltered_norwegian_delete_hausa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-unfiltered_norwegian_delete_hausa_pipeline_en.md new file mode 100644 index 00000000000000..3e581b2104e7d2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-unfiltered_norwegian_delete_hausa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English unfiltered_norwegian_delete_hausa_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: unfiltered_norwegian_delete_hausa_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`unfiltered_norwegian_delete_hausa_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/unfiltered_norwegian_delete_hausa_pipeline_en_5.4.0_3.0_1718137873213.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/unfiltered_norwegian_delete_hausa_pipeline_en_5.4.0_3.0_1718137873213.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("unfiltered_norwegian_delete_hausa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("unfiltered_norwegian_delete_hausa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|unfiltered_norwegian_delete_hausa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/unfiltered_no_delete_hausa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_norwegian_i_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_norwegian_i_en.md new file mode 100644 index 00000000000000..00419890b231cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_norwegian_i_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_norwegian_i XlmRoBertaForTokenClassification from HyungYoun +author: John Snow Labs +name: xlm_norwegian_i +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_norwegian_i` is a English model originally trained by HyungYoun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_norwegian_i_en_5.4.0_3.0_1718125739303.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_norwegian_i_en_5.4.0_3.0_1718125739303.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_norwegian_i","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_norwegian_i", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_norwegian_i| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|794.3 MB| + +## References + +https://huggingface.co/HyungYoun/xlm-no-I \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_norwegian_i_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_norwegian_i_pipeline_en.md new file mode 100644 index 00000000000000..45414c70584b55 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_norwegian_i_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_norwegian_i_pipeline pipeline XlmRoBertaForTokenClassification from HyungYoun +author: John Snow Labs +name: xlm_norwegian_i_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_norwegian_i_pipeline` is a English model originally trained by HyungYoun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_norwegian_i_pipeline_en_5.4.0_3.0_1718125912980.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_norwegian_i_pipeline_en_5.4.0_3.0_1718125912980.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_norwegian_i_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_norwegian_i_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_norwegian_i_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|794.3 MB| + +## References + +https://huggingface.co/HyungYoun/xlm-no-I + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_distemist_es.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_distemist_es.md new file mode 100644 index 00000000000000..7ae38f4c8ffd2b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_distemist_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish xlm_r_galen_distemist XlmRoBertaForTokenClassification from IIC +author: John Snow Labs +name: xlm_r_galen_distemist +date: 2024-06-11 +tags: [es, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_r_galen_distemist` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_r_galen_distemist_es_5.4.0_3.0_1718123981684.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_r_galen_distemist_es_5.4.0_3.0_1718123981684.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_r_galen_distemist","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_r_galen_distemist", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_r_galen_distemist| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|1.0 GB| + +## References + +https://huggingface.co/IIC/XLM_R_Galen-distemist \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_distemist_pipeline_es.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_distemist_pipeline_es.md new file mode 100644 index 00000000000000..538c3a69c4367b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_distemist_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish xlm_r_galen_distemist_pipeline pipeline XlmRoBertaForTokenClassification from IIC +author: John Snow Labs +name: xlm_r_galen_distemist_pipeline +date: 2024-06-11 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_r_galen_distemist_pipeline` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_r_galen_distemist_pipeline_es_5.4.0_3.0_1718124047204.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_r_galen_distemist_pipeline_es_5.4.0_3.0_1718124047204.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_r_galen_distemist_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_r_galen_distemist_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_r_galen_distemist_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|1.0 GB| + +## References + +https://huggingface.co/IIC/XLM_R_Galen-distemist + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_livingner1_es.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_livingner1_es.md new file mode 100644 index 00000000000000..4046a53a1db1b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_livingner1_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish xlm_r_galen_livingner1 XlmRoBertaForTokenClassification from IIC +author: John Snow Labs +name: xlm_r_galen_livingner1 +date: 2024-06-11 +tags: [es, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_r_galen_livingner1` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_r_galen_livingner1_es_5.4.0_3.0_1718115800669.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_r_galen_livingner1_es_5.4.0_3.0_1718115800669.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_r_galen_livingner1","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_r_galen_livingner1", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_r_galen_livingner1| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|1.0 GB| + +## References + +https://huggingface.co/IIC/XLM_R_Galen-livingner1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_livingner1_pipeline_es.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_livingner1_pipeline_es.md new file mode 100644 index 00000000000000..9f117e7066b01c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_livingner1_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish xlm_r_galen_livingner1_pipeline pipeline XlmRoBertaForTokenClassification from IIC +author: John Snow Labs +name: xlm_r_galen_livingner1_pipeline +date: 2024-06-11 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_r_galen_livingner1_pipeline` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_r_galen_livingner1_pipeline_es_5.4.0_3.0_1718115866772.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_r_galen_livingner1_pipeline_es_5.4.0_3.0_1718115866772.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_r_galen_livingner1_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_r_galen_livingner1_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_r_galen_livingner1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|1.0 GB| + +## References + +https://huggingface.co/IIC/XLM_R_Galen-livingner1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_socialdisner_es.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_socialdisner_es.md new file mode 100644 index 00000000000000..7722ed651dbf8c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_socialdisner_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish xlm_r_galen_socialdisner XlmRoBertaForTokenClassification from IIC +author: John Snow Labs +name: xlm_r_galen_socialdisner +date: 2024-06-11 +tags: [es, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_r_galen_socialdisner` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_r_galen_socialdisner_es_5.4.0_3.0_1718131094159.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_r_galen_socialdisner_es_5.4.0_3.0_1718131094159.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_r_galen_socialdisner","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_r_galen_socialdisner", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_r_galen_socialdisner| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|1.0 GB| + +## References + +https://huggingface.co/IIC/XLM_R_Galen-socialdisner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_socialdisner_pipeline_es.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_socialdisner_pipeline_es.md new file mode 100644 index 00000000000000..e7642db99b0f09 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_socialdisner_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish xlm_r_galen_socialdisner_pipeline pipeline XlmRoBertaForTokenClassification from IIC +author: John Snow Labs +name: xlm_r_galen_socialdisner_pipeline +date: 2024-06-11 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_r_galen_socialdisner_pipeline` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_r_galen_socialdisner_pipeline_es_5.4.0_3.0_1718131160107.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_r_galen_socialdisner_pipeline_es_5.4.0_3.0_1718131160107.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_r_galen_socialdisner_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_r_galen_socialdisner_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_r_galen_socialdisner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|1.0 GB| + +## References + +https://huggingface.co/IIC/XLM_R_Galen-socialdisner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_char_shopsign_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_char_shopsign_en.md new file mode 100644 index 00000000000000..5cdb772aecf4c5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_char_shopsign_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_char_shopsign XlmRoBertaForTokenClassification from HyungYoun +author: John Snow Labs +name: xlm_roberta_base_char_shopsign +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_char_shopsign` is a English model originally trained by HyungYoun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_char_shopsign_en_5.4.0_3.0_1718107970920.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_char_shopsign_en_5.4.0_3.0_1718107970920.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_char_shopsign","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_char_shopsign", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_char_shopsign| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|791.5 MB| + +## References + +https://huggingface.co/HyungYoun/xlm-roberta-base-char-shopsign \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_char_shopsign_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_char_shopsign_pipeline_en.md new file mode 100644 index 00000000000000..3c7de744f5d2b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_char_shopsign_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_char_shopsign_pipeline pipeline XlmRoBertaForTokenClassification from HyungYoun +author: John Snow Labs +name: xlm_roberta_base_char_shopsign_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_char_shopsign_pipeline` is a English model originally trained by HyungYoun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_char_shopsign_pipeline_en_5.4.0_3.0_1718108153585.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_char_shopsign_pipeline_en_5.4.0_3.0_1718108153585.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_char_shopsign_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_char_shopsign_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_char_shopsign_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|791.5 MB| + +## References + +https://huggingface.co/HyungYoun/xlm-roberta-base-char-shopsign + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_germeval_14_4_labels_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_germeval_14_4_labels_en.md new file mode 100644 index 00000000000000..1b1f7ca7222786 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_germeval_14_4_labels_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_germeval_14_4_labels XlmRoBertaForTokenClassification from stefanieZ +author: John Snow Labs +name: xlm_roberta_base_finetuned_germeval_14_4_labels +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_germeval_14_4_labels` is a English model originally trained by stefanieZ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_germeval_14_4_labels_en_5.4.0_3.0_1718098637257.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_germeval_14_4_labels_en_5.4.0_3.0_1718098637257.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_germeval_14_4_labels","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_germeval_14_4_labels", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_germeval_14_4_labels| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.2 MB| + +## References + +https://huggingface.co/stefanieZ/xlm-roberta-base-finetuned-germeval-14-4-labels \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_germeval_14_4_labels_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_germeval_14_4_labels_pipeline_en.md new file mode 100644 index 00000000000000..f5a31d0eeb2a86 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_germeval_14_4_labels_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_germeval_14_4_labels_pipeline pipeline XlmRoBertaForTokenClassification from stefanieZ +author: John Snow Labs +name: xlm_roberta_base_finetuned_germeval_14_4_labels_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_germeval_14_4_labels_pipeline` is a English model originally trained by stefanieZ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_germeval_14_4_labels_pipeline_en_5.4.0_3.0_1718098725921.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_germeval_14_4_labels_pipeline_en_5.4.0_3.0_1718098725921.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_germeval_14_4_labels_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_germeval_14_4_labels_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_germeval_14_4_labels_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.2 MB| + +## References + +https://huggingface.co/stefanieZ/xlm-roberta-base-finetuned-germeval-14-4-labels + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_aiventurer_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_aiventurer_en.md new file mode 100644 index 00000000000000..3844f5afe769db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_aiventurer_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_aiventurer XlmRoBertaForTokenClassification from AIventurer +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_aiventurer +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_aiventurer` is a English model originally trained by AIventurer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_aiventurer_en_5.4.0_3.0_1718125283570.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_aiventurer_en_5.4.0_3.0_1718125283570.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_aiventurer","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_aiventurer", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_aiventurer| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|859.5 MB| + +## References + +https://huggingface.co/AIventurer/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_aiventurer_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_aiventurer_pipeline_en.md new file mode 100644 index 00000000000000..56967708485852 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_aiventurer_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_aiventurer_pipeline pipeline XlmRoBertaForTokenClassification from AIventurer +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_aiventurer_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_aiventurer_pipeline` is a English model originally trained by AIventurer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_aiventurer_pipeline_en_5.4.0_3.0_1718125381551.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_aiventurer_pipeline_en_5.4.0_3.0_1718125381551.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_aiventurer_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_aiventurer_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_aiventurer_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|859.5 MB| + +## References + +https://huggingface.co/AIventurer/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_alkampfer_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_alkampfer_en.md new file mode 100644 index 00000000000000..46274025db88f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_alkampfer_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_alkampfer XlmRoBertaForTokenClassification from alkampfer +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_alkampfer +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_alkampfer` is a English model originally trained by alkampfer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_alkampfer_en_5.4.0_3.0_1718116175555.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_alkampfer_en_5.4.0_3.0_1718116175555.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_alkampfer","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_alkampfer", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_alkampfer| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/alkampfer/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_alkampfer_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_alkampfer_pipeline_en.md new file mode 100644 index 00000000000000..0303e00498c028 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_alkampfer_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_alkampfer_pipeline pipeline XlmRoBertaForTokenClassification from alkampfer +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_alkampfer_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_alkampfer_pipeline` is a English model originally trained by alkampfer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_alkampfer_pipeline_en_5.4.0_3.0_1718116261198.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_alkampfer_pipeline_en_5.4.0_3.0_1718116261198.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_alkampfer_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_alkampfer_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_alkampfer_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/alkampfer/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_ankit15nov_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_ankit15nov_en.md new file mode 100644 index 00000000000000..a6e1c91b6b94fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_ankit15nov_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_ankit15nov XlmRoBertaForTokenClassification from Ankit15nov +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_ankit15nov +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_ankit15nov` is a English model originally trained by Ankit15nov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_ankit15nov_en_5.4.0_3.0_1718108947475.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_ankit15nov_en_5.4.0_3.0_1718108947475.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_ankit15nov","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_ankit15nov", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_ankit15nov| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/Ankit15nov/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_ankit15nov_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_ankit15nov_pipeline_en.md new file mode 100644 index 00000000000000..f432269e9e6353 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_ankit15nov_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_ankit15nov_pipeline pipeline XlmRoBertaForTokenClassification from Ankit15nov +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_ankit15nov_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_ankit15nov_pipeline` is a English model originally trained by Ankit15nov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_ankit15nov_pipeline_en_5.4.0_3.0_1718109030207.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_ankit15nov_pipeline_en_5.4.0_3.0_1718109030207.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_ankit15nov_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_ankit15nov_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_ankit15nov_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/Ankit15nov/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_chaoli_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_chaoli_en.md new file mode 100644 index 00000000000000..86f2fe49d26f0b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_chaoli_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_chaoli XlmRoBertaForTokenClassification from ChaoLi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_chaoli +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_chaoli` is a English model originally trained by ChaoLi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_chaoli_en_5.4.0_3.0_1718098637821.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_chaoli_en_5.4.0_3.0_1718098637821.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_chaoli","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_chaoli", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_chaoli| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/ChaoLi/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_chaoli_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_chaoli_pipeline_en.md new file mode 100644 index 00000000000000..52866d837414ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_chaoli_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_chaoli_pipeline pipeline XlmRoBertaForTokenClassification from ChaoLi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_chaoli_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_chaoli_pipeline` is a English model originally trained by ChaoLi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_chaoli_pipeline_en_5.4.0_3.0_1718098729223.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_chaoli_pipeline_en_5.4.0_3.0_1718098729223.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_chaoli_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_chaoli_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_chaoli_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/ChaoLi/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_chris_choi_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_chris_choi_en.md new file mode 100644 index 00000000000000..64fc34d4e85285 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_chris_choi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_chris_choi XlmRoBertaForTokenClassification from Chris-choi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_chris_choi +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_chris_choi` is a English model originally trained by Chris-choi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_chris_choi_en_5.4.0_3.0_1718118485792.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_chris_choi_en_5.4.0_3.0_1718118485792.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_chris_choi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_chris_choi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_chris_choi| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|848.0 MB| + +## References + +https://huggingface.co/Chris-choi/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_chris_choi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_chris_choi_pipeline_en.md new file mode 100644 index 00000000000000..aed01a62dd47e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_chris_choi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_chris_choi_pipeline pipeline XlmRoBertaForTokenClassification from Chris-choi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_chris_choi_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_chris_choi_pipeline` is a English model originally trained by Chris-choi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_chris_choi_pipeline_en_5.4.0_3.0_1718118601294.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_chris_choi_pipeline_en_5.4.0_3.0_1718118601294.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_chris_choi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_chris_choi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_chris_choi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|848.0 MB| + +## References + +https://huggingface.co/Chris-choi/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_ericklerouge123_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_ericklerouge123_en.md new file mode 100644 index 00000000000000..556280e1ae0191 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_ericklerouge123_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_ericklerouge123 XlmRoBertaForTokenClassification from ericklerouge123 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_ericklerouge123 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_ericklerouge123` is a English model originally trained by ericklerouge123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_ericklerouge123_en_5.4.0_3.0_1718108261451.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_ericklerouge123_en_5.4.0_3.0_1718108261451.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_ericklerouge123","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_ericklerouge123", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_ericklerouge123| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/ericklerouge123/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_ericklerouge123_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_ericklerouge123_pipeline_en.md new file mode 100644 index 00000000000000..8b1c2246620583 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_ericklerouge123_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_ericklerouge123_pipeline pipeline XlmRoBertaForTokenClassification from ericklerouge123 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_ericklerouge123_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_ericklerouge123_pipeline` is a English model originally trained by ericklerouge123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_ericklerouge123_pipeline_en_5.4.0_3.0_1718108344098.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_ericklerouge123_pipeline_en_5.4.0_3.0_1718108344098.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_ericklerouge123_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_ericklerouge123_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_ericklerouge123_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/ericklerouge123/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_handun_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_handun_en.md new file mode 100644 index 00000000000000..d3f6a85e9ffab2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_handun_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_handun XlmRoBertaForTokenClassification from Handun +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_handun +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_handun` is a English model originally trained by Handun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_handun_en_5.4.0_3.0_1718118887243.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_handun_en_5.4.0_3.0_1718118887243.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_handun","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_handun", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_handun| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/Handun/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_handun_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_handun_pipeline_en.md new file mode 100644 index 00000000000000..b6e3cc15539aa2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_handun_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_handun_pipeline pipeline XlmRoBertaForTokenClassification from Handun +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_handun_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_handun_pipeline` is a English model originally trained by Handun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_handun_pipeline_en_5.4.0_3.0_1718118969768.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_handun_pipeline_en_5.4.0_3.0_1718118969768.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_handun_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_handun_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_handun_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/Handun/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_hhffxx_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_hhffxx_en.md new file mode 100644 index 00000000000000..d02cb71c1daff1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_hhffxx_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_hhffxx XlmRoBertaForTokenClassification from hhffxx +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_hhffxx +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_hhffxx` is a English model originally trained by hhffxx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_hhffxx_en_5.4.0_3.0_1718110991637.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_hhffxx_en_5.4.0_3.0_1718110991637.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_hhffxx","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_hhffxx", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_hhffxx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.6 MB| + +## References + +https://huggingface.co/hhffxx/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_hhffxx_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_hhffxx_pipeline_en.md new file mode 100644 index 00000000000000..a94f99e2f230e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_hhffxx_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_hhffxx_pipeline pipeline XlmRoBertaForTokenClassification from hhffxx +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_hhffxx_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_hhffxx_pipeline` is a English model originally trained by hhffxx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_hhffxx_pipeline_en_5.4.0_3.0_1718111085484.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_hhffxx_pipeline_en_5.4.0_3.0_1718111085484.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_hhffxx_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_hhffxx_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_hhffxx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.6 MB| + +## References + +https://huggingface.co/hhffxx/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_jbreunig_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_jbreunig_en.md new file mode 100644 index 00000000000000..036b9187f80a75 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_jbreunig_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_jbreunig XlmRoBertaForTokenClassification from jbreunig +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_jbreunig +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_jbreunig` is a English model originally trained by jbreunig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_jbreunig_en_5.4.0_3.0_1718107874411.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_jbreunig_en_5.4.0_3.0_1718107874411.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_jbreunig","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_jbreunig", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_jbreunig| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/jbreunig/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_jbreunig_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_jbreunig_pipeline_en.md new file mode 100644 index 00000000000000..5bcdf6b1791a41 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_jbreunig_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_jbreunig_pipeline pipeline XlmRoBertaForTokenClassification from jbreunig +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_jbreunig_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_jbreunig_pipeline` is a English model originally trained by jbreunig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_jbreunig_pipeline_en_5.4.0_3.0_1718107964056.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_jbreunig_pipeline_en_5.4.0_3.0_1718107964056.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_jbreunig_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_jbreunig_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_jbreunig_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/jbreunig/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_kenhoffman_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_kenhoffman_en.md new file mode 100644 index 00000000000000..66f8046465c485 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_kenhoffman_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_kenhoffman XlmRoBertaForTokenClassification from kenhoffman +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_kenhoffman +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_kenhoffman` is a English model originally trained by kenhoffman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_kenhoffman_en_5.4.0_3.0_1718127818537.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_kenhoffman_en_5.4.0_3.0_1718127818537.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_kenhoffman","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_kenhoffman", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_kenhoffman| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|859.5 MB| + +## References + +https://huggingface.co/kenhoffman/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_kenhoffman_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_kenhoffman_pipeline_en.md new file mode 100644 index 00000000000000..5d804192119f93 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_kenhoffman_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_kenhoffman_pipeline pipeline XlmRoBertaForTokenClassification from kenhoffman +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_kenhoffman_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_kenhoffman_pipeline` is a English model originally trained by kenhoffman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_kenhoffman_pipeline_en_5.4.0_3.0_1718127905178.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_kenhoffman_pipeline_en_5.4.0_3.0_1718127905178.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_kenhoffman_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_kenhoffman_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_kenhoffman_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|859.5 MB| + +## References + +https://huggingface.co/kenhoffman/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_nobody138_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_nobody138_en.md new file mode 100644 index 00000000000000..523b7b3310d18a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_nobody138_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_nobody138 XlmRoBertaForTokenClassification from Nobody138 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_nobody138 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_nobody138` is a English model originally trained by Nobody138. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_nobody138_en_5.4.0_3.0_1718113345863.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_nobody138_en_5.4.0_3.0_1718113345863.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_nobody138","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_nobody138", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_nobody138| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/Nobody138/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_nobody138_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_nobody138_pipeline_en.md new file mode 100644 index 00000000000000..b904c04a4b07e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_nobody138_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_nobody138_pipeline pipeline XlmRoBertaForTokenClassification from Nobody138 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_nobody138_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_nobody138_pipeline` is a English model originally trained by Nobody138. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_nobody138_pipeline_en_5.4.0_3.0_1718113438712.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_nobody138_pipeline_en_5.4.0_3.0_1718113438712.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_nobody138_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_nobody138_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_nobody138_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/Nobody138/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_philosucker_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_philosucker_en.md new file mode 100644 index 00000000000000..d051161b93902f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_philosucker_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_philosucker XlmRoBertaForTokenClassification from philosucker +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_philosucker +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_philosucker` is a English model originally trained by philosucker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_philosucker_en_5.4.0_3.0_1718110030921.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_philosucker_en_5.4.0_3.0_1718110030921.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_philosucker","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_philosucker", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_philosucker| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|862.0 MB| + +## References + +https://huggingface.co/philosucker/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_philosucker_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_philosucker_pipeline_en.md new file mode 100644 index 00000000000000..bf4cd7e01a1d92 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_philosucker_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_philosucker_pipeline pipeline XlmRoBertaForTokenClassification from philosucker +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_philosucker_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_philosucker_pipeline` is a English model originally trained by philosucker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_philosucker_pipeline_en_5.4.0_3.0_1718110125093.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_philosucker_pipeline_en_5.4.0_3.0_1718110125093.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_philosucker_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_philosucker_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_philosucker_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|862.0 MB| + +## References + +https://huggingface.co/philosucker/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_reaverlee_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_reaverlee_en.md new file mode 100644 index 00000000000000..b199be992ae80a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_reaverlee_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_reaverlee XlmRoBertaForTokenClassification from reaverlee +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_reaverlee +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_reaverlee` is a English model originally trained by reaverlee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_reaverlee_en_5.4.0_3.0_1718103752220.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_reaverlee_en_5.4.0_3.0_1718103752220.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_reaverlee","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_reaverlee", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_reaverlee| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/reaverlee/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_reaverlee_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_reaverlee_pipeline_en.md new file mode 100644 index 00000000000000..97353356d1528b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_reaverlee_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_reaverlee_pipeline pipeline XlmRoBertaForTokenClassification from reaverlee +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_reaverlee_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_reaverlee_pipeline` is a English model originally trained by reaverlee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_reaverlee_pipeline_en_5.4.0_3.0_1718103838430.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_reaverlee_pipeline_en_5.4.0_3.0_1718103838430.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_reaverlee_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_reaverlee_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_reaverlee_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/reaverlee/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_skr1125_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_skr1125_en.md new file mode 100644 index 00000000000000..060ff871a141fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_skr1125_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_skr1125 XlmRoBertaForTokenClassification from skr1125 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_skr1125 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_skr1125` is a English model originally trained by skr1125. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_skr1125_en_5.4.0_3.0_1718108896876.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_skr1125_en_5.4.0_3.0_1718108896876.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_skr1125","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_skr1125", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_skr1125| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/skr1125/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_skr1125_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_skr1125_pipeline_en.md new file mode 100644 index 00000000000000..706b57f1b6db53 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_skr1125_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_skr1125_pipeline pipeline XlmRoBertaForTokenClassification from skr1125 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_skr1125_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_skr1125_pipeline` is a English model originally trained by skr1125. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_skr1125_pipeline_en_5.4.0_3.0_1718108978942.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_skr1125_pipeline_en_5.4.0_3.0_1718108978942.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_skr1125_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_skr1125_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_skr1125_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.1 MB| + +## References + +https://huggingface.co/skr1125/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_songys_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_songys_en.md new file mode 100644 index 00000000000000..9720eb3500e060 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_songys_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_songys XlmRoBertaForTokenClassification from songys +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_songys +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_songys` is a English model originally trained by songys. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_songys_en_5.4.0_3.0_1718119568776.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_songys_en_5.4.0_3.0_1718119568776.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_songys","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_songys", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_songys| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.8 MB| + +## References + +https://huggingface.co/songys/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_songys_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_songys_pipeline_en.md new file mode 100644 index 00000000000000..6490785b2de051 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_songys_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_songys_pipeline pipeline XlmRoBertaForTokenClassification from songys +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_songys_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_songys_pipeline` is a English model originally trained by songys. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_songys_pipeline_en_5.4.0_3.0_1718119676969.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_songys_pipeline_en_5.4.0_3.0_1718119676969.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_songys_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_songys_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_songys_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.8 MB| + +## References + +https://huggingface.co/songys/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_sponomary_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_sponomary_en.md new file mode 100644 index 00000000000000..0d56c5ea6ac6b9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_sponomary_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_sponomary XlmRoBertaForTokenClassification from sponomary +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_sponomary +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_sponomary` is a English model originally trained by sponomary. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_sponomary_en_5.4.0_3.0_1718107876684.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_sponomary_en_5.4.0_3.0_1718107876684.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_sponomary","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_sponomary", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_sponomary| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/sponomary/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_sponomary_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_sponomary_pipeline_en.md new file mode 100644 index 00000000000000..407c8a481c0abf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_sponomary_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_sponomary_pipeline pipeline XlmRoBertaForTokenClassification from sponomary +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_sponomary_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_sponomary_pipeline` is a English model originally trained by sponomary. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_sponomary_pipeline_en_5.4.0_3.0_1718107965097.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_sponomary_pipeline_en_5.4.0_3.0_1718107965097.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_sponomary_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_sponomary_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_sponomary_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/sponomary/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_takehirako_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_takehirako_en.md new file mode 100644 index 00000000000000..179408e31044ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_takehirako_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_takehirako XlmRoBertaForTokenClassification from TakeHirako +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_takehirako +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_takehirako` is a English model originally trained by TakeHirako. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_takehirako_en_5.4.0_3.0_1718113812321.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_takehirako_en_5.4.0_3.0_1718113812321.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_takehirako","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_takehirako", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_takehirako| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/TakeHirako/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_takehirako_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_takehirako_pipeline_en.md new file mode 100644 index 00000000000000..e79784f7645c21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_takehirako_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_takehirako_pipeline pipeline XlmRoBertaForTokenClassification from TakeHirako +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_takehirako_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_takehirako_pipeline` is a English model originally trained by TakeHirako. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_takehirako_pipeline_en_5.4.0_3.0_1718113894990.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_takehirako_pipeline_en_5.4.0_3.0_1718113894990.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_takehirako_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_takehirako_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_takehirako_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/TakeHirako/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_team_nave_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_team_nave_en.md new file mode 100644 index 00000000000000..43de1129ad50c8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_team_nave_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_team_nave XlmRoBertaForTokenClassification from team-nave +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_team_nave +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_team_nave` is a English model originally trained by team-nave. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_team_nave_en_5.4.0_3.0_1718120120396.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_team_nave_en_5.4.0_3.0_1718120120396.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_team_nave","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_team_nave", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_team_nave| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|857.0 MB| + +## References + +https://huggingface.co/team-nave/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_team_nave_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_team_nave_pipeline_en.md new file mode 100644 index 00000000000000..97d772bf409807 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_team_nave_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_team_nave_pipeline pipeline XlmRoBertaForTokenClassification from team-nave +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_team_nave_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_team_nave_pipeline` is a English model originally trained by team-nave. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_team_nave_pipeline_en_5.4.0_3.0_1718120206717.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_team_nave_pipeline_en_5.4.0_3.0_1718120206717.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_team_nave_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_team_nave_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_team_nave_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|857.0 MB| + +## References + +https://huggingface.co/team-nave/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_yasu320001_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_yasu320001_en.md new file mode 100644 index 00000000000000..2a4024f3b62295 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_yasu320001_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_yasu320001 XlmRoBertaForTokenClassification from yasu320001 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_yasu320001 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_yasu320001` is a English model originally trained by yasu320001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_yasu320001_en_5.4.0_3.0_1718110226831.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_yasu320001_en_5.4.0_3.0_1718110226831.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_yasu320001","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_yasu320001", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_yasu320001| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/yasu320001/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_yasu320001_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_yasu320001_pipeline_en.md new file mode 100644 index 00000000000000..db39c0d29d11da --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_yasu320001_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_yasu320001_pipeline pipeline XlmRoBertaForTokenClassification from yasu320001 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_yasu320001_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_yasu320001_pipeline` is a English model originally trained by yasu320001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_yasu320001_pipeline_en_5.4.0_3.0_1718110311879.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_yasu320001_pipeline_en_5.4.0_3.0_1718110311879.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_yasu320001_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_yasu320001_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_yasu320001_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/yasu320001/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_arabic_yazannasser_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_arabic_yazannasser_en.md new file mode 100644 index 00000000000000..016e17113ba51a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_arabic_yazannasser_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_arabic_yazannasser XlmRoBertaForTokenClassification from Yazannasser +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_arabic_yazannasser +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_arabic_yazannasser` is a English model originally trained by Yazannasser. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_arabic_yazannasser_en_5.4.0_3.0_1718117424894.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_arabic_yazannasser_en_5.4.0_3.0_1718117424894.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_arabic_yazannasser","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_arabic_yazannasser", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_arabic_yazannasser| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.6 MB| + +## References + +https://huggingface.co/Yazannasser/xlm-roberta-base-finetuned-panx-ar \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_arabic_yazannasser_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_arabic_yazannasser_pipeline_en.md new file mode 100644 index 00000000000000..0938047b94ab51 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_arabic_yazannasser_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_arabic_yazannasser_pipeline pipeline XlmRoBertaForTokenClassification from Yazannasser +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_arabic_yazannasser_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_arabic_yazannasser_pipeline` is a English model originally trained by Yazannasser. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_arabic_yazannasser_pipeline_en_5.4.0_3.0_1718117513792.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_arabic_yazannasser_pipeline_en_5.4.0_3.0_1718117513792.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_arabic_yazannasser_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_arabic_yazannasser_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_arabic_yazannasser_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.7 MB| + +## References + +https://huggingface.co/Yazannasser/xlm-roberta-base-finetuned-panx-ar + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_arabic_zaid_33_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_arabic_zaid_33_en.md new file mode 100644 index 00000000000000..28e368ceee2b51 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_arabic_zaid_33_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_arabic_zaid_33 XlmRoBertaForTokenClassification from zaid-33 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_arabic_zaid_33 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_arabic_zaid_33` is a English model originally trained by zaid-33. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_arabic_zaid_33_en_5.4.0_3.0_1718123006984.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_arabic_zaid_33_en_5.4.0_3.0_1718123006984.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_arabic_zaid_33","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_arabic_zaid_33", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_arabic_zaid_33| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.6 MB| + +## References + +https://huggingface.co/zaid-33/xlm-roberta-base-finetuned-panx-ar \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_arabic_zaid_33_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_arabic_zaid_33_pipeline_en.md new file mode 100644 index 00000000000000..96d6651f75f6f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_arabic_zaid_33_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_arabic_zaid_33_pipeline pipeline XlmRoBertaForTokenClassification from zaid-33 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_arabic_zaid_33_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_arabic_zaid_33_pipeline` is a English model originally trained by zaid-33. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_arabic_zaid_33_pipeline_en_5.4.0_3.0_1718123110351.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_arabic_zaid_33_pipeline_en_5.4.0_3.0_1718123110351.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_arabic_zaid_33_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_arabic_zaid_33_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_arabic_zaid_33_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.7 MB| + +## References + +https://huggingface.co/zaid-33/xlm-roberta-base-finetuned-panx-ar + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ajit_transformer_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ajit_transformer_en.md new file mode 100644 index 00000000000000..4b88be830b1e08 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ajit_transformer_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_ajit_transformer XlmRoBertaForTokenClassification from ajit-transformer +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_ajit_transformer +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_ajit_transformer` is a English model originally trained by ajit-transformer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ajit_transformer_en_5.4.0_3.0_1718133051088.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ajit_transformer_en_5.4.0_3.0_1718133051088.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_ajit_transformer","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_ajit_transformer", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_ajit_transformer| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|827.0 MB| + +## References + +https://huggingface.co/ajit-transformer/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ajit_transformer_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ajit_transformer_pipeline_en.md new file mode 100644 index 00000000000000..9bbfce58883ca1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ajit_transformer_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_ajit_transformer_pipeline pipeline XlmRoBertaForTokenClassification from ajit-transformer +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_ajit_transformer_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_ajit_transformer_pipeline` is a English model originally trained by ajit-transformer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ajit_transformer_pipeline_en_5.4.0_3.0_1718133163195.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ajit_transformer_pipeline_en_5.4.0_3.0_1718133163195.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_ajit_transformer_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_ajit_transformer_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_ajit_transformer_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|827.0 MB| + +## References + +https://huggingface.co/ajit-transformer/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ashrielbrian_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ashrielbrian_en.md new file mode 100644 index 00000000000000..c2c2e6544c8261 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ashrielbrian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_ashrielbrian XlmRoBertaForTokenClassification from ashrielbrian +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_ashrielbrian +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_ashrielbrian` is a English model originally trained by ashrielbrian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ashrielbrian_en_5.4.0_3.0_1718099679172.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ashrielbrian_en_5.4.0_3.0_1718099679172.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_ashrielbrian","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_ashrielbrian", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_ashrielbrian| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/ashrielbrian/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ashrielbrian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ashrielbrian_pipeline_en.md new file mode 100644 index 00000000000000..9d6b888dd680b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ashrielbrian_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_ashrielbrian_pipeline pipeline XlmRoBertaForTokenClassification from ashrielbrian +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_ashrielbrian_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_ashrielbrian_pipeline` is a English model originally trained by ashrielbrian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ashrielbrian_pipeline_en_5.4.0_3.0_1718099797797.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ashrielbrian_pipeline_en_5.4.0_3.0_1718099797797.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_ashrielbrian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_ashrielbrian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_ashrielbrian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/ashrielbrian/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_chris_choi_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_chris_choi_en.md new file mode 100644 index 00000000000000..f8b4984c68295a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_chris_choi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_chris_choi XlmRoBertaForTokenClassification from Chris-choi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_chris_choi +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_chris_choi` is a English model originally trained by Chris-choi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_chris_choi_en_5.4.0_3.0_1718123063495.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_chris_choi_en_5.4.0_3.0_1718123063495.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_chris_choi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_chris_choi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_chris_choi| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/Chris-choi/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_chris_choi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_chris_choi_pipeline_en.md new file mode 100644 index 00000000000000..8104da9bd18a6c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_chris_choi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_chris_choi_pipeline pipeline XlmRoBertaForTokenClassification from Chris-choi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_chris_choi_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_chris_choi_pipeline` is a English model originally trained by Chris-choi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_chris_choi_pipeline_en_5.4.0_3.0_1718123201197.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_chris_choi_pipeline_en_5.4.0_3.0_1718123201197.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_chris_choi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_chris_choi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_chris_choi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/Chris-choi/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_cj_mills_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_cj_mills_en.md new file mode 100644 index 00000000000000..3fc7e2610a285a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_cj_mills_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_cj_mills XlmRoBertaForTokenClassification from cj-mills +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_cj_mills +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_cj_mills` is a English model originally trained by cj-mills. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_cj_mills_en_5.4.0_3.0_1718105119471.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_cj_mills_en_5.4.0_3.0_1718105119471.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_cj_mills","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_cj_mills", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_cj_mills| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|823.0 MB| + +## References + +https://huggingface.co/cj-mills/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_cj_mills_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_cj_mills_pipeline_en.md new file mode 100644 index 00000000000000..4ed018d527bc1f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_cj_mills_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_cj_mills_pipeline pipeline XlmRoBertaForTokenClassification from cj-mills +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_cj_mills_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_cj_mills_pipeline` is a English model originally trained by cj-mills. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_cj_mills_pipeline_en_5.4.0_3.0_1718105233563.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_cj_mills_pipeline_en_5.4.0_3.0_1718105233563.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_cj_mills_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_cj_mills_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_cj_mills_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|823.0 MB| + +## References + +https://huggingface.co/cj-mills/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_cogitur_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_cogitur_en.md new file mode 100644 index 00000000000000..85dfd8735fcb72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_cogitur_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_cogitur XlmRoBertaForTokenClassification from cogitur +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_cogitur +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_cogitur` is a English model originally trained by cogitur. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_cogitur_en_5.4.0_3.0_1718113324050.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_cogitur_en_5.4.0_3.0_1718113324050.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_cogitur","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_cogitur", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_cogitur| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/cogitur/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_cogitur_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_cogitur_pipeline_en.md new file mode 100644 index 00000000000000..4f8a49db01a021 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_cogitur_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_cogitur_pipeline pipeline XlmRoBertaForTokenClassification from cogitur +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_cogitur_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_cogitur_pipeline` is a English model originally trained by cogitur. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_cogitur_pipeline_en_5.4.0_3.0_1718113459211.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_cogitur_pipeline_en_5.4.0_3.0_1718113459211.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_cogitur_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_cogitur_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_cogitur_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/cogitur/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_flood_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_flood_en.md new file mode 100644 index 00000000000000..5cdd165415d655 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_flood_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_flood XlmRoBertaForTokenClassification from flood +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_flood +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_flood` is a English model originally trained by flood. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_flood_en_5.4.0_3.0_1718102977423.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_flood_en_5.4.0_3.0_1718102977423.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_flood","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_flood", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_flood| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/flood/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_flood_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_flood_pipeline_en.md new file mode 100644 index 00000000000000..eb008c457d44ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_flood_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_flood_pipeline pipeline XlmRoBertaForTokenClassification from flood +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_flood_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_flood_pipeline` is a English model originally trained by flood. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_flood_pipeline_en_5.4.0_3.0_1718103096292.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_flood_pipeline_en_5.4.0_3.0_1718103096292.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_flood_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_flood_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_flood_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/flood/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_gogd_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_gogd_en.md new file mode 100644 index 00000000000000..0fa4c60c3aa7dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_gogd_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_gogd XlmRoBertaForTokenClassification from GoGD +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_gogd +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_gogd` is a English model originally trained by GoGD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_gogd_en_5.4.0_3.0_1718118864659.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_gogd_en_5.4.0_3.0_1718118864659.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_gogd","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_gogd", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_gogd| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/GoGD/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_gogd_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_gogd_pipeline_en.md new file mode 100644 index 00000000000000..5e2458264543aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_gogd_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_gogd_pipeline pipeline XlmRoBertaForTokenClassification from GoGD +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_gogd_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_gogd_pipeline` is a English model originally trained by GoGD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_gogd_pipeline_en_5.4.0_3.0_1718118983548.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_gogd_pipeline_en_5.4.0_3.0_1718118983548.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_gogd_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_gogd_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_gogd_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/GoGD/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_guruji108_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_guruji108_en.md new file mode 100644 index 00000000000000..27951b2a3b0e32 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_guruji108_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_guruji108 XlmRoBertaForTokenClassification from Guruji108 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_guruji108 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_guruji108` is a English model originally trained by Guruji108. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_guruji108_en_5.4.0_3.0_1718104851201.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_guruji108_en_5.4.0_3.0_1718104851201.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_guruji108","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_guruji108", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_guruji108| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/Guruji108/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_guruji108_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_guruji108_pipeline_en.md new file mode 100644 index 00000000000000..3a5c3e51e4bc6b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_guruji108_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_guruji108_pipeline pipeline XlmRoBertaForTokenClassification from Guruji108 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_guruji108_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_guruji108_pipeline` is a English model originally trained by Guruji108. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_guruji108_pipeline_en_5.4.0_3.0_1718104971417.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_guruji108_pipeline_en_5.4.0_3.0_1718104971417.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_guruji108_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_guruji108_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_guruji108_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/Guruji108/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_inniok_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_inniok_en.md new file mode 100644 index 00000000000000..e417733088768a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_inniok_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_inniok XlmRoBertaForTokenClassification from inniok +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_inniok +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_inniok` is a English model originally trained by inniok. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_inniok_en_5.4.0_3.0_1718127165681.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_inniok_en_5.4.0_3.0_1718127165681.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_inniok","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_inniok", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_inniok| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/inniok/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_inniok_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_inniok_pipeline_en.md new file mode 100644 index 00000000000000..7ca89ec08af375 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_inniok_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_inniok_pipeline pipeline XlmRoBertaForTokenClassification from inniok +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_inniok_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_inniok_pipeline` is a English model originally trained by inniok. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_inniok_pipeline_en_5.4.0_3.0_1718127298719.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_inniok_pipeline_en_5.4.0_3.0_1718127298719.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_inniok_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_inniok_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_inniok_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/inniok/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_jjglilleberg_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_jjglilleberg_en.md new file mode 100644 index 00000000000000..4056d3e1efd2bd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_jjglilleberg_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_jjglilleberg XlmRoBertaForTokenClassification from jjglilleberg +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_jjglilleberg +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_jjglilleberg` is a English model originally trained by jjglilleberg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_jjglilleberg_en_5.4.0_3.0_1718138915017.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_jjglilleberg_en_5.4.0_3.0_1718138915017.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_jjglilleberg","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_jjglilleberg", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_jjglilleberg| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/jjglilleberg/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_jjglilleberg_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_jjglilleberg_pipeline_en.md new file mode 100644 index 00000000000000..36fb4991bb5740 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_jjglilleberg_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_jjglilleberg_pipeline pipeline XlmRoBertaForTokenClassification from jjglilleberg +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_jjglilleberg_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_jjglilleberg_pipeline` is a English model originally trained by jjglilleberg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_jjglilleberg_pipeline_en_5.4.0_3.0_1718139048211.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_jjglilleberg_pipeline_en_5.4.0_3.0_1718139048211.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_jjglilleberg_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_jjglilleberg_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_jjglilleberg_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/jjglilleberg/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_kiechu_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_kiechu_en.md new file mode 100644 index 00000000000000..6dfadda6624c87 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_kiechu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_kiechu XlmRoBertaForTokenClassification from kiechu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_kiechu +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_kiechu` is a English model originally trained by kiechu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_kiechu_en_5.4.0_3.0_1718127176349.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_kiechu_en_5.4.0_3.0_1718127176349.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_kiechu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_kiechu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_kiechu| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/kiechu/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_kiechu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_kiechu_pipeline_en.md new file mode 100644 index 00000000000000..e94a21df3c2132 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_kiechu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_kiechu_pipeline pipeline XlmRoBertaForTokenClassification from kiechu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_kiechu_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_kiechu_pipeline` is a English model originally trained by kiechu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_kiechu_pipeline_en_5.4.0_3.0_1718127311992.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_kiechu_pipeline_en_5.4.0_3.0_1718127311992.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_kiechu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_kiechu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_kiechu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/kiechu/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_laurentiustancioiu_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_laurentiustancioiu_en.md new file mode 100644 index 00000000000000..5b444caf0e33da --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_laurentiustancioiu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_laurentiustancioiu XlmRoBertaForTokenClassification from LaurentiuStancioiu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_laurentiustancioiu +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_laurentiustancioiu` is a English model originally trained by LaurentiuStancioiu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_laurentiustancioiu_en_5.4.0_3.0_1718111459033.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_laurentiustancioiu_en_5.4.0_3.0_1718111459033.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_laurentiustancioiu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_laurentiustancioiu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_laurentiustancioiu| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/LaurentiuStancioiu/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_laurentiustancioiu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_laurentiustancioiu_pipeline_en.md new file mode 100644 index 00000000000000..bddcdeb979b6f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_laurentiustancioiu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_laurentiustancioiu_pipeline pipeline XlmRoBertaForTokenClassification from LaurentiuStancioiu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_laurentiustancioiu_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_laurentiustancioiu_pipeline` is a English model originally trained by LaurentiuStancioiu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_laurentiustancioiu_pipeline_en_5.4.0_3.0_1718111585112.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_laurentiustancioiu_pipeline_en_5.4.0_3.0_1718111585112.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_laurentiustancioiu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_laurentiustancioiu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_laurentiustancioiu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/LaurentiuStancioiu/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_nobody138_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_nobody138_en.md new file mode 100644 index 00000000000000..793438bc0d02a1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_nobody138_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_nobody138 XlmRoBertaForTokenClassification from Nobody138 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_nobody138 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_nobody138` is a English model originally trained by Nobody138. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_nobody138_en_5.4.0_3.0_1718102724242.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_nobody138_en_5.4.0_3.0_1718102724242.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_nobody138","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_nobody138", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_nobody138| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/Nobody138/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_nobody138_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_nobody138_pipeline_en.md new file mode 100644 index 00000000000000..0e639f379aa7f9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_nobody138_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_nobody138_pipeline pipeline XlmRoBertaForTokenClassification from Nobody138 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_nobody138_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_nobody138_pipeline` is a English model originally trained by Nobody138. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_nobody138_pipeline_en_5.4.0_3.0_1718102844591.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_nobody138_pipeline_en_5.4.0_3.0_1718102844591.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_nobody138_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_nobody138_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_nobody138_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/Nobody138/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_obong_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_obong_en.md new file mode 100644 index 00000000000000..1e0a9c8ebae827 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_obong_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_obong XlmRoBertaForTokenClassification from obong +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_obong +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_obong` is a English model originally trained by obong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_obong_en_5.4.0_3.0_1718115707496.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_obong_en_5.4.0_3.0_1718115707496.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_obong","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_obong", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_obong| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/obong/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_obong_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_obong_pipeline_en.md new file mode 100644 index 00000000000000..50ffcdcd801898 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_obong_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_obong_pipeline pipeline XlmRoBertaForTokenClassification from obong +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_obong_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_obong_pipeline` is a English model originally trained by obong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_obong_pipeline_en_5.4.0_3.0_1718115833149.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_obong_pipeline_en_5.4.0_3.0_1718115833149.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_obong_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_obong_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_obong_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/obong/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_reinoudbosch_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_reinoudbosch_en.md new file mode 100644 index 00000000000000..ab471ff94f304d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_reinoudbosch_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_reinoudbosch XlmRoBertaForTokenClassification from reinoudbosch +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_reinoudbosch +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_reinoudbosch` is a English model originally trained by reinoudbosch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_reinoudbosch_en_5.4.0_3.0_1718104038490.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_reinoudbosch_en_5.4.0_3.0_1718104038490.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_reinoudbosch","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_reinoudbosch", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_reinoudbosch| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/reinoudbosch/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_reinoudbosch_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_reinoudbosch_pipeline_en.md new file mode 100644 index 00000000000000..2ae926cf361a96 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_reinoudbosch_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_reinoudbosch_pipeline pipeline XlmRoBertaForTokenClassification from reinoudbosch +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_reinoudbosch_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_reinoudbosch_pipeline` is a English model originally trained by reinoudbosch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_reinoudbosch_pipeline_en_5.4.0_3.0_1718104157046.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_reinoudbosch_pipeline_en_5.4.0_3.0_1718104157046.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_reinoudbosch_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_reinoudbosch_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_reinoudbosch_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/reinoudbosch/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ridealist_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ridealist_en.md new file mode 100644 index 00000000000000..61939dd6b5ac77 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ridealist_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_ridealist XlmRoBertaForTokenClassification from Ridealist +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_ridealist +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_ridealist` is a English model originally trained by Ridealist. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ridealist_en_5.4.0_3.0_1718106344672.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ridealist_en_5.4.0_3.0_1718106344672.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_ridealist","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_ridealist", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_ridealist| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/Ridealist/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ridealist_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ridealist_pipeline_en.md new file mode 100644 index 00000000000000..58077eaf0396f3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ridealist_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_ridealist_pipeline pipeline XlmRoBertaForTokenClassification from Ridealist +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_ridealist_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_ridealist_pipeline` is a English model originally trained by Ridealist. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ridealist_pipeline_en_5.4.0_3.0_1718106483814.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ridealist_pipeline_en_5.4.0_3.0_1718106483814.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_ridealist_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_ridealist_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_ridealist_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/Ridealist/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ryatora_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ryatora_en.md new file mode 100644 index 00000000000000..24da6cbe1f067b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ryatora_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_ryatora XlmRoBertaForTokenClassification from ryatora +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_ryatora +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_ryatora` is a English model originally trained by ryatora. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ryatora_en_5.4.0_3.0_1718137859539.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ryatora_en_5.4.0_3.0_1718137859539.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_ryatora","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_ryatora", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_ryatora| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/ryatora/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ryatora_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ryatora_pipeline_en.md new file mode 100644 index 00000000000000..f8c19a85c74abf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ryatora_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_ryatora_pipeline pipeline XlmRoBertaForTokenClassification from ryatora +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_ryatora_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_ryatora_pipeline` is a English model originally trained by ryatora. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ryatora_pipeline_en_5.4.0_3.0_1718137979348.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ryatora_pipeline_en_5.4.0_3.0_1718137979348.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_ryatora_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_ryatora_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_ryatora_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/ryatora/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_shinta0615_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_shinta0615_en.md new file mode 100644 index 00000000000000..078ca8405a7ec0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_shinta0615_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_shinta0615 XlmRoBertaForTokenClassification from shinta0615 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_shinta0615 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_shinta0615` is a English model originally trained by shinta0615. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_shinta0615_en_5.4.0_3.0_1718127813006.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_shinta0615_en_5.4.0_3.0_1718127813006.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_shinta0615","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_shinta0615", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_shinta0615| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/shinta0615/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_shinta0615_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_shinta0615_pipeline_en.md new file mode 100644 index 00000000000000..749edf37e0dd0e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_shinta0615_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_shinta0615_pipeline pipeline XlmRoBertaForTokenClassification from shinta0615 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_shinta0615_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_shinta0615_pipeline` is a English model originally trained by shinta0615. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_shinta0615_pipeline_en_5.4.0_3.0_1718127934494.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_shinta0615_pipeline_en_5.4.0_3.0_1718127934494.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_shinta0615_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_shinta0615_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_shinta0615_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/shinta0615/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_skr3178_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_skr3178_en.md new file mode 100644 index 00000000000000..6fe4384d8f0ee9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_skr3178_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_skr3178 XlmRoBertaForTokenClassification from skr3178 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_skr3178 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_skr3178` is a English model originally trained by skr3178. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_skr3178_en_5.4.0_3.0_1718098669326.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_skr3178_en_5.4.0_3.0_1718098669326.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_skr3178","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_skr3178", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_skr3178| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/skr3178/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_skr3178_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_skr3178_pipeline_en.md new file mode 100644 index 00000000000000..9b71dfbb185b4d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_skr3178_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_skr3178_pipeline pipeline XlmRoBertaForTokenClassification from skr3178 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_skr3178_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_skr3178_pipeline` is a English model originally trained by skr3178. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_skr3178_pipeline_en_5.4.0_3.0_1718098786935.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_skr3178_pipeline_en_5.4.0_3.0_1718098786935.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_skr3178_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_skr3178_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_skr3178_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/skr3178/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_smallsuper_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_smallsuper_en.md new file mode 100644 index 00000000000000..becb2a842cdb46 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_smallsuper_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_smallsuper XlmRoBertaForTokenClassification from smallsuper +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_smallsuper +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_smallsuper` is a English model originally trained by smallsuper. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_smallsuper_en_5.4.0_3.0_1718113334837.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_smallsuper_en_5.4.0_3.0_1718113334837.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_smallsuper","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_smallsuper", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_smallsuper| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/smallsuper/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_smallsuper_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_smallsuper_pipeline_en.md new file mode 100644 index 00000000000000..91387aa8274d72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_smallsuper_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_smallsuper_pipeline pipeline XlmRoBertaForTokenClassification from smallsuper +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_smallsuper_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_smallsuper_pipeline` is a English model originally trained by smallsuper. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_smallsuper_pipeline_en_5.4.0_3.0_1718113475489.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_smallsuper_pipeline_en_5.4.0_3.0_1718113475489.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_smallsuper_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_smallsuper_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_smallsuper_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/smallsuper/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_songys_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_songys_en.md new file mode 100644 index 00000000000000..0f3e0c96dddb91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_songys_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_songys XlmRoBertaForTokenClassification from songys +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_songys +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_songys` is a English model originally trained by songys. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_songys_en_5.4.0_3.0_1718135267211.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_songys_en_5.4.0_3.0_1718135267211.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_songys","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_songys", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_songys| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|824.2 MB| + +## References + +https://huggingface.co/songys/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_songys_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_songys_pipeline_en.md new file mode 100644 index 00000000000000..fe6d85e35f633d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_songys_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_songys_pipeline pipeline XlmRoBertaForTokenClassification from songys +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_songys_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_songys_pipeline` is a English model originally trained by songys. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_songys_pipeline_en_5.4.0_3.0_1718135383697.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_songys_pipeline_en_5.4.0_3.0_1718135383697.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_songys_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_songys_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_songys_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|824.2 MB| + +## References + +https://huggingface.co/songys/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_sungkwangjoong_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_sungkwangjoong_en.md new file mode 100644 index 00000000000000..6fd193649df9eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_sungkwangjoong_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_sungkwangjoong XlmRoBertaForTokenClassification from sungkwangjoong +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_sungkwangjoong +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_sungkwangjoong` is a English model originally trained by sungkwangjoong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_sungkwangjoong_en_5.4.0_3.0_1718120799741.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_sungkwangjoong_en_5.4.0_3.0_1718120799741.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_sungkwangjoong","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_sungkwangjoong", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_sungkwangjoong| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/sungkwangjoong/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_sungkwangjoong_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_sungkwangjoong_pipeline_en.md new file mode 100644 index 00000000000000..ddaee7957d071e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_sungkwangjoong_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_sungkwangjoong_pipeline pipeline XlmRoBertaForTokenClassification from sungkwangjoong +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_sungkwangjoong_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_sungkwangjoong_pipeline` is a English model originally trained by sungkwangjoong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_sungkwangjoong_pipeline_en_5.4.0_3.0_1718120933255.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_sungkwangjoong_pipeline_en_5.4.0_3.0_1718120933255.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_sungkwangjoong_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_sungkwangjoong_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_sungkwangjoong_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/sungkwangjoong/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_sungwoo1_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_sungwoo1_en.md new file mode 100644 index 00000000000000..e26dcebfd009d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_sungwoo1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_sungwoo1 XlmRoBertaForTokenClassification from sungwoo1 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_sungwoo1 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_sungwoo1` is a English model originally trained by sungwoo1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_sungwoo1_en_5.4.0_3.0_1718112155652.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_sungwoo1_en_5.4.0_3.0_1718112155652.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_sungwoo1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_sungwoo1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_sungwoo1| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/sungwoo1/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_sungwoo1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_sungwoo1_pipeline_en.md new file mode 100644 index 00000000000000..736b24ce435dc4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_sungwoo1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_sungwoo1_pipeline pipeline XlmRoBertaForTokenClassification from sungwoo1 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_sungwoo1_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_sungwoo1_pipeline` is a English model originally trained by sungwoo1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_sungwoo1_pipeline_en_5.4.0_3.0_1718112278867.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_sungwoo1_pipeline_en_5.4.0_3.0_1718112278867.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_sungwoo1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_sungwoo1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_sungwoo1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/sungwoo1/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_tyayoi_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_tyayoi_en.md new file mode 100644 index 00000000000000..02fa018ee8fc35 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_tyayoi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_tyayoi XlmRoBertaForTokenClassification from tyayoi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_tyayoi +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_tyayoi` is a English model originally trained by tyayoi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_tyayoi_en_5.4.0_3.0_1718109938193.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_tyayoi_en_5.4.0_3.0_1718109938193.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_tyayoi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_tyayoi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_tyayoi| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/tyayoi/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_tyayoi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_tyayoi_pipeline_en.md new file mode 100644 index 00000000000000..77bba7e682c512 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_tyayoi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_tyayoi_pipeline pipeline XlmRoBertaForTokenClassification from tyayoi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_tyayoi_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_tyayoi_pipeline` is a English model originally trained by tyayoi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_tyayoi_pipeline_en_5.4.0_3.0_1718110062314.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_tyayoi_pipeline_en_5.4.0_3.0_1718110062314.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_tyayoi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_tyayoi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_tyayoi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/tyayoi/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_yasu320001_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_yasu320001_en.md new file mode 100644 index 00000000000000..bca3859ea94d49 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_yasu320001_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_yasu320001 XlmRoBertaForTokenClassification from yasu320001 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_yasu320001 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_yasu320001` is a English model originally trained by yasu320001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_yasu320001_en_5.4.0_3.0_1718107907522.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_yasu320001_en_5.4.0_3.0_1718107907522.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_yasu320001","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_yasu320001", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_yasu320001| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/yasu320001/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_yasu320001_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_yasu320001_pipeline_en.md new file mode 100644 index 00000000000000..a72c5f5fc3f2b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_yasu320001_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_yasu320001_pipeline pipeline XlmRoBertaForTokenClassification from yasu320001 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_yasu320001_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_yasu320001_pipeline` is a English model originally trained by yasu320001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_yasu320001_pipeline_en_5.4.0_3.0_1718108035463.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_yasu320001_pipeline_en_5.4.0_3.0_1718108035463.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_yasu320001_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_yasu320001_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_yasu320001_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/yasu320001/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ysige_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ysige_en.md new file mode 100644 index 00000000000000..8f495c25c6513d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ysige_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_ysige XlmRoBertaForTokenClassification from ysige +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_ysige +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_ysige` is a English model originally trained by ysige. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ysige_en_5.4.0_3.0_1718124064401.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ysige_en_5.4.0_3.0_1718124064401.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_ysige","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_ysige", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_ysige| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/ysige/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ysige_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ysige_pipeline_en.md new file mode 100644 index 00000000000000..82473ee42357dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ysige_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_ysige_pipeline pipeline XlmRoBertaForTokenClassification from ysige +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_ysige_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_ysige_pipeline` is a English model originally trained by ysige. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ysige_pipeline_en_5.4.0_3.0_1718124192684.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ysige_pipeline_en_5.4.0_3.0_1718124192684.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_ysige_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_ysige_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_ysige_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/ysige/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_aiventurer_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_aiventurer_en.md new file mode 100644 index 00000000000000..7c9fad75305f92 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_aiventurer_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_aiventurer XlmRoBertaForTokenClassification from AIventurer +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_aiventurer +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_aiventurer` is a English model originally trained by AIventurer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_aiventurer_en_5.4.0_3.0_1718125781327.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_aiventurer_en_5.4.0_3.0_1718125781327.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_aiventurer","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_aiventurer", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_aiventurer| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/AIventurer/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_aiventurer_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_aiventurer_pipeline_en.md new file mode 100644 index 00000000000000..c929f7e20152fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_aiventurer_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_aiventurer_pipeline pipeline XlmRoBertaForTokenClassification from AIventurer +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_aiventurer_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_aiventurer_pipeline` is a English model originally trained by AIventurer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_aiventurer_pipeline_en_5.4.0_3.0_1718125887605.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_aiventurer_pipeline_en_5.4.0_3.0_1718125887605.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_aiventurer_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_aiventurer_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_aiventurer_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/AIventurer/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_cataluna84_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_cataluna84_en.md new file mode 100644 index 00000000000000..c1ca90c5c21fd1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_cataluna84_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_cataluna84 XlmRoBertaForTokenClassification from cataluna84 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_cataluna84 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_cataluna84` is a English model originally trained by cataluna84. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_cataluna84_en_5.4.0_3.0_1718104824875.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_cataluna84_en_5.4.0_3.0_1718104824875.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_cataluna84","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_cataluna84", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_cataluna84| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/cataluna84/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_cataluna84_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_cataluna84_pipeline_en.md new file mode 100644 index 00000000000000..3e6e446099294e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_cataluna84_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_cataluna84_pipeline pipeline XlmRoBertaForTokenClassification from cataluna84 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_cataluna84_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_cataluna84_pipeline` is a English model originally trained by cataluna84. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_cataluna84_pipeline_en_5.4.0_3.0_1718104925385.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_cataluna84_pipeline_en_5.4.0_3.0_1718104925385.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_cataluna84_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_cataluna84_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_cataluna84_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/cataluna84/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_chris_choi_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_chris_choi_en.md new file mode 100644 index 00000000000000..29240ba54f0d04 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_chris_choi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_chris_choi XlmRoBertaForTokenClassification from Chris-choi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_chris_choi +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_chris_choi` is a English model originally trained by Chris-choi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_chris_choi_en_5.4.0_3.0_1718120778592.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_chris_choi_en_5.4.0_3.0_1718120778592.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_chris_choi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_chris_choi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_chris_choi| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|827.9 MB| + +## References + +https://huggingface.co/Chris-choi/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_chris_choi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_chris_choi_pipeline_en.md new file mode 100644 index 00000000000000..0ee4e7f8873470 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_chris_choi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_chris_choi_pipeline pipeline XlmRoBertaForTokenClassification from Chris-choi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_chris_choi_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_chris_choi_pipeline` is a English model originally trained by Chris-choi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_chris_choi_pipeline_en_5.4.0_3.0_1718120894021.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_chris_choi_pipeline_en_5.4.0_3.0_1718120894021.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_chris_choi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_chris_choi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_chris_choi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|827.9 MB| + +## References + +https://huggingface.co/Chris-choi/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_cyycyy_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_cyycyy_en.md new file mode 100644 index 00000000000000..309ed399919038 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_cyycyy_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_cyycyy XlmRoBertaForTokenClassification from cyycyy +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_cyycyy +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_cyycyy` is a English model originally trained by cyycyy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_cyycyy_en_5.4.0_3.0_1718114724641.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_cyycyy_en_5.4.0_3.0_1718114724641.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_cyycyy","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_cyycyy", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_cyycyy| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/cyycyy/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_cyycyy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_cyycyy_pipeline_en.md new file mode 100644 index 00000000000000..c98a7bd504e0a3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_cyycyy_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_cyycyy_pipeline pipeline XlmRoBertaForTokenClassification from cyycyy +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_cyycyy_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_cyycyy_pipeline` is a English model originally trained by cyycyy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_cyycyy_pipeline_en_5.4.0_3.0_1718114840341.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_cyycyy_pipeline_en_5.4.0_3.0_1718114840341.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_cyycyy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_cyycyy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_cyycyy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/cyycyy/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_edwardjross_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_edwardjross_en.md new file mode 100644 index 00000000000000..e0505054f678fb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_edwardjross_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_edwardjross XlmRoBertaForTokenClassification from edwardjross +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_edwardjross +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_edwardjross` is a English model originally trained by edwardjross. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_edwardjross_en_5.4.0_3.0_1718105828143.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_edwardjross_en_5.4.0_3.0_1718105828143.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_edwardjross","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_edwardjross", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_edwardjross| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|842.5 MB| + +## References + +https://huggingface.co/edwardjross/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_edwardjross_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_edwardjross_pipeline_en.md new file mode 100644 index 00000000000000..c904449cba8826 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_edwardjross_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_edwardjross_pipeline pipeline XlmRoBertaForTokenClassification from edwardjross +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_edwardjross_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_edwardjross_pipeline` is a English model originally trained by edwardjross. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_edwardjross_pipeline_en_5.4.0_3.0_1718105928222.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_edwardjross_pipeline_en_5.4.0_3.0_1718105928222.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_edwardjross_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_edwardjross_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_edwardjross_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|842.6 MB| + +## References + +https://huggingface.co/edwardjross/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_fraisier_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_fraisier_en.md new file mode 100644 index 00000000000000..8142977278b8de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_fraisier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_fraisier XlmRoBertaForTokenClassification from Fraisier +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_fraisier +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_fraisier` is a English model originally trained by Fraisier. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_fraisier_en_5.4.0_3.0_1718133097044.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_fraisier_en_5.4.0_3.0_1718133097044.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_fraisier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_fraisier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_fraisier| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/Fraisier/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_fraisier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_fraisier_pipeline_en.md new file mode 100644 index 00000000000000..2df0d9e40cc8be --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_fraisier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_fraisier_pipeline pipeline XlmRoBertaForTokenClassification from Fraisier +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_fraisier_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_fraisier_pipeline` is a English model originally trained by Fraisier. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_fraisier_pipeline_en_5.4.0_3.0_1718133205314.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_fraisier_pipeline_en_5.4.0_3.0_1718133205314.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_fraisier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_fraisier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_fraisier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/Fraisier/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_gogd_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_gogd_en.md new file mode 100644 index 00000000000000..9be26a2b0bc6c5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_gogd_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_gogd XlmRoBertaForTokenClassification from GoGD +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_gogd +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_gogd` is a English model originally trained by GoGD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_gogd_en_5.4.0_3.0_1718118536564.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_gogd_en_5.4.0_3.0_1718118536564.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_gogd","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_gogd", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_gogd| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/GoGD/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_gogd_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_gogd_pipeline_en.md new file mode 100644 index 00000000000000..32c63cb92ddd14 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_gogd_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_gogd_pipeline pipeline XlmRoBertaForTokenClassification from GoGD +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_gogd_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_gogd_pipeline` is a English model originally trained by GoGD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_gogd_pipeline_en_5.4.0_3.0_1718118642815.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_gogd_pipeline_en_5.4.0_3.0_1718118642815.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_gogd_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_gogd_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_gogd_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/GoGD/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_guroruseru_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_guroruseru_en.md new file mode 100644 index 00000000000000..bceb826ed8308f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_guroruseru_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_guroruseru XlmRoBertaForTokenClassification from Guroruseru +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_guroruseru +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_guroruseru` is a English model originally trained by Guroruseru. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_guroruseru_en_5.4.0_3.0_1718106997773.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_guroruseru_en_5.4.0_3.0_1718106997773.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_guroruseru","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_guroruseru", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_guroruseru| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/Guroruseru/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_guroruseru_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_guroruseru_pipeline_en.md new file mode 100644 index 00000000000000..a4668a3a63a69b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_guroruseru_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_guroruseru_pipeline pipeline XlmRoBertaForTokenClassification from Guroruseru +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_guroruseru_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_guroruseru_pipeline` is a English model originally trained by Guroruseru. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_guroruseru_pipeline_en_5.4.0_3.0_1718107099855.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_guroruseru_pipeline_en_5.4.0_3.0_1718107099855.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_guroruseru_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_guroruseru_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_guroruseru_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/Guroruseru/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_guruji108_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_guruji108_en.md new file mode 100644 index 00000000000000..dbe92674e65ee8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_guruji108_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_guruji108 XlmRoBertaForTokenClassification from Guruji108 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_guruji108 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_guruji108` is a English model originally trained by Guruji108. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_guruji108_en_5.4.0_3.0_1718103664598.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_guruji108_en_5.4.0_3.0_1718103664598.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_guruji108","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_guruji108", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_guruji108| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/Guruji108/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_guruji108_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_guruji108_pipeline_en.md new file mode 100644 index 00000000000000..bd64a9835a8373 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_guruji108_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_guruji108_pipeline pipeline XlmRoBertaForTokenClassification from Guruji108 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_guruji108_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_guruji108_pipeline` is a English model originally trained by Guruji108. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_guruji108_pipeline_en_5.4.0_3.0_1718103765265.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_guruji108_pipeline_en_5.4.0_3.0_1718103765265.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_guruji108_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_guruji108_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_guruji108_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/Guruji108/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_jamie613_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_jamie613_en.md new file mode 100644 index 00000000000000..c85c6ba9ff30e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_jamie613_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_jamie613 XlmRoBertaForTokenClassification from jamie613 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_jamie613 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_jamie613` is a English model originally trained by jamie613. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_jamie613_en_5.4.0_3.0_1718117445369.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_jamie613_en_5.4.0_3.0_1718117445369.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_jamie613","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_jamie613", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_jamie613| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/jamie613/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_jamie613_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_jamie613_pipeline_en.md new file mode 100644 index 00000000000000..a492b8b293890b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_jamie613_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_jamie613_pipeline pipeline XlmRoBertaForTokenClassification from jamie613 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_jamie613_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_jamie613_pipeline` is a English model originally trained by jamie613. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_jamie613_pipeline_en_5.4.0_3.0_1718117556257.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_jamie613_pipeline_en_5.4.0_3.0_1718117556257.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_jamie613_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_jamie613_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_jamie613_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/jamie613/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_jgriffi_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_jgriffi_en.md new file mode 100644 index 00000000000000..02e9767d7786bc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_jgriffi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_jgriffi XlmRoBertaForTokenClassification from jgriffi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_jgriffi +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_jgriffi` is a English model originally trained by jgriffi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_jgriffi_en_5.4.0_3.0_1718100406513.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_jgriffi_en_5.4.0_3.0_1718100406513.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_jgriffi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_jgriffi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_jgriffi| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.5 MB| + +## References + +https://huggingface.co/jgriffi/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_jgriffi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_jgriffi_pipeline_en.md new file mode 100644 index 00000000000000..ddf591632136b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_jgriffi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_jgriffi_pipeline pipeline XlmRoBertaForTokenClassification from jgriffi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_jgriffi_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_jgriffi_pipeline` is a English model originally trained by jgriffi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_jgriffi_pipeline_en_5.4.0_3.0_1718100503871.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_jgriffi_pipeline_en_5.4.0_3.0_1718100503871.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_jgriffi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_jgriffi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_jgriffi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.5 MB| + +## References + +https://huggingface.co/jgriffi/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_kenhoffman_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_kenhoffman_en.md new file mode 100644 index 00000000000000..86127aa7075fe4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_kenhoffman_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_kenhoffman XlmRoBertaForTokenClassification from kenhoffman +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_kenhoffman +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_kenhoffman` is a English model originally trained by kenhoffman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_kenhoffman_en_5.4.0_3.0_1718129000647.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_kenhoffman_en_5.4.0_3.0_1718129000647.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_kenhoffman","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_kenhoffman", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_kenhoffman| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/kenhoffman/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_kenhoffman_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_kenhoffman_pipeline_en.md new file mode 100644 index 00000000000000..f3bc82863d855d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_kenhoffman_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_kenhoffman_pipeline pipeline XlmRoBertaForTokenClassification from kenhoffman +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_kenhoffman_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_kenhoffman_pipeline` is a English model originally trained by kenhoffman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_kenhoffman_pipeline_en_5.4.0_3.0_1718129106082.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_kenhoffman_pipeline_en_5.4.0_3.0_1718129106082.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_kenhoffman_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_kenhoffman_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_kenhoffman_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/kenhoffman/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_kiechu_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_kiechu_en.md new file mode 100644 index 00000000000000..ae2d6d4c1c02ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_kiechu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_kiechu XlmRoBertaForTokenClassification from kiechu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_kiechu +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_kiechu` is a English model originally trained by kiechu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_kiechu_en_5.4.0_3.0_1718112138878.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_kiechu_en_5.4.0_3.0_1718112138878.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_kiechu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_kiechu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_kiechu| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|827.9 MB| + +## References + +https://huggingface.co/kiechu/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_kiechu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_kiechu_pipeline_en.md new file mode 100644 index 00000000000000..c9cc2ba5784aa3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_kiechu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_kiechu_pipeline pipeline XlmRoBertaForTokenClassification from kiechu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_kiechu_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_kiechu_pipeline` is a English model originally trained by kiechu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_kiechu_pipeline_en_5.4.0_3.0_1718112259482.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_kiechu_pipeline_en_5.4.0_3.0_1718112259482.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_kiechu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_kiechu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_kiechu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|827.9 MB| + +## References + +https://huggingface.co/kiechu/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_koroku_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_koroku_en.md new file mode 100644 index 00000000000000..6366691e04b6a6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_koroku_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_koroku XlmRoBertaForTokenClassification from koroku +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_koroku +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_koroku` is a English model originally trained by koroku. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_koroku_en_5.4.0_3.0_1718126549452.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_koroku_en_5.4.0_3.0_1718126549452.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_koroku","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_koroku", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_koroku| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.3 MB| + +## References + +https://huggingface.co/koroku/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_koroku_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_koroku_pipeline_en.md new file mode 100644 index 00000000000000..e002f3a9a3530a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_koroku_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_koroku_pipeline pipeline XlmRoBertaForTokenClassification from koroku +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_koroku_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_koroku_pipeline` is a English model originally trained by koroku. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_koroku_pipeline_en_5.4.0_3.0_1718126658951.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_koroku_pipeline_en_5.4.0_3.0_1718126658951.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_koroku_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_koroku_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_koroku_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.3 MB| + +## References + +https://huggingface.co/koroku/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_laurentiustancioiu_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_laurentiustancioiu_en.md new file mode 100644 index 00000000000000..95764abdcb3597 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_laurentiustancioiu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_laurentiustancioiu XlmRoBertaForTokenClassification from LaurentiuStancioiu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_laurentiustancioiu +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_laurentiustancioiu` is a English model originally trained by LaurentiuStancioiu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_laurentiustancioiu_en_5.4.0_3.0_1718107160865.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_laurentiustancioiu_en_5.4.0_3.0_1718107160865.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_laurentiustancioiu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_laurentiustancioiu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_laurentiustancioiu| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/LaurentiuStancioiu/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_laurentiustancioiu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_laurentiustancioiu_pipeline_en.md new file mode 100644 index 00000000000000..774b5b140a875d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_laurentiustancioiu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_laurentiustancioiu_pipeline pipeline XlmRoBertaForTokenClassification from LaurentiuStancioiu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_laurentiustancioiu_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_laurentiustancioiu_pipeline` is a English model originally trained by LaurentiuStancioiu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_laurentiustancioiu_pipeline_en_5.4.0_3.0_1718107264003.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_laurentiustancioiu_pipeline_en_5.4.0_3.0_1718107264003.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_laurentiustancioiu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_laurentiustancioiu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_laurentiustancioiu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/LaurentiuStancioiu/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_munsu_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_munsu_en.md new file mode 100644 index 00000000000000..42d71a7a1232a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_munsu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_munsu XlmRoBertaForTokenClassification from MunSu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_munsu +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_munsu` is a English model originally trained by MunSu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_munsu_en_5.4.0_3.0_1718111168323.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_munsu_en_5.4.0_3.0_1718111168323.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_munsu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_munsu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_munsu| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.5 MB| + +## References + +https://huggingface.co/MunSu/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_munsu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_munsu_pipeline_en.md new file mode 100644 index 00000000000000..8eddb68f1432da --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_munsu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_munsu_pipeline pipeline XlmRoBertaForTokenClassification from MunSu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_munsu_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_munsu_pipeline` is a English model originally trained by MunSu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_munsu_pipeline_en_5.4.0_3.0_1718111251045.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_munsu_pipeline_en_5.4.0_3.0_1718111251045.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_munsu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_munsu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_munsu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.5 MB| + +## References + +https://huggingface.co/MunSu/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_nrazavi_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_nrazavi_en.md new file mode 100644 index 00000000000000..addf7d37a2fc79 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_nrazavi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_nrazavi XlmRoBertaForTokenClassification from nrazavi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_nrazavi +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_nrazavi` is a English model originally trained by nrazavi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_nrazavi_en_5.4.0_3.0_1718100833288.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_nrazavi_en_5.4.0_3.0_1718100833288.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_nrazavi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_nrazavi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_nrazavi| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/nrazavi/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_nrazavi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_nrazavi_pipeline_en.md new file mode 100644 index 00000000000000..d18462c96d4709 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_nrazavi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_nrazavi_pipeline pipeline XlmRoBertaForTokenClassification from nrazavi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_nrazavi_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_nrazavi_pipeline` is a English model originally trained by nrazavi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_nrazavi_pipeline_en_5.4.0_3.0_1718100941022.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_nrazavi_pipeline_en_5.4.0_3.0_1718100941022.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_nrazavi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_nrazavi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_nrazavi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/nrazavi/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_paww_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_paww_en.md new file mode 100644 index 00000000000000..6f944d301a6b45 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_paww_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_paww XlmRoBertaForTokenClassification from paww +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_paww +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_paww` is a English model originally trained by paww. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_paww_en_5.4.0_3.0_1718102554223.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_paww_en_5.4.0_3.0_1718102554223.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_paww","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_paww", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_paww| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/paww/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_paww_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_paww_pipeline_en.md new file mode 100644 index 00000000000000..57cd39917d790e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_paww_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_paww_pipeline pipeline XlmRoBertaForTokenClassification from paww +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_paww_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_paww_pipeline` is a English model originally trained by paww. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_paww_pipeline_en_5.4.0_3.0_1718102656316.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_paww_pipeline_en_5.4.0_3.0_1718102656316.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_paww_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_paww_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_paww_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/paww/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_praboda_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_praboda_en.md new file mode 100644 index 00000000000000..a37aafab383dce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_praboda_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_praboda XlmRoBertaForTokenClassification from Praboda +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_praboda +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_praboda` is a English model originally trained by Praboda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_praboda_en_5.4.0_3.0_1718106344629.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_praboda_en_5.4.0_3.0_1718106344629.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_praboda","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_praboda", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_praboda| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/Praboda/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_praboda_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_praboda_pipeline_en.md new file mode 100644 index 00000000000000..28572a0de80c31 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_praboda_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_praboda_pipeline pipeline XlmRoBertaForTokenClassification from Praboda +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_praboda_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_praboda_pipeline` is a English model originally trained by Praboda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_praboda_pipeline_en_5.4.0_3.0_1718106452247.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_praboda_pipeline_en_5.4.0_3.0_1718106452247.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_praboda_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_praboda_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_praboda_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/Praboda/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_sbpark_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_sbpark_en.md new file mode 100644 index 00000000000000..d844203abaeee8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_sbpark_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_sbpark XlmRoBertaForTokenClassification from sbpark +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_sbpark +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_sbpark` is a English model originally trained by sbpark. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_sbpark_en_5.4.0_3.0_1718134367294.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_sbpark_en_5.4.0_3.0_1718134367294.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_sbpark","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_sbpark", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_sbpark| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/sbpark/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_sbpark_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_sbpark_pipeline_en.md new file mode 100644 index 00000000000000..ad743fe02741ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_sbpark_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_sbpark_pipeline pipeline XlmRoBertaForTokenClassification from sbpark +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_sbpark_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_sbpark_pipeline` is a English model originally trained by sbpark. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_sbpark_pipeline_en_5.4.0_3.0_1718134472473.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_sbpark_pipeline_en_5.4.0_3.0_1718134472473.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_sbpark_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_sbpark_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_sbpark_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/sbpark/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_sreek_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_sreek_en.md new file mode 100644 index 00000000000000..5d3c54d8093075 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_sreek_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_sreek XlmRoBertaForTokenClassification from Sreek +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_sreek +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_sreek` is a English model originally trained by Sreek. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_sreek_en_5.4.0_3.0_1718107166207.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_sreek_en_5.4.0_3.0_1718107166207.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_sreek","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_sreek", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_sreek| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/Sreek/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_sreek_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_sreek_pipeline_en.md new file mode 100644 index 00000000000000..3380edc4d1a56d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_sreek_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_sreek_pipeline pipeline XlmRoBertaForTokenClassification from Sreek +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_sreek_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_sreek_pipeline` is a English model originally trained by Sreek. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_sreek_pipeline_en_5.4.0_3.0_1718107269312.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_sreek_pipeline_en_5.4.0_3.0_1718107269312.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_sreek_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_sreek_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_sreek_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/Sreek/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_team_nave_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_team_nave_en.md new file mode 100644 index 00000000000000..582efe8a8f5d29 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_team_nave_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_team_nave XlmRoBertaForTokenClassification from team-nave +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_team_nave +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_team_nave` is a English model originally trained by team-nave. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_team_nave_en_5.4.0_3.0_1718126531548.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_team_nave_en_5.4.0_3.0_1718126531548.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_team_nave","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_team_nave", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_team_nave| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|835.3 MB| + +## References + +https://huggingface.co/team-nave/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_team_nave_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_team_nave_pipeline_en.md new file mode 100644 index 00000000000000..dc4816266d267e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_team_nave_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_team_nave_pipeline pipeline XlmRoBertaForTokenClassification from team-nave +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_team_nave_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_team_nave_pipeline` is a English model originally trained by team-nave. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_team_nave_pipeline_en_5.4.0_3.0_1718126646597.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_team_nave_pipeline_en_5.4.0_3.0_1718126646597.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_team_nave_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_team_nave_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_team_nave_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|835.4 MB| + +## References + +https://huggingface.co/team-nave/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_thkkvui_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_thkkvui_en.md new file mode 100644 index 00000000000000..a56cf40981663c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_thkkvui_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_thkkvui XlmRoBertaForTokenClassification from thkkvui +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_thkkvui +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_thkkvui` is a English model originally trained by thkkvui. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_thkkvui_en_5.4.0_3.0_1718111185799.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_thkkvui_en_5.4.0_3.0_1718111185799.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_thkkvui","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_thkkvui", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_thkkvui| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/thkkvui/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_thkkvui_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_thkkvui_pipeline_en.md new file mode 100644 index 00000000000000..bfb1beff0ce502 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_thkkvui_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_thkkvui_pipeline pipeline XlmRoBertaForTokenClassification from thkkvui +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_thkkvui_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_thkkvui_pipeline` is a English model originally trained by thkkvui. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_thkkvui_pipeline_en_5.4.0_3.0_1718111286791.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_thkkvui_pipeline_en_5.4.0_3.0_1718111286791.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_thkkvui_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_thkkvui_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_thkkvui_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/thkkvui/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_tyayoi_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_tyayoi_en.md new file mode 100644 index 00000000000000..dd6b3833cc70ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_tyayoi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_tyayoi XlmRoBertaForTokenClassification from tyayoi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_tyayoi +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_tyayoi` is a English model originally trained by tyayoi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_tyayoi_en_5.4.0_3.0_1718109927062.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_tyayoi_en_5.4.0_3.0_1718109927062.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_tyayoi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_tyayoi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_tyayoi| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/tyayoi/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_tyayoi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_tyayoi_pipeline_en.md new file mode 100644 index 00000000000000..2243b7314d0dd8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_tyayoi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_tyayoi_pipeline pipeline XlmRoBertaForTokenClassification from tyayoi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_tyayoi_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_tyayoi_pipeline` is a English model originally trained by tyayoi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_tyayoi_pipeline_en_5.4.0_3.0_1718110029878.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_tyayoi_pipeline_en_5.4.0_3.0_1718110029878.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_tyayoi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_tyayoi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_tyayoi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/tyayoi/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_abdus_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_abdus_en.md new file mode 100644 index 00000000000000..0dbacbc3aa6562 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_abdus_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_abdus XlmRoBertaForTokenClassification from abdus +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_abdus +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_abdus` is a English model originally trained by abdus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_abdus_en_5.4.0_3.0_1718116880454.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_abdus_en_5.4.0_3.0_1718116880454.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_abdus","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_abdus", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_abdus| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/abdus/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_abdus_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_abdus_pipeline_en.md new file mode 100644 index 00000000000000..b5ed1b1140307b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_abdus_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_abdus_pipeline pipeline XlmRoBertaForTokenClassification from abdus +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_abdus_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_abdus_pipeline` is a English model originally trained by abdus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_abdus_pipeline_en_5.4.0_3.0_1718116976533.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_abdus_pipeline_en_5.4.0_3.0_1718116976533.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_abdus_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_abdus_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_abdus_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/abdus/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_alkampfer_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_alkampfer_en.md new file mode 100644 index 00000000000000..21d6b4cd0f7eac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_alkampfer_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_alkampfer XlmRoBertaForTokenClassification from alkampfer +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_alkampfer +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_alkampfer` is a English model originally trained by alkampfer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_alkampfer_en_5.4.0_3.0_1718132111994.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_alkampfer_en_5.4.0_3.0_1718132111994.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_alkampfer","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_alkampfer", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_alkampfer| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/alkampfer/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_alkampfer_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_alkampfer_pipeline_en.md new file mode 100644 index 00000000000000..68f26f89ed8664 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_alkampfer_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_alkampfer_pipeline pipeline XlmRoBertaForTokenClassification from alkampfer +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_alkampfer_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_alkampfer_pipeline` is a English model originally trained by alkampfer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_alkampfer_pipeline_en_5.4.0_3.0_1718132198347.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_alkampfer_pipeline_en_5.4.0_3.0_1718132198347.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_alkampfer_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_alkampfer_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_alkampfer_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/alkampfer/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_amitjain171980_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_amitjain171980_en.md new file mode 100644 index 00000000000000..5c2d5b25981281 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_amitjain171980_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_amitjain171980 XlmRoBertaForTokenClassification from amitjain171980 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_amitjain171980 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_amitjain171980` is a English model originally trained by amitjain171980. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_amitjain171980_en_5.4.0_3.0_1718122378652.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_amitjain171980_en_5.4.0_3.0_1718122378652.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_amitjain171980","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_amitjain171980", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_amitjain171980| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/amitjain171980/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_amitjain171980_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_amitjain171980_pipeline_en.md new file mode 100644 index 00000000000000..b569e090a91d8a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_amitjain171980_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_amitjain171980_pipeline pipeline XlmRoBertaForTokenClassification from amitjain171980 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_amitjain171980_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_amitjain171980_pipeline` is a English model originally trained by amitjain171980. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_amitjain171980_pipeline_en_5.4.0_3.0_1718122465598.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_amitjain171980_pipeline_en_5.4.0_3.0_1718122465598.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_amitjain171980_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_amitjain171980_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_amitjain171980_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/amitjain171980/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_anniepyim_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_anniepyim_en.md new file mode 100644 index 00000000000000..e0cd670091607d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_anniepyim_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_anniepyim XlmRoBertaForTokenClassification from anniepyim +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_anniepyim +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_anniepyim` is a English model originally trained by anniepyim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_anniepyim_en_5.4.0_3.0_1718124324499.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_anniepyim_en_5.4.0_3.0_1718124324499.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_anniepyim","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_anniepyim", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_anniepyim| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/anniepyim/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_anniepyim_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_anniepyim_pipeline_en.md new file mode 100644 index 00000000000000..1653c42e20a3d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_anniepyim_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_anniepyim_pipeline pipeline XlmRoBertaForTokenClassification from anniepyim +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_anniepyim_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_anniepyim_pipeline` is a English model originally trained by anniepyim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_anniepyim_pipeline_en_5.4.0_3.0_1718124411811.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_anniepyim_pipeline_en_5.4.0_3.0_1718124411811.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_anniepyim_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_anniepyim_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_anniepyim_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/anniepyim/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_antoinev17_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_antoinev17_en.md new file mode 100644 index 00000000000000..fa3ecba5b92cdc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_antoinev17_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_antoinev17 XlmRoBertaForTokenClassification from antoinev17 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_antoinev17 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_antoinev17` is a English model originally trained by antoinev17. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_antoinev17_en_5.4.0_3.0_1718099320343.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_antoinev17_en_5.4.0_3.0_1718099320343.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_antoinev17","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_antoinev17", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_antoinev17| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|854.5 MB| + +## References + +https://huggingface.co/antoinev17/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_antoinev17_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_antoinev17_pipeline_en.md new file mode 100644 index 00000000000000..a3b31d1bd9f6d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_antoinev17_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_antoinev17_pipeline pipeline XlmRoBertaForTokenClassification from antoinev17 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_antoinev17_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_antoinev17_pipeline` is a English model originally trained by antoinev17. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_antoinev17_pipeline_en_5.4.0_3.0_1718099409176.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_antoinev17_pipeline_en_5.4.0_3.0_1718099409176.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_antoinev17_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_antoinev17_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_antoinev17_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|854.5 MB| + +## References + +https://huggingface.co/antoinev17/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_arthur_75_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_arthur_75_en.md new file mode 100644 index 00000000000000..5e430799aef66b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_arthur_75_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_arthur_75 XlmRoBertaForTokenClassification from Arthur-75 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_arthur_75 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_arthur_75` is a English model originally trained by Arthur-75. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_arthur_75_en_5.4.0_3.0_1718119701751.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_arthur_75_en_5.4.0_3.0_1718119701751.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_arthur_75","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_arthur_75", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_arthur_75| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/Arthur-75/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_arthur_75_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_arthur_75_pipeline_en.md new file mode 100644 index 00000000000000..9307b14243316e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_arthur_75_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_arthur_75_pipeline pipeline XlmRoBertaForTokenClassification from Arthur-75 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_arthur_75_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_arthur_75_pipeline` is a English model originally trained by Arthur-75. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_arthur_75_pipeline_en_5.4.0_3.0_1718119815124.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_arthur_75_pipeline_en_5.4.0_3.0_1718119815124.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_arthur_75_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_arthur_75_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_arthur_75_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/Arthur-75/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ashrielbrian_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ashrielbrian_en.md new file mode 100644 index 00000000000000..c34922d693412b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ashrielbrian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_ashrielbrian XlmRoBertaForTokenClassification from ashrielbrian +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_ashrielbrian +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_ashrielbrian` is a English model originally trained by ashrielbrian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_ashrielbrian_en_5.4.0_3.0_1718104690549.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_ashrielbrian_en_5.4.0_3.0_1718104690549.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_ashrielbrian","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_ashrielbrian", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_ashrielbrian| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/ashrielbrian/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ashrielbrian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ashrielbrian_pipeline_en.md new file mode 100644 index 00000000000000..e9a8ce4bafcef4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ashrielbrian_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_ashrielbrian_pipeline pipeline XlmRoBertaForTokenClassification from ashrielbrian +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_ashrielbrian_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_ashrielbrian_pipeline` is a English model originally trained by ashrielbrian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_ashrielbrian_pipeline_en_5.4.0_3.0_1718104777282.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_ashrielbrian_pipeline_en_5.4.0_3.0_1718104777282.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_ashrielbrian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_ashrielbrian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_ashrielbrian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/ashrielbrian/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_backdrive_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_backdrive_en.md new file mode 100644 index 00000000000000..c701e397f17608 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_backdrive_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_backdrive XlmRoBertaForTokenClassification from Backdrive +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_backdrive +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_backdrive` is a English model originally trained by Backdrive. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_backdrive_en_5.4.0_3.0_1718127006094.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_backdrive_en_5.4.0_3.0_1718127006094.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_backdrive","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_backdrive", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_backdrive| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Backdrive/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_backdrive_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_backdrive_pipeline_en.md new file mode 100644 index 00000000000000..bfcc7ae94dd772 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_backdrive_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_backdrive_pipeline pipeline XlmRoBertaForTokenClassification from Backdrive +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_backdrive_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_backdrive_pipeline` is a English model originally trained by Backdrive. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_backdrive_pipeline_en_5.4.0_3.0_1718127090486.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_backdrive_pipeline_en_5.4.0_3.0_1718127090486.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_backdrive_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_backdrive_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_backdrive_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Backdrive/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_benjiccee_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_benjiccee_en.md new file mode 100644 index 00000000000000..1fb496ba903772 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_benjiccee_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_benjiccee XlmRoBertaForTokenClassification from Benjiccee +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_benjiccee +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_benjiccee` is a English model originally trained by Benjiccee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_benjiccee_en_5.4.0_3.0_1718104068280.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_benjiccee_en_5.4.0_3.0_1718104068280.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_benjiccee","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_benjiccee", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_benjiccee| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Benjiccee/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_benjiccee_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_benjiccee_pipeline_en.md new file mode 100644 index 00000000000000..d7c5b49d576a32 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_benjiccee_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_benjiccee_pipeline pipeline XlmRoBertaForTokenClassification from Benjiccee +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_benjiccee_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_benjiccee_pipeline` is a English model originally trained by Benjiccee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_benjiccee_pipeline_en_5.4.0_3.0_1718104155327.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_benjiccee_pipeline_en_5.4.0_3.0_1718104155327.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_benjiccee_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_benjiccee_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_benjiccee_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Benjiccee/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_chris_choi_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_chris_choi_en.md new file mode 100644 index 00000000000000..92e7989cf267bb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_chris_choi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_chris_choi XlmRoBertaForTokenClassification from Chris-choi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_chris_choi +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_chris_choi` is a English model originally trained by Chris-choi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_chris_choi_en_5.4.0_3.0_1718125306122.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_chris_choi_en_5.4.0_3.0_1718125306122.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_chris_choi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_chris_choi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_chris_choi| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/Chris-choi/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_chris_choi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_chris_choi_pipeline_en.md new file mode 100644 index 00000000000000..21578ea4300c2c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_chris_choi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_chris_choi_pipeline pipeline XlmRoBertaForTokenClassification from Chris-choi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_chris_choi_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_chris_choi_pipeline` is a English model originally trained by Chris-choi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_chris_choi_pipeline_en_5.4.0_3.0_1718125424663.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_chris_choi_pipeline_en_5.4.0_3.0_1718125424663.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_chris_choi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_chris_choi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_chris_choi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/Chris-choi/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_coyote78_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_coyote78_en.md new file mode 100644 index 00000000000000..ab929ac32c6849 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_coyote78_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_coyote78 XlmRoBertaForTokenClassification from coyote78 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_coyote78 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_coyote78` is a English model originally trained by coyote78. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_coyote78_en_5.4.0_3.0_1718104705066.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_coyote78_en_5.4.0_3.0_1718104705066.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_coyote78","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_coyote78", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_coyote78| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/coyote78/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_coyote78_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_coyote78_pipeline_en.md new file mode 100644 index 00000000000000..b8e12563a37258 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_coyote78_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_coyote78_pipeline pipeline XlmRoBertaForTokenClassification from coyote78 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_coyote78_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_coyote78_pipeline` is a English model originally trained by coyote78. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_coyote78_pipeline_en_5.4.0_3.0_1718104800436.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_coyote78_pipeline_en_5.4.0_3.0_1718104800436.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_coyote78_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_coyote78_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_coyote78_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/coyote78/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_dazzid_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_dazzid_en.md new file mode 100644 index 00000000000000..75a6458375206e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_dazzid_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_dazzid XlmRoBertaForTokenClassification from Dazzid +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_dazzid +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_dazzid` is a English model originally trained by Dazzid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_dazzid_en_5.4.0_3.0_1718099433191.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_dazzid_en_5.4.0_3.0_1718099433191.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_dazzid","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_dazzid", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_dazzid| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Dazzid/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_dazzid_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_dazzid_pipeline_en.md new file mode 100644 index 00000000000000..162743d16f4265 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_dazzid_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_dazzid_pipeline pipeline XlmRoBertaForTokenClassification from Dazzid +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_dazzid_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_dazzid_pipeline` is a English model originally trained by Dazzid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_dazzid_pipeline_en_5.4.0_3.0_1718099520041.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_dazzid_pipeline_en_5.4.0_3.0_1718099520041.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_dazzid_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_dazzid_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_dazzid_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Dazzid/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_deblagoj_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_deblagoj_en.md new file mode 100644 index 00000000000000..253e2f1715fd6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_deblagoj_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_deblagoj XlmRoBertaForTokenClassification from deblagoj +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_deblagoj +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_deblagoj` is a English model originally trained by deblagoj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_deblagoj_en_5.4.0_3.0_1718115016832.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_deblagoj_en_5.4.0_3.0_1718115016832.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_deblagoj","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_deblagoj", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_deblagoj| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|854.1 MB| + +## References + +https://huggingface.co/deblagoj/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_deblagoj_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_deblagoj_pipeline_en.md new file mode 100644 index 00000000000000..3fca7ca1dedde8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_deblagoj_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_deblagoj_pipeline pipeline XlmRoBertaForTokenClassification from deblagoj +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_deblagoj_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_deblagoj_pipeline` is a English model originally trained by deblagoj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_deblagoj_pipeline_en_5.4.0_3.0_1718115097742.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_deblagoj_pipeline_en_5.4.0_3.0_1718115097742.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_deblagoj_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_deblagoj_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_deblagoj_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|854.1 MB| + +## References + +https://huggingface.co/deblagoj/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_dochee_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_dochee_en.md new file mode 100644 index 00000000000000..851498f688b55b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_dochee_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_dochee XlmRoBertaForTokenClassification from Dochee +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_dochee +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_dochee` is a English model originally trained by Dochee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_dochee_en_5.4.0_3.0_1718119549955.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_dochee_en_5.4.0_3.0_1718119549955.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_dochee","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_dochee", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_dochee| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.4 MB| + +## References + +https://huggingface.co/Dochee/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_dochee_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_dochee_pipeline_en.md new file mode 100644 index 00000000000000..ff3b310985e66b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_dochee_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_dochee_pipeline pipeline XlmRoBertaForTokenClassification from Dochee +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_dochee_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_dochee_pipeline` is a English model originally trained by Dochee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_dochee_pipeline_en_5.4.0_3.0_1718119646138.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_dochee_pipeline_en_5.4.0_3.0_1718119646138.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_dochee_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_dochee_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_dochee_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.4 MB| + +## References + +https://huggingface.co/Dochee/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_donaldyy_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_donaldyy_en.md new file mode 100644 index 00000000000000..2fd1d52f2459b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_donaldyy_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_donaldyy XlmRoBertaForTokenClassification from donaldyy +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_donaldyy +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_donaldyy` is a English model originally trained by donaldyy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_donaldyy_en_5.4.0_3.0_1718130118903.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_donaldyy_en_5.4.0_3.0_1718130118903.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_donaldyy","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_donaldyy", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_donaldyy| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/donaldyy/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_donaldyy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_donaldyy_pipeline_en.md new file mode 100644 index 00000000000000..2178aa9b723ea8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_donaldyy_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_donaldyy_pipeline pipeline XlmRoBertaForTokenClassification from donaldyy +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_donaldyy_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_donaldyy_pipeline` is a English model originally trained by donaldyy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_donaldyy_pipeline_en_5.4.0_3.0_1718130226492.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_donaldyy_pipeline_en_5.4.0_3.0_1718130226492.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_donaldyy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_donaldyy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_donaldyy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/donaldyy/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ducdh1210_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ducdh1210_en.md new file mode 100644 index 00000000000000..3dbe33352c93ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ducdh1210_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_ducdh1210 XlmRoBertaForTokenClassification from ducdh1210 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_ducdh1210 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_ducdh1210` is a English model originally trained by ducdh1210. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_ducdh1210_en_5.4.0_3.0_1718125420831.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_ducdh1210_en_5.4.0_3.0_1718125420831.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_ducdh1210","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_ducdh1210", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_ducdh1210| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/ducdh1210/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ducdh1210_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ducdh1210_pipeline_en.md new file mode 100644 index 00000000000000..054e366eba54a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ducdh1210_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_ducdh1210_pipeline pipeline XlmRoBertaForTokenClassification from ducdh1210 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_ducdh1210_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_ducdh1210_pipeline` is a English model originally trained by ducdh1210. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_ducdh1210_pipeline_en_5.4.0_3.0_1718125522574.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_ducdh1210_pipeline_en_5.4.0_3.0_1718125522574.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_ducdh1210_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_ducdh1210_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_ducdh1210_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/ducdh1210/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_100yen_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_100yen_en.md new file mode 100644 index 00000000000000..1d976d74f1bad9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_100yen_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_100yen XlmRoBertaForTokenClassification from 100yen +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_100yen +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_100yen` is a English model originally trained by 100yen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_100yen_en_5.4.0_3.0_1718131268289.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_100yen_en_5.4.0_3.0_1718131268289.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_100yen","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_100yen", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_100yen| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/100yen/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_100yen_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_100yen_pipeline_en.md new file mode 100644 index 00000000000000..835291aa9256e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_100yen_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_100yen_pipeline pipeline XlmRoBertaForTokenClassification from 100yen +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_100yen_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_100yen_pipeline` is a English model originally trained by 100yen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_100yen_pipeline_en_5.4.0_3.0_1718131387496.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_100yen_pipeline_en_5.4.0_3.0_1718131387496.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_100yen_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_100yen_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_100yen_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/100yen/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail_en.md new file mode 100644 index 00000000000000..348c7359bfcaee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail XlmRoBertaForTokenClassification from ahmad-alismail +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail` is a English model originally trained by ahmad-alismail. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail_en_5.4.0_3.0_1718127118950.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail_en_5.4.0_3.0_1718127118950.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/ahmad-alismail/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail_pipeline_en.md new file mode 100644 index 00000000000000..a13b71430a853e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail_pipeline pipeline XlmRoBertaForTokenClassification from ahmad-alismail +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail_pipeline` is a English model originally trained by ahmad-alismail. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail_pipeline_en_5.4.0_3.0_1718127202473.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail_pipeline_en_5.4.0_3.0_1718127202473.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/ahmad-alismail/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_andrew45_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_andrew45_en.md new file mode 100644 index 00000000000000..ae63727fbe701c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_andrew45_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_andrew45 XlmRoBertaForTokenClassification from andrew45 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_andrew45 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_andrew45` is a English model originally trained by andrew45. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_andrew45_en_5.4.0_3.0_1718120895493.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_andrew45_en_5.4.0_3.0_1718120895493.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_andrew45","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_andrew45", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_andrew45| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/andrew45/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_andrew45_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_andrew45_pipeline_en.md new file mode 100644 index 00000000000000..9dc53ab0fefa4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_andrew45_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_andrew45_pipeline pipeline XlmRoBertaForTokenClassification from andrew45 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_andrew45_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_andrew45_pipeline` is a English model originally trained by andrew45. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_andrew45_pipeline_en_5.4.0_3.0_1718121015964.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_andrew45_pipeline_en_5.4.0_3.0_1718121015964.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_andrew45_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_andrew45_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_andrew45_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/andrew45/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_cogitur_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_cogitur_en.md new file mode 100644 index 00000000000000..96ad403833d9d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_cogitur_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_cogitur XlmRoBertaForTokenClassification from cogitur +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_cogitur +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_cogitur` is a English model originally trained by cogitur. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_cogitur_en_5.4.0_3.0_1718111374482.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_cogitur_en_5.4.0_3.0_1718111374482.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_cogitur","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_cogitur", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_cogitur| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/cogitur/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_cogitur_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_cogitur_pipeline_en.md new file mode 100644 index 00000000000000..665d771bf999b9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_cogitur_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_cogitur_pipeline pipeline XlmRoBertaForTokenClassification from cogitur +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_cogitur_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_cogitur_pipeline` is a English model originally trained by cogitur. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_cogitur_pipeline_en_5.4.0_3.0_1718111458848.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_cogitur_pipeline_en_5.4.0_3.0_1718111458848.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_cogitur_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_cogitur_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_cogitur_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/cogitur/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_cyycyy_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_cyycyy_en.md new file mode 100644 index 00000000000000..80dbf969ebba13 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_cyycyy_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_cyycyy XlmRoBertaForTokenClassification from cyycyy +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_cyycyy +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_cyycyy` is a English model originally trained by cyycyy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_cyycyy_en_5.4.0_3.0_1718116757095.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_cyycyy_en_5.4.0_3.0_1718116757095.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_cyycyy","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_cyycyy", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_cyycyy| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/cyycyy/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_cyycyy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_cyycyy_pipeline_en.md new file mode 100644 index 00000000000000..d03477176aad2f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_cyycyy_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_cyycyy_pipeline pipeline XlmRoBertaForTokenClassification from cyycyy +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_cyycyy_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_cyycyy_pipeline` is a English model originally trained by cyycyy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_cyycyy_pipeline_en_5.4.0_3.0_1718116841921.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_cyycyy_pipeline_en_5.4.0_3.0_1718116841921.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_cyycyy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_cyycyy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_cyycyy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/cyycyy/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_haesun_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_haesun_en.md new file mode 100644 index 00000000000000..72d0a570f401c8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_haesun_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_haesun XlmRoBertaForTokenClassification from haesun +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_haesun +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_haesun` is a English model originally trained by haesun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_haesun_en_5.4.0_3.0_1718105842438.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_haesun_en_5.4.0_3.0_1718105842438.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_haesun","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_haesun", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_haesun| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/haesun/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_haesun_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_haesun_pipeline_en.md new file mode 100644 index 00000000000000..c563f494d5b93e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_haesun_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_haesun_pipeline pipeline XlmRoBertaForTokenClassification from haesun +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_haesun_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_haesun_pipeline` is a English model originally trained by haesun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_haesun_pipeline_en_5.4.0_3.0_1718105959103.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_haesun_pipeline_en_5.4.0_3.0_1718105959103.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_haesun_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_haesun_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_haesun_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/haesun/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_hanlforever_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_hanlforever_en.md new file mode 100644 index 00000000000000..478a00e5ab35e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_hanlforever_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_hanlforever XlmRoBertaForTokenClassification from hanlforever +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_hanlforever +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_hanlforever` is a English model originally trained by hanlforever. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_hanlforever_en_5.4.0_3.0_1718135057337.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_hanlforever_en_5.4.0_3.0_1718135057337.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_hanlforever","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_hanlforever", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_hanlforever| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/hanlforever/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_hanlforever_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_hanlforever_pipeline_en.md new file mode 100644 index 00000000000000..c83e6a09059d16 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_hanlforever_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_hanlforever_pipeline pipeline XlmRoBertaForTokenClassification from hanlforever +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_hanlforever_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_hanlforever_pipeline` is a English model originally trained by hanlforever. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_hanlforever_pipeline_en_5.4.0_3.0_1718135163886.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_hanlforever_pipeline_en_5.4.0_3.0_1718135163886.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_hanlforever_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_hanlforever_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_hanlforever_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/hanlforever/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_heerak_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_heerak_en.md new file mode 100644 index 00000000000000..9cc90e7f2ebe5f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_heerak_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_heerak XlmRoBertaForTokenClassification from Heerak +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_heerak +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_heerak` is a English model originally trained by Heerak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_heerak_en_5.4.0_3.0_1718124551631.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_heerak_en_5.4.0_3.0_1718124551631.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_heerak","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_heerak", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_heerak| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/Heerak/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_heerak_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_heerak_pipeline_en.md new file mode 100644 index 00000000000000..15fc40869b3de2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_heerak_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_heerak_pipeline pipeline XlmRoBertaForTokenClassification from Heerak +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_heerak_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_heerak_pipeline` is a English model originally trained by Heerak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_heerak_pipeline_en_5.4.0_3.0_1718124636193.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_heerak_pipeline_en_5.4.0_3.0_1718124636193.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_heerak_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_heerak_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_heerak_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/Heerak/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_henryjiang_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_henryjiang_en.md new file mode 100644 index 00000000000000..048de732711d80 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_henryjiang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_henryjiang XlmRoBertaForTokenClassification from henryjiang +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_henryjiang +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_henryjiang` is a English model originally trained by henryjiang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_henryjiang_en_5.4.0_3.0_1718138157866.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_henryjiang_en_5.4.0_3.0_1718138157866.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_henryjiang","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_henryjiang", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_henryjiang| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.5 MB| + +## References + +https://huggingface.co/henryjiang/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_henryjiang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_henryjiang_pipeline_en.md new file mode 100644 index 00000000000000..0ff9066dcf875f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_henryjiang_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_henryjiang_pipeline pipeline XlmRoBertaForTokenClassification from henryjiang +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_henryjiang_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_henryjiang_pipeline` is a English model originally trained by henryjiang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_henryjiang_pipeline_en_5.4.0_3.0_1718138239265.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_henryjiang_pipeline_en_5.4.0_3.0_1718138239265.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_henryjiang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_henryjiang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_henryjiang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.5 MB| + +## References + +https://huggingface.co/henryjiang/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_hhffxx_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_hhffxx_en.md new file mode 100644 index 00000000000000..ed668a7a2fbe26 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_hhffxx_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_hhffxx XlmRoBertaForTokenClassification from hhffxx +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_hhffxx +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_hhffxx` is a English model originally trained by hhffxx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_hhffxx_en_5.4.0_3.0_1718102635867.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_hhffxx_en_5.4.0_3.0_1718102635867.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_hhffxx","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_hhffxx", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_hhffxx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.5 MB| + +## References + +https://huggingface.co/hhffxx/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_hhffxx_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_hhffxx_pipeline_en.md new file mode 100644 index 00000000000000..3812d8f044f7e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_hhffxx_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_hhffxx_pipeline pipeline XlmRoBertaForTokenClassification from hhffxx +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_hhffxx_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_hhffxx_pipeline` is a English model originally trained by hhffxx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_hhffxx_pipeline_en_5.4.0_3.0_1718102716868.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_hhffxx_pipeline_en_5.4.0_3.0_1718102716868.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_hhffxx_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_hhffxx_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_hhffxx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.5 MB| + +## References + +https://huggingface.co/hhffxx/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_huggingbase_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_huggingbase_en.md new file mode 100644 index 00000000000000..d1ebceb3cc1d06 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_huggingbase_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_huggingbase XlmRoBertaForTokenClassification from huggingbase +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_huggingbase +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_huggingbase` is a English model originally trained by huggingbase. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_huggingbase_en_5.4.0_3.0_1718101546763.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_huggingbase_en_5.4.0_3.0_1718101546763.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_huggingbase","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_huggingbase", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_huggingbase| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/huggingbase/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_huggingbase_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_huggingbase_pipeline_en.md new file mode 100644 index 00000000000000..93a5813a855cc7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_huggingbase_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_huggingbase_pipeline pipeline XlmRoBertaForTokenClassification from huggingbase +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_huggingbase_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_huggingbase_pipeline` is a English model originally trained by huggingbase. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_huggingbase_pipeline_en_5.4.0_3.0_1718101630829.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_huggingbase_pipeline_en_5.4.0_3.0_1718101630829.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_huggingbase_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_huggingbase_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_huggingbase_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/huggingbase/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_jamie613_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_jamie613_en.md new file mode 100644 index 00000000000000..748b8b4bc4ea17 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_jamie613_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_jamie613 XlmRoBertaForTokenClassification from jamie613 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_jamie613 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_jamie613` is a English model originally trained by jamie613. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_jamie613_en_5.4.0_3.0_1718131268033.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_jamie613_en_5.4.0_3.0_1718131268033.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_jamie613","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_jamie613", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_jamie613| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/jamie613/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_jamie613_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_jamie613_pipeline_en.md new file mode 100644 index 00000000000000..3833ee17d71a12 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_jamie613_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_jamie613_pipeline pipeline XlmRoBertaForTokenClassification from jamie613 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_jamie613_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_jamie613_pipeline` is a English model originally trained by jamie613. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_jamie613_pipeline_en_5.4.0_3.0_1718131360659.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_jamie613_pipeline_en_5.4.0_3.0_1718131360659.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_jamie613_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_jamie613_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_jamie613_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/jamie613/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_jojeyh_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_jojeyh_en.md new file mode 100644 index 00000000000000..0dabb6e853e4ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_jojeyh_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_jojeyh XlmRoBertaForTokenClassification from jojeyh +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_jojeyh +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_jojeyh` is a English model originally trained by jojeyh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_jojeyh_en_5.4.0_3.0_1718109238641.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_jojeyh_en_5.4.0_3.0_1718109238641.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_jojeyh","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_jojeyh", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_jojeyh| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/jojeyh/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_jojeyh_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_jojeyh_pipeline_en.md new file mode 100644 index 00000000000000..b07b12467d2b6e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_jojeyh_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_jojeyh_pipeline pipeline XlmRoBertaForTokenClassification from jojeyh +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_jojeyh_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_jojeyh_pipeline` is a English model originally trained by jojeyh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_jojeyh_pipeline_en_5.4.0_3.0_1718109322711.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_jojeyh_pipeline_en_5.4.0_3.0_1718109322711.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_jojeyh_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_jojeyh_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_jojeyh_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/jojeyh/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_k3lana_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_k3lana_en.md new file mode 100644 index 00000000000000..2d834a46850450 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_k3lana_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_k3lana XlmRoBertaForTokenClassification from k3lana +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_k3lana +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_k3lana` is a English model originally trained by k3lana. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_k3lana_en_5.4.0_3.0_1718121913813.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_k3lana_en_5.4.0_3.0_1718121913813.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_k3lana","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_k3lana", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_k3lana| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/k3lana/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_k3lana_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_k3lana_pipeline_en.md new file mode 100644 index 00000000000000..697dd445e0163e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_k3lana_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_k3lana_pipeline pipeline XlmRoBertaForTokenClassification from k3lana +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_k3lana_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_k3lana_pipeline` is a English model originally trained by k3lana. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_k3lana_pipeline_en_5.4.0_3.0_1718122001080.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_k3lana_pipeline_en_5.4.0_3.0_1718122001080.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_k3lana_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_k3lana_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_k3lana_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/k3lana/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_likejazz_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_likejazz_en.md new file mode 100644 index 00000000000000..f587020af0b283 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_likejazz_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_likejazz XlmRoBertaForTokenClassification from likejazz +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_likejazz +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_likejazz` is a English model originally trained by likejazz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_likejazz_en_5.4.0_3.0_1718121933346.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_likejazz_en_5.4.0_3.0_1718121933346.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_likejazz","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_likejazz", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_likejazz| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|854.4 MB| + +## References + +https://huggingface.co/likejazz/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_likejazz_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_likejazz_pipeline_en.md new file mode 100644 index 00000000000000..70cf58ae55b2d2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_likejazz_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_likejazz_pipeline pipeline XlmRoBertaForTokenClassification from likejazz +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_likejazz_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_likejazz_pipeline` is a English model originally trained by likejazz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_likejazz_pipeline_en_5.4.0_3.0_1718122058650.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_likejazz_pipeline_en_5.4.0_3.0_1718122058650.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_likejazz_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_likejazz_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_likejazz_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|854.4 MB| + +## References + +https://huggingface.co/likejazz/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_nisimura_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_nisimura_en.md new file mode 100644 index 00000000000000..ea40b7a4d17122 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_nisimura_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_nisimura XlmRoBertaForTokenClassification from nisimura +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_nisimura +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_nisimura` is a English model originally trained by nisimura. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_nisimura_en_5.4.0_3.0_1718109869993.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_nisimura_en_5.4.0_3.0_1718109869993.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_nisimura","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_nisimura", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_nisimura| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|857.0 MB| + +## References + +https://huggingface.co/nisimura/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_nisimura_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_nisimura_pipeline_en.md new file mode 100644 index 00000000000000..d188c6ec66b462 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_nisimura_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_nisimura_pipeline pipeline XlmRoBertaForTokenClassification from nisimura +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_nisimura_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_nisimura_pipeline` is a English model originally trained by nisimura. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_nisimura_pipeline_en_5.4.0_3.0_1718109964767.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_nisimura_pipeline_en_5.4.0_3.0_1718109964767.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_nisimura_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_nisimura_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_nisimura_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|857.0 MB| + +## References + +https://huggingface.co/nisimura/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_philosucker_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_philosucker_en.md new file mode 100644 index 00000000000000..e97633ef667745 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_philosucker_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_philosucker XlmRoBertaForTokenClassification from philosucker +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_philosucker +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_philosucker` is a English model originally trained by philosucker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_philosucker_en_5.4.0_3.0_1718110988569.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_philosucker_en_5.4.0_3.0_1718110988569.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_philosucker","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_philosucker", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_philosucker| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.7 MB| + +## References + +https://huggingface.co/philosucker/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_philosucker_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_philosucker_pipeline_en.md new file mode 100644 index 00000000000000..b9af566d95a695 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_philosucker_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_philosucker_pipeline pipeline XlmRoBertaForTokenClassification from philosucker +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_philosucker_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_philosucker_pipeline` is a English model originally trained by philosucker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_philosucker_pipeline_en_5.4.0_3.0_1718111079452.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_philosucker_pipeline_en_5.4.0_3.0_1718111079452.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_philosucker_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_philosucker_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_philosucker_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.7 MB| + +## References + +https://huggingface.co/philosucker/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_skr1125_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_skr1125_en.md new file mode 100644 index 00000000000000..bc8410ea2aa7bc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_skr1125_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_skr1125 XlmRoBertaForTokenClassification from skr1125 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_skr1125 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_skr1125` is a English model originally trained by skr1125. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_skr1125_en_5.4.0_3.0_1718099314528.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_skr1125_en_5.4.0_3.0_1718099314528.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_skr1125","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_skr1125", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_skr1125| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/skr1125/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_skr1125_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_skr1125_pipeline_en.md new file mode 100644 index 00000000000000..752271f0ea81ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_skr1125_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_skr1125_pipeline pipeline XlmRoBertaForTokenClassification from skr1125 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_skr1125_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_skr1125_pipeline` is a English model originally trained by skr1125. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_skr1125_pipeline_en_5.4.0_3.0_1718099409341.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_skr1125_pipeline_en_5.4.0_3.0_1718099409341.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_skr1125_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_skr1125_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_skr1125_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/skr1125/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_sunwooooong_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_sunwooooong_en.md new file mode 100644 index 00000000000000..27424f723f227f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_sunwooooong_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_sunwooooong XlmRoBertaForTokenClassification from sunwooooong +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_sunwooooong +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_sunwooooong` is a English model originally trained by sunwooooong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_sunwooooong_en_5.4.0_3.0_1718123034112.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_sunwooooong_en_5.4.0_3.0_1718123034112.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_sunwooooong","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_sunwooooong", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_sunwooooong| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/sunwooooong/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_sunwooooong_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_sunwooooong_pipeline_en.md new file mode 100644 index 00000000000000..8662f3a913c98d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_sunwooooong_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_sunwooooong_pipeline pipeline XlmRoBertaForTokenClassification from sunwooooong +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_sunwooooong_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_sunwooooong_pipeline` is a English model originally trained by sunwooooong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_sunwooooong_pipeline_en_5.4.0_3.0_1718123118627.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_sunwooooong_pipeline_en_5.4.0_3.0_1718123118627.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_sunwooooong_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_sunwooooong_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_sunwooooong_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/sunwooooong/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_transll_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_transll_en.md new file mode 100644 index 00000000000000..d633ab9f9427fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_transll_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_transll XlmRoBertaForTokenClassification from TransLL +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_transll +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_transll` is a English model originally trained by TransLL. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_transll_en_5.4.0_3.0_1718105872509.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_transll_en_5.4.0_3.0_1718105872509.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_transll","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_transll", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_transll| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/TransLL/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_transll_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_transll_pipeline_en.md new file mode 100644 index 00000000000000..9ca56f9ba234dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_transll_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_transll_pipeline pipeline XlmRoBertaForTokenClassification from TransLL +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_transll_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_transll_pipeline` is a English model originally trained by TransLL. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_transll_pipeline_en_5.4.0_3.0_1718105956549.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_transll_pipeline_en_5.4.0_3.0_1718105956549.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_transll_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_transll_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_transll_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/TransLL/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_udon3_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_udon3_en.md new file mode 100644 index 00000000000000..47d5b3ba6c98fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_udon3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_udon3 XlmRoBertaForTokenClassification from udon3 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_udon3 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_udon3` is a English model originally trained by udon3. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_udon3_en_5.4.0_3.0_1718117636042.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_udon3_en_5.4.0_3.0_1718117636042.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_udon3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_udon3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_udon3| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/udon3/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_udon3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_udon3_pipeline_en.md new file mode 100644 index 00000000000000..368c277af3c827 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_udon3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_udon3_pipeline pipeline XlmRoBertaForTokenClassification from udon3 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_udon3_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_udon3_pipeline` is a English model originally trained by udon3. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_udon3_pipeline_en_5.4.0_3.0_1718117728636.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_udon3_pipeline_en_5.4.0_3.0_1718117728636.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_udon3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_udon3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_udon3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/udon3/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_yuri_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_yuri_en.md new file mode 100644 index 00000000000000..7c5c03b5ffd2c5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_yuri_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_yuri XlmRoBertaForTokenClassification from Yuri +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_yuri +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_yuri` is a English model originally trained by Yuri. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_yuri_en_5.4.0_3.0_1718099777534.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_yuri_en_5.4.0_3.0_1718099777534.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_yuri","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_yuri", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_yuri| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/Yuri/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_yuri_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_yuri_pipeline_en.md new file mode 100644 index 00000000000000..100d6cc9b8df2c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_yuri_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_yuri_pipeline pipeline XlmRoBertaForTokenClassification from Yuri +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_yuri_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_yuri_pipeline` is a English model originally trained by Yuri. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_yuri_pipeline_en_5.4.0_3.0_1718099861772.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_yuri_pipeline_en_5.4.0_3.0_1718099861772.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_yuri_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_yuri_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_yuri_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/Yuri/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_gcmsrc_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_gcmsrc_en.md new file mode 100644 index 00000000000000..5620a01a0959d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_gcmsrc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_gcmsrc XlmRoBertaForTokenClassification from gcmsrc +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_gcmsrc +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_gcmsrc` is a English model originally trained by gcmsrc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_gcmsrc_en_5.4.0_3.0_1718100398700.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_gcmsrc_en_5.4.0_3.0_1718100398700.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_gcmsrc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_gcmsrc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_gcmsrc| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/gcmsrc/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_gcmsrc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_gcmsrc_pipeline_en.md new file mode 100644 index 00000000000000..929fdf2491e24d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_gcmsrc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_gcmsrc_pipeline pipeline XlmRoBertaForTokenClassification from gcmsrc +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_gcmsrc_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_gcmsrc_pipeline` is a English model originally trained by gcmsrc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_gcmsrc_pipeline_en_5.4.0_3.0_1718100488048.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_gcmsrc_pipeline_en_5.4.0_3.0_1718100488048.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_gcmsrc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_gcmsrc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_gcmsrc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/gcmsrc/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_guruji108_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_guruji108_en.md new file mode 100644 index 00000000000000..79586cc972568d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_guruji108_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_guruji108 XlmRoBertaForTokenClassification from Guruji108 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_guruji108 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_guruji108` is a English model originally trained by Guruji108. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_guruji108_en_5.4.0_3.0_1718104703463.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_guruji108_en_5.4.0_3.0_1718104703463.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_guruji108","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_guruji108", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_guruji108| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Guruji108/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_guruji108_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_guruji108_pipeline_en.md new file mode 100644 index 00000000000000..b02e0578519944 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_guruji108_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_guruji108_pipeline pipeline XlmRoBertaForTokenClassification from Guruji108 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_guruji108_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_guruji108_pipeline` is a English model originally trained by Guruji108. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_guruji108_pipeline_en_5.4.0_3.0_1718104798218.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_guruji108_pipeline_en_5.4.0_3.0_1718104798218.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_guruji108_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_guruji108_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_guruji108_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Guruji108/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hash1360_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hash1360_en.md new file mode 100644 index 00000000000000..995d926bfb334e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hash1360_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_hash1360 XlmRoBertaForTokenClassification from Hash1360 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_hash1360 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_hash1360` is a English model originally trained by Hash1360. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hash1360_en_5.4.0_3.0_1718132057896.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hash1360_en_5.4.0_3.0_1718132057896.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_hash1360","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_hash1360", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_hash1360| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/Hash1360/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hash1360_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hash1360_pipeline_en.md new file mode 100644 index 00000000000000..6023f1e2434616 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hash1360_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_hash1360_pipeline pipeline XlmRoBertaForTokenClassification from Hash1360 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_hash1360_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_hash1360_pipeline` is a English model originally trained by Hash1360. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hash1360_pipeline_en_5.4.0_3.0_1718132165879.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hash1360_pipeline_en_5.4.0_3.0_1718132165879.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_hash1360_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_hash1360_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_hash1360_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/Hash1360/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hbtemari_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hbtemari_en.md new file mode 100644 index 00000000000000..53b3f8d2867dff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hbtemari_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_hbtemari XlmRoBertaForTokenClassification from HBtemari +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_hbtemari +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_hbtemari` is a English model originally trained by HBtemari. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hbtemari_en_5.4.0_3.0_1718103003850.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hbtemari_en_5.4.0_3.0_1718103003850.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_hbtemari","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_hbtemari", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_hbtemari| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/HBtemari/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hbtemari_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hbtemari_pipeline_en.md new file mode 100644 index 00000000000000..99afedf4971389 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hbtemari_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_hbtemari_pipeline pipeline XlmRoBertaForTokenClassification from HBtemari +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_hbtemari_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_hbtemari_pipeline` is a English model originally trained by HBtemari. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hbtemari_pipeline_en_5.4.0_3.0_1718103090701.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hbtemari_pipeline_en_5.4.0_3.0_1718103090701.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_hbtemari_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_hbtemari_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_hbtemari_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/HBtemari/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hhffxx_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hhffxx_en.md new file mode 100644 index 00000000000000..3b4d12392032d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hhffxx_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_hhffxx XlmRoBertaForTokenClassification from hhffxx +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_hhffxx +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_hhffxx` is a English model originally trained by hhffxx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hhffxx_en_5.4.0_3.0_1718106982560.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hhffxx_en_5.4.0_3.0_1718106982560.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_hhffxx","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_hhffxx", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_hhffxx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|854.3 MB| + +## References + +https://huggingface.co/hhffxx/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hhffxx_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hhffxx_pipeline_en.md new file mode 100644 index 00000000000000..76613a6796d47a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hhffxx_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_hhffxx_pipeline pipeline XlmRoBertaForTokenClassification from hhffxx +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_hhffxx_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_hhffxx_pipeline` is a English model originally trained by hhffxx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hhffxx_pipeline_en_5.4.0_3.0_1718107065701.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hhffxx_pipeline_en_5.4.0_3.0_1718107065701.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_hhffxx_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_hhffxx_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_hhffxx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|854.3 MB| + +## References + +https://huggingface.co/hhffxx/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hitakura_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hitakura_en.md new file mode 100644 index 00000000000000..241969dd664da1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hitakura_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_hitakura XlmRoBertaForTokenClassification from hitakura +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_hitakura +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_hitakura` is a English model originally trained by hitakura. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hitakura_en_5.4.0_3.0_1718117876796.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hitakura_en_5.4.0_3.0_1718117876796.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_hitakura","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_hitakura", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_hitakura| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|471.4 MB| + +## References + +https://huggingface.co/hitakura/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hitakura_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hitakura_pipeline_en.md new file mode 100644 index 00000000000000..3ef1f75a47f906 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hitakura_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_hitakura_pipeline pipeline XlmRoBertaForTokenClassification from hitakura +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_hitakura_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_hitakura_pipeline` is a English model originally trained by hitakura. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hitakura_pipeline_en_5.4.0_3.0_1718117963205.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hitakura_pipeline_en_5.4.0_3.0_1718117963205.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_hitakura_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_hitakura_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_hitakura_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|471.5 MB| + +## References + +https://huggingface.co/hitakura/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hitoshinagaoka_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hitoshinagaoka_en.md new file mode 100644 index 00000000000000..178987011453a6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hitoshinagaoka_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_hitoshinagaoka XlmRoBertaForTokenClassification from hitoshiNagaoka +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_hitoshinagaoka +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_hitoshinagaoka` is a English model originally trained by hitoshiNagaoka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hitoshinagaoka_en_5.4.0_3.0_1718127763830.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hitoshinagaoka_en_5.4.0_3.0_1718127763830.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_hitoshinagaoka","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_hitoshinagaoka", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_hitoshinagaoka| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/hitoshiNagaoka/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hitoshinagaoka_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hitoshinagaoka_pipeline_en.md new file mode 100644 index 00000000000000..3348132b753762 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hitoshinagaoka_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_hitoshinagaoka_pipeline pipeline XlmRoBertaForTokenClassification from hitoshiNagaoka +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_hitoshinagaoka_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_hitoshinagaoka_pipeline` is a English model originally trained by hitoshiNagaoka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hitoshinagaoka_pipeline_en_5.4.0_3.0_1718127850435.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hitoshinagaoka_pipeline_en_5.4.0_3.0_1718127850435.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_hitoshinagaoka_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_hitoshinagaoka_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_hitoshinagaoka_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/hitoshiNagaoka/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hyeonseo_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hyeonseo_en.md new file mode 100644 index 00000000000000..744b3b1449e130 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hyeonseo_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_hyeonseo XlmRoBertaForTokenClassification from Hyeonseo +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_hyeonseo +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_hyeonseo` is a English model originally trained by Hyeonseo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hyeonseo_en_5.4.0_3.0_1718108936407.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hyeonseo_en_5.4.0_3.0_1718108936407.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_hyeonseo","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_hyeonseo", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_hyeonseo| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|808.3 MB| + +## References + +https://huggingface.co/Hyeonseo/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hyeonseo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hyeonseo_pipeline_en.md new file mode 100644 index 00000000000000..1de0eed145d634 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hyeonseo_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_hyeonseo_pipeline pipeline XlmRoBertaForTokenClassification from Hyeonseo +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_hyeonseo_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_hyeonseo_pipeline` is a English model originally trained by Hyeonseo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hyeonseo_pipeline_en_5.4.0_3.0_1718109056653.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hyeonseo_pipeline_en_5.4.0_3.0_1718109056653.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_hyeonseo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_hyeonseo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_hyeonseo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|808.3 MB| + +## References + +https://huggingface.co/Hyeonseo/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_inniok_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_inniok_en.md new file mode 100644 index 00000000000000..b7a76a52007481 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_inniok_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_inniok XlmRoBertaForTokenClassification from inniok +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_inniok +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_inniok` is a English model originally trained by inniok. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_inniok_en_5.4.0_3.0_1718139190318.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_inniok_en_5.4.0_3.0_1718139190318.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_inniok","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_inniok", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_inniok| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/inniok/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_inniok_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_inniok_pipeline_en.md new file mode 100644 index 00000000000000..6aae24171a38e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_inniok_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_inniok_pipeline pipeline XlmRoBertaForTokenClassification from inniok +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_inniok_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_inniok_pipeline` is a English model originally trained by inniok. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_inniok_pipeline_en_5.4.0_3.0_1718139297331.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_inniok_pipeline_en_5.4.0_3.0_1718139297331.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_inniok_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_inniok_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_inniok_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/inniok/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jakobbrunner_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jakobbrunner_en.md new file mode 100644 index 00000000000000..682ad2899330dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jakobbrunner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_jakobbrunner XlmRoBertaForTokenClassification from jakobBrunner +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_jakobbrunner +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_jakobbrunner` is a English model originally trained by jakobBrunner. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jakobbrunner_en_5.4.0_3.0_1718135163047.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jakobbrunner_en_5.4.0_3.0_1718135163047.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_jakobbrunner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_jakobbrunner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_jakobbrunner| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/jakobBrunner/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jakobbrunner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jakobbrunner_pipeline_en.md new file mode 100644 index 00000000000000..360982b79f14ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jakobbrunner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_jakobbrunner_pipeline pipeline XlmRoBertaForTokenClassification from jakobBrunner +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_jakobbrunner_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_jakobbrunner_pipeline` is a English model originally trained by jakobBrunner. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jakobbrunner_pipeline_en_5.4.0_3.0_1718135258571.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jakobbrunner_pipeline_en_5.4.0_3.0_1718135258571.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_jakobbrunner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_jakobbrunner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_jakobbrunner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/jakobBrunner/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jjglilleberg_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jjglilleberg_en.md new file mode 100644 index 00000000000000..da4a1831fe6316 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jjglilleberg_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_jjglilleberg XlmRoBertaForTokenClassification from jjglilleberg +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_jjglilleberg +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_jjglilleberg` is a English model originally trained by jjglilleberg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jjglilleberg_en_5.4.0_3.0_1718120147023.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jjglilleberg_en_5.4.0_3.0_1718120147023.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_jjglilleberg","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_jjglilleberg", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_jjglilleberg| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/jjglilleberg/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jjglilleberg_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jjglilleberg_pipeline_en.md new file mode 100644 index 00000000000000..6bfd40ee56701e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jjglilleberg_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_jjglilleberg_pipeline pipeline XlmRoBertaForTokenClassification from jjglilleberg +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_jjglilleberg_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_jjglilleberg_pipeline` is a English model originally trained by jjglilleberg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jjglilleberg_pipeline_en_5.4.0_3.0_1718120254741.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jjglilleberg_pipeline_en_5.4.0_3.0_1718120254741.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_jjglilleberg_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_jjglilleberg_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_jjglilleberg_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/jjglilleberg/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_junghim_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_junghim_en.md new file mode 100644 index 00000000000000..94dfcae1ba40cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_junghim_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_junghim XlmRoBertaForTokenClassification from Junghim +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_junghim +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_junghim` is a English model originally trained by Junghim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_junghim_en_5.4.0_3.0_1718115675144.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_junghim_en_5.4.0_3.0_1718115675144.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_junghim","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_junghim", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_junghim| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Junghim/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_junghim_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_junghim_pipeline_en.md new file mode 100644 index 00000000000000..2371599f22e8bc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_junghim_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_junghim_pipeline pipeline XlmRoBertaForTokenClassification from Junghim +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_junghim_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_junghim_pipeline` is a English model originally trained by Junghim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_junghim_pipeline_en_5.4.0_3.0_1718115772572.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_junghim_pipeline_en_5.4.0_3.0_1718115772572.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_junghim_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_junghim_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_junghim_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Junghim/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jzwk_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jzwk_en.md new file mode 100644 index 00000000000000..96283eecbedc30 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jzwk_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_jzwk XlmRoBertaForTokenClassification from Jzwk +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_jzwk +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_jzwk` is a English model originally trained by Jzwk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jzwk_en_5.4.0_3.0_1718133041002.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jzwk_en_5.4.0_3.0_1718133041002.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_jzwk","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_jzwk", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_jzwk| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|851.7 MB| + +## References + +https://huggingface.co/Jzwk/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jzwk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jzwk_pipeline_en.md new file mode 100644 index 00000000000000..1040619c6d1752 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jzwk_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_jzwk_pipeline pipeline XlmRoBertaForTokenClassification from Jzwk +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_jzwk_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_jzwk_pipeline` is a English model originally trained by Jzwk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jzwk_pipeline_en_5.4.0_3.0_1718133134180.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jzwk_pipeline_en_5.4.0_3.0_1718133134180.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_jzwk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_jzwk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_jzwk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|851.8 MB| + +## References + +https://huggingface.co/Jzwk/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_k_masaki_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_k_masaki_en.md new file mode 100644 index 00000000000000..7369b3030b5a03 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_k_masaki_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_k_masaki XlmRoBertaForTokenClassification from k-masaki +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_k_masaki +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_k_masaki` is a English model originally trained by k-masaki. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_k_masaki_en_5.4.0_3.0_1718101962125.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_k_masaki_en_5.4.0_3.0_1718101962125.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_k_masaki","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_k_masaki", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_k_masaki| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/k-masaki/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_k_masaki_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_k_masaki_pipeline_en.md new file mode 100644 index 00000000000000..4f362bcebaf1cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_k_masaki_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_k_masaki_pipeline pipeline XlmRoBertaForTokenClassification from k-masaki +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_k_masaki_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_k_masaki_pipeline` is a English model originally trained by k-masaki. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_k_masaki_pipeline_en_5.4.0_3.0_1718102048238.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_k_masaki_pipeline_en_5.4.0_3.0_1718102048238.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_k_masaki_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_k_masaki_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_k_masaki_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/k-masaki/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_karolk_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_karolk_en.md new file mode 100644 index 00000000000000..b217176047ec0d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_karolk_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_karolk XlmRoBertaForTokenClassification from KarolK +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_karolk +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_karolk` is a English model originally trained by KarolK. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_karolk_en_5.4.0_3.0_1718123398719.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_karolk_en_5.4.0_3.0_1718123398719.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_karolk","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_karolk", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_karolk| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|851.7 MB| + +## References + +https://huggingface.co/KarolK/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_karolk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_karolk_pipeline_en.md new file mode 100644 index 00000000000000..6cfa60b222ebf2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_karolk_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_karolk_pipeline pipeline XlmRoBertaForTokenClassification from KarolK +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_karolk_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_karolk_pipeline` is a English model originally trained by KarolK. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_karolk_pipeline_en_5.4.0_3.0_1718123492213.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_karolk_pipeline_en_5.4.0_3.0_1718123492213.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_karolk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_karolk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_karolk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|851.8 MB| + +## References + +https://huggingface.co/KarolK/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_kenhoffman_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_kenhoffman_en.md new file mode 100644 index 00000000000000..9b6ce3b0499b9b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_kenhoffman_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_kenhoffman XlmRoBertaForTokenClassification from kenhoffman +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_kenhoffman +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_kenhoffman` is a English model originally trained by kenhoffman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_kenhoffman_en_5.4.0_3.0_1718132455810.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_kenhoffman_en_5.4.0_3.0_1718132455810.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_kenhoffman","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_kenhoffman", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_kenhoffman| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/kenhoffman/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_kenhoffman_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_kenhoffman_pipeline_en.md new file mode 100644 index 00000000000000..c26f4ea797c31c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_kenhoffman_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_kenhoffman_pipeline pipeline XlmRoBertaForTokenClassification from kenhoffman +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_kenhoffman_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_kenhoffman_pipeline` is a English model originally trained by kenhoffman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_kenhoffman_pipeline_en_5.4.0_3.0_1718132542460.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_kenhoffman_pipeline_en_5.4.0_3.0_1718132542460.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_kenhoffman_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_kenhoffman_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_kenhoffman_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/kenhoffman/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_khadija267_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_khadija267_en.md new file mode 100644 index 00000000000000..8801ae0c40dd22 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_khadija267_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_khadija267 XlmRoBertaForTokenClassification from khadija267 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_khadija267 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_khadija267` is a English model originally trained by khadija267. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_khadija267_en_5.4.0_3.0_1718130099628.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_khadija267_en_5.4.0_3.0_1718130099628.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_khadija267","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_khadija267", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_khadija267| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/khadija267/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_khadija267_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_khadija267_pipeline_en.md new file mode 100644 index 00000000000000..2ace6bf733f98d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_khadija267_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_khadija267_pipeline pipeline XlmRoBertaForTokenClassification from khadija267 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_khadija267_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_khadija267_pipeline` is a English model originally trained by khadija267. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_khadija267_pipeline_en_5.4.0_3.0_1718130185906.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_khadija267_pipeline_en_5.4.0_3.0_1718130185906.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_khadija267_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_khadija267_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_khadija267_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/khadija267/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_leotunganh_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_leotunganh_en.md new file mode 100644 index 00000000000000..753b2f7f801794 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_leotunganh_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_leotunganh XlmRoBertaForTokenClassification from LeoTungAnh +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_leotunganh +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_leotunganh` is a English model originally trained by LeoTungAnh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_leotunganh_en_5.4.0_3.0_1718122360010.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_leotunganh_en_5.4.0_3.0_1718122360010.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_leotunganh","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_leotunganh", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_leotunganh| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|835.3 MB| + +## References + +https://huggingface.co/LeoTungAnh/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_leotunganh_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_leotunganh_pipeline_en.md new file mode 100644 index 00000000000000..9727855b2219f3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_leotunganh_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_leotunganh_pipeline pipeline XlmRoBertaForTokenClassification from LeoTungAnh +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_leotunganh_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_leotunganh_pipeline` is a English model originally trained by LeoTungAnh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_leotunganh_pipeline_en_5.4.0_3.0_1718122474380.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_leotunganh_pipeline_en_5.4.0_3.0_1718122474380.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_leotunganh_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_leotunganh_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_leotunganh_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|835.3 MB| + +## References + +https://huggingface.co/LeoTungAnh/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_likejazz_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_likejazz_en.md new file mode 100644 index 00000000000000..193790a9e611fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_likejazz_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_likejazz XlmRoBertaForTokenClassification from likejazz +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_likejazz +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_likejazz` is a English model originally trained by likejazz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_likejazz_en_5.4.0_3.0_1718119556549.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_likejazz_en_5.4.0_3.0_1718119556549.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_likejazz","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_likejazz", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_likejazz| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|847.3 MB| + +## References + +https://huggingface.co/likejazz/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_likejazz_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_likejazz_pipeline_en.md new file mode 100644 index 00000000000000..84d6f627873cf4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_likejazz_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_likejazz_pipeline pipeline XlmRoBertaForTokenClassification from likejazz +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_likejazz_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_likejazz_pipeline` is a English model originally trained by likejazz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_likejazz_pipeline_en_5.4.0_3.0_1718119692892.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_likejazz_pipeline_en_5.4.0_3.0_1718119692892.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_likejazz_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_likejazz_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_likejazz_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|847.3 MB| + +## References + +https://huggingface.co/likejazz/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_matovu_ronald_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_matovu_ronald_en.md new file mode 100644 index 00000000000000..8f7d74a5431320 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_matovu_ronald_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_matovu_ronald XlmRoBertaForTokenClassification from matovu-ronald +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_matovu_ronald +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_matovu_ronald` is a English model originally trained by matovu-ronald. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_matovu_ronald_en_5.4.0_3.0_1718132072689.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_matovu_ronald_en_5.4.0_3.0_1718132072689.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_matovu_ronald","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_matovu_ronald", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_matovu_ronald| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/matovu-ronald/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_matovu_ronald_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_matovu_ronald_pipeline_en.md new file mode 100644 index 00000000000000..29ef379fe3ec59 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_matovu_ronald_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_matovu_ronald_pipeline pipeline XlmRoBertaForTokenClassification from matovu-ronald +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_matovu_ronald_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_matovu_ronald_pipeline` is a English model originally trained by matovu-ronald. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_matovu_ronald_pipeline_en_5.4.0_3.0_1718132181729.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_matovu_ronald_pipeline_en_5.4.0_3.0_1718132181729.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_matovu_ronald_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_matovu_ronald_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_matovu_ronald_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/matovu-ronald/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_maxnet_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_maxnet_en.md new file mode 100644 index 00000000000000..b2a2ec2a63bfbd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_maxnet_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_maxnet XlmRoBertaForTokenClassification from Maxnet +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_maxnet +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_maxnet` is a English model originally trained by Maxnet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_maxnet_en_5.4.0_3.0_1718126525706.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_maxnet_en_5.4.0_3.0_1718126525706.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_maxnet","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_maxnet", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_maxnet| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Maxnet/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_maxnet_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_maxnet_pipeline_en.md new file mode 100644 index 00000000000000..e182e3ff4a1649 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_maxnet_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_maxnet_pipeline pipeline XlmRoBertaForTokenClassification from Maxnet +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_maxnet_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_maxnet_pipeline` is a English model originally trained by Maxnet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_maxnet_pipeline_en_5.4.0_3.0_1718126622720.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_maxnet_pipeline_en_5.4.0_3.0_1718126622720.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_maxnet_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_maxnet_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_maxnet_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Maxnet/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mj03_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mj03_en.md new file mode 100644 index 00000000000000..c6dccd1ab60648 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mj03_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_mj03 XlmRoBertaForTokenClassification from MJ03 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_mj03 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_mj03` is a English model originally trained by MJ03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mj03_en_5.4.0_3.0_1718126523601.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mj03_en_5.4.0_3.0_1718126523601.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_mj03","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_mj03", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_mj03| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/MJ03/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mj03_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mj03_pipeline_en.md new file mode 100644 index 00000000000000..fc53c6cda4c1ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mj03_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_mj03_pipeline pipeline XlmRoBertaForTokenClassification from MJ03 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_mj03_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_mj03_pipeline` is a English model originally trained by MJ03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mj03_pipeline_en_5.4.0_3.0_1718126620244.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mj03_pipeline_en_5.4.0_3.0_1718126620244.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_mj03_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_mj03_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_mj03_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/MJ03/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mlewand_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mlewand_en.md new file mode 100644 index 00000000000000..2877b109bc27bc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mlewand_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_mlewand XlmRoBertaForTokenClassification from mlewand +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_mlewand +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_mlewand` is a English model originally trained by mlewand. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mlewand_en_5.4.0_3.0_1718112538480.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mlewand_en_5.4.0_3.0_1718112538480.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_mlewand","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_mlewand", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_mlewand| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/mlewand/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mlewand_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mlewand_pipeline_en.md new file mode 100644 index 00000000000000..6df8610ab38ea4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mlewand_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_mlewand_pipeline pipeline XlmRoBertaForTokenClassification from mlewand +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_mlewand_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_mlewand_pipeline` is a English model originally trained by mlewand. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mlewand_pipeline_en_5.4.0_3.0_1718112625634.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mlewand_pipeline_en_5.4.0_3.0_1718112625634.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_mlewand_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_mlewand_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_mlewand_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/mlewand/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mmiketan_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mmiketan_en.md new file mode 100644 index 00000000000000..c9e0721549dd39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mmiketan_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_mmiketan XlmRoBertaForTokenClassification from mmiketan +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_mmiketan +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_mmiketan` is a English model originally trained by mmiketan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mmiketan_en_5.4.0_3.0_1718101894809.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mmiketan_en_5.4.0_3.0_1718101894809.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_mmiketan","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_mmiketan", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_mmiketan| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/mmiketan/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mmiketan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mmiketan_pipeline_en.md new file mode 100644 index 00000000000000..27910aa260e048 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mmiketan_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_mmiketan_pipeline pipeline XlmRoBertaForTokenClassification from mmiketan +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_mmiketan_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_mmiketan_pipeline` is a English model originally trained by mmiketan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mmiketan_pipeline_en_5.4.0_3.0_1718101981999.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mmiketan_pipeline_en_5.4.0_3.0_1718101981999.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_mmiketan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_mmiketan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_mmiketan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/mmiketan/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mooface_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mooface_en.md new file mode 100644 index 00000000000000..73a08696e5e6db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mooface_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_mooface XlmRoBertaForTokenClassification from mooface +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_mooface +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_mooface` is a English model originally trained by mooface. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mooface_en_5.4.0_3.0_1718119981574.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mooface_en_5.4.0_3.0_1718119981574.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_mooface","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_mooface", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_mooface| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/mooface/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mooface_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mooface_pipeline_en.md new file mode 100644 index 00000000000000..3bcfba7dc76d9b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mooface_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_mooface_pipeline pipeline XlmRoBertaForTokenClassification from mooface +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_mooface_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_mooface_pipeline` is a English model originally trained by mooface. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mooface_pipeline_en_5.4.0_3.0_1718120069280.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mooface_pipeline_en_5.4.0_3.0_1718120069280.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_mooface_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_mooface_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_mooface_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/mooface/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_msrisrujan_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_msrisrujan_en.md new file mode 100644 index 00000000000000..50651616708b98 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_msrisrujan_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_msrisrujan XlmRoBertaForTokenClassification from Msrisrujan +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_msrisrujan +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_msrisrujan` is a English model originally trained by Msrisrujan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_msrisrujan_en_5.4.0_3.0_1718100404032.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_msrisrujan_en_5.4.0_3.0_1718100404032.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_msrisrujan","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_msrisrujan", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_msrisrujan| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Msrisrujan/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_msrisrujan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_msrisrujan_pipeline_en.md new file mode 100644 index 00000000000000..8c9401f0fd48e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_msrisrujan_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_msrisrujan_pipeline pipeline XlmRoBertaForTokenClassification from Msrisrujan +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_msrisrujan_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_msrisrujan_pipeline` is a English model originally trained by Msrisrujan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_msrisrujan_pipeline_en_5.4.0_3.0_1718100501761.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_msrisrujan_pipeline_en_5.4.0_3.0_1718100501761.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_msrisrujan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_msrisrujan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_msrisrujan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Msrisrujan/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_myasa_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_myasa_en.md new file mode 100644 index 00000000000000..e564b65f356b77 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_myasa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_myasa XlmRoBertaForTokenClassification from myasa +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_myasa +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_myasa` is a English model originally trained by myasa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_myasa_en_5.4.0_3.0_1718118469660.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_myasa_en_5.4.0_3.0_1718118469660.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_myasa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_myasa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_myasa| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/myasa/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_myasa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_myasa_pipeline_en.md new file mode 100644 index 00000000000000..ff384daefa995d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_myasa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_myasa_pipeline pipeline XlmRoBertaForTokenClassification from myasa +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_myasa_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_myasa_pipeline` is a English model originally trained by myasa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_myasa_pipeline_en_5.4.0_3.0_1718118556747.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_myasa_pipeline_en_5.4.0_3.0_1718118556747.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_myasa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_myasa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_myasa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/myasa/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_neha2608_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_neha2608_en.md new file mode 100644 index 00000000000000..c5bbb0096da25b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_neha2608_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_neha2608 XlmRoBertaForTokenClassification from Neha2608 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_neha2608 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_neha2608` is a English model originally trained by Neha2608. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_neha2608_en_5.4.0_3.0_1718105115537.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_neha2608_en_5.4.0_3.0_1718105115537.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_neha2608","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_neha2608", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_neha2608| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Neha2608/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_neha2608_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_neha2608_pipeline_en.md new file mode 100644 index 00000000000000..47c3bf7abb6b81 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_neha2608_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_neha2608_pipeline pipeline XlmRoBertaForTokenClassification from Neha2608 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_neha2608_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_neha2608_pipeline` is a English model originally trained by Neha2608. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_neha2608_pipeline_en_5.4.0_3.0_1718105206370.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_neha2608_pipeline_en_5.4.0_3.0_1718105206370.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_neha2608_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_neha2608_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_neha2608_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Neha2608/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_nokomoro3_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_nokomoro3_en.md new file mode 100644 index 00000000000000..631f3a771b673c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_nokomoro3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_nokomoro3 XlmRoBertaForTokenClassification from nokomoro3 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_nokomoro3 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_nokomoro3` is a English model originally trained by nokomoro3. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_nokomoro3_en_5.4.0_3.0_1718106230902.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_nokomoro3_en_5.4.0_3.0_1718106230902.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_nokomoro3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_nokomoro3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_nokomoro3| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/nokomoro3/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_nokomoro3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_nokomoro3_pipeline_en.md new file mode 100644 index 00000000000000..cf78a75a877486 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_nokomoro3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_nokomoro3_pipeline pipeline XlmRoBertaForTokenClassification from nokomoro3 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_nokomoro3_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_nokomoro3_pipeline` is a English model originally trained by nokomoro3. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_nokomoro3_pipeline_en_5.4.0_3.0_1718106317147.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_nokomoro3_pipeline_en_5.4.0_3.0_1718106317147.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_nokomoro3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_nokomoro3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_nokomoro3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/nokomoro3/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ntaka_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ntaka_en.md new file mode 100644 index 00000000000000..03b426721007e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ntaka_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_ntaka XlmRoBertaForTokenClassification from ntaka +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_ntaka +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_ntaka` is a English model originally trained by ntaka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_ntaka_en_5.4.0_3.0_1718105820330.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_ntaka_en_5.4.0_3.0_1718105820330.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_ntaka","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_ntaka", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_ntaka| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/ntaka/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ntaka_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ntaka_pipeline_en.md new file mode 100644 index 00000000000000..00d8d5f899711d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ntaka_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_ntaka_pipeline pipeline XlmRoBertaForTokenClassification from ntaka +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_ntaka_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_ntaka_pipeline` is a English model originally trained by ntaka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_ntaka_pipeline_en_5.4.0_3.0_1718105908847.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_ntaka_pipeline_en_5.4.0_3.0_1718105908847.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_ntaka_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_ntaka_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_ntaka_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/ntaka/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_omersubasi_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_omersubasi_en.md new file mode 100644 index 00000000000000..dcb2f36ab63c07 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_omersubasi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_omersubasi XlmRoBertaForTokenClassification from omersubasi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_omersubasi +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_omersubasi` is a English model originally trained by omersubasi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_omersubasi_en_5.4.0_3.0_1718115741102.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_omersubasi_en_5.4.0_3.0_1718115741102.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_omersubasi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_omersubasi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_omersubasi| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/omersubasi/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_omersubasi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_omersubasi_pipeline_en.md new file mode 100644 index 00000000000000..52912ee9d525a4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_omersubasi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_omersubasi_pipeline pipeline XlmRoBertaForTokenClassification from omersubasi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_omersubasi_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_omersubasi_pipeline` is a English model originally trained by omersubasi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_omersubasi_pipeline_en_5.4.0_3.0_1718115828571.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_omersubasi_pipeline_en_5.4.0_3.0_1718115828571.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_omersubasi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_omersubasi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_omersubasi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/omersubasi/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_otrturn_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_otrturn_en.md new file mode 100644 index 00000000000000..e10c8208b17859 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_otrturn_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_otrturn XlmRoBertaForTokenClassification from otrturn +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_otrturn +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_otrturn` is a English model originally trained by otrturn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_otrturn_en_5.4.0_3.0_1718112141420.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_otrturn_en_5.4.0_3.0_1718112141420.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_otrturn","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_otrturn", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_otrturn| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/otrturn/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_otrturn_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_otrturn_pipeline_en.md new file mode 100644 index 00000000000000..289dc30b13b20a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_otrturn_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_otrturn_pipeline pipeline XlmRoBertaForTokenClassification from otrturn +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_otrturn_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_otrturn_pipeline` is a English model originally trained by otrturn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_otrturn_pipeline_en_5.4.0_3.0_1718112254043.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_otrturn_pipeline_en_5.4.0_3.0_1718112254043.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_otrturn_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_otrturn_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_otrturn_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/otrturn/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_philosucker_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_philosucker_en.md new file mode 100644 index 00000000000000..cd6199669f8dbb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_philosucker_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_philosucker XlmRoBertaForTokenClassification from philosucker +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_philosucker +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_philosucker` is a English model originally trained by philosucker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_philosucker_en_5.4.0_3.0_1718108315180.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_philosucker_en_5.4.0_3.0_1718108315180.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_philosucker","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_philosucker", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_philosucker| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|854.4 MB| + +## References + +https://huggingface.co/philosucker/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_philosucker_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_philosucker_pipeline_en.md new file mode 100644 index 00000000000000..4616dc9bd5c034 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_philosucker_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_philosucker_pipeline pipeline XlmRoBertaForTokenClassification from philosucker +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_philosucker_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_philosucker_pipeline` is a English model originally trained by philosucker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_philosucker_pipeline_en_5.4.0_3.0_1718108394682.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_philosucker_pipeline_en_5.4.0_3.0_1718108394682.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_philosucker_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_philosucker_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_philosucker_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|854.4 MB| + +## References + +https://huggingface.co/philosucker/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_qilin1_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_qilin1_en.md new file mode 100644 index 00000000000000..65d88b6b560deb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_qilin1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_qilin1 XlmRoBertaForTokenClassification from qilin1 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_qilin1 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_qilin1` is a English model originally trained by qilin1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_qilin1_en_5.4.0_3.0_1718123208307.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_qilin1_en_5.4.0_3.0_1718123208307.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_qilin1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_qilin1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_qilin1| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|854.5 MB| + +## References + +https://huggingface.co/qilin1/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_qilin1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_qilin1_pipeline_en.md new file mode 100644 index 00000000000000..3fa86b4b2cba6c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_qilin1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_qilin1_pipeline pipeline XlmRoBertaForTokenClassification from qilin1 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_qilin1_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_qilin1_pipeline` is a English model originally trained by qilin1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_qilin1_pipeline_en_5.4.0_3.0_1718123289809.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_qilin1_pipeline_en_5.4.0_3.0_1718123289809.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_qilin1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_qilin1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_qilin1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|854.5 MB| + +## References + +https://huggingface.co/qilin1/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_reinoudbosch_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_reinoudbosch_en.md new file mode 100644 index 00000000000000..39f058ab21f6fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_reinoudbosch_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_reinoudbosch XlmRoBertaForTokenClassification from reinoudbosch +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_reinoudbosch +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_reinoudbosch` is a English model originally trained by reinoudbosch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_reinoudbosch_en_5.4.0_3.0_1718098416619.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_reinoudbosch_en_5.4.0_3.0_1718098416619.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_reinoudbosch","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_reinoudbosch", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_reinoudbosch| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/reinoudbosch/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_reinoudbosch_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_reinoudbosch_pipeline_en.md new file mode 100644 index 00000000000000..bfc46e10710746 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_reinoudbosch_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_reinoudbosch_pipeline pipeline XlmRoBertaForTokenClassification from reinoudbosch +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_reinoudbosch_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_reinoudbosch_pipeline` is a English model originally trained by reinoudbosch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_reinoudbosch_pipeline_en_5.4.0_3.0_1718098502695.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_reinoudbosch_pipeline_en_5.4.0_3.0_1718098502695.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_reinoudbosch_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_reinoudbosch_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_reinoudbosch_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/reinoudbosch/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_robinschaefer_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_robinschaefer_en.md new file mode 100644 index 00000000000000..8509437f1c75f9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_robinschaefer_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_robinschaefer XlmRoBertaForTokenClassification from RobinSchaefer +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_robinschaefer +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_robinschaefer` is a English model originally trained by RobinSchaefer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_robinschaefer_en_5.4.0_3.0_1718121948130.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_robinschaefer_en_5.4.0_3.0_1718121948130.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_robinschaefer","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_robinschaefer", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_robinschaefer| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|839.7 MB| + +## References + +https://huggingface.co/RobinSchaefer/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_robinschaefer_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_robinschaefer_pipeline_en.md new file mode 100644 index 00000000000000..09b11423260682 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_robinschaefer_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_robinschaefer_pipeline pipeline XlmRoBertaForTokenClassification from RobinSchaefer +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_robinschaefer_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_robinschaefer_pipeline` is a English model originally trained by RobinSchaefer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_robinschaefer_pipeline_en_5.4.0_3.0_1718122064053.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_robinschaefer_pipeline_en_5.4.0_3.0_1718122064053.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_robinschaefer_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_robinschaefer_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_robinschaefer_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|839.7 MB| + +## References + +https://huggingface.co/RobinSchaefer/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_sungkwangjoong_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_sungkwangjoong_en.md new file mode 100644 index 00000000000000..c62f62052cbbda --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_sungkwangjoong_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_sungkwangjoong XlmRoBertaForTokenClassification from sungkwangjoong +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_sungkwangjoong +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_sungkwangjoong` is a English model originally trained by sungkwangjoong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_sungkwangjoong_en_5.4.0_3.0_1718124060651.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_sungkwangjoong_en_5.4.0_3.0_1718124060651.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_sungkwangjoong","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_sungkwangjoong", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_sungkwangjoong| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/sungkwangjoong/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_sungkwangjoong_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_sungkwangjoong_pipeline_en.md new file mode 100644 index 00000000000000..258d48f58a1b70 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_sungkwangjoong_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_sungkwangjoong_pipeline pipeline XlmRoBertaForTokenClassification from sungkwangjoong +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_sungkwangjoong_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_sungkwangjoong_pipeline` is a English model originally trained by sungkwangjoong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_sungkwangjoong_pipeline_en_5.4.0_3.0_1718124181292.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_sungkwangjoong_pipeline_en_5.4.0_3.0_1718124181292.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_sungkwangjoong_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_sungkwangjoong_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_sungkwangjoong_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/sungkwangjoong/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_sungwoo1_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_sungwoo1_en.md new file mode 100644 index 00000000000000..d262d2798bd3b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_sungwoo1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_sungwoo1 XlmRoBertaForTokenClassification from sungwoo1 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_sungwoo1 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_sungwoo1` is a English model originally trained by sungwoo1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_sungwoo1_en_5.4.0_3.0_1718116194052.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_sungwoo1_en_5.4.0_3.0_1718116194052.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_sungwoo1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_sungwoo1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_sungwoo1| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/sungwoo1/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_sungwoo1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_sungwoo1_pipeline_en.md new file mode 100644 index 00000000000000..a901b8ae4ea32a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_sungwoo1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_sungwoo1_pipeline pipeline XlmRoBertaForTokenClassification from sungwoo1 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_sungwoo1_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_sungwoo1_pipeline` is a English model originally trained by sungwoo1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_sungwoo1_pipeline_en_5.4.0_3.0_1718116283936.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_sungwoo1_pipeline_en_5.4.0_3.0_1718116283936.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_sungwoo1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_sungwoo1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_sungwoo1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/sungwoo1/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_team_nave_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_team_nave_en.md new file mode 100644 index 00000000000000..d313391d38aa1c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_team_nave_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_team_nave XlmRoBertaForTokenClassification from team-nave +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_team_nave +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_team_nave` is a English model originally trained by team-nave. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_team_nave_en_5.4.0_3.0_1718110331261.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_team_nave_en_5.4.0_3.0_1718110331261.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_team_nave","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_team_nave", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_team_nave| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|851.7 MB| + +## References + +https://huggingface.co/team-nave/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_team_nave_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_team_nave_pipeline_en.md new file mode 100644 index 00000000000000..480deaeb34855f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_team_nave_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_team_nave_pipeline pipeline XlmRoBertaForTokenClassification from team-nave +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_team_nave_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_team_nave_pipeline` is a English model originally trained by team-nave. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_team_nave_pipeline_en_5.4.0_3.0_1718110427059.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_team_nave_pipeline_en_5.4.0_3.0_1718110427059.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_team_nave_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_team_nave_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_team_nave_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|851.8 MB| + +## References + +https://huggingface.co/team-nave/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_udon3_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_udon3_en.md new file mode 100644 index 00000000000000..ce9ef022daace0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_udon3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_udon3 XlmRoBertaForTokenClassification from udon3 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_udon3 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_udon3` is a English model originally trained by udon3. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_udon3_en_5.4.0_3.0_1718128811940.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_udon3_en_5.4.0_3.0_1718128811940.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_udon3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_udon3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_udon3| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/udon3/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_udon3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_udon3_pipeline_en.md new file mode 100644 index 00000000000000..71732b4f71d7ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_udon3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_udon3_pipeline pipeline XlmRoBertaForTokenClassification from udon3 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_udon3_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_udon3_pipeline` is a English model originally trained by udon3. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_udon3_pipeline_en_5.4.0_3.0_1718128899502.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_udon3_pipeline_en_5.4.0_3.0_1718128899502.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_udon3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_udon3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_udon3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/udon3/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_v3rx2000_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_v3rx2000_en.md new file mode 100644 index 00000000000000..079979dd16964f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_v3rx2000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_v3rx2000 XlmRoBertaForTokenClassification from V3RX2000 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_v3rx2000 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_v3rx2000` is a English model originally trained by V3RX2000. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_v3rx2000_en_5.4.0_3.0_1718098454232.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_v3rx2000_en_5.4.0_3.0_1718098454232.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_v3rx2000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_v3rx2000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_v3rx2000| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/V3RX2000/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_v3rx2000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_v3rx2000_pipeline_en.md new file mode 100644 index 00000000000000..891ffce3c32dee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_v3rx2000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_v3rx2000_pipeline pipeline XlmRoBertaForTokenClassification from V3RX2000 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_v3rx2000_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_v3rx2000_pipeline` is a English model originally trained by V3RX2000. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_v3rx2000_pipeline_en_5.4.0_3.0_1718098541332.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_v3rx2000_pipeline_en_5.4.0_3.0_1718098541332.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_v3rx2000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_v3rx2000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_v3rx2000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/V3RX2000/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_wilcomply_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_wilcomply_en.md new file mode 100644 index 00000000000000..5bbae6e48fd077 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_wilcomply_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_wilcomply XlmRoBertaForTokenClassification from wilcomply +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_wilcomply +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_wilcomply` is a English model originally trained by wilcomply. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_wilcomply_en_5.4.0_3.0_1718121226377.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_wilcomply_en_5.4.0_3.0_1718121226377.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_wilcomply","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_wilcomply", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_wilcomply| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/wilcomply/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_wilcomply_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_wilcomply_pipeline_en.md new file mode 100644 index 00000000000000..6959cc9937b11c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_wilcomply_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_wilcomply_pipeline pipeline XlmRoBertaForTokenClassification from wilcomply +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_wilcomply_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_wilcomply_pipeline` is a English model originally trained by wilcomply. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_wilcomply_pipeline_en_5.4.0_3.0_1718121312919.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_wilcomply_pipeline_en_5.4.0_3.0_1718121312919.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_wilcomply_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_wilcomply_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_wilcomply_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/wilcomply/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_xyfigo_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_xyfigo_en.md new file mode 100644 index 00000000000000..fddf179aeac293 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_xyfigo_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_xyfigo XlmRoBertaForTokenClassification from xyfigo +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_xyfigo +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_xyfigo` is a English model originally trained by xyfigo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_xyfigo_en_5.4.0_3.0_1718101523698.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_xyfigo_en_5.4.0_3.0_1718101523698.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_xyfigo","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_xyfigo", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_xyfigo| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/xyfigo/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_xyfigo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_xyfigo_pipeline_en.md new file mode 100644 index 00000000000000..5fd65eb9964569 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_xyfigo_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_xyfigo_pipeline pipeline XlmRoBertaForTokenClassification from xyfigo +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_xyfigo_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_xyfigo_pipeline` is a English model originally trained by xyfigo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_xyfigo_pipeline_en_5.4.0_3.0_1718101610349.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_xyfigo_pipeline_en_5.4.0_3.0_1718101610349.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_xyfigo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_xyfigo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_xyfigo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/xyfigo/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_yong_sik_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_yong_sik_en.md new file mode 100644 index 00000000000000..bc45d2cff4f483 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_yong_sik_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_yong_sik XlmRoBertaForTokenClassification from Yong-Sik +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_yong_sik +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_yong_sik` is a English model originally trained by Yong-Sik. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_yong_sik_en_5.4.0_3.0_1718119569027.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_yong_sik_en_5.4.0_3.0_1718119569027.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_yong_sik","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_yong_sik", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_yong_sik| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Yong-Sik/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_yong_sik_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_yong_sik_pipeline_en.md new file mode 100644 index 00000000000000..65da6e3df76ff2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_yong_sik_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_yong_sik_pipeline pipeline XlmRoBertaForTokenClassification from Yong-Sik +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_yong_sik_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_yong_sik_pipeline` is a English model originally trained by Yong-Sik. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_yong_sik_pipeline_en_5.4.0_3.0_1718119675272.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_yong_sik_pipeline_en_5.4.0_3.0_1718119675272.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_yong_sik_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_yong_sik_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_yong_sik_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Yong-Sik/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_yyabuki_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_yyabuki_en.md new file mode 100644 index 00000000000000..5bf304746ebab3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_yyabuki_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_yyabuki XlmRoBertaForTokenClassification from yyabuki +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_yyabuki +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_yyabuki` is a English model originally trained by yyabuki. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_yyabuki_en_5.4.0_3.0_1718121921557.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_yyabuki_en_5.4.0_3.0_1718121921557.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_yyabuki","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_yyabuki", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_yyabuki| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/yyabuki/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_yyabuki_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_yyabuki_pipeline_en.md new file mode 100644 index 00000000000000..c09f947bcf81c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_yyabuki_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_yyabuki_pipeline pipeline XlmRoBertaForTokenClassification from yyabuki +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_yyabuki_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_yyabuki_pipeline` is a English model originally trained by yyabuki. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_yyabuki_pipeline_en_5.4.0_3.0_1718122014776.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_yyabuki_pipeline_en_5.4.0_3.0_1718122014776.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_yyabuki_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_yyabuki_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_yyabuki_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/yyabuki/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_hindi_marathi_marh_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_hindi_marathi_marh_en.md new file mode 100644 index 00000000000000..9df0c4f0d0395b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_hindi_marathi_marh_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_hindi_marathi_marh XlmRoBertaForTokenClassification from neelrr +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_hindi_marathi_marh +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_hindi_marathi_marh` is a English model originally trained by neelrr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_hindi_marathi_marh_en_5.4.0_3.0_1718103640468.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_hindi_marathi_marh_en_5.4.0_3.0_1718103640468.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_hindi_marathi_marh","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_hindi_marathi_marh", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_hindi_marathi_marh| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|839.8 MB| + +## References + +https://huggingface.co/neelrr/xlm-roberta-base-finetuned-panx-hi-mr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_hindi_marathi_marh_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_hindi_marathi_marh_pipeline_en.md new file mode 100644 index 00000000000000..11dbdabdd59739 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_hindi_marathi_marh_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_hindi_marathi_marh_pipeline pipeline XlmRoBertaForTokenClassification from neelrr +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_hindi_marathi_marh_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_hindi_marathi_marh_pipeline` is a English model originally trained by neelrr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_hindi_marathi_marh_pipeline_en_5.4.0_3.0_1718103729015.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_hindi_marathi_marh_pipeline_en_5.4.0_3.0_1718103729015.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_hindi_marathi_marh_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_hindi_marathi_marh_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_hindi_marathi_marh_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|839.8 MB| + +## References + +https://huggingface.co/neelrr/xlm-roberta-base-finetuned-panx-hi-mr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_hindi_neelrr_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_hindi_neelrr_en.md new file mode 100644 index 00000000000000..0eb7a089823c1c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_hindi_neelrr_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_hindi_neelrr XlmRoBertaForTokenClassification from neelrr +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_hindi_neelrr +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_hindi_neelrr` is a English model originally trained by neelrr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_hindi_neelrr_en_5.4.0_3.0_1718102561247.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_hindi_neelrr_en_5.4.0_3.0_1718102561247.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_hindi_neelrr","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_hindi_neelrr", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_hindi_neelrr| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|834.6 MB| + +## References + +https://huggingface.co/neelrr/xlm-roberta-base-finetuned-panx-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_hindi_neelrr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_hindi_neelrr_pipeline_en.md new file mode 100644 index 00000000000000..178d4b19a26458 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_hindi_neelrr_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_hindi_neelrr_pipeline pipeline XlmRoBertaForTokenClassification from neelrr +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_hindi_neelrr_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_hindi_neelrr_pipeline` is a English model originally trained by neelrr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_hindi_neelrr_pipeline_en_5.4.0_3.0_1718102667469.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_hindi_neelrr_pipeline_en_5.4.0_3.0_1718102667469.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_hindi_neelrr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_hindi_neelrr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_hindi_neelrr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|834.6 MB| + +## References + +https://huggingface.co/neelrr/xlm-roberta-base-finetuned-panx-hi + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_100yen_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_100yen_en.md new file mode 100644 index 00000000000000..38b0c7dbb91aa2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_100yen_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_100yen XlmRoBertaForTokenClassification from 100yen +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_100yen +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_100yen` is a English model originally trained by 100yen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_100yen_en_5.4.0_3.0_1718130199192.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_100yen_en_5.4.0_3.0_1718130199192.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_100yen","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_100yen", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_100yen| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|816.7 MB| + +## References + +https://huggingface.co/100yen/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_100yen_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_100yen_pipeline_en.md new file mode 100644 index 00000000000000..8a2c8c194eb414 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_100yen_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_100yen_pipeline pipeline XlmRoBertaForTokenClassification from 100yen +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_100yen_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_100yen_pipeline` is a English model originally trained by 100yen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_100yen_pipeline_en_5.4.0_3.0_1718130323017.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_100yen_pipeline_en_5.4.0_3.0_1718130323017.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_100yen_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_100yen_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_100yen_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|816.8 MB| + +## References + +https://huggingface.co/100yen/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_54data_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_54data_en.md new file mode 100644 index 00000000000000..7f6062fc25fb48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_54data_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_54data XlmRoBertaForTokenClassification from 54data +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_54data +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_54data` is a English model originally trained by 54data. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_54data_en_5.4.0_3.0_1718113320370.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_54data_en_5.4.0_3.0_1718113320370.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_54data","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_54data", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_54data| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/54data/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_54data_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_54data_pipeline_en.md new file mode 100644 index 00000000000000..91edaf08f771b7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_54data_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_54data_pipeline pipeline XlmRoBertaForTokenClassification from 54data +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_54data_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_54data_pipeline` is a English model originally trained by 54data. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_54data_pipeline_en_5.4.0_3.0_1718113447338.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_54data_pipeline_en_5.4.0_3.0_1718113447338.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_54data_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_54data_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_54data_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/54data/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_amartyobanerjee_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_amartyobanerjee_en.md new file mode 100644 index 00000000000000..55147169de0811 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_amartyobanerjee_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_amartyobanerjee XlmRoBertaForTokenClassification from amartyobanerjee +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_amartyobanerjee +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_amartyobanerjee` is a English model originally trained by amartyobanerjee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_amartyobanerjee_en_5.4.0_3.0_1718124457792.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_amartyobanerjee_en_5.4.0_3.0_1718124457792.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_amartyobanerjee","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_amartyobanerjee", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_amartyobanerjee| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/amartyobanerjee/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_amartyobanerjee_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_amartyobanerjee_pipeline_en.md new file mode 100644 index 00000000000000..d0ddf70bcc8954 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_amartyobanerjee_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_amartyobanerjee_pipeline pipeline XlmRoBertaForTokenClassification from amartyobanerjee +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_amartyobanerjee_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_amartyobanerjee_pipeline` is a English model originally trained by amartyobanerjee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_amartyobanerjee_pipeline_en_5.4.0_3.0_1718124566686.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_amartyobanerjee_pipeline_en_5.4.0_3.0_1718124566686.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_amartyobanerjee_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_amartyobanerjee_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_amartyobanerjee_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/amartyobanerjee/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_bluetree99_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_bluetree99_en.md new file mode 100644 index 00000000000000..942c84cbb42121 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_bluetree99_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_bluetree99 XlmRoBertaForTokenClassification from bluetree99 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_bluetree99 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_bluetree99` is a English model originally trained by bluetree99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_bluetree99_en_5.4.0_3.0_1718110050944.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_bluetree99_en_5.4.0_3.0_1718110050944.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_bluetree99","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_bluetree99", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_bluetree99| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/bluetree99/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_bluetree99_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_bluetree99_pipeline_en.md new file mode 100644 index 00000000000000..14f0e7549cdf00 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_bluetree99_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_bluetree99_pipeline pipeline XlmRoBertaForTokenClassification from bluetree99 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_bluetree99_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_bluetree99_pipeline` is a English model originally trained by bluetree99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_bluetree99_pipeline_en_5.4.0_3.0_1718110162677.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_bluetree99_pipeline_en_5.4.0_3.0_1718110162677.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_bluetree99_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_bluetree99_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_bluetree99_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/bluetree99/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_bobojjhh_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_bobojjhh_en.md new file mode 100644 index 00000000000000..e964ca7ffdb1ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_bobojjhh_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_bobojjhh XlmRoBertaForTokenClassification from bobojjhh +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_bobojjhh +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_bobojjhh` is a English model originally trained by bobojjhh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_bobojjhh_en_5.4.0_3.0_1718130483717.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_bobojjhh_en_5.4.0_3.0_1718130483717.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_bobojjhh","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_bobojjhh", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_bobojjhh| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/bobojjhh/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_bobojjhh_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_bobojjhh_pipeline_en.md new file mode 100644 index 00000000000000..c9269063f20c95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_bobojjhh_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_bobojjhh_pipeline pipeline XlmRoBertaForTokenClassification from bobojjhh +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_bobojjhh_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_bobojjhh_pipeline` is a English model originally trained by bobojjhh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_bobojjhh_pipeline_en_5.4.0_3.0_1718130592374.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_bobojjhh_pipeline_en_5.4.0_3.0_1718130592374.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_bobojjhh_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_bobojjhh_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_bobojjhh_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/bobojjhh/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_cataluna84_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_cataluna84_en.md new file mode 100644 index 00000000000000..5ed9057d0b4bf5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_cataluna84_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_cataluna84 XlmRoBertaForTokenClassification from cataluna84 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_cataluna84 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_cataluna84` is a English model originally trained by cataluna84. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_cataluna84_en_5.4.0_3.0_1718117840298.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_cataluna84_en_5.4.0_3.0_1718117840298.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_cataluna84","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_cataluna84", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_cataluna84| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/cataluna84/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_cataluna84_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_cataluna84_pipeline_en.md new file mode 100644 index 00000000000000..078c1e6871d1ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_cataluna84_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_cataluna84_pipeline pipeline XlmRoBertaForTokenClassification from cataluna84 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_cataluna84_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_cataluna84_pipeline` is a English model originally trained by cataluna84. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_cataluna84_pipeline_en_5.4.0_3.0_1718117949402.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_cataluna84_pipeline_en_5.4.0_3.0_1718117949402.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_cataluna84_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_cataluna84_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_cataluna84_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.7 MB| + +## References + +https://huggingface.co/cataluna84/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_cyycyy_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_cyycyy_en.md new file mode 100644 index 00000000000000..b4ee629c276ab7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_cyycyy_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_cyycyy XlmRoBertaForTokenClassification from cyycyy +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_cyycyy +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_cyycyy` is a English model originally trained by cyycyy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_cyycyy_en_5.4.0_3.0_1718105827382.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_cyycyy_en_5.4.0_3.0_1718105827382.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_cyycyy","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_cyycyy", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_cyycyy| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/cyycyy/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_cyycyy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_cyycyy_pipeline_en.md new file mode 100644 index 00000000000000..d564b08922d26f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_cyycyy_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_cyycyy_pipeline pipeline XlmRoBertaForTokenClassification from cyycyy +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_cyycyy_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_cyycyy_pipeline` is a English model originally trained by cyycyy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_cyycyy_pipeline_en_5.4.0_3.0_1718105939331.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_cyycyy_pipeline_en_5.4.0_3.0_1718105939331.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_cyycyy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_cyycyy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_cyycyy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/cyycyy/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_derekbear_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_derekbear_en.md new file mode 100644 index 00000000000000..585fba4fb41397 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_derekbear_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_derekbear XlmRoBertaForTokenClassification from derekbear +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_derekbear +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_derekbear` is a English model originally trained by derekbear. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_derekbear_en_5.4.0_3.0_1718113929190.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_derekbear_en_5.4.0_3.0_1718113929190.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_derekbear","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_derekbear", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_derekbear| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/derekbear/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_derekbear_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_derekbear_pipeline_en.md new file mode 100644 index 00000000000000..1a0da6fefadfaf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_derekbear_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_derekbear_pipeline pipeline XlmRoBertaForTokenClassification from derekbear +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_derekbear_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_derekbear_pipeline` is a English model originally trained by derekbear. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_derekbear_pipeline_en_5.4.0_3.0_1718114038416.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_derekbear_pipeline_en_5.4.0_3.0_1718114038416.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_derekbear_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_derekbear_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_derekbear_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.7 MB| + +## References + +https://huggingface.co/derekbear/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_dkasti_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_dkasti_en.md new file mode 100644 index 00000000000000..f17d820d95e91b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_dkasti_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_dkasti XlmRoBertaForTokenClassification from dkasti +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_dkasti +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_dkasti` is a English model originally trained by dkasti. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_dkasti_en_5.4.0_3.0_1718107907652.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_dkasti_en_5.4.0_3.0_1718107907652.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_dkasti","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_dkasti", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_dkasti| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/dkasti/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_dkasti_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_dkasti_pipeline_en.md new file mode 100644 index 00000000000000..52d65ae74bcfd0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_dkasti_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_dkasti_pipeline pipeline XlmRoBertaForTokenClassification from dkasti +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_dkasti_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_dkasti_pipeline` is a English model originally trained by dkasti. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_dkasti_pipeline_en_5.4.0_3.0_1718108023369.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_dkasti_pipeline_en_5.4.0_3.0_1718108023369.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_dkasti_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_dkasti_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_dkasti_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/dkasti/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_drigb_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_drigb_en.md new file mode 100644 index 00000000000000..132fa88d4c7d5a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_drigb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_drigb XlmRoBertaForTokenClassification from drigb +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_drigb +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_drigb` is a English model originally trained by drigb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_drigb_en_5.4.0_3.0_1718113318010.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_drigb_en_5.4.0_3.0_1718113318010.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_drigb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_drigb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_drigb| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/drigb/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_drigb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_drigb_pipeline_en.md new file mode 100644 index 00000000000000..5f73c86660250e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_drigb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_drigb_pipeline pipeline XlmRoBertaForTokenClassification from drigb +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_drigb_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_drigb_pipeline` is a English model originally trained by drigb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_drigb_pipeline_en_5.4.0_3.0_1718113435296.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_drigb_pipeline_en_5.4.0_3.0_1718113435296.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_drigb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_drigb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_drigb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/drigb/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_gcmsrc_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_gcmsrc_en.md new file mode 100644 index 00000000000000..35922f7776b508 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_gcmsrc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_gcmsrc XlmRoBertaForTokenClassification from gcmsrc +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_gcmsrc +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_gcmsrc` is a English model originally trained by gcmsrc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_gcmsrc_en_5.4.0_3.0_1718099807507.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_gcmsrc_en_5.4.0_3.0_1718099807507.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_gcmsrc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_gcmsrc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_gcmsrc| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/gcmsrc/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_gcmsrc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_gcmsrc_pipeline_en.md new file mode 100644 index 00000000000000..2c05022b27647f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_gcmsrc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_gcmsrc_pipeline pipeline XlmRoBertaForTokenClassification from gcmsrc +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_gcmsrc_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_gcmsrc_pipeline` is a English model originally trained by gcmsrc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_gcmsrc_pipeline_en_5.4.0_3.0_1718099919437.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_gcmsrc_pipeline_en_5.4.0_3.0_1718099919437.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_gcmsrc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_gcmsrc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_gcmsrc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/gcmsrc/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_gogd_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_gogd_en.md new file mode 100644 index 00000000000000..649ef3f8edcd94 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_gogd_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_gogd XlmRoBertaForTokenClassification from GoGD +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_gogd +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_gogd` is a English model originally trained by GoGD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_gogd_en_5.4.0_3.0_1718120899037.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_gogd_en_5.4.0_3.0_1718120899037.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_gogd","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_gogd", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_gogd| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/GoGD/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_gogd_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_gogd_pipeline_en.md new file mode 100644 index 00000000000000..733cfd0ec71d43 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_gogd_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_gogd_pipeline pipeline XlmRoBertaForTokenClassification from GoGD +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_gogd_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_gogd_pipeline` is a English model originally trained by GoGD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_gogd_pipeline_en_5.4.0_3.0_1718121022052.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_gogd_pipeline_en_5.4.0_3.0_1718121022052.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_gogd_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_gogd_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_gogd_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.7 MB| + +## References + +https://huggingface.co/GoGD/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech_en.md new file mode 100644 index 00000000000000..664068f7cb4d98 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech XlmRoBertaForTokenClassification from h-radiolo-tech +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech` is a English model originally trained by h-radiolo-tech. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech_en_5.4.0_3.0_1718128230625.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech_en_5.4.0_3.0_1718128230625.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/h-radiolo-tech/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech_pipeline_en.md new file mode 100644 index 00000000000000..41b56edddadec1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech_pipeline pipeline XlmRoBertaForTokenClassification from h-radiolo-tech +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech_pipeline` is a English model originally trained by h-radiolo-tech. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech_pipeline_en_5.4.0_3.0_1718128339981.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech_pipeline_en_5.4.0_3.0_1718128339981.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/h-radiolo-tech/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_hhffxx_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_hhffxx_en.md new file mode 100644 index 00000000000000..bfb9bafda0d0e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_hhffxx_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_hhffxx XlmRoBertaForTokenClassification from hhffxx +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_hhffxx +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_hhffxx` is a English model originally trained by hhffxx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_hhffxx_en_5.4.0_3.0_1718114548530.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_hhffxx_en_5.4.0_3.0_1718114548530.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_hhffxx","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_hhffxx", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_hhffxx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|839.8 MB| + +## References + +https://huggingface.co/hhffxx/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_hhffxx_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_hhffxx_pipeline_en.md new file mode 100644 index 00000000000000..ac2dbf4ca5381f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_hhffxx_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_hhffxx_pipeline pipeline XlmRoBertaForTokenClassification from hhffxx +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_hhffxx_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_hhffxx_pipeline` is a English model originally trained by hhffxx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_hhffxx_pipeline_en_5.4.0_3.0_1718114643972.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_hhffxx_pipeline_en_5.4.0_3.0_1718114643972.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_hhffxx_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_hhffxx_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_hhffxx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|839.8 MB| + +## References + +https://huggingface.co/hhffxx/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_isaacp_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_isaacp_en.md new file mode 100644 index 00000000000000..048d30fb8ccb9e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_isaacp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_isaacp XlmRoBertaForTokenClassification from Isaacp +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_isaacp +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_isaacp` is a English model originally trained by Isaacp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_isaacp_en_5.4.0_3.0_1718103653693.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_isaacp_en_5.4.0_3.0_1718103653693.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_isaacp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_isaacp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_isaacp| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/Isaacp/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_isaacp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_isaacp_pipeline_en.md new file mode 100644 index 00000000000000..36a8f9e1cda187 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_isaacp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_isaacp_pipeline pipeline XlmRoBertaForTokenClassification from Isaacp +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_isaacp_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_isaacp_pipeline` is a English model originally trained by Isaacp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_isaacp_pipeline_en_5.4.0_3.0_1718103763180.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_isaacp_pipeline_en_5.4.0_3.0_1718103763180.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_isaacp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_isaacp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_isaacp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/Isaacp/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu_en.md new file mode 100644 index 00000000000000..9a4e58dfc820c5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu XlmRoBertaForTokenClassification from LaurentiuStancioiu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu` is a English model originally trained by LaurentiuStancioiu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu_en_5.4.0_3.0_1718114611950.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu_en_5.4.0_3.0_1718114611950.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/LaurentiuStancioiu/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu_pipeline_en.md new file mode 100644 index 00000000000000..78f6f3fa9ef30d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu_pipeline pipeline XlmRoBertaForTokenClassification from LaurentiuStancioiu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu_pipeline` is a English model originally trained by LaurentiuStancioiu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu_pipeline_en_5.4.0_3.0_1718114727595.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu_pipeline_en_5.4.0_3.0_1718114727595.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.7 MB| + +## References + +https://huggingface.co/LaurentiuStancioiu/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_leotunganh_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_leotunganh_en.md new file mode 100644 index 00000000000000..330f7caefe4e3b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_leotunganh_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_leotunganh XlmRoBertaForTokenClassification from LeoTungAnh +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_leotunganh +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_leotunganh` is a English model originally trained by LeoTungAnh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_leotunganh_en_5.4.0_3.0_1718124605948.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_leotunganh_en_5.4.0_3.0_1718124605948.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_leotunganh","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_leotunganh", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_leotunganh| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|815.0 MB| + +## References + +https://huggingface.co/LeoTungAnh/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_leotunganh_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_leotunganh_pipeline_en.md new file mode 100644 index 00000000000000..aec39ab2a81d74 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_leotunganh_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_leotunganh_pipeline pipeline XlmRoBertaForTokenClassification from LeoTungAnh +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_leotunganh_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_leotunganh_pipeline` is a English model originally trained by LeoTungAnh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_leotunganh_pipeline_en_5.4.0_3.0_1718124729135.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_leotunganh_pipeline_en_5.4.0_3.0_1718124729135.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_leotunganh_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_leotunganh_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_leotunganh_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|815.0 MB| + +## References + +https://huggingface.co/LeoTungAnh/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_malduwais_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_malduwais_en.md new file mode 100644 index 00000000000000..a7ae043c90e949 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_malduwais_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_malduwais XlmRoBertaForTokenClassification from malduwais +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_malduwais +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_malduwais` is a English model originally trained by malduwais. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_malduwais_en_5.4.0_3.0_1718125293878.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_malduwais_en_5.4.0_3.0_1718125293878.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_malduwais","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_malduwais", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_malduwais| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|836.1 MB| + +## References + +https://huggingface.co/malduwais/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_malduwais_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_malduwais_pipeline_en.md new file mode 100644 index 00000000000000..6e7ee773a5ce25 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_malduwais_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_malduwais_pipeline pipeline XlmRoBertaForTokenClassification from malduwais +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_malduwais_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_malduwais_pipeline` is a English model originally trained by malduwais. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_malduwais_pipeline_en_5.4.0_3.0_1718125403695.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_malduwais_pipeline_en_5.4.0_3.0_1718125403695.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_malduwais_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_malduwais_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_malduwais_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|836.1 MB| + +## References + +https://huggingface.co/malduwais/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_maxnet_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_maxnet_en.md new file mode 100644 index 00000000000000..d3d198fa08bcd9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_maxnet_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_maxnet XlmRoBertaForTokenClassification from Maxnet +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_maxnet +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_maxnet` is a English model originally trained by Maxnet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_maxnet_en_5.4.0_3.0_1718124012834.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_maxnet_en_5.4.0_3.0_1718124012834.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_maxnet","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_maxnet", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_maxnet| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/Maxnet/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_maxnet_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_maxnet_pipeline_en.md new file mode 100644 index 00000000000000..2ad0cb98042edf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_maxnet_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_maxnet_pipeline pipeline XlmRoBertaForTokenClassification from Maxnet +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_maxnet_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_maxnet_pipeline` is a English model originally trained by Maxnet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_maxnet_pipeline_en_5.4.0_3.0_1718124121656.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_maxnet_pipeline_en_5.4.0_3.0_1718124121656.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_maxnet_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_maxnet_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_maxnet_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/Maxnet/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_monkdalma_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_monkdalma_en.md new file mode 100644 index 00000000000000..a42dc8a440ca62 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_monkdalma_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_monkdalma XlmRoBertaForTokenClassification from MonkDalma +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_monkdalma +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_monkdalma` is a English model originally trained by MonkDalma. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_monkdalma_en_5.4.0_3.0_1718116073674.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_monkdalma_en_5.4.0_3.0_1718116073674.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_monkdalma","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_monkdalma", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_monkdalma| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/MonkDalma/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_monkdalma_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_monkdalma_pipeline_en.md new file mode 100644 index 00000000000000..ee5478d0732da4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_monkdalma_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_monkdalma_pipeline pipeline XlmRoBertaForTokenClassification from MonkDalma +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_monkdalma_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_monkdalma_pipeline` is a English model originally trained by MonkDalma. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_monkdalma_pipeline_en_5.4.0_3.0_1718116183098.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_monkdalma_pipeline_en_5.4.0_3.0_1718116183098.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_monkdalma_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_monkdalma_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_monkdalma_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/MonkDalma/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_msrisrujan_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_msrisrujan_en.md new file mode 100644 index 00000000000000..0b81074deb0795 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_msrisrujan_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_msrisrujan XlmRoBertaForTokenClassification from Msrisrujan +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_msrisrujan +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_msrisrujan` is a English model originally trained by Msrisrujan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_msrisrujan_en_5.4.0_3.0_1718101574116.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_msrisrujan_en_5.4.0_3.0_1718101574116.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_msrisrujan","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_msrisrujan", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_msrisrujan| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/Msrisrujan/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_msrisrujan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_msrisrujan_pipeline_en.md new file mode 100644 index 00000000000000..f5ef9107e91769 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_msrisrujan_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_msrisrujan_pipeline pipeline XlmRoBertaForTokenClassification from Msrisrujan +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_msrisrujan_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_msrisrujan_pipeline` is a English model originally trained by Msrisrujan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_msrisrujan_pipeline_en_5.4.0_3.0_1718101683333.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_msrisrujan_pipeline_en_5.4.0_3.0_1718101683333.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_msrisrujan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_msrisrujan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_msrisrujan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/Msrisrujan/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_obong_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_obong_en.md new file mode 100644 index 00000000000000..7fa83d9b9775e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_obong_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_obong XlmRoBertaForTokenClassification from obong +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_obong +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_obong` is a English model originally trained by obong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_obong_en_5.4.0_3.0_1718117667802.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_obong_en_5.4.0_3.0_1718117667802.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_obong","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_obong", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_obong| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/obong/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_obong_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_obong_pipeline_en.md new file mode 100644 index 00000000000000..d7d7891734336d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_obong_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_obong_pipeline pipeline XlmRoBertaForTokenClassification from obong +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_obong_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_obong_pipeline` is a English model originally trained by obong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_obong_pipeline_en_5.4.0_3.0_1718117786451.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_obong_pipeline_en_5.4.0_3.0_1718117786451.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_obong_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_obong_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_obong_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/obong/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_patnelt60_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_patnelt60_en.md new file mode 100644 index 00000000000000..63096a517d8050 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_patnelt60_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_patnelt60 XlmRoBertaForTokenClassification from patnelt60 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_patnelt60 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_patnelt60` is a English model originally trained by patnelt60. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_patnelt60_en_5.4.0_3.0_1718133562440.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_patnelt60_en_5.4.0_3.0_1718133562440.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_patnelt60","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_patnelt60", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_patnelt60| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|815.0 MB| + +## References + +https://huggingface.co/patnelt60/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_patnelt60_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_patnelt60_pipeline_en.md new file mode 100644 index 00000000000000..eebc75b07f692e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_patnelt60_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_patnelt60_pipeline pipeline XlmRoBertaForTokenClassification from patnelt60 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_patnelt60_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_patnelt60_pipeline` is a English model originally trained by patnelt60. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_patnelt60_pipeline_en_5.4.0_3.0_1718133690189.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_patnelt60_pipeline_en_5.4.0_3.0_1718133690189.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_patnelt60_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_patnelt60_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_patnelt60_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|815.0 MB| + +## References + +https://huggingface.co/patnelt60/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_philosucker_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_philosucker_en.md new file mode 100644 index 00000000000000..62b7b84bb46466 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_philosucker_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_philosucker XlmRoBertaForTokenClassification from philosucker +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_philosucker +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_philosucker` is a English model originally trained by philosucker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_philosucker_en_5.4.0_3.0_1718126516566.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_philosucker_en_5.4.0_3.0_1718126516566.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_philosucker","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_philosucker", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_philosucker| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|838.8 MB| + +## References + +https://huggingface.co/philosucker/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_philosucker_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_philosucker_pipeline_en.md new file mode 100644 index 00000000000000..754a3107ef36b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_philosucker_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_philosucker_pipeline pipeline XlmRoBertaForTokenClassification from philosucker +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_philosucker_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_philosucker_pipeline` is a English model originally trained by philosucker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_philosucker_pipeline_en_5.4.0_3.0_1718126610206.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_philosucker_pipeline_en_5.4.0_3.0_1718126610206.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_philosucker_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_philosucker_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_philosucker_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|838.8 MB| + +## References + +https://huggingface.co/philosucker/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_thkkvui_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_thkkvui_en.md new file mode 100644 index 00000000000000..f2b3de49f2d634 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_thkkvui_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_thkkvui XlmRoBertaForTokenClassification from thkkvui +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_thkkvui +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_thkkvui` is a English model originally trained by thkkvui. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_thkkvui_en_5.4.0_3.0_1718112606952.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_thkkvui_en_5.4.0_3.0_1718112606952.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_thkkvui","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_thkkvui", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_thkkvui| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/thkkvui/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_thkkvui_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_thkkvui_pipeline_en.md new file mode 100644 index 00000000000000..a1b63ce918a170 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_thkkvui_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_thkkvui_pipeline pipeline XlmRoBertaForTokenClassification from thkkvui +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_thkkvui_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_thkkvui_pipeline` is a English model originally trained by thkkvui. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_thkkvui_pipeline_en_5.4.0_3.0_1718112715953.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_thkkvui_pipeline_en_5.4.0_3.0_1718112715953.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_thkkvui_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_thkkvui_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_thkkvui_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.7 MB| + +## References + +https://huggingface.co/thkkvui/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_transformersbook_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_transformersbook_en.md new file mode 100644 index 00000000000000..11fe46f7f5fdb4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_transformersbook_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_transformersbook XlmRoBertaForTokenClassification from transformersbook +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_transformersbook +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_transformersbook` is a English model originally trained by transformersbook. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_transformersbook_en_5.4.0_3.0_1718111029428.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_transformersbook_en_5.4.0_3.0_1718111029428.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_transformersbook","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_transformersbook", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_transformersbook| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/transformersbook/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_transformersbook_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_transformersbook_pipeline_en.md new file mode 100644 index 00000000000000..87cf5e7b1a4b78 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_transformersbook_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_transformersbook_pipeline pipeline XlmRoBertaForTokenClassification from transformersbook +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_transformersbook_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_transformersbook_pipeline` is a English model originally trained by transformersbook. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_transformersbook_pipeline_en_5.4.0_3.0_1718111148462.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_transformersbook_pipeline_en_5.4.0_3.0_1718111148462.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_transformersbook_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_transformersbook_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_transformersbook_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/transformersbook/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_tyayoi_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_tyayoi_en.md new file mode 100644 index 00000000000000..6a5a4be4c426b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_tyayoi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_tyayoi XlmRoBertaForTokenClassification from tyayoi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_tyayoi +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_tyayoi` is a English model originally trained by tyayoi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_tyayoi_en_5.4.0_3.0_1718115697730.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_tyayoi_en_5.4.0_3.0_1718115697730.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_tyayoi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_tyayoi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_tyayoi| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/tyayoi/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_tyayoi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_tyayoi_pipeline_en.md new file mode 100644 index 00000000000000..eeca13669651e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_tyayoi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_tyayoi_pipeline pipeline XlmRoBertaForTokenClassification from tyayoi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_tyayoi_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_tyayoi_pipeline` is a English model originally trained by tyayoi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_tyayoi_pipeline_en_5.4.0_3.0_1718115819616.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_tyayoi_pipeline_en_5.4.0_3.0_1718115819616.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_tyayoi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_tyayoi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_tyayoi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/tyayoi/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_yezune_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_yezune_en.md new file mode 100644 index 00000000000000..d7d0700b88cd35 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_yezune_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_yezune XlmRoBertaForTokenClassification from yezune +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_yezune +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_yezune` is a English model originally trained by yezune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_yezune_en_5.4.0_3.0_1718118486794.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_yezune_en_5.4.0_3.0_1718118486794.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_yezune","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_yezune", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_yezune| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/yezune/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_yezune_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_yezune_pipeline_en.md new file mode 100644 index 00000000000000..2fc8d9441e6691 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_yezune_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_yezune_pipeline pipeline XlmRoBertaForTokenClassification from yezune +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_yezune_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_yezune_pipeline` is a English model originally trained by yezune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_yezune_pipeline_en_5.4.0_3.0_1718118606283.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_yezune_pipeline_en_5.4.0_3.0_1718118606283.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_yezune_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_yezune_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_yezune_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/yezune/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_tamil_the_neural_networker_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_tamil_the_neural_networker_en.md new file mode 100644 index 00000000000000..24fea6dbd961e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_tamil_the_neural_networker_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_tamil_the_neural_networker XlmRoBertaForTokenClassification from the-neural-networker +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_tamil_the_neural_networker +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_tamil_the_neural_networker` is a English model originally trained by the-neural-networker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_tamil_the_neural_networker_en_5.4.0_3.0_1718103758096.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_tamil_the_neural_networker_en_5.4.0_3.0_1718103758096.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_tamil_the_neural_networker","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_tamil_the_neural_networker", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_tamil_the_neural_networker| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|837.5 MB| + +## References + +https://huggingface.co/the-neural-networker/xlm-roberta-base-finetuned-panx-ta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_tamil_the_neural_networker_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_tamil_the_neural_networker_pipeline_en.md new file mode 100644 index 00000000000000..a66907a62c9ec9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_tamil_the_neural_networker_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_tamil_the_neural_networker_pipeline pipeline XlmRoBertaForTokenClassification from the-neural-networker +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_tamil_the_neural_networker_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_tamil_the_neural_networker_pipeline` is a English model originally trained by the-neural-networker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_tamil_the_neural_networker_pipeline_en_5.4.0_3.0_1718103845164.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_tamil_the_neural_networker_pipeline_en_5.4.0_3.0_1718103845164.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_tamil_the_neural_networker_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_tamil_the_neural_networker_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_tamil_the_neural_networker_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|837.5 MB| + +## References + +https://huggingface.co/the-neural-networker/xlm-roberta-base-finetuned-panx-ta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_telugu_the_neural_networker_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_telugu_the_neural_networker_en.md new file mode 100644 index 00000000000000..bec5386a44539f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_telugu_the_neural_networker_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_telugu_the_neural_networker XlmRoBertaForTokenClassification from the-neural-networker +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_telugu_the_neural_networker +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_telugu_the_neural_networker` is a English model originally trained by the-neural-networker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_telugu_the_neural_networker_en_5.4.0_3.0_1718114617598.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_telugu_the_neural_networker_en_5.4.0_3.0_1718114617598.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_telugu_the_neural_networker","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_telugu_the_neural_networker", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_telugu_the_neural_networker| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|814.1 MB| + +## References + +https://huggingface.co/the-neural-networker/xlm-roberta-base-finetuned-panx-te \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_telugu_the_neural_networker_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_telugu_the_neural_networker_pipeline_en.md new file mode 100644 index 00000000000000..0a6ef9815e3b61 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_telugu_the_neural_networker_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_telugu_the_neural_networker_pipeline pipeline XlmRoBertaForTokenClassification from the-neural-networker +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_telugu_the_neural_networker_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_telugu_the_neural_networker_pipeline` is a English model originally trained by the-neural-networker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_telugu_the_neural_networker_pipeline_en_5.4.0_3.0_1718114741171.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_telugu_the_neural_networker_pipeline_en_5.4.0_3.0_1718114741171.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_telugu_the_neural_networker_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_telugu_the_neural_networker_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_telugu_the_neural_networker_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|814.2 MB| + +## References + +https://huggingface.co/the-neural-networker/xlm-roberta-base-finetuned-panx-te + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_pnax_german_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_pnax_german_en.md new file mode 100644 index 00000000000000..5c25df11296f4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_pnax_german_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_pnax_german XlmRoBertaForTokenClassification from Almondpeanuts +author: John Snow Labs +name: xlm_roberta_base_finetuned_pnax_german +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_pnax_german` is a English model originally trained by Almondpeanuts. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_pnax_german_en_5.4.0_3.0_1718121928404.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_pnax_german_en_5.4.0_3.0_1718121928404.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_pnax_german","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_pnax_german", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_pnax_german| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Almondpeanuts/xlm-roberta-base-finetuned-pnax-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_pnax_german_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_pnax_german_pipeline_en.md new file mode 100644 index 00000000000000..c51896bd43cd5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_pnax_german_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_pnax_german_pipeline pipeline XlmRoBertaForTokenClassification from Almondpeanuts +author: John Snow Labs +name: xlm_roberta_base_finetuned_pnax_german_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_pnax_german_pipeline` is a English model originally trained by Almondpeanuts. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_pnax_german_pipeline_en_5.4.0_3.0_1718122039975.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_pnax_german_pipeline_en_5.4.0_3.0_1718122039975.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_pnax_german_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_pnax_german_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_pnax_german_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Almondpeanuts/xlm-roberta-base-finetuned-pnax-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetunned_panx_german_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetunned_panx_german_en.md new file mode 100644 index 00000000000000..82642c41316e07 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetunned_panx_german_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetunned_panx_german XlmRoBertaForTokenClassification from jhn9803 +author: John Snow Labs +name: xlm_roberta_base_finetunned_panx_german +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetunned_panx_german` is a English model originally trained by jhn9803. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetunned_panx_german_en_5.4.0_3.0_1718114709492.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetunned_panx_german_en_5.4.0_3.0_1718114709492.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetunned_panx_german","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetunned_panx_german", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetunned_panx_german| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/jhn9803/xlm-roberta-base-finetunned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetunned_panx_german_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetunned_panx_german_pipeline_en.md new file mode 100644 index 00000000000000..2d1b0d1f496d4f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetunned_panx_german_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetunned_panx_german_pipeline pipeline XlmRoBertaForTokenClassification from jhn9803 +author: John Snow Labs +name: xlm_roberta_base_finetunned_panx_german_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetunned_panx_german_pipeline` is a English model originally trained by jhn9803. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetunned_panx_german_pipeline_en_5.4.0_3.0_1718114797628.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetunned_panx_german_pipeline_en_5.4.0_3.0_1718114797628.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetunned_panx_german_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetunned_panx_german_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetunned_panx_german_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/jhn9803/xlm-roberta-base-finetunned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_ner_silvanus_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_ner_silvanus_pipeline_xx.md new file mode 100644 index 00000000000000..97f6bba3ae03ef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_ner_silvanus_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual xlm_roberta_base_ner_silvanus_pipeline pipeline XlmRoBertaForTokenClassification from rollerhafeezh-amikom +author: John Snow Labs +name: xlm_roberta_base_ner_silvanus_pipeline +date: 2024-06-11 +tags: [xx, open_source, pipeline, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_ner_silvanus_pipeline` is a Multilingual model originally trained by rollerhafeezh-amikom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_ner_silvanus_pipeline_xx_5.4.0_3.0_1718097257703.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_ner_silvanus_pipeline_xx_5.4.0_3.0_1718097257703.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_ner_silvanus_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_ner_silvanus_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_ner_silvanus_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|832.6 MB| + +## References + +https://huggingface.co/rollerhafeezh-amikom/xlm-roberta-base-ner-silvanus + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_ner_silvanus_xx.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_ner_silvanus_xx.md new file mode 100644 index 00000000000000..5e5f3f8b609c2b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_ner_silvanus_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual xlm_roberta_base_ner_silvanus XlmRoBertaForTokenClassification from rollerhafeezh-amikom +author: John Snow Labs +name: xlm_roberta_base_ner_silvanus +date: 2024-06-11 +tags: [xx, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_ner_silvanus` is a Multilingual model originally trained by rollerhafeezh-amikom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_ner_silvanus_xx_5.4.0_3.0_1718097143786.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_ner_silvanus_xx_5.4.0_3.0_1718097143786.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_ner_silvanus","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_ner_silvanus", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_ner_silvanus| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|832.6 MB| + +## References + +https://huggingface.co/rollerhafeezh-amikom/xlm-roberta-base-ner-silvanus \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_pie_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_pie_en.md new file mode 100644 index 00000000000000..12beccb36a23a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_pie_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_pie XlmRoBertaForTokenClassification from Gooogr +author: John Snow Labs +name: xlm_roberta_base_pie +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_pie` is a English model originally trained by Gooogr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_pie_en_5.4.0_3.0_1718125418567.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_pie_en_5.4.0_3.0_1718125418567.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_pie","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_pie", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_pie| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.9 MB| + +## References + +https://huggingface.co/Gooogr/xlm-roberta-base-pie \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_pie_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_pie_pipeline_en.md new file mode 100644 index 00000000000000..0154b675eba48b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_pie_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_pie_pipeline pipeline XlmRoBertaForTokenClassification from Gooogr +author: John Snow Labs +name: xlm_roberta_base_pie_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_pie_pipeline` is a English model originally trained by Gooogr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_pie_pipeline_en_5.4.0_3.0_1718125512193.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_pie_pipeline_en_5.4.0_3.0_1718125512193.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_pie_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_pie_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_pie_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.9 MB| + +## References + +https://huggingface.co/Gooogr/xlm-roberta-base-pie + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_pii_finetuned_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_pii_finetuned_en.md new file mode 100644 index 00000000000000..508ffdb7c541ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_pii_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_pii_finetuned XlmRoBertaForTokenClassification from 1-13-am +author: John Snow Labs +name: xlm_roberta_base_pii_finetuned +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_pii_finetuned` is a English model originally trained by 1-13-am. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_pii_finetuned_en_5.4.0_3.0_1718100836969.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_pii_finetuned_en_5.4.0_3.0_1718100836969.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_pii_finetuned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_pii_finetuned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_pii_finetuned| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|819.6 MB| + +## References + +https://huggingface.co/1-13-am/xlm-roberta-base-pii-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_pii_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_pii_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..cdc969da76bcd7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_pii_finetuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_pii_finetuned_pipeline pipeline XlmRoBertaForTokenClassification from 1-13-am +author: John Snow Labs +name: xlm_roberta_base_pii_finetuned_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_pii_finetuned_pipeline` is a English model originally trained by 1-13-am. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_pii_finetuned_pipeline_en_5.4.0_3.0_1718100991138.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_pii_finetuned_pipeline_en_5.4.0_3.0_1718100991138.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_pii_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_pii_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_pii_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|819.6 MB| + +## References + +https://huggingface.co/1-13-am/xlm-roberta-base-pii-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_postagging_urdu_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_postagging_urdu_en.md new file mode 100644 index 00000000000000..130eae93e61e4f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_postagging_urdu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_postagging_urdu XlmRoBertaForTokenClassification from Aimlab +author: John Snow Labs +name: xlm_roberta_base_postagging_urdu +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_postagging_urdu` is a English model originally trained by Aimlab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_postagging_urdu_en_5.4.0_3.0_1718102837887.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_postagging_urdu_en_5.4.0_3.0_1718102837887.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_postagging_urdu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_postagging_urdu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_postagging_urdu| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|681.0 MB| + +## References + +https://huggingface.co/Aimlab/xlm-roberta-base-postagging-urdu \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_postagging_urdu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_postagging_urdu_pipeline_en.md new file mode 100644 index 00000000000000..562a0252b39cf7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_postagging_urdu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_postagging_urdu_pipeline pipeline XlmRoBertaForTokenClassification from Aimlab +author: John Snow Labs +name: xlm_roberta_base_postagging_urdu_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_postagging_urdu_pipeline` is a English model originally trained by Aimlab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_postagging_urdu_pipeline_en_5.4.0_3.0_1718103064313.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_postagging_urdu_pipeline_en_5.4.0_3.0_1718103064313.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_postagging_urdu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_postagging_urdu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_postagging_urdu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|681.0 MB| + +## References + +https://huggingface.co/Aimlab/xlm-roberta-base-postagging-urdu + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_panx_uzbek_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_panx_uzbek_en.md new file mode 100644 index 00000000000000..1d42962ef6cbd2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_panx_uzbek_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_panx_uzbek XlmRoBertaForTokenClassification from murodbek +author: John Snow Labs +name: xlm_roberta_panx_uzbek +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_panx_uzbek` is a English model originally trained by murodbek. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_panx_uzbek_en_5.4.0_3.0_1718134383603.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_panx_uzbek_en_5.4.0_3.0_1718134383603.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_panx_uzbek","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_panx_uzbek", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_panx_uzbek| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|836.7 MB| + +## References + +https://huggingface.co/murodbek/xlm-roberta-panx-uz \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_panx_uzbek_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_panx_uzbek_pipeline_en.md new file mode 100644 index 00000000000000..0305698493ccc5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_panx_uzbek_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_panx_uzbek_pipeline pipeline XlmRoBertaForTokenClassification from murodbek +author: John Snow Labs +name: xlm_roberta_panx_uzbek_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_panx_uzbek_pipeline` is a English model originally trained by murodbek. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_panx_uzbek_pipeline_en_5.4.0_3.0_1718134471717.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_panx_uzbek_pipeline_en_5.4.0_3.0_1718134471717.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_panx_uzbek_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_panx_uzbek_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_panx_uzbek_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|836.7 MB| + +## References + +https://huggingface.co/murodbek/xlm-roberta-panx-uz + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmr_base_finetuned_hausa_2e_4_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlmr_base_finetuned_hausa_2e_4_en.md new file mode 100644 index 00000000000000..68b964d19458a6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmr_base_finetuned_hausa_2e_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlmr_base_finetuned_hausa_2e_4 XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: xlmr_base_finetuned_hausa_2e_4 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_base_finetuned_hausa_2e_4` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_base_finetuned_hausa_2e_4_en_5.4.0_3.0_1718135056261.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_base_finetuned_hausa_2e_4_en_5.4.0_3.0_1718135056261.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlmr_base_finetuned_hausa_2e_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlmr_base_finetuned_hausa_2e_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_base_finetuned_hausa_2e_4| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|868.5 MB| + +## References + +https://huggingface.co/grace-pro/xlmr-base-finetuned-hausa-2e-4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmr_base_finetuned_hausa_2e_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlmr_base_finetuned_hausa_2e_4_pipeline_en.md new file mode 100644 index 00000000000000..3da738b22d0912 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmr_base_finetuned_hausa_2e_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmr_base_finetuned_hausa_2e_4_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: xlmr_base_finetuned_hausa_2e_4_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_base_finetuned_hausa_2e_4_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_base_finetuned_hausa_2e_4_pipeline_en_5.4.0_3.0_1718135160923.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_base_finetuned_hausa_2e_4_pipeline_en_5.4.0_3.0_1718135160923.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmr_base_finetuned_hausa_2e_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmr_base_finetuned_hausa_2e_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_base_finetuned_hausa_2e_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|868.5 MB| + +## References + +https://huggingface.co/grace-pro/xlmr-base-finetuned-hausa-2e-4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmr_medical_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlmr_medical_en.md new file mode 100644 index 00000000000000..b715c4750b0e74 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmr_medical_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlmr_medical XlmRoBertaForTokenClassification from aaaksenova +author: John Snow Labs +name: xlmr_medical +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_medical` is a English model originally trained by aaaksenova. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_medical_en_5.4.0_3.0_1718099301347.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_medical_en_5.4.0_3.0_1718099301347.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlmr_medical","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlmr_medical", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_medical| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/aaaksenova/xlmr_medical \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmr_medical_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlmr_medical_pipeline_en.md new file mode 100644 index 00000000000000..d6eec4297a3ed1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmr_medical_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmr_medical_pipeline pipeline XlmRoBertaForTokenClassification from aaaksenova +author: John Snow Labs +name: xlmr_medical_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_medical_pipeline` is a English model originally trained by aaaksenova. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_medical_pipeline_en_5.4.0_3.0_1718099367588.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_medical_pipeline_en_5.4.0_3.0_1718099367588.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmr_medical_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmr_medical_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_medical_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/aaaksenova/xlmr_medical + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_arned_base_finetuned_panx_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_arned_base_finetuned_panx_de.md new file mode 100644 index 00000000000000..d29d4f330d1e5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_arned_base_finetuned_panx_de.md @@ -0,0 +1,113 @@ +--- +layout: model +title: German XLMRobertaForTokenClassification Base Cased model (from ArneD) +author: John Snow Labs +name: xlmroberta_ner_arned_base_finetuned_panx +date: 2024-06-11 +tags: [de, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-de` is a German model originally trained by `ArneD`. + +## Predicted Entities + +`PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_arned_base_finetuned_panx_de_5.4.0_3.0_1718071934527.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_arned_base_finetuned_panx_de_5.4.0_3.0_1718071934527.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_arned_base_finetuned_panx","de") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_arned_base_finetuned_panx","de") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("de.ner.xlmr_roberta.xtreme.base_finetuned.by_ArneD").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_arned_base_finetuned_panx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|853.8 MB| + +## References + +References + +- https://huggingface.co/ArneD/xlm-roberta-base-finetuned-panx-de +- https://paperswithcode.com/sota?task=Token+Classification&dataset=xtreme \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_arned_base_finetuned_panx_pipeline_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_arned_base_finetuned_panx_pipeline_de.md new file mode 100644 index 00000000000000..7f7ac64f64892e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_arned_base_finetuned_panx_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German xlmroberta_ner_arned_base_finetuned_panx_pipeline pipeline XlmRoBertaForTokenClassification from ArneD +author: John Snow Labs +name: xlmroberta_ner_arned_base_finetuned_panx_pipeline +date: 2024-06-11 +tags: [de, open_source, pipeline, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_arned_base_finetuned_panx_pipeline` is a German model originally trained by ArneD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_arned_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718072021578.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_arned_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718072021578.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_arned_base_finetuned_panx_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_arned_base_finetuned_panx_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_arned_base_finetuned_panx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|853.8 MB| + +## References + +https://huggingface.co/ArneD/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_bc5cdr_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_bc5cdr_en.md new file mode 100644 index 00000000000000..8d746630f1093e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_bc5cdr_en.md @@ -0,0 +1,113 @@ +--- +layout: model +title: English XLMRobertaForTokenClassification Base Cased model (from tner) +author: John Snow Labs +name: xlmroberta_ner_base_bc5cdr +date: 2024-06-11 +tags: [en, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-bc5cdr` is a English model originally trained by `tner`. + +## Predicted Entities + +`chemical`, `disease` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_bc5cdr_en_5.4.0_3.0_1718072396890.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_bc5cdr_en_5.4.0_3.0_1718072396890.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_bc5cdr","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_bc5cdr","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.xlmr_roberta.bc5cdr.base").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_bc5cdr| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|779.7 MB| + +## References + +References + +- https://huggingface.co/tner/xlm-roberta-base-bc5cdr +- https://github.com/asahi417/tner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_bc5cdr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_bc5cdr_pipeline_en.md new file mode 100644 index 00000000000000..3e2f54c300ef6a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_bc5cdr_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmroberta_ner_base_bc5cdr_pipeline pipeline XlmRoBertaForTokenClassification from tner +author: John Snow Labs +name: xlmroberta_ner_base_bc5cdr_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_base_bc5cdr_pipeline` is a English model originally trained by tner. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_bc5cdr_pipeline_en_5.4.0_3.0_1718072584099.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_bc5cdr_pipeline_en_5.4.0_3.0_1718072584099.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_base_bc5cdr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_base_bc5cdr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_bc5cdr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|779.7 MB| + +## References + +https://huggingface.co/tner/xlm-roberta-base-bc5cdr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili_pipeline_sw.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili_pipeline_sw.md new file mode 100644 index 00000000000000..73bdfa8278ed4c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili_pipeline_sw.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Swahili (macrolanguage) xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili_pipeline pipeline XlmRoBertaForTokenClassification from mbeukman +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili_pipeline +date: 2024-06-11 +tags: [sw, open_source, pipeline, onnx] +task: Named Entity Recognition +language: sw +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili_pipeline` is a Swahili (macrolanguage) model originally trained by mbeukman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili_pipeline_sw_5.4.0_3.0_1718072130572.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili_pipeline_sw_5.4.0_3.0_1718072130572.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili_pipeline", lang = "sw") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili_pipeline", lang = "sw") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sw| +|Size:|1.0 GB| + +## References + +https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-amharic-finetuned-ner-swahili + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili_sw.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili_sw.md new file mode 100644 index 00000000000000..738effba8f4942 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili_sw.md @@ -0,0 +1,115 @@ +--- +layout: model +title: Swahili XLMRobertaForTokenClassification Base Cased model (from mbeukman) +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili +date: 2024-06-11 +tags: [sw, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: sw +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-amharic-finetuned-ner-swahili` is a Swahili model originally trained by `mbeukman`. + +## Predicted Entities + +`PER`, `LOC`, `ORG`, `DATE` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili_sw_5.4.0_3.0_1718072036241.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili_sw_5.4.0_3.0_1718072036241.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili","sw") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili","sw") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("sw.ner.xlmr_roberta.base_finetuned").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|sw| +|Size:|1.0 GB| + +## References + +References + +- https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-amharic-finetuned-ner-swahili +- https://arxiv.org/abs/2103.11811 +- https://github.com/Michael-Beukman/NERTransfer +- https://github.com/masakhane-io/masakhane-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili_pipeline_sw.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili_pipeline_sw.md new file mode 100644 index 00000000000000..04ecbf84774610 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili_pipeline_sw.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Swahili (macrolanguage) xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili_pipeline pipeline XlmRoBertaForTokenClassification from mbeukman +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili_pipeline +date: 2024-06-11 +tags: [sw, open_source, pipeline, onnx] +task: Named Entity Recognition +language: sw +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili_pipeline` is a Swahili (macrolanguage) model originally trained by mbeukman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili_pipeline_sw_5.4.0_3.0_1718072135649.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili_pipeline_sw_5.4.0_3.0_1718072135649.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili_pipeline", lang = "sw") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili_pipeline", lang = "sw") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sw| +|Size:|1.0 GB| + +## References + +https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-luo-finetuned-ner-swahili + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili_sw.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili_sw.md new file mode 100644 index 00000000000000..99387e381f7e79 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili_sw.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Swahili (macrolanguage) xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili XlmRoBertaForTokenClassification from mbeukman +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili +date: 2024-06-11 +tags: [sw, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: sw +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili` is a Swahili (macrolanguage) model originally trained by mbeukman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili_sw_5.4.0_3.0_1718072026461.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili_sw_5.4.0_3.0_1718072026461.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili","sw") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili", "sw") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|sw| +|Size:|1.0 GB| + +## References + +https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-luo-finetuned-ner-swahili \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili_pipeline_sw.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili_pipeline_sw.md new file mode 100644 index 00000000000000..904e23c78b218f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili_pipeline_sw.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Swahili (macrolanguage) xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili_pipeline pipeline XlmRoBertaForTokenClassification from mbeukman +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili_pipeline +date: 2024-06-11 +tags: [sw, open_source, pipeline, onnx] +task: Named Entity Recognition +language: sw +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili_pipeline` is a Swahili (macrolanguage) model originally trained by mbeukman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili_pipeline_sw_5.4.0_3.0_1718072733391.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili_pipeline_sw_5.4.0_3.0_1718072733391.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili_pipeline", lang = "sw") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili_pipeline", lang = "sw") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sw| +|Size:|1.0 GB| + +## References + +https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-luganda-finetuned-ner-swahili + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili_sw.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili_sw.md new file mode 100644 index 00000000000000..7478afe2726f49 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili_sw.md @@ -0,0 +1,115 @@ +--- +layout: model +title: Swahili XLMRobertaForTokenClassification Base Cased model (from mbeukman) +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili +date: 2024-06-11 +tags: [sw, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: sw +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-luganda-finetuned-ner-swahili` is a Swahili model originally trained by `mbeukman`. + +## Predicted Entities + +`PER`, `DATE`, `ORG`, `LOC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili_sw_5.4.0_3.0_1718072658418.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili_sw_5.4.0_3.0_1718072658418.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili","sw") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili","sw") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("sw.ner.xlmr_roberta.base_finetuned_luganda.by_mbeukman").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|sw| +|Size:|1.0 GB| + +## References + +References + +- https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-luganda-finetuned-ner-swahili +- https://arxiv.org/abs/2103.11811 +- https://github.com/Michael-Beukman/NERTransfer +- https://github.com/masakhane-io/masakhane-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili_pipeline_sw.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili_pipeline_sw.md new file mode 100644 index 00000000000000..2a89ccf46bc217 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili_pipeline_sw.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Swahili (macrolanguage) xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili_pipeline pipeline XlmRoBertaForTokenClassification from mbeukman +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili_pipeline +date: 2024-06-11 +tags: [sw, open_source, pipeline, onnx] +task: Named Entity Recognition +language: sw +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili_pipeline` is a Swahili (macrolanguage) model originally trained by mbeukman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili_pipeline_sw_5.4.0_3.0_1718072732776.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili_pipeline_sw_5.4.0_3.0_1718072732776.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili_pipeline", lang = "sw") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili_pipeline", lang = "sw") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sw| +|Size:|1.0 GB| + +## References + +https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-naija-finetuned-ner-swahili + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili_sw.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili_sw.md new file mode 100644 index 00000000000000..391e681e214f99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili_sw.md @@ -0,0 +1,115 @@ +--- +layout: model +title: Swahili XLMRobertaForTokenClassification Base Cased model (from mbeukman) +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili +date: 2024-06-11 +tags: [sw, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: sw +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-naija-finetuned-ner-swahili` is a Swahili model originally trained by `mbeukman`. + +## Predicted Entities + +`PER`, `DATE`, `ORG`, `LOC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili_sw_5.4.0_3.0_1718072656266.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili_sw_5.4.0_3.0_1718072656266.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili","sw") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili","sw") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("sw.ner.xlmr_roberta.base_finetuned_naija.by_mbeukman").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|sw| +|Size:|1.0 GB| + +## References + +References + +- https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-naija-finetuned-ner-swahili +- https://arxiv.org/abs/2103.11811 +- https://github.com/Michael-Beukman/NERTransfer +- https://github.com/masakhane-io/masakhane-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_recipe_all_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_recipe_all_en.md new file mode 100644 index 00000000000000..b7d6316b89b058 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_recipe_all_en.md @@ -0,0 +1,117 @@ +--- +layout: model +title: English XLMRobertaForTokenClassification Base Cased model (from edwardjross) +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_recipe_all +date: 2024-06-11 +tags: [en, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-recipe-all` is a English model originally trained by `edwardjross`. + +## Predicted Entities + +`UNIT`, `DF`, `QUANTITY`, `TEMP`, `SIZE`, `NAME`, `STATE` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_recipe_all_en_5.4.0_3.0_1718093425591.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_recipe_all_en_5.4.0_3.0_1718093425591.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_recipe_all","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_recipe_all","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.xlmr_roberta.base_finetuned").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_recipe_all| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|834.4 MB| + +## References + +References + +- https://huggingface.co/edwardjross/xlm-roberta-base-finetuned-recipe-all +- https://github.com/cosylabiiit/recipe-knowledge-mining +- https://arxiv.org/abs/2004.12184 +- https://github.com/cosylabiiit/recipe-knowledge-mining +- https://www.oreilly.com/library/view/natural-language-processing/9781098103231/ +- https://github.com/EdwardJRoss/nlp_transformers_exercises/blob/master/notebooks/ch4-ner-recipe-stanford-crf.ipynb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_recipe_all_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_recipe_all_pipeline_en.md new file mode 100644 index 00000000000000..f728205ce2ee39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_recipe_all_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmroberta_ner_base_finetuned_recipe_all_pipeline pipeline XlmRoBertaForTokenClassification from edwardjross +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_recipe_all_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_base_finetuned_recipe_all_pipeline` is a English model originally trained by edwardjross. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_recipe_all_pipeline_en_5.4.0_3.0_1718093515879.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_recipe_all_pipeline_en_5.4.0_3.0_1718093515879.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_base_finetuned_recipe_all_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_base_finetuned_recipe_all_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_recipe_all_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|834.4 MB| + +## References + +https://huggingface.co/edwardjross/xlm-roberta-base-finetuned-recipe-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_naija_pcm.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_naija_pcm.md new file mode 100644 index 00000000000000..742881afa86855 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_naija_pcm.md @@ -0,0 +1,120 @@ +--- +layout: model +title: Nigerian Pidgin XLMRobertaForTokenClassification Base Cased model (from mbeukman) +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_swahili_finetuned_naija +date: 2024-06-11 +tags: [pcm, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: pcm +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-swahili-finetuned-ner-naija` is a Nigerian Pidgin model originally trained by `mbeukman`. + +## Predicted Entities + +`PER`, `LOC`, `DATE`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_swahili_finetuned_naija_pcm_5.4.0_3.0_1718072035316.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_swahili_finetuned_naija_pcm_5.4.0_3.0_1718072035316.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_swahili_finetuned_naija","pcm") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_swahili_finetuned_naija","pcm") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("pcm.ner.xlmr_roberta.base_finetuned_swahili.by_mbeukman").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_swahili_finetuned_naija| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|pcm| +|Size:|1.0 GB| + +## References + +References + +- https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-swahili-finetuned-ner-naija +- https://arxiv.org/abs/2103.11811 +- https://github.com/Michael-Beukman/NERTransfer +- https://www.apache.org/licenses/LICENSE-2.0 +- https://github.com/Michael-Beukman/NERTransfer +- https://github.com/masakhane-io/masakhane-ner +- https://arxiv.org/pdf/2103.11811.pdf +- https://arxiv.org/abs/2103.11811 +- https://arxiv.org/abs/2103.11811 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_naija_pipeline_pcm.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_naija_pipeline_pcm.md new file mode 100644 index 00000000000000..c8c98075671bfd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_naija_pipeline_pcm.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Nigerian Pidgin xlmroberta_ner_base_finetuned_swahili_finetuned_naija_pipeline pipeline XlmRoBertaForTokenClassification from mbeukman +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_swahili_finetuned_naija_pipeline +date: 2024-06-11 +tags: [pcm, open_source, pipeline, onnx] +task: Named Entity Recognition +language: pcm +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_base_finetuned_swahili_finetuned_naija_pipeline` is a Nigerian Pidgin model originally trained by mbeukman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_swahili_finetuned_naija_pipeline_pcm_5.4.0_3.0_1718072108386.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_swahili_finetuned_naija_pipeline_pcm_5.4.0_3.0_1718072108386.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_base_finetuned_swahili_finetuned_naija_pipeline", lang = "pcm") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_base_finetuned_swahili_finetuned_naija_pipeline", lang = "pcm") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_swahili_finetuned_naija_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pcm| +|Size:|1.0 GB| + +## References + +https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-swahili-finetuned-ner-naija + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand_pipeline_rw.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand_pipeline_rw.md new file mode 100644 index 00000000000000..2c2d684d1097be --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand_pipeline_rw.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Kinyarwanda xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand_pipeline pipeline XlmRoBertaForTokenClassification from mbeukman +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand_pipeline +date: 2024-06-11 +tags: [rw, open_source, pipeline, onnx] +task: Named Entity Recognition +language: rw +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand_pipeline` is a Kinyarwanda model originally trained by mbeukman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand_pipeline_rw_5.4.0_3.0_1718072120242.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand_pipeline_rw_5.4.0_3.0_1718072120242.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand_pipeline", lang = "rw") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand_pipeline", lang = "rw") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|rw| +|Size:|1.0 GB| + +## References + +https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-swahili-finetuned-ner-kinyarwanda + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand_rw.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand_rw.md new file mode 100644 index 00000000000000..8c2178b7bf263d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand_rw.md @@ -0,0 +1,115 @@ +--- +layout: model +title: Kinyarwanda XLMRobertaForTokenClassification Base Cased model (from mbeukman) +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand +date: 2024-06-11 +tags: [rw, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: rw +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-swahili-finetuned-ner-kinyarwanda` is a Kinyarwanda model originally trained by `mbeukman`. + +## Predicted Entities + +`DATE`, `PER`, `ORG`, `LOC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand_rw_5.4.0_3.0_1718072039614.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand_rw_5.4.0_3.0_1718072039614.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand","rw") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand","rw") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("rw.ner.xlmr_roberta.base_finetuned_swahili.by_mbeukman").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|rw| +|Size:|1.0 GB| + +## References + +References + +- https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-swahili-finetuned-ner-kinyarwanda +- https://arxiv.org/abs/2103.11811 +- https://github.com/Michael-Beukman/NERTransfer +- https://github.com/masakhane-io/masakhane-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_luo.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_luo.md new file mode 100644 index 00000000000000..02d165c02ef2ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_luo.md @@ -0,0 +1,115 @@ +--- +layout: model +title: Luo (Kenya and Tanzania) XLMRobertaForTokenClassification Base Cased model (from mbeukman) +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_swahili_finetuned_ner +date: 2024-06-11 +tags: [luo, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: luo +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-swahili-finetuned-ner-luo` is a Luo (Kenya and Tanzania) model originally trained by `mbeukman`. + +## Predicted Entities + +`DATE`, `PER`, `ORG`, `LOC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_swahili_finetuned_ner_luo_5.4.0_3.0_1718072557607.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_swahili_finetuned_ner_luo_5.4.0_3.0_1718072557607.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_swahili_finetuned_ner","luo") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_swahili_finetuned_ner","luo") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("luo.ner.xlmr_roberta.base_finetuned_swahili.by_mbeukman").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_swahili_finetuned_ner| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|luo| +|Size:|1.0 GB| + +## References + +References + +- https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-swahili-finetuned-ner-luo +- https://arxiv.org/abs/2103.11811 +- https://github.com/Michael-Beukman/NERTransfer +- https://github.com/masakhane-io/masakhane-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_pipeline_luo.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_pipeline_luo.md new file mode 100644 index 00000000000000..e9142fc3b4377a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_pipeline_luo.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Dholuo, Luo (Kenya and Tanzania) xlmroberta_ner_base_finetuned_swahili_finetuned_ner_pipeline pipeline XlmRoBertaForTokenClassification from mbeukman +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_swahili_finetuned_ner_pipeline +date: 2024-06-11 +tags: [luo, open_source, pipeline, onnx] +task: Named Entity Recognition +language: luo +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_base_finetuned_swahili_finetuned_ner_pipeline` is a Dholuo, Luo (Kenya and Tanzania) model originally trained by mbeukman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_swahili_finetuned_ner_pipeline_luo_5.4.0_3.0_1718072623410.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_swahili_finetuned_ner_pipeline_luo_5.4.0_3.0_1718072623410.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_base_finetuned_swahili_finetuned_ner_pipeline", lang = "luo") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_base_finetuned_swahili_finetuned_ner_pipeline", lang = "luo") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_swahili_finetuned_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|luo| +|Size:|1.0 GB| + +## References + +https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-swahili-finetuned-ner-luo + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof_pipeline_wo.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof_pipeline_wo.md new file mode 100644 index 00000000000000..b52a026c19a540 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof_pipeline_wo.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Wolof xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof_pipeline pipeline XlmRoBertaForTokenClassification from mbeukman +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof_pipeline +date: 2024-06-11 +tags: [wo, open_source, pipeline, onnx] +task: Named Entity Recognition +language: wo +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof_pipeline` is a Wolof model originally trained by mbeukman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof_pipeline_wo_5.4.0_3.0_1718093466874.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof_pipeline_wo_5.4.0_3.0_1718093466874.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof_pipeline", lang = "wo") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof_pipeline", lang = "wo") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|wo| +|Size:|1.0 GB| + +## References + +https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-swahili-finetuned-ner-wolof + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof_wo.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof_wo.md new file mode 100644 index 00000000000000..277d50be428ba4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof_wo.md @@ -0,0 +1,115 @@ +--- +layout: model +title: Wolof XLMRobertaForTokenClassification Base Cased model (from mbeukman) +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof +date: 2024-06-11 +tags: [wo, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: wo +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-swahili-finetuned-ner-wolof` is a Wolof model originally trained by `mbeukman`. + +## Predicted Entities + +`DATE`, `PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof_wo_5.4.0_3.0_1718093401811.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof_wo_5.4.0_3.0_1718093401811.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof","wo") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof","wo") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("wo.ner.xlmr_roberta.base_finetuned_swahili.by_mbeukman").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|wo| +|Size:|1.0 GB| + +## References + +References + +- https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-swahili-finetuned-ner-wolof +- https://arxiv.org/abs/2103.11811 +- https://github.com/Michael-Beukman/NERTransfer +- https://github.com/masakhane-io/masakhane-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili_pipeline_sw.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili_pipeline_sw.md new file mode 100644 index 00000000000000..da49e9b863f0f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili_pipeline_sw.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Swahili (macrolanguage) xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili_pipeline pipeline XlmRoBertaForTokenClassification from mbeukman +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili_pipeline +date: 2024-06-11 +tags: [sw, open_source, pipeline, onnx] +task: Named Entity Recognition +language: sw +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili_pipeline` is a Swahili (macrolanguage) model originally trained by mbeukman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili_pipeline_sw_5.4.0_3.0_1718072696880.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili_pipeline_sw_5.4.0_3.0_1718072696880.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili_pipeline", lang = "sw") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili_pipeline", lang = "sw") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sw| +|Size:|1.0 GB| + +## References + +https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-wolof-finetuned-ner-swahili + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili_sw.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili_sw.md new file mode 100644 index 00000000000000..76a70c39f0fed8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili_sw.md @@ -0,0 +1,115 @@ +--- +layout: model +title: Swahili XLMRobertaForTokenClassification Base Cased model (from mbeukman) +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili +date: 2024-06-11 +tags: [sw, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: sw +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-wolof-finetuned-ner-swahili` is a Swahili model originally trained by `mbeukman`. + +## Predicted Entities + +`DATE`, `PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili_sw_5.4.0_3.0_1718072618924.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili_sw_5.4.0_3.0_1718072618924.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili","sw") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili","sw") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("sw.ner.xlmr_roberta.base_finetuned_wolof.by_mbeukman").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|sw| +|Size:|1.0 GB| + +## References + +References + +- https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-wolof-finetuned-ner-swahili +- https://arxiv.org/abs/2103.11811 +- https://github.com/Michael-Beukman/NERTransfer +- https://github.com/masakhane-io/masakhane-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili_pipeline_sw.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili_pipeline_sw.md new file mode 100644 index 00000000000000..90a464e3475e2d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili_pipeline_sw.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Swahili (macrolanguage) xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili_pipeline pipeline XlmRoBertaForTokenClassification from mbeukman +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili_pipeline +date: 2024-06-11 +tags: [sw, open_source, pipeline, onnx] +task: Named Entity Recognition +language: sw +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili_pipeline` is a Swahili (macrolanguage) model originally trained by mbeukman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili_pipeline_sw_5.4.0_3.0_1718093797916.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili_pipeline_sw_5.4.0_3.0_1718093797916.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili_pipeline", lang = "sw") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili_pipeline", lang = "sw") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sw| +|Size:|1.0 GB| + +## References + +https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-yoruba-finetuned-ner-swahili + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili_sw.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili_sw.md new file mode 100644 index 00000000000000..57a5df026e4ba9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili_sw.md @@ -0,0 +1,115 @@ +--- +layout: model +title: Swahili XLMRobertaForTokenClassification Base Cased model (from mbeukman) +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili +date: 2024-06-11 +tags: [sw, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: sw +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-yoruba-finetuned-ner-swahili` is a Swahili model originally trained by `mbeukman`. + +## Predicted Entities + +`DATE`, `PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili_sw_5.4.0_3.0_1718093731937.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili_sw_5.4.0_3.0_1718093731937.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili","sw") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili","sw") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("sw.ner.xlmr_roberta.base_finetuned_yoruba.by_mbeukman").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|sw| +|Size:|1.0 GB| + +## References + +References + +- https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-yoruba-finetuned-ner-swahili +- https://arxiv.org/abs/2103.11811 +- https://github.com/Michael-Beukman/NERTransfer +- https://github.com/masakhane-io/masakhane-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_uncased_all_english_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_uncased_all_english_en.md new file mode 100644 index 00000000000000..867522a8579c99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_uncased_all_english_en.md @@ -0,0 +1,113 @@ +--- +layout: model +title: English XLMRobertaForTokenClassification Base Uncased model (from tner) +author: John Snow Labs +name: xlmroberta_ner_base_uncased_all_english +date: 2024-06-11 +tags: [en, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-uncased-all-english` is a English model originally trained by `tner`. + +## Predicted Entities + +`actor`, `time`, `corporation`, `ordinal number`, `cardinal number`, `restaurant`, `director`, `rna`, `geopolitical area`, `rating`, `protein`, `percent`, `product`, `plot`, `dna`, `disease`, `cell line`, `law`, `other`, `quote`, `date`, `soundtrack`, `origin`, `amenity`, `chemical`, `event`, `cuisine`, `dish`, `work of art`, `genre`, `cell type`, `location`, `language`, `quantity`, `award`, `character name`, `facility`, `relationship`, `organization`, `opinion`, `group`, `money`, `person` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_uncased_all_english_en_5.4.0_3.0_1718093500047.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_uncased_all_english_en_5.4.0_3.0_1718093500047.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_uncased_all_english","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_uncased_all_english","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.xlmr_roberta.all_english.uncased_base.by_tner").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_uncased_all_english| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|803.9 MB| + +## References + +References + +- https://huggingface.co/tner/xlm-roberta-base-uncased-all-english +- https://github.com/asahi417/tner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_uncased_all_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_uncased_all_english_pipeline_en.md new file mode 100644 index 00000000000000..89dad2bcd7cadf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_uncased_all_english_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmroberta_ner_base_uncased_all_english_pipeline pipeline XlmRoBertaForTokenClassification from tner +author: John Snow Labs +name: xlmroberta_ner_base_uncased_all_english_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_base_uncased_all_english_pipeline` is a English model originally trained by tner. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_uncased_all_english_pipeline_en_5.4.0_3.0_1718093669223.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_uncased_all_english_pipeline_en_5.4.0_3.0_1718093669223.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_base_uncased_all_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_base_uncased_all_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_uncased_all_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|803.9 MB| + +## References + +https://huggingface.co/tner/xlm-roberta-base-uncased-all-english + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_cj_mills_base_finetuned_panx_all_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_cj_mills_base_finetuned_panx_all_pipeline_xx.md new file mode 100644 index 00000000000000..e5817bdbfc15e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_cj_mills_base_finetuned_panx_all_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual xlmroberta_ner_cj_mills_base_finetuned_panx_all_pipeline pipeline XlmRoBertaForTokenClassification from cj-mills +author: John Snow Labs +name: xlmroberta_ner_cj_mills_base_finetuned_panx_all_pipeline +date: 2024-06-11 +tags: [xx, open_source, pipeline, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_cj_mills_base_finetuned_panx_all_pipeline` is a Multilingual model originally trained by cj-mills. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_cj_mills_base_finetuned_panx_all_pipeline_xx_5.4.0_3.0_1718094020656.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_cj_mills_base_finetuned_panx_all_pipeline_xx_5.4.0_3.0_1718094020656.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_cj_mills_base_finetuned_panx_all_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_cj_mills_base_finetuned_panx_all_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_cj_mills_base_finetuned_panx_all_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|859.8 MB| + +## References + +https://huggingface.co/cj-mills/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_cj_mills_base_finetuned_panx_all_xx.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_cj_mills_base_finetuned_panx_all_xx.md new file mode 100644 index 00000000000000..fc5824794545a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_cj_mills_base_finetuned_panx_all_xx.md @@ -0,0 +1,112 @@ +--- +layout: model +title: Multilingual XLMRobertaForTokenClassification Base Cased model (from cj-mills) +author: John Snow Labs +name: xlmroberta_ner_cj_mills_base_finetuned_panx_all +date: 2024-06-11 +tags: [xx, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-all` is a Multilingual model originally trained by `cj-mills`. + +## Predicted Entities + +`ORG`, `LOC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_cj_mills_base_finetuned_panx_all_xx_5.4.0_3.0_1718093933639.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_cj_mills_base_finetuned_panx_all_xx_5.4.0_3.0_1718093933639.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_cj_mills_base_finetuned_panx_all","xx") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_cj_mills_base_finetuned_panx_all","xx") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("xx.ner.xlmr_roberta.base_finetuned").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_cj_mills_base_finetuned_panx_all| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|859.8 MB| + +## References + +References + +- https://huggingface.co/cj-mills/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_dfsj_base_finetuned_panx_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_dfsj_base_finetuned_panx_de.md new file mode 100644 index 00000000000000..e7b02ecfbb3fb1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_dfsj_base_finetuned_panx_de.md @@ -0,0 +1,113 @@ +--- +layout: model +title: German XLMRobertaForTokenClassification Base Cased model (from dfsj) +author: John Snow Labs +name: xlmroberta_ner_dfsj_base_finetuned_panx +date: 2024-06-11 +tags: [de, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-de` is a German model originally trained by `dfsj`. + +## Predicted Entities + +`PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_dfsj_base_finetuned_panx_de_5.4.0_3.0_1718093332007.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_dfsj_base_finetuned_panx_de_5.4.0_3.0_1718093332007.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_dfsj_base_finetuned_panx","de") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_dfsj_base_finetuned_panx","de") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("de.ner.xlmr_roberta.xtreme.base_finetuned.by_dfsj").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_dfsj_base_finetuned_panx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|853.8 MB| + +## References + +References + +- https://huggingface.co/dfsj/xlm-roberta-base-finetuned-panx-de +- https://paperswithcode.com/sota?task=Token+Classification&dataset=xtreme \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_dfsj_base_finetuned_panx_pipeline_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_dfsj_base_finetuned_panx_pipeline_de.md new file mode 100644 index 00000000000000..2eac378fe59e89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_dfsj_base_finetuned_panx_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German xlmroberta_ner_dfsj_base_finetuned_panx_pipeline pipeline XlmRoBertaForTokenClassification from dfsj +author: John Snow Labs +name: xlmroberta_ner_dfsj_base_finetuned_panx_pipeline +date: 2024-06-11 +tags: [de, open_source, pipeline, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_dfsj_base_finetuned_panx_pipeline` is a German model originally trained by dfsj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_dfsj_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718093418856.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_dfsj_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718093418856.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_dfsj_base_finetuned_panx_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_dfsj_base_finetuned_panx_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_dfsj_base_finetuned_panx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|853.8 MB| + +## References + +https://huggingface.co/dfsj/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_dkasti_base_finetuned_panx_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_dkasti_base_finetuned_panx_de.md new file mode 100644 index 00000000000000..a30233ee8013eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_dkasti_base_finetuned_panx_de.md @@ -0,0 +1,113 @@ +--- +layout: model +title: German XLMRobertaForTokenClassification Base Cased model (from dkasti) +author: John Snow Labs +name: xlmroberta_ner_dkasti_base_finetuned_panx +date: 2024-06-11 +tags: [de, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-de` is a German model originally trained by `dkasti`. + +## Predicted Entities + +`PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_dkasti_base_finetuned_panx_de_5.4.0_3.0_1718093695826.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_dkasti_base_finetuned_panx_de_5.4.0_3.0_1718093695826.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_dkasti_base_finetuned_panx","de") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_dkasti_base_finetuned_panx","de") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("de.ner.xlmr_roberta.xtreme.base_finetuned.by_dkasti").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_dkasti_base_finetuned_panx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|853.8 MB| + +## References + +References + +- https://huggingface.co/dkasti/xlm-roberta-base-finetuned-panx-de +- https://paperswithcode.com/sota?task=Token+Classification&dataset=xtreme \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_dkasti_base_finetuned_panx_pipeline_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_dkasti_base_finetuned_panx_pipeline_de.md new file mode 100644 index 00000000000000..1e7e6ad17330ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_dkasti_base_finetuned_panx_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German xlmroberta_ner_dkasti_base_finetuned_panx_pipeline pipeline XlmRoBertaForTokenClassification from dkasti +author: John Snow Labs +name: xlmroberta_ner_dkasti_base_finetuned_panx_pipeline +date: 2024-06-11 +tags: [de, open_source, pipeline, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_dkasti_base_finetuned_panx_pipeline` is a German model originally trained by dkasti. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_dkasti_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718093782823.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_dkasti_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718093782823.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_dkasti_base_finetuned_panx_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_dkasti_base_finetuned_panx_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_dkasti_base_finetuned_panx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|853.8 MB| + +## References + +https://huggingface.co/dkasti/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_furyhawk_base_finetuned_panx_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_furyhawk_base_finetuned_panx_de.md new file mode 100644 index 00000000000000..be3b24e55f6e26 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_furyhawk_base_finetuned_panx_de.md @@ -0,0 +1,113 @@ +--- +layout: model +title: German XLMRobertaForTokenClassification Base Cased model (from furyhawk) +author: John Snow Labs +name: xlmroberta_ner_furyhawk_base_finetuned_panx +date: 2024-06-11 +tags: [de, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-de` is a German model originally trained by `furyhawk`. + +## Predicted Entities + +`PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_furyhawk_base_finetuned_panx_de_5.4.0_3.0_1718094653555.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_furyhawk_base_finetuned_panx_de_5.4.0_3.0_1718094653555.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_furyhawk_base_finetuned_panx","de") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_furyhawk_base_finetuned_panx","de") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("de.ner.xlmr_roberta.xtreme.base_finetuned.by_furyhawk").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_furyhawk_base_finetuned_panx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|853.8 MB| + +## References + +References + +- https://huggingface.co/furyhawk/xlm-roberta-base-finetuned-panx-de +- https://paperswithcode.com/sota?task=Token+Classification&dataset=xtreme \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_furyhawk_base_finetuned_panx_pipeline_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_furyhawk_base_finetuned_panx_pipeline_de.md new file mode 100644 index 00000000000000..e10461957cce32 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_furyhawk_base_finetuned_panx_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German xlmroberta_ner_furyhawk_base_finetuned_panx_pipeline pipeline XlmRoBertaForTokenClassification from furyhawk +author: John Snow Labs +name: xlmroberta_ner_furyhawk_base_finetuned_panx_pipeline +date: 2024-06-11 +tags: [de, open_source, pipeline, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_furyhawk_base_finetuned_panx_pipeline` is a German model originally trained by furyhawk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_furyhawk_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718094742018.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_furyhawk_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718094742018.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_furyhawk_base_finetuned_panx_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_furyhawk_base_finetuned_panx_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_furyhawk_base_finetuned_panx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|853.8 MB| + +## References + +https://huggingface.co/furyhawk/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_harish3110_base_finetuned_panx_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_harish3110_base_finetuned_panx_de.md new file mode 100644 index 00000000000000..81e204a34817ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_harish3110_base_finetuned_panx_de.md @@ -0,0 +1,113 @@ +--- +layout: model +title: German XLMRobertaForTokenClassification Base Cased model (from harish3110) +author: John Snow Labs +name: xlmroberta_ner_harish3110_base_finetuned_panx +date: 2024-06-11 +tags: [de, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-de` is a German model originally trained by `harish3110`. + +## Predicted Entities + +`PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_harish3110_base_finetuned_panx_de_5.4.0_3.0_1718094551313.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_harish3110_base_finetuned_panx_de_5.4.0_3.0_1718094551313.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_harish3110_base_finetuned_panx","de") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_harish3110_base_finetuned_panx","de") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("de.ner.xlmr_roberta.xtreme.base_finetuned.by_harish3110").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_harish3110_base_finetuned_panx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|853.8 MB| + +## References + +References + +- https://huggingface.co/harish3110/xlm-roberta-base-finetuned-panx-de +- https://paperswithcode.com/sota?task=Token+Classification&dataset=xtreme \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_harish3110_base_finetuned_panx_pipeline_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_harish3110_base_finetuned_panx_pipeline_de.md new file mode 100644 index 00000000000000..7a7d89e41ca0b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_harish3110_base_finetuned_panx_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German xlmroberta_ner_harish3110_base_finetuned_panx_pipeline pipeline XlmRoBertaForTokenClassification from harish3110 +author: John Snow Labs +name: xlmroberta_ner_harish3110_base_finetuned_panx_pipeline +date: 2024-06-11 +tags: [de, open_source, pipeline, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_harish3110_base_finetuned_panx_pipeline` is a German model originally trained by harish3110. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_harish3110_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718094647055.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_harish3110_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718094647055.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_harish3110_base_finetuned_panx_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_harish3110_base_finetuned_panx_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_harish3110_base_finetuned_panx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|853.8 MB| + +## References + +https://huggingface.co/harish3110/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_kaykozaronek_base_finetuned_panx_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_kaykozaronek_base_finetuned_panx_de.md new file mode 100644 index 00000000000000..6d81ae2b7f9a46 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_kaykozaronek_base_finetuned_panx_de.md @@ -0,0 +1,113 @@ +--- +layout: model +title: German XLMRobertaForTokenClassification Base Cased model (from KayKozaronek) +author: John Snow Labs +name: xlmroberta_ner_kaykozaronek_base_finetuned_panx +date: 2024-06-11 +tags: [de, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-de` is a German model originally trained by `KayKozaronek`. + +## Predicted Entities + +`PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_kaykozaronek_base_finetuned_panx_de_5.4.0_3.0_1718094928242.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_kaykozaronek_base_finetuned_panx_de_5.4.0_3.0_1718094928242.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_kaykozaronek_base_finetuned_panx","de") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_kaykozaronek_base_finetuned_panx","de") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("de.ner.xlmr_roberta.xtreme.base_finetuned.by_KayKozaronek").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_kaykozaronek_base_finetuned_panx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|853.8 MB| + +## References + +References + +- https://huggingface.co/KayKozaronek/xlm-roberta-base-finetuned-panx-de +- https://paperswithcode.com/sota?task=Token+Classification&dataset=xtreme \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_kaykozaronek_base_finetuned_panx_pipeline_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_kaykozaronek_base_finetuned_panx_pipeline_de.md new file mode 100644 index 00000000000000..db44f2601f4890 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_kaykozaronek_base_finetuned_panx_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German xlmroberta_ner_kaykozaronek_base_finetuned_panx_pipeline pipeline XlmRoBertaForTokenClassification from KayKozaronek +author: John Snow Labs +name: xlmroberta_ner_kaykozaronek_base_finetuned_panx_pipeline +date: 2024-06-11 +tags: [de, open_source, pipeline, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_kaykozaronek_base_finetuned_panx_pipeline` is a German model originally trained by KayKozaronek. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_kaykozaronek_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718095014624.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_kaykozaronek_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718095014624.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_kaykozaronek_base_finetuned_panx_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_kaykozaronek_base_finetuned_panx_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_kaykozaronek_base_finetuned_panx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|853.8 MB| + +## References + +https://huggingface.co/KayKozaronek/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_miyagawaorj_base_finetuned_panx_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_miyagawaorj_base_finetuned_panx_de.md new file mode 100644 index 00000000000000..09e7357963249d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_miyagawaorj_base_finetuned_panx_de.md @@ -0,0 +1,113 @@ +--- +layout: model +title: German XLMRobertaForTokenClassification Base Cased model (from miyagawaorj) +author: John Snow Labs +name: xlmroberta_ner_miyagawaorj_base_finetuned_panx +date: 2024-06-11 +tags: [de, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-de` is a German model originally trained by `miyagawaorj`. + +## Predicted Entities + +`PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_miyagawaorj_base_finetuned_panx_de_5.4.0_3.0_1718094814679.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_miyagawaorj_base_finetuned_panx_de_5.4.0_3.0_1718094814679.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_miyagawaorj_base_finetuned_panx","de") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_miyagawaorj_base_finetuned_panx","de") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("de.ner.xlmr_roberta.xtreme.base_finetuned.by_miyagawaorj").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_miyagawaorj_base_finetuned_panx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|853.8 MB| + +## References + +References + +- https://huggingface.co/miyagawaorj/xlm-roberta-base-finetuned-panx-de +- https://paperswithcode.com/sota?task=Token+Classification&dataset=xtreme \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_miyagawaorj_base_finetuned_panx_pipeline_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_miyagawaorj_base_finetuned_panx_pipeline_de.md new file mode 100644 index 00000000000000..48a88ee030274f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_miyagawaorj_base_finetuned_panx_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German xlmroberta_ner_miyagawaorj_base_finetuned_panx_pipeline pipeline XlmRoBertaForTokenClassification from miyagawaorj +author: John Snow Labs +name: xlmroberta_ner_miyagawaorj_base_finetuned_panx_pipeline +date: 2024-06-11 +tags: [de, open_source, pipeline, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_miyagawaorj_base_finetuned_panx_pipeline` is a German model originally trained by miyagawaorj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_miyagawaorj_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718094901754.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_miyagawaorj_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718094901754.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_miyagawaorj_base_finetuned_panx_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_miyagawaorj_base_finetuned_panx_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_miyagawaorj_base_finetuned_panx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|853.8 MB| + +## References + +https://huggingface.co/miyagawaorj/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_neha2608_base_finetuned_panx_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_neha2608_base_finetuned_panx_en.md new file mode 100644 index 00000000000000..146f60829330b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_neha2608_base_finetuned_panx_en.md @@ -0,0 +1,112 @@ +--- +layout: model +title: English XLMRobertaForTokenClassification Base Cased model (from Neha2608) +author: John Snow Labs +name: xlmroberta_ner_neha2608_base_finetuned_panx +date: 2024-06-11 +tags: [en, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-en` is a English model originally trained by `Neha2608`. + +## Predicted Entities + +`PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_neha2608_base_finetuned_panx_en_5.4.0_3.0_1718095594405.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_neha2608_base_finetuned_panx_en_5.4.0_3.0_1718095594405.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_neha2608_base_finetuned_panx","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_neha2608_base_finetuned_panx","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.xlmr_roberta.xtreme.base_finetuned.by_Neha2608").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_neha2608_base_finetuned_panx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|825.6 MB| + +## References + +References + +- https://huggingface.co/Neha2608/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_neha2608_base_finetuned_panx_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_neha2608_base_finetuned_panx_pipeline_en.md new file mode 100644 index 00000000000000..7aaa6034c99d49 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_neha2608_base_finetuned_panx_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmroberta_ner_neha2608_base_finetuned_panx_pipeline pipeline XlmRoBertaForTokenClassification from Neha2608 +author: John Snow Labs +name: xlmroberta_ner_neha2608_base_finetuned_panx_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_neha2608_base_finetuned_panx_pipeline` is a English model originally trained by Neha2608. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_neha2608_base_finetuned_panx_pipeline_en_5.4.0_3.0_1718095718736.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_neha2608_base_finetuned_panx_pipeline_en_5.4.0_3.0_1718095718736.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_neha2608_base_finetuned_panx_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_neha2608_base_finetuned_panx_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_neha2608_base_finetuned_panx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|825.6 MB| + +## References + +https://huggingface.co/Neha2608/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_novarac23_base_finetuned_panx_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_novarac23_base_finetuned_panx_de.md new file mode 100644 index 00000000000000..7a94a94549ea96 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_novarac23_base_finetuned_panx_de.md @@ -0,0 +1,113 @@ +--- +layout: model +title: German XLMRobertaForTokenClassification Base Cased model (from novarac23) +author: John Snow Labs +name: xlmroberta_ner_novarac23_base_finetuned_panx +date: 2024-06-11 +tags: [de, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-de` is a German model originally trained by `novarac23`. + +## Predicted Entities + +`PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_novarac23_base_finetuned_panx_de_5.4.0_3.0_1718096022977.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_novarac23_base_finetuned_panx_de_5.4.0_3.0_1718096022977.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_novarac23_base_finetuned_panx","de") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_novarac23_base_finetuned_panx","de") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("de.ner.xlmr_roberta.xtreme.base_finetuned.by_novarac23").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_novarac23_base_finetuned_panx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|853.8 MB| + +## References + +References + +- https://huggingface.co/novarac23/xlm-roberta-base-finetuned-panx-de +- https://paperswithcode.com/sota?task=Token+Classification&dataset=xtreme \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_novarac23_base_finetuned_panx_pipeline_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_novarac23_base_finetuned_panx_pipeline_de.md new file mode 100644 index 00000000000000..32cf17d69683b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_novarac23_base_finetuned_panx_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German xlmroberta_ner_novarac23_base_finetuned_panx_pipeline pipeline XlmRoBertaForTokenClassification from novarac23 +author: John Snow Labs +name: xlmroberta_ner_novarac23_base_finetuned_panx_pipeline +date: 2024-06-11 +tags: [de, open_source, pipeline, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_novarac23_base_finetuned_panx_pipeline` is a German model originally trained by novarac23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_novarac23_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718096108829.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_novarac23_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718096108829.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_novarac23_base_finetuned_panx_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_novarac23_base_finetuned_panx_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_novarac23_base_finetuned_panx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|853.8 MB| + +## References + +https://huggingface.co/novarac23/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_pdroberts_base_finetuned_panx_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_pdroberts_base_finetuned_panx_de.md new file mode 100644 index 00000000000000..ba4ac333efba67 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_pdroberts_base_finetuned_panx_de.md @@ -0,0 +1,113 @@ +--- +layout: model +title: German XLMRobertaForTokenClassification Base Cased model (from pdroberts) +author: John Snow Labs +name: xlmroberta_ner_pdroberts_base_finetuned_panx +date: 2024-06-11 +tags: [de, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-de` is a German model originally trained by `pdroberts`. + +## Predicted Entities + +`PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_pdroberts_base_finetuned_panx_de_5.4.0_3.0_1718095555716.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_pdroberts_base_finetuned_panx_de_5.4.0_3.0_1718095555716.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_pdroberts_base_finetuned_panx","de") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_pdroberts_base_finetuned_panx","de") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("de.ner.xlmr_roberta.xtreme.base_finetuned.by_pdroberts").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_pdroberts_base_finetuned_panx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|853.8 MB| + +## References + +References + +- https://huggingface.co/pdroberts/xlm-roberta-base-finetuned-panx-de +- https://paperswithcode.com/sota?task=Token+Classification&dataset=xtreme \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_pdroberts_base_finetuned_panx_pipeline_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_pdroberts_base_finetuned_panx_pipeline_de.md new file mode 100644 index 00000000000000..497614a9404610 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_pdroberts_base_finetuned_panx_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German xlmroberta_ner_pdroberts_base_finetuned_panx_pipeline pipeline XlmRoBertaForTokenClassification from pdroberts +author: John Snow Labs +name: xlmroberta_ner_pdroberts_base_finetuned_panx_pipeline +date: 2024-06-11 +tags: [de, open_source, pipeline, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_pdroberts_base_finetuned_panx_pipeline` is a German model originally trained by pdroberts. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_pdroberts_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718095659548.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_pdroberts_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718095659548.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_pdroberts_base_finetuned_panx_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_pdroberts_base_finetuned_panx_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_pdroberts_base_finetuned_panx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|853.8 MB| + +## References + +https://huggingface.co/pdroberts/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_rishiyoung_base_finetuned_panx_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_rishiyoung_base_finetuned_panx_de.md new file mode 100644 index 00000000000000..4e259830f95ee0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_rishiyoung_base_finetuned_panx_de.md @@ -0,0 +1,113 @@ +--- +layout: model +title: German XLMRobertaForTokenClassification Base Cased model (from rishiyoung) +author: John Snow Labs +name: xlmroberta_ner_rishiyoung_base_finetuned_panx +date: 2024-06-11 +tags: [de, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-de` is a German model originally trained by `rishiyoung`. + +## Predicted Entities + +`PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_rishiyoung_base_finetuned_panx_de_5.4.0_3.0_1718095992221.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_rishiyoung_base_finetuned_panx_de_5.4.0_3.0_1718095992221.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_rishiyoung_base_finetuned_panx","de") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_rishiyoung_base_finetuned_panx","de") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("de.ner.xlmr_roberta.xtreme.base_finetuned.by_rishiyoung").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_rishiyoung_base_finetuned_panx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|853.8 MB| + +## References + +References + +- https://huggingface.co/rishiyoung/xlm-roberta-base-finetuned-panx-de +- https://paperswithcode.com/sota?task=Token+Classification&dataset=xtreme \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_rishiyoung_base_finetuned_panx_pipeline_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_rishiyoung_base_finetuned_panx_pipeline_de.md new file mode 100644 index 00000000000000..8248a5f7212722 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_rishiyoung_base_finetuned_panx_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German xlmroberta_ner_rishiyoung_base_finetuned_panx_pipeline pipeline XlmRoBertaForTokenClassification from rishiyoung +author: John Snow Labs +name: xlmroberta_ner_rishiyoung_base_finetuned_panx_pipeline +date: 2024-06-11 +tags: [de, open_source, pipeline, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_rishiyoung_base_finetuned_panx_pipeline` is a German model originally trained by rishiyoung. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_rishiyoung_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718096079724.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_rishiyoung_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718096079724.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_rishiyoung_base_finetuned_panx_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_rishiyoung_base_finetuned_panx_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_rishiyoung_base_finetuned_panx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|853.8 MB| + +## References + +https://huggingface.co/rishiyoung/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_robkayinto_base_finetuned_panx_all_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_robkayinto_base_finetuned_panx_all_pipeline_xx.md new file mode 100644 index 00000000000000..7060f8844f3ca8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_robkayinto_base_finetuned_panx_all_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual xlmroberta_ner_robkayinto_base_finetuned_panx_all_pipeline pipeline XlmRoBertaForTokenClassification from robkayinto +author: John Snow Labs +name: xlmroberta_ner_robkayinto_base_finetuned_panx_all_pipeline +date: 2024-06-11 +tags: [xx, open_source, pipeline, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_robkayinto_base_finetuned_panx_all_pipeline` is a Multilingual model originally trained by robkayinto. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_robkayinto_base_finetuned_panx_all_pipeline_xx_5.4.0_3.0_1718095764653.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_robkayinto_base_finetuned_panx_all_pipeline_xx_5.4.0_3.0_1718095764653.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_robkayinto_base_finetuned_panx_all_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_robkayinto_base_finetuned_panx_all_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_robkayinto_base_finetuned_panx_all_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|861.0 MB| + +## References + +https://huggingface.co/robkayinto/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_robkayinto_base_finetuned_panx_all_xx.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_robkayinto_base_finetuned_panx_all_xx.md new file mode 100644 index 00000000000000..4ea506dfac5ad0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_robkayinto_base_finetuned_panx_all_xx.md @@ -0,0 +1,112 @@ +--- +layout: model +title: Multilingual XLMRobertaForTokenClassification Base Cased model (from robkayinto) +author: John Snow Labs +name: xlmroberta_ner_robkayinto_base_finetuned_panx_all +date: 2024-06-11 +tags: [xx, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-all` is a Multilingual model originally trained by `robkayinto`. + +## Predicted Entities + +`PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_robkayinto_base_finetuned_panx_all_xx_5.4.0_3.0_1718095681806.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_robkayinto_base_finetuned_panx_all_xx_5.4.0_3.0_1718095681806.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_robkayinto_base_finetuned_panx_all","xx") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_robkayinto_base_finetuned_panx_all","xx") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("xx.ner.xlmr_roberta.base_finetuned_panx_all.by_robkayinto").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_robkayinto_base_finetuned_panx_all| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|861.0 MB| + +## References + +References + +- https://huggingface.co/robkayinto/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_simulst_base_finetuned_panx_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_simulst_base_finetuned_panx_de.md new file mode 100644 index 00000000000000..183d6b762dde6f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_simulst_base_finetuned_panx_de.md @@ -0,0 +1,113 @@ +--- +layout: model +title: German XLMRobertaForTokenClassification Base Cased model (from SimulSt) +author: John Snow Labs +name: xlmroberta_ner_simulst_base_finetuned_panx +date: 2024-06-11 +tags: [de, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-de` is a German model originally trained by `SimulSt`. + +## Predicted Entities + +`PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_simulst_base_finetuned_panx_de_5.4.0_3.0_1718095557497.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_simulst_base_finetuned_panx_de_5.4.0_3.0_1718095557497.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_simulst_base_finetuned_panx","de") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_simulst_base_finetuned_panx","de") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("de.ner.xlmr_roberta.xtreme.base_finetuned.by_SimulSt").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_simulst_base_finetuned_panx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|853.8 MB| + +## References + +References + +- https://huggingface.co/SimulSt/xlm-roberta-base-finetuned-panx-de +- https://paperswithcode.com/sota?task=Token+Classification&dataset=xtreme \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_simulst_base_finetuned_panx_pipeline_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_simulst_base_finetuned_panx_pipeline_de.md new file mode 100644 index 00000000000000..d77b912caa4a50 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_simulst_base_finetuned_panx_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German xlmroberta_ner_simulst_base_finetuned_panx_pipeline pipeline XlmRoBertaForTokenClassification from SimulSt +author: John Snow Labs +name: xlmroberta_ner_simulst_base_finetuned_panx_pipeline +date: 2024-06-11 +tags: [de, open_source, pipeline, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_simulst_base_finetuned_panx_pipeline` is a German model originally trained by SimulSt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_simulst_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718095656737.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_simulst_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718095656737.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_simulst_base_finetuned_panx_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_simulst_base_finetuned_panx_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_simulst_base_finetuned_panx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|853.8 MB| + +## References + +https://huggingface.co/SimulSt/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_transformersbook_base_finetuned_panx_all_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_transformersbook_base_finetuned_panx_all_pipeline_xx.md new file mode 100644 index 00000000000000..fb4469d2c6590f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_transformersbook_base_finetuned_panx_all_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual xlmroberta_ner_transformersbook_base_finetuned_panx_all_pipeline pipeline XlmRoBertaForTokenClassification from transformersbook +author: John Snow Labs +name: xlmroberta_ner_transformersbook_base_finetuned_panx_all_pipeline +date: 2024-06-11 +tags: [xx, open_source, pipeline, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_transformersbook_base_finetuned_panx_all_pipeline` is a Multilingual model originally trained by transformersbook. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_transformersbook_base_finetuned_panx_all_pipeline_xx_5.4.0_3.0_1718095833355.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_transformersbook_base_finetuned_panx_all_pipeline_xx_5.4.0_3.0_1718095833355.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_transformersbook_base_finetuned_panx_all_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_transformersbook_base_finetuned_panx_all_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_transformersbook_base_finetuned_panx_all_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|861.0 MB| + +## References + +https://huggingface.co/transformersbook/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_transformersbook_base_finetuned_panx_all_xx.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_transformersbook_base_finetuned_panx_all_xx.md new file mode 100644 index 00000000000000..9d5eaded52605e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_transformersbook_base_finetuned_panx_all_xx.md @@ -0,0 +1,115 @@ +--- +layout: model +title: Multilingual XLMRobertaForTokenClassification Base Cased model (from transformersbook) +author: John Snow Labs +name: xlmroberta_ner_transformersbook_base_finetuned_panx_all +date: 2024-06-11 +tags: [xx, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-all` is a Multilingual model originally trained by `transformersbook`. + +## Predicted Entities + +`PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_transformersbook_base_finetuned_panx_all_xx_5.4.0_3.0_1718095750213.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_transformersbook_base_finetuned_panx_all_xx_5.4.0_3.0_1718095750213.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_transformersbook_base_finetuned_panx_all","xx") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_transformersbook_base_finetuned_panx_all","xx") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("xx.ner.xlmr_roberta.wikiann.base_finetuned").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_transformersbook_base_finetuned_panx_all| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|861.0 MB| + +## References + +References + +- https://huggingface.co/transformersbook/xlm-roberta-base-finetuned-panx-all +- https://learning.oreilly.com/library/view/natural-language-processing/9781098103231/ +- https://github.com/nlp-with-transformers/notebooks/blob/main/04_multilingual-ner.ipynb +- https://paperswithcode.com/sota?task=Token+Classification&dataset=wikiann \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xliu128_base_finetuned_panx_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xliu128_base_finetuned_panx_de.md new file mode 100644 index 00000000000000..3f70bb46ae1f48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xliu128_base_finetuned_panx_de.md @@ -0,0 +1,113 @@ +--- +layout: model +title: German XLMRobertaForTokenClassification Base Cased model (from xliu128) +author: John Snow Labs +name: xlmroberta_ner_xliu128_base_finetuned_panx +date: 2024-06-11 +tags: [de, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-de` is a German model originally trained by `xliu128`. + +## Predicted Entities + +`PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_xliu128_base_finetuned_panx_de_5.4.0_3.0_1718096617019.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_xliu128_base_finetuned_panx_de_5.4.0_3.0_1718096617019.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_xliu128_base_finetuned_panx","de") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_xliu128_base_finetuned_panx","de") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("de.ner.xlmr_roberta.xtreme.base_finetuned.by_xliu128").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_xliu128_base_finetuned_panx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|853.8 MB| + +## References + +References + +- https://huggingface.co/xliu128/xlm-roberta-base-finetuned-panx-de +- https://paperswithcode.com/sota?task=Token+Classification&dataset=xtreme \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xliu128_base_finetuned_panx_pipeline_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xliu128_base_finetuned_panx_pipeline_de.md new file mode 100644 index 00000000000000..337ab6cdc6eec8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xliu128_base_finetuned_panx_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German xlmroberta_ner_xliu128_base_finetuned_panx_pipeline pipeline XlmRoBertaForTokenClassification from xliu128 +author: John Snow Labs +name: xlmroberta_ner_xliu128_base_finetuned_panx_pipeline +date: 2024-06-11 +tags: [de, open_source, pipeline, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_xliu128_base_finetuned_panx_pipeline` is a German model originally trained by xliu128. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_xliu128_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718096703642.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_xliu128_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718096703642.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_xliu128_base_finetuned_panx_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_xliu128_base_finetuned_panx_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_xliu128_base_finetuned_panx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|853.8 MB| + +## References + +https://huggingface.co/xliu128/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa_ha.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa_ha.md new file mode 100644 index 00000000000000..7f004522828d77 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa_ha.md @@ -0,0 +1,117 @@ +--- +layout: model +title: Hausa Named Entity Recognition (from mbeukman) +author: John Snow Labs +name: xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa +date: 2024-06-11 +tags: [xlm_roberta, ner, token_classification, ha, open_source, onnx] +task: Named Entity Recognition +language: ha +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained Named Entity Recognition model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-ner-hausa` is a Hausa model orginally trained by `mbeukman`. + +## Predicted Entities + +`PER`, `ORG`, `LOC`, `DATE` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa_ha_5.4.0_3.0_1718097060825.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa_ha_5.4.0_3.0_1718097060825.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa","ha") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("pos") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["Ina son Spark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa","ha") + .setInputCols(Array("sentence", "token")) + .setOutputCol("pos") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("Ina son Spark NLP").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("ha.ner.xlmr_roberta.base_finetuned_hausa.by_mbeukman").predict("""Ina son Spark NLP""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|ha| +|Size:|774.7 MB| + +## References + +References + +- https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-ner-hausa +- https://arxiv.org/abs/2103.11811 +- https://github.com/Michael-Beukman/NERTransfer +- https://www.apache.org/licenses/LICENSE-2.0 +- https://github.com/Michael-Beukman/NERTransfer +- htt \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa_pipeline_ha.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa_pipeline_ha.md new file mode 100644 index 00000000000000..0c834d88f02f89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa_pipeline_ha.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Hausa xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa_pipeline pipeline XlmRoBertaForTokenClassification from mbeukman +author: John Snow Labs +name: xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa_pipeline +date: 2024-06-11 +tags: [ha, open_source, pipeline, onnx] +task: Named Entity Recognition +language: ha +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa_pipeline` is a Hausa model originally trained by mbeukman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa_pipeline_ha_5.4.0_3.0_1718097242413.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa_pipeline_ha_5.4.0_3.0_1718097242413.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa_pipeline", lang = "ha") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa_pipeline", lang = "ha") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ha| +|Size:|774.7 MB| + +## References + +https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-ner-hausa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba_pipeline_yo.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba_pipeline_yo.md new file mode 100644 index 00000000000000..66de4b4ae4d697 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba_pipeline_yo.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Yoruba xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba_pipeline pipeline XlmRoBertaForTokenClassification from mbeukman +author: John Snow Labs +name: xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba_pipeline +date: 2024-06-11 +tags: [yo, open_source, pipeline, onnx] +task: Named Entity Recognition +language: yo +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba_pipeline` is a Yoruba model originally trained by mbeukman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba_pipeline_yo_5.4.0_3.0_1718096869540.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba_pipeline_yo_5.4.0_3.0_1718096869540.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba_pipeline", lang = "yo") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba_pipeline", lang = "yo") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|yo| +|Size:|772.8 MB| + +## References + +https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-ner-yoruba + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba_yo.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba_yo.md new file mode 100644 index 00000000000000..b1192123208396 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba_yo.md @@ -0,0 +1,111 @@ +--- +layout: model +title: Yoruba Named Entity Recognition (from mbeukman) +author: John Snow Labs +name: xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba +date: 2024-06-11 +tags: [xlm_roberta, ner, token_classification, yo, open_source, onnx] +task: Named Entity Recognition +language: yo +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained Named Entity Recognition model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-ner-yoruba` is a Yoruba model orginally trained by `mbeukman`. + +## Predicted Entities + +`PER`, `ORG`, `LOC`, `DATE` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba_yo_5.4.0_3.0_1718096687333.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba_yo_5.4.0_3.0_1718096687333.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba","yo") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["Mo nifẹ Snark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba","yo") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("Mo nifẹ Snark NLP").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|yo| +|Size:|772.8 MB| + +## References + +References + +- https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-ner-yoruba +- https://arxiv.org/abs/2103.11811 +- https://github.com/Michael-Beukman/NERTransfer +- https://www.apache.org/licenses/LICENSE-2.0 +- https://github.com/Michael-Beukman/NERTransfer +- ht \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-12-sent_roberta_base_en.md b/docs/_posts/ahmedlone127/2024-06-12-sent_roberta_base_en.md new file mode 100644 index 00000000000000..61565f34a07c3f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-12-sent_roberta_base_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: RoBERTa Base Sentence Embeddings(sent_roberta_base) +author: John Snow Labs +name: sent_roberta_base +date: 2024-06-12 +tags: [sentence_embeddings, en, english, roberta, open_source, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.4 +supported: true +engine: onnx +annotator: RoBertaSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository. This model is case-sensitive: it makes a difference between english and English. + +RoBERTa is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labeling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. + +More precisely, it was pretrained with the Masked language modeling (MLM) objective. Taking a sentence, the model randomly masks 15% of the words in the input then runs the entire masked sentence through the model and has to predict the masked words. This is different from traditional recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like GPT which internally mask the future tokens. It allows the model to learn a bidirectional representation of the sentence. + +This way, the model learns an inner representation of the English language that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled sentences, for instance, you can train a standard classifier using the features produced by the BERT model as inputs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_roberta_base_en_5.4.0_3.4_1718213024958.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_roberta_base_en_5.4.0_3.4_1718213024958.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +embeddings = RoBertaSentenceEmbeddings.pretrained("sent_roberta_base", "en") \ + .setInputCols("sentence") \ + .setOutputCol("embeddings") +``` +```scala +val embeddings = RoBertaSentenceEmbeddings.pretrained("sent_roberta_base", "en") + .setInputCols("sentence") + .setOutputCol("embeddings") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_roberta_base| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[sentence_embeddings]| +|Language:|en| +|Size:|297.8 MB| + +## References + +https://huggingface.co/FacebookAI/roberta-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-bge_base_english_sec10k_embed_en.md b/docs/_posts/ahmedlone127/2024-06-13-bge_base_english_sec10k_embed_en.md new file mode 100644 index 00000000000000..2d5095ff91165a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-bge_base_english_sec10k_embed_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_english_sec10k_embed BGEEmbeddings from pavanmantha +author: John Snow Labs +name: bge_base_english_sec10k_embed +date: 2024-06-13 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_sec10k_embed` is a English model originally trained by pavanmantha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_sec10k_embed_en_5.4.0_3.0_1718289495528.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_sec10k_embed_en_5.4.0_3.0_1718289495528.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_english_sec10k_embed","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_english_sec10k_embed","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_sec10k_embed| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/pavanmantha/bge-base-en-sec10k-embed \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-bge_base_english_sec10k_embed_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-13-bge_base_english_sec10k_embed_pipeline_en.md new file mode 100644 index 00000000000000..67d6f050d959fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-bge_base_english_sec10k_embed_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_english_sec10k_embed_pipeline pipeline BGEEmbeddings from pavanmantha +author: John Snow Labs +name: bge_base_english_sec10k_embed_pipeline +date: 2024-06-13 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_sec10k_embed_pipeline` is a English model originally trained by pavanmantha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_sec10k_embed_pipeline_en_5.4.0_3.0_1718289529223.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_sec10k_embed_pipeline_en_5.4.0_3.0_1718289529223.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_english_sec10k_embed_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_english_sec10k_embed_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_sec10k_embed_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/pavanmantha/bge-base-en-sec10k-embed + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_anikulkar_en.md b/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_anikulkar_en.md new file mode 100644 index 00000000000000..d728c3137729ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_anikulkar_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_anikulkar BGEEmbeddings from anikulkar +author: John Snow Labs +name: bge_base_financial_matryoshka_anikulkar +date: 2024-06-13 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_anikulkar` is a English model originally trained by anikulkar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_anikulkar_en_5.4.0_3.0_1718289693625.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_anikulkar_en_5.4.0_3.0_1718289693625.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_anikulkar","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_anikulkar","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_anikulkar| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/anikulkar/bge-base-financial-matryoshka \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_anikulkar_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_anikulkar_pipeline_en.md new file mode 100644 index 00000000000000..5b6a4ede49033f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_anikulkar_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_anikulkar_pipeline pipeline BGEEmbeddings from anikulkar +author: John Snow Labs +name: bge_base_financial_matryoshka_anikulkar_pipeline +date: 2024-06-13 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_anikulkar_pipeline` is a English model originally trained by anikulkar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_anikulkar_pipeline_en_5.4.0_3.0_1718289728318.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_anikulkar_pipeline_en_5.4.0_3.0_1718289728318.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_financial_matryoshka_anikulkar_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_financial_matryoshka_anikulkar_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_anikulkar_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/anikulkar/bge-base-financial-matryoshka + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_hritikmore_en.md b/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_hritikmore_en.md new file mode 100644 index 00000000000000..977e8a487ff50e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_hritikmore_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_hritikmore BGEEmbeddings from Hritikmore +author: John Snow Labs +name: bge_base_financial_matryoshka_hritikmore +date: 2024-06-13 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_hritikmore` is a English model originally trained by Hritikmore. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_hritikmore_en_5.4.0_3.0_1718290095984.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_hritikmore_en_5.4.0_3.0_1718290095984.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_hritikmore","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_hritikmore","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_hritikmore| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/Hritikmore/bge-base-financial-matryoshka \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_hritikmore_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_hritikmore_pipeline_en.md new file mode 100644 index 00000000000000..d726b895f09fb3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_hritikmore_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_hritikmore_pipeline pipeline BGEEmbeddings from Hritikmore +author: John Snow Labs +name: bge_base_financial_matryoshka_hritikmore_pipeline +date: 2024-06-13 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_hritikmore_pipeline` is a English model originally trained by Hritikmore. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_hritikmore_pipeline_en_5.4.0_3.0_1718290130520.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_hritikmore_pipeline_en_5.4.0_3.0_1718290130520.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_financial_matryoshka_hritikmore_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_financial_matryoshka_hritikmore_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_hritikmore_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/Hritikmore/bge-base-financial-matryoshka + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_thetayne_en.md b/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_thetayne_en.md new file mode 100644 index 00000000000000..de37420e20f1b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_thetayne_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_thetayne BGEEmbeddings from thetayne +author: John Snow Labs +name: bge_base_financial_matryoshka_thetayne +date: 2024-06-13 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_thetayne` is a English model originally trained by thetayne. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_thetayne_en_5.4.0_3.0_1718290300674.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_thetayne_en_5.4.0_3.0_1718290300674.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_thetayne","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_thetayne","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_thetayne| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/thetayne/bge-base-financial-matryoshka \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_thetayne_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_thetayne_pipeline_en.md new file mode 100644 index 00000000000000..9475b0fd716f4f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_thetayne_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_thetayne_pipeline pipeline BGEEmbeddings from thetayne +author: John Snow Labs +name: bge_base_financial_matryoshka_thetayne_pipeline +date: 2024-06-13 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_thetayne_pipeline` is a English model originally trained by thetayne. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_thetayne_pipeline_en_5.4.0_3.0_1718290335477.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_thetayne_pipeline_en_5.4.0_3.0_1718290335477.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_financial_matryoshka_thetayne_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_financial_matryoshka_thetayne_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_thetayne_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/thetayne/bge-base-financial-matryoshka + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-bge_base_securiti_dataset_1_v7_en.md b/docs/_posts/ahmedlone127/2024-06-13-bge_base_securiti_dataset_1_v7_en.md new file mode 100644 index 00000000000000..5dd5caa03e6f1b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-bge_base_securiti_dataset_1_v7_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_securiti_dataset_1_v7 BGEEmbeddings from MugheesAwan11 +author: John Snow Labs +name: bge_base_securiti_dataset_1_v7 +date: 2024-06-13 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_securiti_dataset_1_v7` is a English model originally trained by MugheesAwan11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v7_en_5.4.0_3.0_1718289608530.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v7_en_5.4.0_3.0_1718289608530.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_securiti_dataset_1_v7","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_securiti_dataset_1_v7","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_securiti_dataset_1_v7| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|381.5 MB| + +## References + +https://huggingface.co/MugheesAwan11/bge-base-securiti-dataset-1-v7 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-bge_base_securiti_dataset_1_v7_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-13-bge_base_securiti_dataset_1_v7_pipeline_en.md new file mode 100644 index 00000000000000..dafe2e0d83dea4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-bge_base_securiti_dataset_1_v7_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_securiti_dataset_1_v7_pipeline pipeline BGEEmbeddings from MugheesAwan11 +author: John Snow Labs +name: bge_base_securiti_dataset_1_v7_pipeline +date: 2024-06-13 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_securiti_dataset_1_v7_pipeline` is a English model originally trained by MugheesAwan11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v7_pipeline_en_5.4.0_3.0_1718289645988.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v7_pipeline_en_5.4.0_3.0_1718289645988.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_securiti_dataset_1_v7_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_securiti_dataset_1_v7_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_securiti_dataset_1_v7_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|381.6 MB| + +## References + +https://huggingface.co/MugheesAwan11/bge-base-securiti-dataset-1-v7 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-bge_base_securiti_dataset_1_v8_en.md b/docs/_posts/ahmedlone127/2024-06-13-bge_base_securiti_dataset_1_v8_en.md new file mode 100644 index 00000000000000..75250ec6d622da --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-bge_base_securiti_dataset_1_v8_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_securiti_dataset_1_v8 BGEEmbeddings from MugheesAwan11 +author: John Snow Labs +name: bge_base_securiti_dataset_1_v8 +date: 2024-06-13 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_securiti_dataset_1_v8` is a English model originally trained by MugheesAwan11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v8_en_5.4.0_3.0_1718289899891.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v8_en_5.4.0_3.0_1718289899891.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_securiti_dataset_1_v8","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_securiti_dataset_1_v8","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_securiti_dataset_1_v8| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|382.1 MB| + +## References + +https://huggingface.co/MugheesAwan11/bge-base-securiti-dataset-1-v8 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-bge_base_securiti_dataset_1_v8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-13-bge_base_securiti_dataset_1_v8_pipeline_en.md new file mode 100644 index 00000000000000..0c5f37f1adad55 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-bge_base_securiti_dataset_1_v8_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_securiti_dataset_1_v8_pipeline pipeline BGEEmbeddings from MugheesAwan11 +author: John Snow Labs +name: bge_base_securiti_dataset_1_v8_pipeline +date: 2024-06-13 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_securiti_dataset_1_v8_pipeline` is a English model originally trained by MugheesAwan11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v8_pipeline_en_5.4.0_3.0_1718289937694.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v8_pipeline_en_5.4.0_3.0_1718289937694.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_securiti_dataset_1_v8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_securiti_dataset_1_v8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_securiti_dataset_1_v8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|382.1 MB| + +## References + +https://huggingface.co/MugheesAwan11/bge-base-securiti-dataset-1-v8 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-xlmroberta_ner_base_bionlp2004_en.md b/docs/_posts/ahmedlone127/2024-06-13-xlmroberta_ner_base_bionlp2004_en.md new file mode 100644 index 00000000000000..8e370e593c436a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-xlmroberta_ner_base_bionlp2004_en.md @@ -0,0 +1,113 @@ +--- +layout: model +title: English XLMRobertaForTokenClassification Base Cased model (from tner) +author: John Snow Labs +name: xlmroberta_ner_base_bionlp2004 +date: 2024-06-13 +tags: [en, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-bionlp2004` is a English model originally trained by `tner`. + +## Predicted Entities + +`protein`, `dna`, `cell line`, `rna`, `cell type` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_bionlp2004_en_5.4.0_3.0_1718291003301.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_bionlp2004_en_5.4.0_3.0_1718291003301.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_bionlp2004","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_bionlp2004","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.xlmr_roberta.base").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_bionlp2004| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|783.1 MB| + +## References + +References + +- https://huggingface.co/tner/xlm-roberta-base-bionlp2004 +- https://github.com/asahi417/tner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-xlmroberta_ner_base_finetuned_arman_fa.md b/docs/_posts/ahmedlone127/2024-06-13-xlmroberta_ner_base_finetuned_arman_fa.md new file mode 100644 index 00000000000000..75ee608d4d033d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-xlmroberta_ner_base_finetuned_arman_fa.md @@ -0,0 +1,112 @@ +--- +layout: model +title: Persian XLMRobertaForTokenClassification Base Cased model (from BK-V) +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_arman +date: 2024-06-13 +tags: [fa, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: fa +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-arman-fa` is a Persian model originally trained by `BK-V`. + +## Predicted Entities + +`pers`, `event`, `org`, `loc`, `pro`, `fac` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_arman_fa_5.4.0_3.0_1718290853102.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_arman_fa_5.4.0_3.0_1718290853102.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_arman","fa") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_arman","fa") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("fa.ner.xlmr_roberta.arman_xtreme.base_finetuned").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_arman| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|fa| +|Size:|841.0 MB| + +## References + +References + +- https://huggingface.co/BK-V/xlm-roberta-base-finetuned-arman-fa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-xlmroberta_ner_base_finetuned_arman_pipeline_fa.md b/docs/_posts/ahmedlone127/2024-06-13-xlmroberta_ner_base_finetuned_arman_pipeline_fa.md new file mode 100644 index 00000000000000..2830610a916848 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-xlmroberta_ner_base_finetuned_arman_pipeline_fa.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Persian xlmroberta_ner_base_finetuned_arman_pipeline pipeline XlmRoBertaForTokenClassification from BK-V +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_arman_pipeline +date: 2024-06-13 +tags: [fa, open_source, pipeline, onnx] +task: Named Entity Recognition +language: fa +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_base_finetuned_arman_pipeline` is a Persian model originally trained by BK-V. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_arman_pipeline_fa_5.4.0_3.0_1718290936653.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_arman_pipeline_fa_5.4.0_3.0_1718290936653.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_base_finetuned_arman_pipeline", lang = "fa") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_base_finetuned_arman_pipeline", lang = "fa") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_arman_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fa| +|Size:|841.1 MB| + +## References + +https://huggingface.co/BK-V/xlm-roberta-base-finetuned-arman-fa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-xlmroberta_ner_base_finetuned_ner_kinyarwand_rw.md b/docs/_posts/ahmedlone127/2024-06-13-xlmroberta_ner_base_finetuned_ner_kinyarwand_rw.md new file mode 100644 index 00000000000000..280bd435ddd73f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-xlmroberta_ner_base_finetuned_ner_kinyarwand_rw.md @@ -0,0 +1,115 @@ +--- +layout: model +title: Kinyarwanda XLMRobertaForTokenClassification Base Cased model (from mbeukman) +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_ner_kinyarwand +date: 2024-06-13 +tags: [rw, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: rw +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-ner-kinyarwanda` is a Kinyarwanda model originally trained by `mbeukman`. + +## Predicted Entities + +`DATE`, `PER`, `ORG`, `LOC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_ner_kinyarwand_rw_5.4.0_3.0_1718290999388.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_ner_kinyarwand_rw_5.4.0_3.0_1718290999388.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_ner_kinyarwand","rw") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_ner_kinyarwand","rw") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("rw.ner.xlmr_roberta.base_finetuned_kinyarwand.by_mbeukman").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_ner_kinyarwand| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|rw| +|Size:|775.2 MB| + +## References + +References + +- https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-ner-kinyarwanda +- https://arxiv.org/abs/2103.11811 +- https://github.com/Michael-Beukman/NERTransfer +- https://github.com/masakhane-io/masakhane-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-xlmroberta_ner_base_finetuned_ner_wolof_wo.md b/docs/_posts/ahmedlone127/2024-06-13-xlmroberta_ner_base_finetuned_ner_wolof_wo.md new file mode 100644 index 00000000000000..b37c68b784aaf1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-xlmroberta_ner_base_finetuned_ner_wolof_wo.md @@ -0,0 +1,115 @@ +--- +layout: model +title: Wolof XLMRobertaForTokenClassification Base Cased model (from mbeukman) +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_ner_wolof +date: 2024-06-13 +tags: [wo, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: wo +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-ner-wolof` is a Wolof model originally trained by `mbeukman`. + +## Predicted Entities + +`DATE`, `PER`, `ORG`, `LOC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_ner_wolof_wo_5.4.0_3.0_1718290974709.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_ner_wolof_wo_5.4.0_3.0_1718290974709.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_ner_wolof","wo") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_ner_wolof","wo") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("wo.ner.xlmr_roberta.base_finetuned").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_ner_wolof| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|wo| +|Size:|772.3 MB| + +## References + +References + +- https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-ner-wolof +- https://arxiv.org/abs/2103.11811 +- https://github.com/Michael-Beukman/NERTransfer +- https://github.com/masakhane-io/masakhane-ner \ No newline at end of file diff --git a/docs/_posts/akrztrk/2024-04-22-mpnet_embeddings_biolord_2023_en.md b/docs/_posts/akrztrk/2024-04-22-mpnet_embeddings_biolord_2023_en.md new file mode 100644 index 00000000000000..b0f57730b904a6 --- /dev/null +++ b/docs/_posts/akrztrk/2024-04-22-mpnet_embeddings_biolord_2023_en.md @@ -0,0 +1,85 @@ +--- +layout: model +title: English BioLORD-2023 MPNetEmbeddings from FremyCompany +author: John Snow Labs +name: mpnet_embeddings_biolord_2023 +date: 2024-04-22 +tags: [mpnet, en, embeddings, biolord, open_source, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.2.2 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP `mpnet_embeddings_biolord_2023` is a English model originally trained by `FremyCompany`. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mpnet_embeddings_biolord_2023_en_5.2.2_3.0_1713822166758.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mpnet_embeddings_biolord_2023_en_5.2.2_3.0_1713822166758.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("documents") + +embeddings =MPNetEmbeddings.pretrained("mpnet_embeddings_biolord_2023","en")\ + .setInputCols(["documents"])\ + .setOutputCol("mpnet_embeddings") + +pipeline = Pipeline().setStages([document_assembler, embeddings]) + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("documents") + +val embeddings = MPNetEmbeddings + .pretrained("mpnet_embeddings_biolord_2023", "en") + .setInputCols(Array("documents")) + .setOutputCol("mpnet_embeddings") + +val pipeline = new Pipeline().setStages(Array(document_assembler, embeddings)) + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mpnet_embeddings_biolord_2023| +|Compatibility:|Spark NLP 5.2.2+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[MPNet]| +|Language:|en| +|Size:|407.0 MB| + +## References + +https://huggingface.co/FremyCompany/BioLORD-2023 \ No newline at end of file diff --git a/docs/api/com/index.html b/docs/api/com/index.html index 2196fab77a52b6..42bd9076f9892d 100644 --- a/docs/api/com/index.html +++ b/docs/api/com/index.html @@ -3,9 +3,9 @@ - Spark NLP 5.3.3 ScalaDoc - com - - + Spark NLP 5.4.0 ScalaDoc - com + + @@ -28,7 +28,7 @@