Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

org.tensorflow.exceptions.TFInvalidArgumentException: indices[0,11] = 28937 is not in [0, 21128) #14277

Open
1 task done
xueyuan1990 opened this issue May 24, 2024 · 3 comments
Assignees

Comments

@xueyuan1990
Copy link

xueyuan1990 commented May 24, 2024

Is there an existing issue for this?

  • I have searched the existing issues and did not find a match.

Current Behavior

BertEmbeddings.pretrained() can load successfully.
But when I run BertEmbeddings.pretrained("bert_embeddings_chinese_roberta_wwm_ext","zh") , I get the exception:

2024-05-24 17:23:24.614208: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Exception in thread "main" org.tensorflow.exceptions.TFInvalidArgumentException: indices[0,11] = 28937 is not in [0, 21128)
         [[{{node bert/embeddings/Gather}}]]
        at org.tensorflow.internal.c_api.AbstractTF_Status.throwExceptionIfNotOK(AbstractTF_Status.java:87)
        at org.tensorflow.Session.run(Session.java:850)
        at org.tensorflow.Session.access$300(Session.java:82)
        at org.tensorflow.Session$Runner.runHelper(Session.java:552)
        at org.tensorflow.Session$Runner.runNoInit(Session.java:499)
        at org.tensorflow.Session$Runner.run(Session.java:495)
        at com.johnsnowlabs.ml.ai.Bert.tag(Bert.scala:176)
        at com.johnsnowlabs.ml.ai.Bert.sessionWarmup(Bert.scala:77)
        at com.johnsnowlabs.ml.ai.Bert.<init>(Bert.scala:86)
        at com.johnsnowlabs.nlp.embeddings.BertEmbeddings.setModelIfNotSet(BertEmbeddings.scala:267)
        at com.johnsnowlabs.nlp.embeddings.ReadBertDLModel.readModel(BertEmbeddings.scala:432)
        at com.johnsnowlabs.nlp.embeddings.ReadBertDLModel.readModel$(BertEmbeddings.scala:427)
        at com.johnsnowlabs.nlp.embeddings.BertEmbeddings$.readModel(BertEmbeddings.scala:492)
        at com.johnsnowlabs.nlp.embeddings.ReadBertDLModel.$anonfun$$init$$1(BertEmbeddings.scala:444)
        at com.johnsnowlabs.nlp.embeddings.ReadBertDLModel.$anonfun$$init$$1$adapted(BertEmbeddings.scala:444)
        at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$onRead$1(ParamsAndFeaturesReadable.scala:50)
        at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$onRead$1$adapted(ParamsAndFeaturesReadable.scala:49)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.onRead(ParamsAndFeaturesReadable.scala:49)
        at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$read$1(ParamsAndFeaturesReadable.scala:61)
        at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$read$1$adapted(ParamsAndFeaturesReadable.scala:61)
        at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:38)
        at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:24)
        at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:515)
        at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:507)
        at com.johnsnowlabs.nlp.HasPretrained.pretrained(HasPretrained.scala:44)
        at com.johnsnowlabs.nlp.HasPretrained.pretrained$(HasPretrained.scala:41)
        at com.johnsnowlabs.nlp.embeddings.BertEmbeddings$.com$johnsnowlabs$nlp$embeddings$ReadablePretrainedBertModel$$super$pretrained(BertEmbeddings.scala:492)
        at com.johnsnowlabs.nlp.embeddings.ReadablePretrainedBertModel.pretrained(BertEmbeddings.scala:418)
        at com.johnsnowlabs.nlp.embeddings.ReadablePretrainedBertModel.pretrained$(BertEmbeddings.scala:417)
        at com.johnsnowlabs.nlp.embeddings.BertEmbeddings$.pretrained(BertEmbeddings.scala:492)
        at com.johnsnowlabs.nlp.embeddings.BertEmbeddings$.pretrained(BertEmbeddings.scala:492)
        at com.johnsnowlabs.nlp.HasPretrained.pretrained(HasPretrained.scala:47)
        at com.johnsnowlabs.nlp.HasPretrained.pretrained$(HasPretrained.scala:47)
        at com.johnsnowlabs.nlp.embeddings.BertEmbeddings$.com$johnsnowlabs$nlp$embeddings$ReadablePretrainedBertModel$$super$pretrained(BertEmbeddings.scala:492)
        at com.johnsnowlabs.nlp.embeddings.ReadablePretrainedBertModel.pretrained(BertEmbeddings.scala:415)
        at com.johnsnowlabs.nlp.embeddings.ReadablePretrainedBertModel.pretrained$(BertEmbeddings.scala:414)
        at com.johnsnowlabs.nlp.embeddings.BertEmbeddings$.pretrained(BertEmbeddings.scala:492)
        at com.algo.recom.article_recommender.v20240511.test_spark_nlp$.main(test_spark_nlp.scala:7)
        at com.algo.recom.article_recommender.v20240511.test_spark_nlp.main(test_spark_nlp.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Expected Behavior

Download model successfully.

Steps To Reproduce

import com.johnsnowlabs.nlp.embeddings.BertEmbeddings
BertEmbeddings.pretrained("bert_embeddings_chinese_roberta_wwm_ext","zh") 

spark-submit :

#!/bin/bash
jar_file="./article_recommender_spark3-2.0-SNAPSHOT.jar"
class_name="com.algo.recom.article_recommender.v20240511.test_spark_nlp"
/opt/spark3/bin/spark-submit \
--name action_sequence_123 \
--master local[1] \
--files /opt/spark3/conf/hive-site.xml \
--class $class_name \
--jars hdfs:///apps/recommend/models/jars/xueyuan/mzreader/spark-nlp-assembly-5.3.3.jar \
$jar_file

Spark NLP version and Apache Spark

CentOS Linux release 8.4.2105
spark version 2.2.1
Scala version 2.11.8
java version 1.8.0_144
sparknlp : I use the Fat JAR downloaded from https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-5.3.3.jar.

Confirm CPU instructions(AVX2 AVX512F FMA):

lscpu | grep -i -e AVX512F -i -e AVX2 -i -e FMA 
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm avx512f rdseed adx smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 arat pku ospke
@maziyarpanahi
Copy link
Member

This is an issue with the model as I explained in the other thread. suggest you either found another model with Chinese support, try to import the same model yourself with ONNX, or import another model yourself:

Import new model(s):

@xueyuan1990
Copy link
Author

OK, thanks for your help.

Copy link

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be closed in 5 days

@github-actions github-actions bot added the Stale label Nov 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants