You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have run the following code snippet, the output including word "it", pos_kept don't include the PRON.
importspacyimportpytextranknlp=spacy.load("en_core_web_sm")
# add PyTextRank to the spaCy pipelinenlp.add_pipe("textrank", config={'pos_kept': ["NOUN", "PROPN", "VERB"]})
text='''The MCU SDK for WRG1 general firmware has been launched, and it can be automatically generated after creating the product.'''doc=nlp(text)
forphraseindoc._.phrases[:10]:
print(phrase.text, phrase.rank, phrase.count, phrase.chunks)
## the output is # the product 0.12286712485174818 1 [the product]# WRG1 general firmware 0.10712303413227088 1 [WRG1 general firmware]# The MCU SDK 0.0834726982382997 1 [The MCU SDK]# it 0.0 1 [it]
The text was updated successfully, but these errors were encountered:
The library considers noun chunks and apparently spaCy parses the term it as that.
The coreference capabilities for spaCy are currently marked "experimental", which is a nice way to say "Good luck installing and running this part in production" :) I've evaluated multiple options for coreference (including the AllenNLP integration) and they each seem to have serious limitations. That said, if these capabilities were available, it would be relatively simple to resolve a pronoun reference within the graph. In that case, the term it would add more weight to The MCU SDK instead.
If you want, the term it might be good to add to the stop words list for your application?
Hi, @ceteri , I found it's not useful to add item it to the stop words list, and the same as other single PRON words. Because pos_kept don't include the PRON, I don't need to add a single PRON word to stop words. In the code of function _collect_phrases atbase.py, pytextrank will exclude single PRON word that not be included in the pos_kept. So for single PRON word, it's rank will always be 0.0, So what I need to do is to filter the phrase it's rank is equal to zero.
I have run the following code snippet, the output including word "it",
pos_kept
don't include the PRON.The text was updated successfully, but these errors were encountered: