We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
from flashtext import KeywordProcessor
#text = "@苍月轶 再次核实:骆然5月8日持24小时核酸从宜昌回蓉,到成都24小时内核酸一次,9号回泸定,24小时内又做一次核酸,均阴性,健康码绿码。宜昌不是 AB区域。" text = "成都到北京高铁3小时,郑州到成都2小时"
print(text) kp = KeywordProcessor() kp.add_keyword("到成都", ("成都", "ab")) kp.add_keyword("宜昌", ("宜昌", "ab"))
print(len(kp)) print(kp) word_index = kp.extract_keywords(text, span_info=True) print(word_index) for item in word_index: print(text[item[1]:item[2]])
print('finished')
The text was updated successfully, but these errors were encountered:
text = "成都到北京高铁3小时,郑州到成都2小时" kp = KeywordProcessor() kp.add_keyword("到成都", ("成都", "ab")) kp.add_keyword("宜昌", ("宜昌", "ab"))
print(len(kp)) keywords_found = kp.extract_keywords(text, span_info=True) for item in keywords_found: print(item)
2 (('成都', 'ab'), 13, 15)
Reference:https://blog.csdn.net/chen10314/article/details/122048726
Sorry, something went wrong.
still not a good solution cause so many special char will appear in our keywords. like () [] ... etc.
No branches or pull requests
from flashtext import KeywordProcessor
#text = "@苍月轶 再次核实:骆然5月8日持24小时核酸从宜昌回蓉,到成都24小时内核酸一次,9号回泸定,24小时内又做一次核酸,均阴性,健康码绿码。宜昌不是
AB区域。"
text = "成都到北京高铁3小时,郑州到成都2小时"
print(text)
kp = KeywordProcessor()
kp.add_keyword("到成都", ("成都", "ab"))
kp.add_keyword("宜昌", ("宜昌", "ab"))
print(len(kp))
print(kp)
word_index = kp.extract_keywords(text, span_info=True)
print(word_index)
for item in word_index:
print(text[item[1]:item[2]])
print('finished')
The text was updated successfully, but these errors were encountered: