“成都”the two chinese words won't recognize #132

GuoPL · 2022-06-08T07:56:17Z

from flashtext import KeywordProcessor

#text = "@苍月轶再次核实:骆然5月8日持24小时核酸从宜昌回蓉，到成都24小时内核酸一次，9号回泸定，24小时内又做一次核酸，均阴性，健康码绿码。宜昌不是
AB区域。"
text = "成都到北京高铁3小时，郑州到成都2小时"

print(text)
kp = KeywordProcessor()
kp.add_keyword("到成都", ("成都", "ab"))
kp.add_keyword("宜昌", ("宜昌", "ab"))

print(len(kp))
print(kp)
word_index = kp.extract_keywords(text, span_info=True)
print(word_index)
for item in word_index:
print(text[item[1]:item[2]])

print('finished')

githublyff · 2022-08-07T04:18:19Z

from flashtext import KeywordProcessor

text = "成都到北京高铁3小时，郑州到成都2小时"
kp = KeywordProcessor()
kp.add_keyword("到成都", ("成都", "ab"))
kp.add_keyword("宜昌", ("宜昌", "ab"))

print(len(kp))
keywords_found = kp.extract_keywords(text, span_info=True)
for item in keywords_found:
print(item)

2
(('成都', 'ab'), 13, 15)

Reference:https://blog.csdn.net/chen10314/article/details/122048726

zhangbo2008 · 2023-02-08T13:37:39Z

still not a good solution
cause so many special char will appear in our keywords. like () [] ... etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

“成都”the two chinese words won't recognize #132

“成都”the two chinese words won't recognize #132

GuoPL commented Jun 8, 2022

githublyff commented Aug 7, 2022

zhangbo2008 commented Feb 8, 2023

“成都”the two chinese words won't recognize #132

“成都”the two chinese words won't recognize #132

Comments

GuoPL commented Jun 8, 2022

githublyff commented Aug 7, 2022

zhangbo2008 commented Feb 8, 2023