Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

“成都”the two chinese words won't recognize #132

Open
GuoPL opened this issue Jun 8, 2022 · 2 comments
Open

“成都”the two chinese words won't recognize #132

GuoPL opened this issue Jun 8, 2022 · 2 comments

Comments

@GuoPL
Copy link

GuoPL commented Jun 8, 2022

from flashtext import KeywordProcessor

#text = "@苍月轶 再次核实:骆然5月8日持24小时核酸从宜昌回蓉,到成都24小时内核酸一次,9号回泸定,24小时内又做一次核酸,均阴性,健康码绿码。宜昌不是
AB区域。"
text = "成都到北京高铁3小时,郑州到成都2小时"

print(text)
kp = KeywordProcessor()
kp.add_keyword("到成都", ("成都", "ab"))
kp.add_keyword("宜昌", ("宜昌", "ab"))

print(len(kp))
print(kp)
word_index = kp.extract_keywords(text, span_info=True)
print(word_index)
for item in word_index:
print(text[item[1]:item[2]])

print('finished')

@githublyff
Copy link

from flashtext import KeywordProcessor

text = "成都到北京高铁3小时,郑州到成都2小时"
kp = KeywordProcessor()
kp.add_keyword("到成都", ("成都", "ab"))
kp.add_keyword("宜昌", ("宜昌", "ab"))

print(len(kp))
keywords_found = kp.extract_keywords(text, span_info=True)
for item in keywords_found:
print(item)

2
(('成都', 'ab'), 13, 15)

Reference:https://blog.csdn.net/chen10314/article/details/122048726

@zhangbo2008
Copy link

still not a good solution
cause so many special char will appear in our keywords. like () [] ... etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants