-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Empty list returned when working with Devanagri Script #38
Comments
It's hard for me to fix it without at least basic knowledge of this script. |
Yes splitting on whitespace is the way to go. It would be great if that was
an option. Clean out punctuation and then splitting the sentence into words
using whitespace.
…On Fri, Jun 25, 2021, 12:24 PM Vitaliy ***@***.***> wrote:
It's hard for me to fix it without at least basic knowledge of this script.
I can point you to the problem in the code, though.
There is regexp \p{L}+ that processes input text in order to count words
properly. It keeps only letters. hello, world! is transformed into hello
world.
When I pass शेवणें आनी शेतकार, it is transformed into शे वणें आनी शे तका र.
It introduces additional spaces that break subsequent logic. In order to
keep it in line with the general logic, it should have stayed as शेवणें
आनी शेतकार.
Maybe we don't need to use regexp for this script and split sentences on
white spaces? I have no idea whether this is the right thing to do.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#38 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AQ4ITWK37Q4JSB3BT7REE43TUQR3DANCNFSM46UQ3OLA>
.
|
Hi im working with texts in Devanagri Script (A Popular script used in India unlike the Latin Script used by English like languages). When I try to generate keywords it returns an empty list. Code is below.
full_text="शेवणें आनी शेतकार एक आसलेलो शेतकार तेणें बरें शेत रोयलेलें रोयल्यार कितें जालें थाम वाडलें आनी इल्लें इल्लें करून पोटराक येयलें आनी थोडे दीस वयतकच कुचकुचीत गोट्याचें कणस सुटलें आनी वाऱ्याचेर बरें धोलूंक लागलें शेतकाराक सामकी उमेद जाली आतां म्हण लागलो रोकडेंच आपूण शेत लुंवतलो आनी भात घरा व्ह"
rake = Rake(max_words_unknown_lang=1)
keywords = rake.apply(full_text)
The text was updated successfully, but these errors were encountered: