Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Word segmentation support #16

Open
GrimPixel opened this issue Apr 25, 2023 · 3 comments
Open

Word segmentation support #16

GrimPixel opened this issue Apr 25, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@GrimPixel
Copy link

There are other languages than Japanese that need word segmentation
https://polyglotclub.com/wiki/Language/Multiple-languages/Culture/Text-Processing-Tools#Word_Segmentation

@Darazaki Darazaki added the enhancement New feature or request label Jun 3, 2023
@Darazaki
Copy link
Owner

Darazaki commented Jun 3, 2023

Hi and sorry for the wait. This looks like a great resource thanks! I really underestimated how big of a task segmenting words would be

I was hopping what I suggested in #5 would suffice. But now the better approach seems to be to completely rework the way words are read by Spedread

Maybe something like:

when start_reading_button.pressed:
    chunks = user_text.split_by_language()

    for language, text_chunk in chunks:
        if language.requires_word_segmentation:
            words = language.get_nlp_library().parse(text_chunk)
        else:
            words = text_chunk.split_by_spaces()

What do you think?

I'll also ask the opinion of one of my colleague who does NLP stuff next week to see if that's reasonable

@GrimPixel
Copy link
Author

Great to hear that!
I think users can choose their own word segmentation engine. Just place engines in a folder and program a file that calls the engine to segregate the sentences.

@Darazaki
Copy link
Owner

Darazaki commented Jun 3, 2023

Good idea! If I end up going with that idea I'll see what would be the best format for these libraries later (maybe .so/.wasm or Python scripts idk)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants