Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting the size of a grapheme cluster #97

Open
Kleidukos opened this issue Oct 3, 2023 · 4 comments
Open

Getting the size of a grapheme cluster #97

Kleidukos opened this issue Oct 3, 2023 · 4 comments

Comments

@Kleidukos
Copy link
Member

I'd like to get the size of a grapheme cluster (from a value of type Text). Is there a function in the library that can help me with it? If not, is it in the scope of the library to provide one?

@vshabanov
Copy link
Collaborator

I'm not even sure what does the "size of a grapheme cluster" mean.

There are various ways to normalize text (compose/decompose grapheme clusters) https://hackage.haskell.org/package/text-icu-0.8.0.3/docs/Data-Text-ICU-Normalize2.html

Maybe unorm2_composePair() can help to compose those clusters and get their size.

@Kleidukos
Copy link
Member Author

Kleidukos commented Oct 5, 2023

I'm not even sure what does the "size of a grapheme cluster" mean.

It's the operation that gives the length in graphemes, not code points.
For example, the length of this grapheme cluster: "🤦🏼‍♂️" is 1.

This is an interesting problem, there's a short read about it here: https://tonsky.me/blog/unicode/

@andreasabel
Copy link
Member

andreasabel commented Oct 7, 2023

@Kleidukos In Agda we use cluster counting as linked below, is that what you are looking for?
https://github.com/agda/agda/blob/4c5501e369b63ff3eabdbb3217db59904baf0e78/src/full/Agda/Interaction/Highlighting/LaTeX/Base.hs#L708-L716
length . ICU.breaks (ICU.breakCharacter ICU.Root)

@Kleidukos
Copy link
Member Author

Oh yeah definitely! I'm quite surprised it's not offered by the library directly. Thanks @andreasabel!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants