term.ljust, center etc. incorrect for sequences containing U+FE0F (Variation Selector-16) #267

dscrofts · 2024-03-20T20:54:35Z

Example:

from blessed import Terminal

term = Terminal()
strings = ["123", "456", "🗣️  "]

print("with term.ljust:")
for string in strings:
    print(f"{term.ljust(string, 5)} 1")

print("without term.ljust:")
for string in strings:
    print(f"{string:<5} 1")

Output (term.ljust adds one additional cell):

with term.ljust:
123   1
456   1
🗣️     1
without term.ljust:
123   1
456   1
🗣️    1

However this is not consistent with all unicode sequences. For example, changing strings to ["123", "456", "🤔 "] gives:

Output (term.ljust padding is correct):

with term.ljust:
123   1
456   1
🤔    1
without term.ljust:
123   1
456   1
🤔     1

The text was updated successfully, but these errors were encountered:

jquast · 2024-03-20T21:05:30Z

Hello, thanks for the report.

I was aware of this issue but there was no bug to track it. I could probably add a simple workaround here in blessed so I will try to do that soon.

I recently added support for Variation Selector-16 (U+FE0F) into wcwidth. But the way that blessed uses this library still gets the calculation wrong (adding each individual codepoint together from wcwidth.wcwidth() function).

I might,

add the functionality of interpreting terminal sequences directly into wcwidth library which blessed will directly offload to Should wcwidth provide rjust, ljust, center and textwrap? wcwidth#93
or a "grapheme clustering" functionality to wcwidth that blessed should use
or just make blessed do the "grapheme clustering" necessary to account for these correctly

Correct accounting for Emoji that includes U+FE0F is difficult, only 7 terminals support it at last check, i wrote more about it here https://www.jeffquast.com/post/ucs-detect-test-results/, and I've gotten pushback from libvte author used in terminals like Gnome, they refuse to support it at all https://gitlab.gnome.org/GNOME/vte/-/issues/2580 so i've been a bit distracted just trying to get terminal emulators to support it, rather than having blessed support it, but I will definitely get to it soon.

jquast · 2024-03-20T21:06:44Z

Also to add, I could tell this included U+FE0F by the following commands,

>>> import unicodedata
>>> list(map(unicodedata.name, '🗣️  '))
['SPEAKING HEAD IN SILHOUETTE', 'VARIATION SELECTOR-16', 'SPACE', 'SPACE']
>>> list(map(hex, map(ord, '🗣️  ')))
['0x1f5e3', '0xfe0f', '0x20', '0x20']

jquast · 2024-03-20T21:08:56Z

Also to add, that python's built-in formatting gets this horribly wrong, it's not aware of emojis, terminal sequences, or even basic east-asian characters like Chinese or Japanese, but in your case it just happens to accidentally get it right :)

I wrote an issue about what it might take to get python's built-in formatting to just account for emoji correctly, jquast/wcwidth#94

jquast · 2024-06-26T21:06:07Z

Just to add, I added some tests in #275 around ZWJ, pointing out that it gets it wrong. I will continue to work towards a solution for this, I think the wcwidth library needs a kind of iterative parser to correctly solve this in a way that can be integrated into blessed.

jquast changed the title ~~term.ljust calculating incorrect padding value with some unicode sequences~~ term.ljust, center etc. incorrect for sequences containing U+FE0F (Variation Selector-16) Mar 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

term.ljust, center etc. incorrect for sequences containing U+FE0F (Variation Selector-16) #267

term.ljust, center etc. incorrect for sequences containing U+FE0F (Variation Selector-16) #267

dscrofts commented Mar 20, 2024

jquast commented Mar 20, 2024

jquast commented Mar 20, 2024

jquast commented Mar 20, 2024

jquast commented Jun 26, 2024

term.ljust, center etc. incorrect for sequences containing U+FE0F (Variation Selector-16) #267

term.ljust, center etc. incorrect for sequences containing U+FE0F (Variation Selector-16) #267

Comments

dscrofts commented Mar 20, 2024

jquast commented Mar 20, 2024

jquast commented Mar 20, 2024

jquast commented Mar 20, 2024

jquast commented Jun 26, 2024