Emojis/Grapheme clusters seem to be broken in pyte #131

chubin · 2020-04-03T13:31:09Z

Consider this Python 3 code:

# -*- coding: utf-8 -*-

from __future__ import print_function, unicode_literals

import pyte

if __name__ == "__main__":
    emoji_string = "☁️"
    print(emoji_string.encode("utf-8").hex())
    print("---")

    screen = pyte.Screen(80, 24)
    stream = pyte.Stream(screen)
    stream.feed(emoji_string)
    for character in screen.display[0][:3]:
        print(character.encode("utf-8").hex())

emoji_string contains one grapheme cluster,
that is displayed like in terminal/editor/etc:

This emoji is displayed as a single one, but it conists of two and.
Pyte seems to drop the second (the rest except the first part?) part of the cluster,
and so the output of the program looks like this:

e29881efb88f
---
e29881
20
20

We see that efb88f was dropped, and immediately after e29881, spaces follow (20).

Is it a bug in pyte or is it expected behaviour?
Maybe, I've missed some configuration mode?

The text was updated successfully, but these errors were encountered:

superbobry · 2020-04-04T11:58:43Z

This is very likely a bug. Feel free to submit a PR ;)

chubin · 2020-04-12T12:17:23Z

I have written a small workaround for this problem, it works fine for me, but I don't think that it is a good solution for this bug.

That is how I do it:

  def _fix_graphemes(text):
      """
      Extract long graphemes sequences that can't be handled
      by pyte correctly because of the bug pyte#131.
      Graphemes are omited and replaced with placeholders,
      and returned as a list.
  
      Return:
          text_without_graphemes, graphemes
      """
  
      output = ""
      graphemes = []
  
      for gra in grapheme.graphemes(text):
          if len(gra) > 1:
              character = "!"
              graphemes.append(gra)
          else:
              character = gra
          output += character
  
      return output, graphemes

I extract the graphemes before rendering, like this:

text, graphemes = _fix_graphemes(text)

and then after rendering I put them back.

It works like it should, but I am not sure that this method is (1) general enough (2) good for pyte, because it introduces a new dependency: grapheme

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emojis/Grapheme clusters seem to be broken in pyte #131

Emojis/Grapheme clusters seem to be broken in pyte #131

chubin commented Apr 3, 2020

superbobry commented Apr 4, 2020

chubin commented Apr 12, 2020

Emojis/Grapheme clusters seem to be broken in pyte #131

Emojis/Grapheme clusters seem to be broken in pyte #131

Comments

chubin commented Apr 3, 2020

superbobry commented Apr 4, 2020

chubin commented Apr 12, 2020