Allow emojis without selector-16 variation character to be recognized #26

oriolbcn · 2020-06-10T14:00:33Z

Why is this change needed?

We are using this library in our company to show emojis in text that comes from chats uploaded by users. Chats come from different sources (WhatsApp, Facebook, Instagram, Telegram), different platforms (Android, iOS, Windows) and different versions.

Over the time we have found that some times emojis are not composed of the strictly correct unicode characters. Namely we have found that sometimes the Variation Selector 16 (Unicode char FE0F) is missing. This character is theoretically used to indicate that a character that defaults to text representation should instead have emoji representation. To me, The spec is not clear about whether these modifiers are mandatory or not. From what I can understand it seems something that was added "fairly recently" and some clients may not implement (some of our users have really old phones). In any case, we all know that one thing is the spec and a different one what the actual implementations do. And some implementations do not add this modifier.

When there is a sequence without modifier the library is not able to recognize it as an emoji because it does not match any entry in emojiList.

What's the change

The idea is to take every emoji that has the FE0F character and generate all possible combinations with and without it. For example, for "002a-fe0f-20e3" it generates both "002a-fe0f-20e3" (the same) and "002a-20e3". For "1f441-fe0f-200d-1f5e8-fe0f" it generates: "1f441-200d-1f5e8-fe0f", "1f441-fe0f-200d-1f5e8", "1f441-200d-1f5e8" and "1f441-fe0f-200d-1f5e8-fe0f".

Then we associate all of those combinations to the same emoji.

There's a small twist to that. Some emojis, after removing the FE0F character end up being the same as a commonly used ASCII character (these are the emojis :digit_one, etc.). For those, I added a guard clause to not add them as combinations, otherwise it transforms a regular character "1" to an emoji.

Risks

I understand that it may be felt that this change is risky and it make break the recognition of some Emojis. I know it's difficult to take just my word for it but we have been using this modification for a long time (~ 1 year) and it has not caused any issue, and we process hundreds of chats per day.

In any case, I can understand if it is seen as an edge case that should not be always applied. In this case I would like the consideration to still add it with an optional flag. I can do that if there is interest. It would be very beneficial for us to have this code (even behind a flag) integrated in the library in order to make future updates much easier. Not sure if it's relevant, but we have a paid Joypixels license.

TODO

JsHint is complaining about having a function inside a loop. If we move forward with this patch I would need to figure out how to fix it. It's not a trivial change from the current implementation.

Allow emojis without selector-16 variation character to be recognized

8504c70

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow emojis without selector-16 variation character to be recognized #26

Allow emojis without selector-16 variation character to be recognized #26

oriolbcn commented Jun 10, 2020

Allow emojis without selector-16 variation character to be recognized #26

Are you sure you want to change the base?

Allow emojis without selector-16 variation character to be recognized #26

Conversation

oriolbcn commented Jun 10, 2020

Why is this change needed?

What's the change

Risks

TODO