Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow emojis without selector-16 variation character to be recognized #26

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

oriolbcn
Copy link

Why is this change needed?

We are using this library in our company to show emojis in text that comes from chats uploaded by users. Chats come from different sources (WhatsApp, Facebook, Instagram, Telegram), different platforms (Android, iOS, Windows) and different versions.

Over the time we have found that some times emojis are not composed of the strictly correct unicode characters. Namely we have found that sometimes the Variation Selector 16 (Unicode char FE0F) is missing. This character is theoretically used to indicate that a character that defaults to text representation should instead have emoji representation. To me, The spec is not clear about whether these modifiers are mandatory or not. From what I can understand it seems something that was added "fairly recently" and some clients may not implement (some of our users have really old phones). In any case, we all know that one thing is the spec and a different one what the actual implementations do. And some implementations do not add this modifier.

When there is a sequence without modifier the library is not able to recognize it as an emoji because it does not match any entry in emojiList.

What's the change

The idea is to take every emoji that has the FE0F character and generate all possible combinations with and without it. For example, for "002a-fe0f-20e3" it generates both "002a-fe0f-20e3" (the same) and "002a-20e3". For "1f441-fe0f-200d-1f5e8-fe0f" it generates: "1f441-200d-1f5e8-fe0f", "1f441-fe0f-200d-1f5e8", "1f441-200d-1f5e8" and "1f441-fe0f-200d-1f5e8-fe0f".

Then we associate all of those combinations to the same emoji.

There's a small twist to that. Some emojis, after removing the FE0F character end up being the same as a commonly used ASCII character (these are the emojis :digit_one, etc.). For those, I added a guard clause to not add them as combinations, otherwise it transforms a regular character "1" to an emoji.

Risks

I understand that it may be felt that this change is risky and it make break the recognition of some Emojis. I know it's difficult to take just my word for it but we have been using this modification for a long time (~ 1 year) and it has not caused any issue, and we process hundreds of chats per day.

In any case, I can understand if it is seen as an edge case that should not be always applied. In this case I would like the consideration to still add it with an optional flag. I can do that if there is interest. It would be very beneficial for us to have this code (even behind a flag) integrated in the library in order to make future updates much easier. Not sure if it's relevant, but we have a paid Joypixels license.

TODO

  • JsHint is complaining about having a function inside a loop. If we move forward with this patch I would need to figure out how to fix it. It's not a trivial change from the current implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant