Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: commands for special unicode characters in unicode package #1899

Closed
wants to merge 3 commits into from

Conversation

jodros
Copy link
Contributor

@jodros jodros commented Oct 21, 2023

It must have the font style set to "normal" because some fonts were showing the dash as italic by default.

…sile-typesetter#1894)

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@Omikhleia
Copy link
Member

... some fonts were showing the dash as italic by default.

Which can be legit actually.

@jodros
Copy link
Contributor Author

jodros commented Oct 22, 2023

Which can be legit actually.

Of course it can. What I had in mind when wrote it is that sometimes it was being printed as italic even when it shouldn't. But indeed, thinking better, the public version of this command should have only the dash character and nothing else.

@Omikhleia
Copy link
Member

Omikhleia commented Oct 22, 2023

Which can be legit actually.

Of course it can. What I had in mind when wrote it is that sometimes it was being printed as italic even when it shouldn't. But indeed, thinking better, the public version of this command should have only the dash character and nothing else.

But where does it stop?

  • Why a command for an em-dash?
  • Why not a command for an en-dash?
  • Why not commands for any Unicode character useful in books?
  • EDIT: And why should it be in the "plain" class? re-EDIT: This is not solving any problem or issue. One thing that might be interesting to discuss, possibly, with respect to typography, is dialogues: if an em-dash opens a dialog line (first character, however produced), one need to consider how the next space is to be handled. No?

Copy link
Member

@alerque alerque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but nope ;-)

There is some discussion of this in the manual, but SILE is heavily biased towards just processing Unicode input, not the LaTeX approach of ASCII + commands for everything non-ASCII. I do not want to include these sort of commands in the default class(es) because I do not think it is the right way forward at all.

That being said I also don't think it's right for SILE to be so opinionated about how things should be done that we restrict people from doing what they want. You're free to add this command to your documents of course, and free to add this to your 3rd party classes or packages. But if we put it in the SILE default class it's going to get used and abused and the burden of supporting it is on us.

Going down this road is going to make SILE documents harder and harder to convert to other formats. On the other had other document formats are very easy to covert to SILE largely because you don't have to do any special handling, just feed it Unicode. Personally I author things in Markdown and use a mix of --- and when typing (I can type the latter easily with Compose key plus three dashes, but other editors in the same docs don't always have that convenience) input methods and Pandoc +smart extension to normalize it all to for typesetting.

Lets take what I think should happen in a couple stages:

  1. I suggest that this go in a package not a class. I think the things a class does to the whole document is completely orthagonal to whether people are going to want these sort of Unicode function shorthands, and since I don't plan on this being part of the base/plain class it won't be inherited by default. Inheriting from a special "with-unicode-funcs" class is silly, but loading a package to get more commands is normal.

  2. Such a package should provide more than just one command, ultimately there are a dozen or so symbols used commonly in English, and a long tail of thousands that it could include. I would at least start with some mapping of super common symbols (e.g. —, –, …, ™, ©, ®, €, £, non-breaking spaces, etc.) and think through the logic of how the long tail of commands might be named.

  3. The last step that I'm not sure about is whether the package should be in the core distribution or just be a 3rd party rock. We could host it in this Org if the latter.

I suggest we go ahead and discuss it here for a bit and maybe even go ahead and code it up as a core package until we decide on that last point, at which point it would be easy to punt externally if we need to.

@alerque alerque marked this pull request as draft October 23, 2023 09:57
@jodros
Copy link
Contributor Author

jodros commented Oct 29, 2023

I suggest that this go in a package not a class.

Right, creating a package for this makes much more sense indeed.

I would at least start with some mapping of super common symbols (e.g. —, –, …, ™, ©, ®, €, £, non-breaking spaces, etc.) and think through the logic of how the long tail of commands might be named.

I'll write a draft of this.

The last step that I'm not sure about is whether the package should be in the core distribution or just be a 3rd party rock. We could host it in this Org if the latter.

Well, this has to do with #1897, we should have clear set of criteria when choosing either.

@jodros
Copy link
Contributor Author

jodros commented Nov 3, 2023

@alerque So I wrote a loop to generate functions according to pre-defined unicode symbols in the unichar package.

Unfortunately I can't print every character even using a font like Noto Emoji.

image

Well, this has little to do with my original idea of just adding an em-dash. But after reading yours comments and thinking better about, it would be more logical to programmatically create commands for each useful character.

Going down this road is going to make SILE documents harder and harder to convert to other formats.

I hope this way of declaring the commands may not be a problem when doing conversions.

@alerque
Copy link
Member

alerque commented Nov 3, 2023

Try Symbola for font coverage of symbols like that. In any case for something like this to work it would need to either inject a font-fallback or explicitly wrap the output in a useful font. Symbola is pretty good but recent versions also have licensing issues. Which font and/or method to use is probably something your package will want to have as a setting.

@alerque
Copy link
Member

alerque commented Nov 3, 2023

Also you might look into CLDR to see if there is a list of name / codepoint maps that would be useful. If so we can get it added to lua-cldr so we can use it in SILE.

@jodros jodros changed the title feat: command to output a dash '―' created feat: commands for special unicode characters in unicode package Nov 11, 2023
@jodros
Copy link
Contributor Author

jodros commented Nov 11, 2023

Which font and/or method to use is probably something your package will want to have as a setting.

Should this font come together with SILE or it is expected that the user will always have to install the font?

@jodros
Copy link
Contributor Author

jodros commented Dec 26, 2023

Also you might look into CLDR to see if there is a list of name / codepoint maps that would be useful. If so we can get it added to lua-cldr so we can use it in SILE.

@alerque I looked for it a few minutes ago, and I found this xml. But now, how to filter it?

@Omikhleia
Copy link
Member

So if I summarize the exchanges, this should rather be a 3rd-party package, right?

@alerque I looked for it a few minutes ago, and I found this xml. But now, how to filter it?

This is a CLDR annotation file -- with language-specific (or rather, locale-dependent) names and aliases.
So English in your link, but there are translations for most major languages: https://github.com/unicode-org/cldr/tree/latest/common/annotations/).
I'm not sure what you were looking for, but https://unicode.org/Public/15.1.0/ucd/UnicodeData.txt has all code points and default short names for Unicode 15.1. It's a huge file...

@jodros
Copy link
Contributor Author

jodros commented May 7, 2024

this should rather be a 3rd-party package

You're right!

I'm not sure what you were looking for, but https://unicode.org/Public/15.1.0/ucd/UnicodeData.txt has all code points and default short names for Unicode 15.1. It's a huge file...

Yeah, that's what I'm looking for, many thanks.

@jodros jodros closed this May 7, 2024
@alerque
Copy link
Member

alerque commented May 28, 2024

Just a random idea to throw out for the package: You could use that list to generate commands with a namespace like \u:bullseye, \u:gears, etc. That way you wouldn't have any that conflicted with existing commands yet the wouldn't be too burdensome to type.

This would be a huge command set that might cause other issues. I honestly don't know how Lua would hold up to that abuse. Probably fine, but it could cause issues. You'd probably want to wrap the actual data in a custom loader that caches a bytecode version to ease the burden of parsing a massive text file into Lua code on ever document render.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants