-
-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: commands for special unicode characters in unicode package #1899
Conversation
…sile-typesetter#1894) Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Which can be legit actually. |
Of course it can. What I had in mind when wrote it is that sometimes it was being printed as italic even when it shouldn't. But indeed, thinking better, the public version of this command should have only the dash character and nothing else. |
But where does it stop?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but nope ;-)
There is some discussion of this in the manual, but SILE is heavily biased towards just processing Unicode input, not the LaTeX approach of ASCII + commands for everything non-ASCII. I do not want to include these sort of commands in the default class(es) because I do not think it is the right way forward at all.
That being said I also don't think it's right for SILE to be so opinionated about how things should be done that we restrict people from doing what they want. You're free to add this command to your documents of course, and free to add this to your 3rd party classes or packages. But if we put it in the SILE default class it's going to get used and abused and the burden of supporting it is on us.
Going down this road is going to make SILE documents harder and harder to convert to other formats. On the other had other document formats are very easy to covert to SILE largely because you don't have to do any special handling, just feed it Unicode. Personally I author things in Markdown and use a mix of ---
and —
when typing (I can type the latter easily with Compose key plus three dashes, but other editors in the same docs don't always have that convenience) input methods and Pandoc +smart
extension to normalize it all to —
for typesetting.
Lets take what I think should happen in a couple stages:
-
I suggest that this go in a package not a class. I think the things a class does to the whole document is completely orthagonal to whether people are going to want these sort of Unicode function shorthands, and since I don't plan on this being part of the base/plain class it won't be inherited by default. Inheriting from a special "with-unicode-funcs" class is silly, but loading a package to get more commands is normal.
-
Such a package should provide more than just one command, ultimately there are a dozen or so symbols used commonly in English, and a long tail of thousands that it could include. I would at least start with some mapping of super common symbols (e.g. —, –, …, ™, ©, ®, €, £, non-breaking spaces, etc.) and think through the logic of how the long tail of commands might be named.
-
The last step that I'm not sure about is whether the package should be in the core distribution or just be a 3rd party rock. We could host it in this Org if the latter.
I suggest we go ahead and discuss it here for a bit and maybe even go ahead and code it up as a core package until we decide on that last point, at which point it would be easy to punt externally if we need to.
Right, creating a package for this makes much more sense indeed.
I'll write a draft of this.
Well, this has to do with #1897, we should have clear set of criteria when choosing either. |
@alerque So I wrote a loop to generate functions according to pre-defined unicode symbols in the unichar package. Unfortunately I can't print every character even using a font like Well, this has little to do with my original idea of just adding an em-dash. But after reading yours comments and thinking better about, it would be more logical to programmatically create commands for each useful character.
I hope this way of declaring the commands may not be a problem when doing conversions. |
Try Symbola for font coverage of symbols like that. In any case for something like this to work it would need to either inject a font-fallback or explicitly wrap the output in a useful font. Symbola is pretty good but recent versions also have licensing issues. Which font and/or method to use is probably something your package will want to have as a setting. |
Also you might look into CLDR to see if there is a list of name / codepoint maps that would be useful. If so we can get it added to |
Should this font come together with SILE or it is expected that the user will always have to install the font? |
So if I summarize the exchanges, this should rather be a 3rd-party package, right?
This is a CLDR annotation file -- with language-specific (or rather, locale-dependent) names and aliases. |
You're right!
Yeah, that's what I'm looking for, many thanks. |
Just a random idea to throw out for the package: You could use that list to generate commands with a namespace like This would be a huge command set that might cause other issues. I honestly don't know how Lua would hold up to that abuse. Probably fine, but it could cause issues. You'd probably want to wrap the actual data in a custom loader that caches a bytecode version to ease the burden of parsing a massive text file into Lua code on ever document render. |
It must have the font style set to "normal" because some fonts were showing the dash as italic by default.