Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSL support #2082

Merged
merged 17 commits into from
Nov 26, 2024
Merged

CSL support #2082

merged 17 commits into from
Nov 26, 2024

Conversation

Omikhleia
Copy link
Member

@Omikhleia Omikhleia commented Jun 29, 2024

Closes #2074

It already does nice things (see screenshots in the referred issue).

In order to support CSL (Citation Style Language), we need to:

  • Convert the BibTeX entries to CSL format.
  • Use a CSL engine to format the citations and bibliography references. It boils down to:
    • Support CSL locales
    • Support CSL styles
    • Implement the CSL processor/renderer (a "reasonable" subset at least)

Regarding the conversion of BibTeX entries, the mappings are not straightforward, but there is some prior art that we can check... None of the implementations I checked did the exact same things, so it's likely a bit messy...

Regarding the CSL engine, there are various existing implementations.
Yet, I had a look at them, and I am not really convinced by their code quality, so I went implementing the CSL 1.0.2 specifications from scratch. Because it's fun, and SILE has the guts to do it. And because I think I can.

Additionally, this would also close several other items.

Closes #2024 = The CSL locales takes care of it.

Closes #2022 = The CSL styles have appropriate fallbacks (substitutes, conditionals, etc.)

Closes #2027 = The CSL styles and locales define how to format localized dates in the selected citation or bibliography style.

Closes #2026 = Some CSL styles sort entries by citation order ("citation-number"), so keeping track of cited entries was needed anyhow.

@Omikhleia Omikhleia requested a review from alerque as a code owner June 29, 2024 06:40
@Omikhleia Omikhleia marked this pull request as draft June 29, 2024 06:40
csl/core/engine.lua Outdated Show resolved Hide resolved
csl/core/engine.lua Outdated Show resolved Hide resolved
csl/core/locale.lua Outdated Show resolved Hide resolved
csl/core/engine.lua Outdated Show resolved Hide resolved
csl/core/engine.lua Outdated Show resolved Hide resolved
csl/core/engine.lua Outdated Show resolved Hide resolved
@Omikhleia Omikhleia force-pushed the bibliography-csl branch 3 times, most recently from 0f5d659 to 330cd9d Compare July 14, 2024 18:57
@Omikhleia
Copy link
Member Author

2024/07/14 "Stage 0" milestone: Successfully processed 1355 references

  • with en-US locale and styles chicago-author-date, chicago-fullnote-bibliography and apa.
  • with fr-FR locale and styles chicago-author-date, chicago-author-date-fr, chicago-fullnote-bibliography-fr and apa

@Omikhleia
Copy link
Member Author

Omikhleia commented Jul 20, 2024

2024/07/20 "Stage 1" milestone: Successfully processed 1508 references,

  • with entry sorting according to the CSL style.
  • tested with fr-FR locale and styles chicago-author-date-fr, chicago-fullnote-bibliography-fr
  • Some entries belong to numbered "series" (a.k.a. collection-title and collection-number in CSL)

@Omikhleia
Copy link
Member Author

Omikhleia commented Jul 28, 2024

Soon leaving for vacations, so here are just some advancement notes to myself, in order to remember:

  • Stage 2
    • implement subsequent-author-substitute (I had it done more or less this week-end on an experimental ground, but I'm unhappy with the code so I didn't push it... I prioritized working on my bib files, now over 2000 references, and couldn't finish that code properly today...)
    • implement "locators" in citations
  • Stage 3 = implement page-range-delimiter so page ranges would look decent...
  • Stage 4 = Understand how to handle properly demoting/non-demoting particles in names (I've some of these in my bibliography files, so I guess it's time to dig into the topic...)
  • Stage 5 = review package commands for multiple citations (but then, how to handle locators?) --> I'm gonna postpone this item, it needs some further discussion.

That's a minimal set. There would still be a few missing features from the CSL spec, but at least all Chicago-styles would be covered fairly decently, and a first milestone would be passed.

@Omikhleia
Copy link
Member Author

Slowly back on track.
I rebased the branch, and added a commit with support for #2026 (see rationale in main description). Some tests performed with the American Chemical Society" (ACS) style, which uses the "citation-number".
We are not yet there, but it's a progress.
I also included silently ("in passing") a small refactor/fix for an issue I experienced with the Modern Language Association (MLA) style, which I used (with a few adaptations) for the 2600+ references in the book I made this summer, A bibliography of Tolkien studies in French & English -- But there's still some code to clean-up and refactor from that work-in-progress ;)

@Omikhleia
Copy link
Member Author

Omikhleia commented Sep 12, 2024

Let's refactor a bit and support locators. It's a refactor, since none of this is released yet.

Chicago style:

image

This is demonstrated in \csl:cite[page=30-35]{FullInProceedings}; see also \csl:cite[fig=5, key=FullBook].

@Omikhleia Omikhleia force-pushed the bibliography-csl branch 3 times, most recently from c407336 to c775542 Compare September 14, 2024 23:37
@Omikhleia
Copy link
Member Author

I hate names with particles, definitively. 🤣 -- Doh, it was hard for my tired brain. One checkbox ticked.

@Omikhleia Omikhleia self-assigned this Sep 14, 2024
@Omikhleia
Copy link
Member Author

That is, you can of course "Move CSL support module unter bibtex package", though I don't know what CSL and bibtex have in common :D

@alerque
Copy link
Member

alerque commented Nov 25, 2024

hough I don't know what CSL and bibtex have in common

Just that CSL in only used in connection with bibliographies, and the only (poorly named) implementation of that we have is our bibtex package. Keeping all the related utilities together makes it much easier to package and maintain. If/when we do have other packages we can consider whether abstracting the CSL stuff under more general utilities makes sense.

Putting anything in the root of this project comes with some caveats for packaging and distribution and I'd rather not deal with that unless I really understand why that namespacing choice is warranted.

These things look simple on the surface but always take more time that first considered when it comes to make it right)

Sure, but it also isn't that hard. I'm still seriously considering it for this module.

I looked through the sources and it seems like the only real issue is the use of SILE's justenoughicu for casing. That would create a circular dependency. But I think we could easily replace that with e.g. my own decasify Lua Rock. It can handle the lower, upper, title and even sentence casing.

@Omikhleia
Copy link
Member Author

Putting anything in the root of this project comes with some caveats for packaging and distribution

I understand.
No problem, and I don't really care where files end up being located in the current implementation (I was just answering your question why I put them this way originally) -- At some point I'll open an "Epic" issue and list a few further ideas for the future of the bibliography support -- some of which also depends on the traction this feature gets (vs. the required effort), so I'm eager to see it live and start pondering how to build upon it further 😄

Omikhleia and others added 17 commits November 26, 2024 15:14
Note that these files are licensed under CC-BY-SA 3.0 and are
only included as a default minimal set for testing.
This should even be the default when generating a bibliography.
After citeproc-java, let's check also citeproc-lua.
We can use the bibtex.style setting to help switching implementations
We can also ensure printbibliography works with legacy citations.
This will make deprecations and transition easier.
@alerque
Copy link
Member

alerque commented Nov 26, 2024

I think I'm going to go ahead with this namespacing (under bibtex) and we can refactor from there. I'd still consider extracting this to an external library, but to do that the interaction between SILE and the CSL engine interface should be much clearer, and potentially interchangeable with another implementation (e.g. a Rust one). A couple hundred kb of code we're still refactoring can live here until such a time as we have a clearly better plan, and also clearly moving ahead with this is better than what we have now.

When we get BCP-47 locale stuff straitened out the actual bundled styles can probably move to the language module, but that should be a non-breaking change later.

@alerque alerque merged commit 938f5a9 into sile-typesetter:master Nov 26, 2024
19 of 21 checks passed
@Omikhleia
Copy link
Member Author

... potentially interchangeable with another implementation (e.g. a Rust one).

I understand the idea of potentially using a Rust implementation in the future, but I’d like to clarify whether it’s something that’s actively being pursued or just a long-term consideration?
Given that my time is limited, I want to make sure I'm focusing my efforts where they’ll have the most lasting impact, and it would help to know if additional work on the Lua CSL implementation is still expected, or if it’s best to wait for a possible transition to Rust. This also applies to other parts of the software I’m actively involved in, as I want to ensure my time is spent most effectively across the project.

@alerque
Copy link
Member

alerque commented Nov 26, 2024

I'm not actively working on this one (CSL), but the tooling is already in place for Rust, so when you look around to see if there are existing implementations we can leverage don't just look at Lua. I haven't looked for CSL engines yet, but before doing more work on any topic, do check crates.io or other Rust sources too. If there are existing libraries that implement some function or type in either language then we can leverage them in SILE. Wrapping an existing Rust library in Lua bindings so we can expose them to our users is fairly trivial. We do want everything accessible from the Lua side (since user tinkering with internals is one of the stand out features of SILE) but we have pretty robust bridging at our finger tips now.

@Omikhleia
Copy link
Member Author

Omikhleia commented Nov 26, 2024

I haven't looked for CSL engines yet

I have - there's Typst Hayagriva.

(tongue-in-cheek on) Typst has a huge active community, it also has a fairly good indexer, and plenty of advanced modules. (tongue-in-cheek off)

Last time I checked this summer, entry sorting was not language-dependent ... checking ... yep, the issue is still open. Well they don't have ICU yet under the hood.

@alerque
Copy link
Member

alerque commented Nov 26, 2024

I hear you on the hype train. Don't get me wrong Typst has done some things right and well. It's fast and does some things very well. But I was quite surprised at the amount of attention it got before it supported basics like footnotes. Even now it doesn't support flexible vertical spaces. Supporting that alone is the cause of much of our nasty pushback problems and a huge chunk of our time, but it's widely used in publishing for good reason. Anyhow I am more convinced than ever that there is room for more than one approach here. Typst is doubling down on being a sandboxed environment with no tinkering (unless you build your own) and only its own dedicated syntax, and SILE allows you access to call out to anything external and tinker with anything internal and process whatever input formats you want.

Anyhooooo, yes Hayagriva also look like it took the approach of managing all it's own data in their own YAML format instead of existing bibliography data formats, so I'd view it more as an alternative backend we could support rather than a pathway to our primary support.

@zepinglee
Copy link

I haven't looked for CSL engines yet

I have - there's Typst Hayagriva.

There is also https://github.com/zotero/citeproc-rs. It implements the complete features of CSL spec while hayariva only implements < 70% of them (I guess).

@Omikhleia
Copy link
Member Author

@zepinglee Indeed, thanks for pointing it! Seeing the online demo again, I'm pretty sure I looked at it too this summer, albeit very briefly, and had some concerns.
Well, there are philosophical and political decisions behind the possibilities of a shift to Rust evoked in the recent exchanges, and the use of such components in SILE - A meta-discussion would be welcome at some point, in the appropriate place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment