-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Markdown files support #3106
Comments
We currently do not support translating any documents, only formats designed for localization. Related to #2592 |
@nijel do you think will be implemented? |
I think it will be implemented at some point. It should not be hard to implement (we already do something similar for the appstore metadata). Right now it's just not a priority for me, but this can change if somebody comes with funding for this :-). |
@nijel What sort of funding are we talking about here? Asking for a friend. |
@RMStoica-zivver You can use https://www.bountysource.com/issues/81891384-markdown-files-support to put funds on this issue to motivate contributors. |
I have added a bounty on this issue - it's not exactly clear to me what the integration should look like but I trust @nijel to steer the idea in the right direction. Our own requirement is to translate documentation, represented by a set of Markdown files. Since these Markdown files will be stored in the same git repository as our UI i18n files (and the rest of our code), it would be ideal if the Markdown files could be added as an extra component in the Weblate project. |
Depends on translate/translate#3956 which depends on miyuchina/mistletoe#162 |
I think it can be easily supported with some simple script, converting Markdown files to JSON files. |
If you want to go the conversion route, I recommend using po4a, a 20 year old project for doing just that. It recently got some key improvements in v0.58. Its used for things like f-droid.org and Fedora documentation. |
@eighthave good to know. thanks! |
About the idea of using po4a here, I think that it perfectly makes sense (disclamer: I'm one of the authors of po4a). We already have an existing markdown parser, but it's ... not rock stable and changing it may be more complex than rewriting a new parser. What would remain is the surrounding infrastructure of po4a, which makes the conversion between documentation formats and PO files easier, and the tests. My plan to improve the support of Markdown in po4a is to simplify the existing parser (its code is convoluted), and then improve its robustness using for example the tests from https://github.com/bobtfish/text-markdown/tree/master/t Markdown is not very complex compared to other formats we handler pretty well in po4a (eg, groff of man pages or XML plus the docbook and HTML variants). For both formats, we use internal parses with no dependency to external tool or library. This is because the kind of parsing that we are doing is specific, so we felt it easier this way. The groff parser is interesting in the sense that it really normalizes the input. There is maybe 6 ways to specify the inline formatting (bold, italic), and po4a converts them all to one form only to ease the life of translators. I'm not sure that it will be mandated for the markdown parser, but that's something to consider. The XML parser is interesting because it is difficult to have a line-by-line parser of XML, just as it happens to be in markdown. So the solution built in the XML parser could be useful to rework the markdown parser: instead of the line by line parser that we currently have, we could go for a block by block approach. That would help supporting the bits that are currently not supported. Edit: we also have some format parsers that are using external tools in po4a. The POD parser is using a dedicated Perl library while the SGML parser is using the onsgml external parser. On another front, I am considering whether asciidoctor could be used as an external parser for the AsciiDoc format. If someone knows a parser for markdown that works a bit like a SAX parser, that may be an interesting starting point, maybe. I'm willing to help any volunteer, but my personal schedule does not allow me to address this issue alone anytime soon. Oh. And po4a is written in Perl. Sorry about that... |
What I mean is that there are a lot of important ideas in po4a that can
be used in an Python/AST implementation:
* the --keep option, e.g. a percentage translated that must be met, or
the document reverts to the source language
* automatic metadata like "markdown-text"
* removing pure syntax strings from the translators view
* custom YAML Front Matter handling
But in a broader sense, I think a po4a mode might make sense for Weblate
to handle formats like asciidoc, groff/man, etc. I wasn't thinking to
use po4a directly in Weblate to handle Markdown, though that might be a
quick fix for this. I think that having access to the Markdown AST will
enable so many really useful possibilities, it will be worth the work.
|
If I may, from the point of view of the translator block-by-block makes a lot of sense, as one block will be one paragraph, or one list etc. |
Considering how all "inline HTML" is valid markdown, I would suggest approaching markdown files by doing I do apologize in case I've missed something about the problem. Only came across this project on bountysource a few hours back |
The Markdown AST parser libraries will understand the HTML components,
and then let us work directly with the AST.
|
I'm adding 1 Monero (XMR value at the moment: $346 - updated value) to the bounty to be awarded to the person who will resolve this issue. I would have used Bountysource but the related bounty can only be funded through Paypal. To send the bounty i will need a Monero address (or, if preferred, the address of another cryptocurrency. Like BTC or ETH). |
Reminder that there are two bounties on this issue: $240 + 1XMR. This feature is very much needed. |
Hi, how can I do this now? With po4a? What are the steps? Say I have hello.md with "Hello world \n Bye all" in it, and would like it to be translatable, for example. |
Create a po4a.conf file (name doesn't matter) in a po/ subdirectory with the content
Then run po4a po/po4a.conf If using git, you can add these sorts of rules into .gitignore:
Now there is also this tool from KDE: https://invent.kde.org/websites/hugo-i18n |
Here are some examples of sites doing this with po4a:
https://gitlab.com/fdroid/fdroid-website/
fsfe/reuse-docs#61
|
I have not tried it but I suspect the main issue with this workflow is that translators get to translate parts of markdown files out of context, no? In OpenRefine we are sadly going to go for Crowdin (for now), because it seems to be the only solution which offers a real markdown editor where you can see the entire file being translated while still working on individual parts. If people are interested in adding a similar Markdown support in Weblate, I could imagine finding some funding for it (the existing bounties will not get us very far I am afraid). Maybe we could pool resources with other projects interested in the feature? |
I wouldn't extend the scope of this issue to include such a nice-to-have feature. |
Is this issue is only about getting the strings out of Markdown and translating them? I would suggest supporting something similar to Crowdin's documentation localization offering. |
last I looked, Crowdin's Markdown support was limited but better than nothing.
The best way would be to actually use the AST (Abstract Syntax Tree). That
means Markdown becomes structured data like JSON, YAML, XML, etc.
|
I'm talking about the fact that you can see the end result in a preview pane while translating, Mozilla's Pontoon also offer such capability. |
We're translating markdown articles via weblate. Initially, we wanted to translate by inserting plain text into Weblate, but Weblate was pretty bad at handling insertions and deletions of paragraphs in the original text. I've looked into po4a and other ways to convert the text into formats that would allow us to easily translate and update the text, but haven't found anything that would be easy to use and wouldn't generate lots of overhead. So I've written a simple Golang package that splits text into paragraphs, compares to the previous version of the text (if there is one), and produces JSON of a map from keys to paragraphs in a way that keeps the paragraphs in the right order, doesn't change the keys if the text wasn't significantly changed and handles insertions and deletions in a way that avoids key collisions. It should be pretty easy to write something similar and add to how Weblate handles plain text; but if anyone's interested, I can add the documentation, examples, etc. to the tool I've written. |
Handling of plain text files will work better since 4.13, see #7585 |
Thank you for your report; the issue you have reported has just been fixed.
|
Describe the solution you'd like
Markdown files support (maybe similar to Crowdin system)
The text was updated successfully, but these errors were encountered: