Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Markdown files support #3106

Closed
maicol07 opened this issue Oct 12, 2019 · 28 comments · Fixed by #9597
Closed

Markdown files support #3106

maicol07 opened this issue Oct 12, 2019 · 28 comments · Fixed by #9597
Assignees
Labels
enhancement Adding or requesting a new feature.
Milestone

Comments

@maicol07
Copy link

maicol07 commented Oct 12, 2019

Describe the solution you'd like
Markdown files support (maybe similar to Crowdin system)

@nijel nijel added the enhancement Adding or requesting a new feature. label Oct 13, 2019
@nijel
Copy link
Member

nijel commented Oct 13, 2019

We currently do not support translating any documents, only formats designed for localization.

Related to #2592

@maicol07
Copy link
Author

@nijel do you think will be implemented?

@nijel
Copy link
Member

nijel commented Oct 14, 2019

I think it will be implemented at some point. It should not be hard to implement (we already do something similar for the appstore metadata). Right now it's just not a priority for me, but this can change if somebody comes with funding for this :-).

@RMStoica-zivver
Copy link

@nijel What sort of funding are we talking about here? Asking for a friend.

@nijel
Copy link
Member

nijel commented Nov 25, 2019

@RMStoica-zivver You can use https://www.bountysource.com/issues/81891384-markdown-files-support to put funds on this issue to motivate contributors.

@wetneb
Copy link

wetneb commented Mar 26, 2020

I have added a bounty on this issue - it's not exactly clear to me what the integration should look like but I trust @nijel to steer the idea in the right direction.

Our own requirement is to translate documentation, represented by a set of Markdown files. Since these Markdown files will be stored in the same git repository as our UI i18n files (and the rest of our code), it would be ideal if the Markdown files could be added as an extra component in the Weblate project.

@nijel nijel changed the title Markdown files support Markdown files support [$250] Mar 26, 2020
@nijel nijel added the bounty label Mar 26, 2020
@nijel
Copy link
Member

nijel commented Mar 30, 2020

Depends on translate/translate#3956 which depends on miyuchina/mistletoe#162

@guoyunhe
Copy link

I think it can be easily supported with some simple script, converting Markdown files to JSON files.

@eighthave
Copy link
Contributor

If you want to go the conversion route, I recommend using po4a, a 20 year old project for doing just that. It recently got some key improvements in v0.58. Its used for things like f-droid.org and Fedora documentation.

@guoyunhe
Copy link

@eighthave good to know. thanks!

@mquinson
Copy link

mquinson commented Apr 29, 2020

About the idea of using po4a here, I think that it perfectly makes sense (disclamer: I'm one of the authors of po4a).

We already have an existing markdown parser, but it's ... not rock stable and changing it may be more complex than rewriting a new parser. What would remain is the surrounding infrastructure of po4a, which makes the conversion between documentation formats and PO files easier, and the tests.

My plan to improve the support of Markdown in po4a is to simplify the existing parser (its code is convoluted), and then improve its robustness using for example the tests from https://github.com/bobtfish/text-markdown/tree/master/t

Markdown is not very complex compared to other formats we handler pretty well in po4a (eg, groff of man pages or XML plus the docbook and HTML variants). For both formats, we use internal parses with no dependency to external tool or library. This is because the kind of parsing that we are doing is specific, so we felt it easier this way.

The groff parser is interesting in the sense that it really normalizes the input. There is maybe 6 ways to specify the inline formatting (bold, italic), and po4a converts them all to one form only to ease the life of translators. I'm not sure that it will be mandated for the markdown parser, but that's something to consider. The XML parser is interesting because it is difficult to have a line-by-line parser of XML, just as it happens to be in markdown. So the solution built in the XML parser could be useful to rework the markdown parser: instead of the line by line parser that we currently have, we could go for a block by block approach. That would help supporting the bits that are currently not supported.

Edit: we also have some format parsers that are using external tools in po4a. The POD parser is using a dedicated Perl library while the SGML parser is using the onsgml external parser. On another front, I am considering whether asciidoctor could be used as an external parser for the AsciiDoc format. If someone knows a parser for markdown that works a bit like a SAX parser, that may be an interesting starting point, maybe.

I'm willing to help any volunteer, but my personal schedule does not allow me to address this issue alone anytime soon.

Oh. And po4a is written in Perl. Sorry about that...

@eighthave
Copy link
Contributor

eighthave commented Apr 29, 2020 via email

@RMStoica-zivver
Copy link

If I may, from the point of view of the translator block-by-block makes a lot of sense, as one block will be one paragraph, or one list etc.
Also I would suggest maybe taking a look at pandoc

@akumar-xyz
Copy link

Considering how all "inline HTML" is valid markdown, I would suggest approaching markdown files by doing md -> html then simply using translate-toolkit's html support.
Perhaps another small layer to handle front matter. That should make it a little more straight forward to implement a complete markdown support.

I do apologize in case I've missed something about the problem. Only came across this project on bountysource a few hours back

@eighthave
Copy link
Contributor

eighthave commented May 5, 2020 via email

@erciccione
Copy link

erciccione commented Feb 25, 2021

I'm adding 1 Monero (XMR value at the moment: $346 - updated value) to the bounty to be awarded to the person who will resolve this issue.

I would have used Bountysource but the related bounty can only be funded through Paypal. To send the bounty i will need a Monero address (or, if preferred, the address of another cryptocurrency. Like BTC or ETH).

@erciccione
Copy link

Reminder that there are two bounties on this issue: $240 + 1XMR. This feature is very much needed.

@Svetlana-T
Copy link
Contributor

Hi, how can I do this now? With po4a? What are the steps? Say I have hello.md with "Hello world \n Bye all" in it, and would like it to be translatable, for example.

@ilmari-lauhakangas
Copy link

ilmari-lauhakangas commented May 28, 2021

Hi, how can I do this now? With po4a? What are the steps? Say I have hello.md with "Hello world \n Bye all" in it, and would like it to be translatable, for example.

Create a po4a.conf file (name doesn't matter) in a po/ subdirectory with the content

[po4a_langs] fr es it de
[po4a_paths] po/mysite.pot $lang:po/mysite.$lang.po

[options] opt:"--addendum-charset=UTF-8" opt:"--localized-charset=UTF-8" opt:"--master-charset=UTF-8" opt:"--master-language=en_US" opt:"--msgmerge-opt='--no-wrap'" opt:"--porefs=file" opt:"--wrap-po=newlines"

[po4a_alias:markdown] text opt:"--option markdown" opt:"--option yfm_keys=title" opt:"--addendum-charset=UTF-8" opt:"--localized-charset=UTF-8" opt:"--master-charset=UTF-8" opt:"--keep=0"

[type: markdown] content/hello.md $lang:content/$lang/hello.md
[type: markdown] content/goodbye.md $lang:content/$lang/goodbye.md

Then run

po4a po/po4a.conf

If using git, you can add these sorts of rules into .gitignore:

# no need to translate the source language, but po4a gens this file
po/mysite.en.po

# po4a auto-generated markdown files from translations
content/[a-z][a-z]/*.md
content/[a-z][a-z][a-z]/*.md
content/[a-z][a-z][a-z]_[A-Z]*/*.md
content/[a-z][a-z]_[A-Z]*/*.md

Now there is also this tool from KDE: https://invent.kde.org/websites/hugo-i18n

@eighthave
Copy link
Contributor

eighthave commented May 28, 2021 via email

@wetneb
Copy link

wetneb commented May 28, 2021

I have not tried it but I suspect the main issue with this workflow is that translators get to translate parts of markdown files out of context, no?

In OpenRefine we are sadly going to go for Crowdin (for now), because it seems to be the only solution which offers a real markdown editor where you can see the entire file being translated while still working on individual parts.

If people are interested in adding a similar Markdown support in Weblate, I could imagine finding some funding for it (the existing bounties will not get us very far I am afraid). Maybe we could pool resources with other projects interested in the feature?

@ilmari-lauhakangas
Copy link

I have not tried it but I suspect the main issue with this workflow is that translators get to translate parts of markdown files out of context, no?

In OpenRefine we are sadly going to go for Crowdin (for now), because it seems to be the only solution which offers a real markdown editor where you can see the entire file being translated while still working on individual parts.

I wouldn't extend the scope of this issue to include such a nice-to-have feature.

@yarons
Copy link
Contributor

yarons commented Nov 28, 2021

Is this issue is only about getting the strings out of Markdown and translating them? I would suggest supporting something similar to Crowdin's documentation localization offering.

@eighthave
Copy link
Contributor

eighthave commented Nov 29, 2021 via email

@yarons
Copy link
Contributor

yarons commented Nov 29, 2021

I'm talking about the fact that you can see the end result in a preview pane while translating, Mozilla's Pontoon also offer such capability.

@Mihonarium
Copy link

Mihonarium commented May 27, 2022

We're translating markdown articles via weblate. Initially, we wanted to translate by inserting plain text into Weblate, but Weblate was pretty bad at handling insertions and deletions of paragraphs in the original text.

I've looked into po4a and other ways to convert the text into formats that would allow us to easily translate and update the text, but haven't found anything that would be easy to use and wouldn't generate lots of overhead.

So I've written a simple Golang package that splits text into paragraphs, compares to the previous version of the text (if there is one), and produces JSON of a map from keys to paragraphs in a way that keeps the paragraphs in the right order, doesn't change the keys if the text wasn't significantly changed and handles insertions and deletions in a way that avoids key collisions. It should be pretty easy to write something similar and add to how Weblate handles plain text; but if anyone's interested, I can add the documentation, examples, etc. to the tool I've written.

@nijel
Copy link
Member

nijel commented Jun 6, 2022

Handling of plain text files will work better since 4.13, see #7585

@nijel nijel changed the title Markdown files support [$250] Markdown files support Jun 1, 2023
@nijel nijel added this to the 5.0 milestone Jul 19, 2023
@nijel nijel self-assigned this Jul 19, 2023
@nijel nijel linked a pull request Jul 27, 2023 that will close this issue
5 tasks
@github-actions
Copy link

github-actions bot commented Aug 1, 2023

Thank you for your report; the issue you have reported has just been fixed.

  • In case you see a problem with the fix, please comment on this issue.
  • In case you see a similar problem, please open a separate issue.
  • If you are happy with the outcome, don’t hesitate to support Weblate by making a donation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding or requesting a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.