-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML Transform: Erronous HTML Table Parsing #582
Comments
Thank you got the clear issue report. Very helpful. It appears that the is no agreed way to represent captions for markdown tables. Given that we use |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Bug Report 🐛
Whenever a html table is defined with a caption, the transformation to Markdown yields to an invalid md table.
Expected Behavior
The following html table,
Shall be parsed in the following valid markdown,
Which parses into a valid Markdown table:
Current Behavior
Given the previous html table, including a caption, the tool transform the html into the following markdown content,
Which is an invalid md table:
Average monthly active recipients of the service, in the EU region over prior 6 months (est.)
| | Aug. 2022 - Jan. 2023
| Feb. 2023 - July 2023
|
| Wikibooks
| 6,919,000
| 1,611,000
|
| Wikidata
| 1,056,000
| 1,051,000
|
| Wikimedia Commons
| 2,845,000
| 3,272,000
|
| Wikinews
| 6,283,000
| 1,035,000
|
| Wikipedia
| 151,556,000 | 151,088,000 |
| Wikiquote
| 6,811,000
| 1,548,000
|
| Wikisource
| 7,106,000
| 1,845,000
|
| Wikispecies
| 29,000
| 37,000
|
| Wikiversity
| 6,360,000
| 1,082,000
|
| Wikivoyage
| 616,000
| 632,000
|
| Wiktionary
| 8,955,000
| 8,425,000
|
| Est. devices per person | 2.4[1] | 2.4[1] |
Steps to Reproduce
npm install -g @accordproject/markdown-cli
wget https://foundation.wikimedia.org/wiki/Legal:EU_DSA_Userbase_Statistics --output-file test.html
markus transform --from html --to markdown --input test.html --output test.md
test.md
using a md parser to visiualise the invalid table parsing.Context (Environment)
Parsing HTML to Markdown for web archiving.
Desktop
The text was updated successfully, but these errors were encountered: