-
Notifications
You must be signed in to change notification settings - Fork 129
Parser Domains Wiki, XML, HTML, LaTeX
Wikipedia or in general all MediaWiki driven wiki allow a mixture of basic wiki markdown elements (for lists, links, images, sections, ...), XML, HTML (which violates in a few cases XML coding requirements (unclose HR and BR tags) and LaTeX-code enclosed with MATH-XML-tags. Furthermore it allows dual notation of content element markup (Marking section with "=", "==","===", or H1, H2, H3-tags. Especially LaTeX for mathematical expressions is generated with a complete different grammar than XML dialects or basic markdown elements.
The consequence of this mixture of grammar in Wikipedia, Wikiversity, Wikivoyage, ... is the detection of parser domains, e.g. a NPM MathJax-parser for MathJax could be applied on the LaTeX expression wrapped into math-tags. See Wiki Content in GitHub wiki for wtf_wikipedia
.
In the method kill_xml()
mathematical expressions are removed for parsing with wtf_wikipedia
to make parsing of wtf_wikipedia
against contents elements that are part of the LaTeX code and could be interpreted differently as part of the basic Wiki Markdown source.
A parser to REF-tags is necessary to extract the literature citation in a wiki document. This REF-parser is one main parser domain of wtf_wikipedia
A parser to REF-tags is necessary to extract the mathematical expressions in a wiki document. This MATH-parser must not necessarily cross-compile into another output format, if LaTeX code can be used in other output formats as well or be interpreted by a specific LaTeX plugin. This currently 5.0 unimplemented parser is another parser domain of wtf_wikipedia
even if the MATH-tags are removed with kill_xml.js
.
== Water ==
The formula <math> H_{2}O </math> represents the water molecule
with 2 hydrogen atoms and one oxygen atom.
Removing mathematical expressions could make even non-mathematical text difficult to understand. See also ASCII-Math.
- Parsing Concepts are based on Parsoid - https://www.mediawiki.org/wiki/Parsoid
- Output: Based on concepts of the swiss-army knife of
document conversion
developed by John MacFarlane PanDoc - https://www.pandoc.org