Skip to content

Parser Domains Wiki, XML, HTML, LaTeX

Engelbert Niehaus edited this page Aug 24, 2018 · 1 revision

Wikipedia or in general all MediaWiki driven wiki allow a mixture of basic wiki markdown elements (for lists, links, images, sections, ...), XML, HTML (which violates in a few cases XML coding requirements (unclose HR and BR tags) and LaTeX-code enclosed with MATH-XML-tags. Furthermore it allows dual notation of content element markup (Marking section with "=", "==","===", or H1, H2, H3-tags. Especially LaTeX for mathematical expressions is generated with a complete different grammar than XML dialects or basic markdown elements.

The consequence of this mixture of grammar in Wikipedia, Wikiversity, Wikivoyage, ... is the detection of parser domains, e.g. a NPM MathJax-parser for MathJax could be applied on the LaTeX expression wrapped into math-tags. See Wiki Content in GitHub wiki for wtf_wikipedia.

In the method kill_xml() mathematical expressions are removed for parsing with wtf_wikipedia to make parsing of wtf_wikipedia against contents elements that are part of the LaTeX code and could be interpreted differently as part of the basic Wiki Markdown source.

Parsing Domain REF-tags - References

A parser to REF-tags is necessary to extract the literature citation in a wiki document. This REF-parser is one main parser domain of wtf_wikipedia

Parsing Domain MATH-tags - Mathematical Expressions

A parser to REF-tags is necessary to extract the mathematical expressions in a wiki document. This MATH-parser must not necessarily cross-compile into another output format, if LaTeX code can be used in other output formats as well or be interpreted by a specific LaTeX plugin. This currently 5.0 unimplemented parser is another parser domain of wtf_wikipedia even if the MATH-tags are removed with kill_xml.js.

 == Water ==
 The formula <math> H_{2}O </math> represents the water molecule 
 with 2 hydrogen atoms and one oxygen atom.  

Removing mathematical expressions could make even non-mathematical text difficult to understand. See also ASCII-Math.