-
Notifications
You must be signed in to change notification settings - Fork 129
Math
Mathematical expressions are used in many scientific and educational content (not only core mathematical disciplines like Calculus, Algebra, Geometry, Statistics,...). Mathematical expressions are used when specific content can be described in precise form of a finite combination of symbols that is well-formed according to rules that depend on the context in which the expression is used. A precise description of the methodology may include mathematical expression by which the results are determined. Removing the mathematical expressions from the MediaWiki content may result in an incomprehensive text fragement.
This section describes the basic principles of handling mathematical expressions. The export functions are defined as library math.js
in the subdirectory /src/02-section/start-to-end/
.
Important Remark: Currently the export functions are defined already but the export of the parsed syntax tree of the document will not call these functions. A regular expression must distinguish the inline math from block math.
The following code finds all math
-tags
var scripttext = "before text <math> f(x) = x^2 </math> middle text \n:<math> g(x) = sin(x)+cos(x) </math> \n after text"
var re_all = /<math>(.*)<\/math>/gim;
var re_block = /\n[:]+<math>(.*)<\/math>/gim;
// block RE: newline "\n" with one or more indent symbols ":".
// ":" shifts the mathematical expression in the block to the right, when placed directly behind newline
var match;
while (match = re_all.exec(scripttext)) {
// full match is in match[0], whereas captured groups (i.e. the LaTeX math expression) are in match[1].
console.log(match[1]);
}
Output rendering is now defined with plugins. This is a major step towards decomposition of parsing and processing task. Location src/output
are not valid anymore and is replaced plugin
concept.
In the german Wikiversity article about mathematical Norms and Topology you will recognize
- (
INLINE
) inline mathematical expressions in the text (e.g.<math display="inline">\sum_{n=1}^\infty 3^{-n}</math>
) and - (
BLOCK
) separated mathematical expression in a single line (e.g.<math display="block">f(x):=\sum_{n=0}^\infty c_n \cdot (x-x_0)^{-n}</math>
). These two different applications of mathematical expressions can be distinguished by a leading colon ":" in the first column of Wiki Markdown article or better if supported by the author in Wikipedia/Wikiversity with thedisplay
attribute set toinline
orblock
. In the following examplef(x)
is representation of a mathematical expression in LaTeX resp. Wikipedia help for displaying a formula.
This expression <math> f(x) </math> is a mathematical INLINE expression.
The next line is a BLOCK expression in a separate line.
:<math> f(x) </math>
This is the text below the BLOCK expression.
The following functions are defined as library /src/output/latex/math.js
and both functions declare the export of the mathematical expression provided in the parameter pMath
in LaTeX syntax.
// handle inline mathematical expression
const doMathInline = (pMath, options) => {
// pMath is internal LaTeX code for the mathematical expression e.g. "f(x)"
// pMath does not contain the wrapped <math>-tags from the MediaWiki source
let out = '$' + pMath + '$';
return out ;
};
// handle mathematical expression displayed in a separate line
const doMathBlock = (pMath, options) => {
let out = '\[' + pMath + '\]';
return out + ' ';
};
Every export format has a subdirectory in /src/output/
and all subdirectories have a math.js
library with mainly two functions
-
doMathInline(pMath, options)
to handle inline mathematical expressions and -
doMathBlock(pMath, options)
to handle mathematical expressions is separate lines as a block.
If we decompose all tasks of wtf_wikipedia
in the main 3 tasks:
-
wtf_fetch
to download the wiki markdown from the MediaWiki API, -
wtf_parse
to create an Abstract Syntax Tree AST from the wiki markdown and -
wtf_output
to generate different output formats from the AST like PanDoc,
The handling of math expressions in wtf_wikipedia
is addressing
-
wtf_parse
with detection if math expressions areINLINE
orBLOCK
and - with rendering the LaTeX code for the math expression to a specific output format in
wtf_output
task.
Regular expressions can be used to determine what display format of the mathematical expression is intended by the authors of the Wiki MarkDown article.
- (
BLOCK
) First determine the DisplayMath by a regular expression with a- newline, colon and
math
-tag opening the block with the mathematical expression in LaTeX syntax, - the mathematical expression in LaTeX syntax itself and
- closing
math
-Tag
- newline, colon and
- (
INLINE
) after replacement ofBLOCK
expressions the remained mathematical expressions wrapped inmath
-tags can be treated asINLINE
math. The export functions are defined in/src/output/
under the respective export formats e.g. -
/src/output/latex
for LaTeX, -
/src/output/html
for HTML, -
/src/output/markdown
for MarkDown with (KaTeX for rendering the mathematical expressions)
Inline with the export design with helper functions the processing of the mathematical expressions can be done with
- helper function (similar to sentences, ... (see folder
/src/output
in thewtf_wikipedia
repository) (NOT IMPLEMENTED) or - as work around do the preprocessing of the Wiki Markdown sources befor performing the export of the markdown source so that the mathematical expressions are not removed.
The easiest export format is LaTeX due to the fact, that the mathematical expressions in the Wiki Markdown article is written in LaTeX syntax. Therefore a cross-compilation of the latex syntax is not necessary.
- (
INLINE
) inline mathematical expressions are wrapped with TWO Dollar symbols, that replaces the opening and closingmath
-tags. - (
BLOCK
) separated mathematical expression are wrapped with a blackslash followed by an opening respectively closing square brackets. The following latex code shows the converted Wiki markdown text in latex syntax:
This expression $ f(x) $ is a mathematical INLINE expression.
The next line is a BLOCK expression in a separate line.
\[ f(x) \]
This is the text below the BLOCK expression.
The easiest way to export MediaWiki markdown article into HTML with mathematical expression is MathJax, due to the fact, that MathJax can render mathematical expressions in the Wiki Markdown article is written in LaTeX syntax. Therefore a cross-compilation of the latex syntax is not necessary.
- (
INLINE
) inline mathematical expressions are wrapped with TWO Dollar symbols, that replaces the opening and closingmath
-tags. - (
BLOCK
) separated mathematical expression are wrapped with a blackslash followed by an opening respectively closing square brackets. To allow mathematical expression rendering with MathJax - insert the MathJax library as script tag in the output HTML file (for online use remote CDN - for offline use and integrate MathJax download a MathJax ZIP copy and unzip on your computer )
- replace the opening and closing tags for
INLINE
andBLOCK
with the following symbols- (
INLINE
) inline mathematical expressions are wrapped with a blackslash followed by an opening respectively closing bracket. - (
BLOCK
) line separated mathematical expression are wrapped with a blackslash followed by an opening respectively closing square brackets. The following latex code shows the converted Wiki markdown text in HTML syntax including the HTML header.
- (
<!DOCTYPE html>
<html>
<head>
<title>MathJax TeX Test Page</title>
<script type="text/javascript" async
src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.4/latest.js?config=TeX-MML-AM_CHTML">
</script>
</head>
<body>
This expression \( f(x) \) is a mathematical INLINE expression.
The next line is a BLOCK expression in a separate line.
\[ f(x) \]
This is the text below the BLOCK expression.
</body>
</html>
With a local MathJax installation on your harddrive or server replace the MathJax CDN link by the appropriate path
e.g. as a subfolder mathjax
in which the HTML-file is stored the script tag source will be:
<script type="text/javascript" async
src="mathjax/latest.js?config=TeX-MML-AM_CHTML">
</script>
The easiest way to export MediaWiki article into MarkDown with mathematical expressions is KaTeX-Library, due to the fact, that KaTeX can render mathematical expressions in LaTeX syntax in the Wiki Markdown. Therefore the mathematical expressions in the wiki article are just wrapped with a dollor symbol and KaTeX will render the syntax in markdown nicely for your output. Therefore a cross-compilation of the latex syntax is not necessary if you use .
- (
INLINE
) inline mathematical expressions are wrapped with TWO Dollar symbols, that replaces the opening and closingmath
-tags. - (
BLOCK
) separated mathematical block expression are wrapped with a blackslash followed by an opening respectively closing square brackets. To allow mathematical expression rendering with MathJax - insert the MathJax library as script tag in the output HTML file (for online use remote CDN - for offline use and integrate MathJax download a MathJax ZIP copy and unzip on your computer )
- replace the opening and closing tags for
INLINE
andBLOCK
with the following symbols- (
INLINE
) inline mathematical expressions are wrapped with ONE Dollar symbol. - (
BLOCK
) line separated mathematical block expression are wrapped with TWO Dollar Symbols. The following latex code shows the converted Wiki markdown text in HTML syntax including the HTML header.
- (
This expression $ f(x) $ is a mathematical INLINE expression.
The next line is a BLOCK expression in a separate line.
$$ f(x) $$
This is the text below the BLOCK expression.
- in version 5.0 the method
kill_xml()
in/src/document/preProcess/kill_xml.js
removes mathematical expressions wrapped intomath
-tags. -
kill_xml()
is called as preprocess so any handler for mathematical expressions will have nothing to process. - 7.* version parses mathematical expression prior to
kill_xml()
call and parsed mathematical expressions are stored in the JSON objectDocument
generated after parsing the wiki source.The firs step of handling math expressions is performed to extract the mathematical expressions in LaTeX syntax from the wiki source text. - See the concept of a Tokenizer how it is possible to remember the location in the source text where the mathematical expression found by the
wtf_wikipedia
parser. - convert a mathematical expression into a tree node in the Abstract Syntax Tree AST.
- export a specific output format has to decide with options if the mathematical expressions should be kept in the output or removed from the output (e.g. plain text).
- Parsing Concepts are based on Parsoid - https://www.mediawiki.org/wiki/Parsoid
- Output: Based on concepts of the swiss-army knife of
document conversion
developed by John MacFarlane PanDoc - https://www.pandoc.org