-
Notifications
You must be signed in to change notification settings - Fork 129
Citations
The citations/references integrated the Wiki markdown are parsed by wtf_wikipedia
and the aggregraed references in the text can be accessed and exported from the generated document doc
with doc.json()
or doc.toJSON()
.
wtf_wikipedia
does a great job for extracting citations and references from a given wiki markdown source.
The following description is not implemented in wtf_wikipedia 5.0
and serves as a basis for the software design process. Please feel free to adapt the content of this GitHub wiki and improve the description prior to implementation of the solution. In the sense of Agile Software Developement some code is inserted here for checking the basic concept, but the code should be regards more or less a pseudo code to understand the proposal. The code is not meant to be a copy and paste resource for the implementation.
The parsed result JSON doc
that contains all parsed citation from the Wiki Markdown article in doc.citations
, which is an array of all collected citations during the wtf.parse(...)
-Call.
This is a text about Swarm
Intelligence<ref>Swarm Behaviour, SomeAuthor,
(2009), SomeEditor, Publisher</ref> and other content.
Parsing the document extracts the content inside the ref
-tags and pushes the content into the doc.citations
array. The desired output should be in plain text.
This is a text about Swarm Intelligence [1] and other content.
or depending on the Citation Style
This is a text about Swarm Intelligence [SomeAuthor2009] and other content.
References are added to wiki document in a different way than you might expect that from other citation management software like Zotero. Classically a bibliographic citation is a reference to a database record that defines a book, article, web page, or other published item by attributes, that allow other scientistic to validate statements and make scientific results reproducible. Authors in Wikipedia, Wikiversity, ... do not create a link to database entry of a book or article, they add the references completely at the location in wiki document were the citation is needed. The MediaWiki aggregates these references and lists the bibliography at the end of the document (by default) or at a location where a marker for injecting the bibliography is placed.
-
wtf_wikipedia
extracts the references into JSON which allows the citation management within a database. - a consequence of that is, that it is necessary to place a kind of unique marker at the place where
ref
-tags defined a reference to book or article.
A citation marker is in the sense of an Abstract Syntax Tree (AST) a tree node Cite
that contains an unique identifier for a reference in JSON database of citations. The unique identier could be
- the array index (not recommended, if citations sorted alphabetically), better to store a counter value, that is incremented for all parsed references in
cite.id = cite_counter()
. - the
ref
-tags are removed by the parser, so aCite
node must be inserted in the AST on the sentence level of parsing (seeContentList
proposal). - as an interims solution of citation could be replaced by marker (e.g.
___CITE_1___
or___CITE_SomeAuthor2009___
) that do not create conflicts with other parsing processes that follow. - an other alternative is to use the standard citation markers e.g. in a Wikipedia source. If your already inserted a new citation in a wiki article with a
ref
-tag wiki authors may need the reference to the book twice in one wiki article. To avoid a multiple listing of one book or article in the reference list at the end of the document in the wiki source the following code is used.
The citation marker injection was not implemented in wtf_wikipedia
release 5.0 and its software design process and parsing issues can be defined in this GitHub-Wiki prior to implementation. A solid software design could reduce the workload of implementation and especially for Spencer Kelly who implemented most of it.
The result of the citation marker injection of the example mentioned above would look like this:
This is a text about Swarm Intelligence ___CITE_1___ and other content.
or
This is a text about Swarm Intelligence ___CITE_SomeAuthor2009___ and other content.
A citation marker would like this ___CITE_label___
with a label, consisting of A-Za-z0-9\-
, so that the citations markers can replaced later in the output by a reference to a book or article in the approrpriate citation style (see PanDoc Citation Management
If we use the standard ref-name
citation marker in the wiki source is will look like this
This is a text about Swarm
Intelligence <ref name="SomeAuthor2009"/> and other content.
Remark: This type of citation marker are currently removed in by the function kill_xml()
in the file /src/document/preProcess/kill_xml.js
.
Remark: The marker injection is currently necessary because parsing of references needs to preserve location in the text, where the literature was cited and the currently designed parsers take strings as input. Non-conflicting markers seems to be a workaround for this until the parsing of Abstract Syntax Tree allows the tree node generation at the time citation and reference detection. There might be better iterims solution. Please discuss and propose alternatives in this wiki prior to implementation, to minimize workload for implementation and deadends of development.
The parsing of references/citation are called in /src/section/index.js
in the doSection()
method
const doSection = function(section, wiki, options) {
...
//parse the <ref></ref> tags
wiki = parse.references(section, wiki, options);
...
The version removes citation markers in kill_xml()
method (see /src/document/preProcess/kill_xml.js
). If you want play around with citation marker replacement, you must comment out especially the citation marker removal that eliminates citation in the form of
<ref name="SomeAuthor2009"></ref> or <ref name="SomeAuthor2009"/>
See /src/document/preProcess/kill_xml.js
l.10:
//only kill ref tags if they are selfclosing
wiki = wiki.replace(/ ?< ?(ref) [a-zA-Z0-9=" ]{2,100}\/ ?> ?/g, ' ');
// removes tags like <ref name="asd"/> but not <ref name='asd'/> in 5.0 - not used/allowed??
If you want to analyse how wtf_wikipedia
parses the citations you should look at /src/section/references.js
. The parsing process is part of section parsing method for the section body
doSection(section, wiki, options)
in /src/section/index.js
via the call of
wiki = parse.references(section, wiki, options);
The method of parsing the references is defined in /src/section/references.js
- see also the basic structure of parsing method in Parsing Wiki Source)
- Use the template mechanismn of the MediaWiki to render a output in a specific format.
wtf_wikipedia
is able to resolve a template. - (Alternative) https://citation.js.org/demo/ how to convert citations with a specific style into an output format.
- (Alternative) HandleBarsJS as a template engine might be helpful to convert JSON data about a citation into a specific output format.
The way how citations are handled is depending on the Output Format and the preferences of the user of wtf_wikipedia
- export all citations in a BibTeX-format (use FileSaver.js by Eli Grey to generate a file save as Download for exporting generated BibTex-files in a browser).
- generate a bibliography and inject this bibliography into the output text at the marker
{{Reflist|2}}
.
\bibliography{mybib}{}
\bibliographystyle{apalike}
The replacement inserts all cited literature in the bibliography.
- create a citation helper function that is performed whenever a citation is found. It determines, what to inject at the location where the location is found in the Wiki markdown text.
Conclusion: A solution for the citation management could be a citation.js
for all output formats in /src/output
(e.g. /src/output/latex/citation.js
). This library processes
-
(RefList) the reference list at the marker
{{Reflist|2}}
or at the end of the document (see https://www.mediawiki.org/wiki/Template:Reflist ) and -
(Ref-Tag) replaces all citations with an appropriate marker that handles the citation in the corresponding output format - e.g.
(OER, 2013)
(see also https://www.mediawiki.org/wiki/Extension:Cite ).
<ref>{{cite web|title=What is OER?|url=http://wiki.creativecommons.org/What_is_OER|work=wiki.creativecommons.org|publisher=Creative Commons|accessdate=18 April 2013}}</ref>
Replace the citation in Wiki Markdown with a cite-command in LaTeX that uses the id
of the citation record.
citations : [
{
"id": "C1D20180327T1503",
"type": "book",
"title": "Swarm Intelligence: From Natural to Artificial Systems",
"author":[
{
"given": "Eric",
"family": "Bonabeau",
},
{
"given": "Marco",
"family": "Dorigo",
},
{
"given": "Guy",
"family": "Theraulaz",
}
],
"year": 1999,
"isbn": "0-19-513159-2"
},
....
]
The citation mechanism of BibTex will work if the citations in the JSON array is part of the BibTeX database of your LaTeX enviroment. So alteration and/or export of the collected citations in wtf_wikipedia
is necessary.
\cite{C1D20180327T1503}
The cite command will be replaced by LaTeX according to your selected citation style (e.g. APA with (Bonabeau, 1999)
).
The citations in the parse JSON by wtf_wikipedia.js
needs some post-processing.
The current JSON format for the citation array is a result of the storage of citations in the Wiki markdown language
citations : [
{
"cite": "book",
"title": "Swarm Intelligence: From Natural to Artificial Systems",
"first1": "Eric",
"last1": "Bonabeau",
"first2": "Marco",
"last2": "Dorigo",
"first3": "Guy",
"last3": "Theraulaz",
"year": 1999,
"isbn": "0-19-513159-2"
},
{
"cite": "journal",
"last1": "Bertin",
"first1": "E.",
"last2": "Droz",
"first2": "M.",
"last3": "Grégoire",
"first3": "G.",
"year": 2009,
"arxiv": 907.4688,
"title": "Hydrodynamic equations for self-propelled particles: microscopic derivation and stability analysis",
"journal": "[[J. Phys. A]]",
"volume": 42,
"issue": 44,
"page": 445001,
"doi": "10.1088/1751-8113/42/44/445001",
"bibcode": "2009JPhA...42R5001B"
}
]
must be converted into
The citations in the parse JSON by wtf_wikipedia.js
needs some post-processing.
citations : [
{
"id": "C1D20180327T1503",
"type": "book",
"title": "Swarm Intelligence: From Natural to Artificial Systems",
"author":[
{
"given": "Eric",
"family": "Bonabeau",
},
{
"given": "Marco",
"family": "Dorigo",
},
{
"given": "Guy",
"family": "Theraulaz",
}
],
"year": 1999,
"isbn": "0-19-513159-2"
},
{
"id": "C1D20180327T1503",
"type": "journal",
"author":[
{
"family": "Bertin",
"given": "E.",
},
{
"family": "Droz",
"given": "M.",
},
{
"family": "Grégoire",
"given": "G.",
}
],
"year": 2009,
"arxiv": 907.4688,
"title": "Hydrodynamic equations for self-propelled particles: microscopic derivation and stability analysis",
"journal": "[[J. Phys. A]]",
"volume": 42,
"issue": 44,
"page": 445001,
"doi": "10.1088/1751-8113/42/44/445001",
"bibcode": "2009JPhA...42R5001B"
}
]
After this conversion is done, the citations can be cross-compiled in the output format with a template or added to a BibTeX-file that is used for creating a LaTeX document.
- Create an attribute
author
in all bibliographic records in the author arraycitations
,
var c = data.citations;
for (var i = 0; i < c.length; i++) {
// add to author array to all bibitem records b=c[i]
c[i]["author"] = [];
var b = c[i];
// add an unique ID for bibitem records b=c[i]
if (!(b.hasOwnProperty("id"))) {
// if bibitem has no id-key add a unique id
const now = new Date();
b["id"] = "T"+now.getTime()+"R"+i;
// e.g. T1508330494000R2
};
var count = 1;
var family = "";
var given = "";
var delkeys = [];
var key = "";
for (var k in c[i]) {
key = "first"+count;
if (b.hasOwnProperty(key)) {
// store given name
given = b[key];
// store the key for delete
delkeys.push(key);
} else {
given = ""
};
key = "last"+count;
if (b.hasOwnProperty(key)) {
// store family name
family = c[i]][key];
// store the key for delete
delkeys.push(key);
// add author to author array with family and given name
(b["author"]).push({"family":family,"given",given})
};
count++;
};
// clean up key/value pairs
// remove first1, last1, ... as key/value pairs from bibitem records b=c[i]
for (var i = 0; i < delkeys.length; i++) {
// delete keys first1, last1, first2, last2, ... if they exist.
delete c[i][delkeys[i]];
};
}
- HandleBarsJS can be used to generated the citation in a specific format. E.g. the content of the
data.citations[i]["title"]
will replace the key marker{{title}}
in a HTML template. The wrapped HTML-tags will render the title in italics.
... <i>{{title}}</i>, ({{year}}), {{journal}} ...
In the Wiki Markdown the reference are stored either at the very end of Wiki markdown text or at the reference marker {{Reflist|2}}
as the two column reference list of all citations found in the Wiki markdown article. The compilation of the citations in the parsed JSON file of wtf_wikipedia
will be converted e.g. with a HandleBarsJS template into an appropriate output format. A citation reference (Bertin2009)
will be inserted that links to a HTML page anchor in the reference list:
LaTeX has its own citation management system BibTex. If you want to use the BibTex, convert the collected citation in the array data.citations
.
wtf.from_api("Swarm intelligence", 'en', function (wikimarkdown, page_identifier, lang_or_wikiid) {
var options = {
page_identifier:page_identifier,
lang_or_wikiid:lang_or_wikiid
};
var data = wtf.parse(wikimarkdown,options);
console.log(JSON.stringify(data, null, 2));
});
The JSON hash data
contains an array with all parsed citations from the Wiki Markdown article. Loop over data.citations
and convert all bibitem records from the array of all collected citations into the BibTex format (e.g. with HandleBarsJS ).
Without BibTex it is possible to render the citation in the array data.citations
into an bibitem
in the bibliography. This is the same procedure without a database and explicit list of collected citations similar to an direct approach mentioned for HTML. The bibliography can be added to the end of the LaTeX file to add the citation.
(see Bibiography in LaTeX )
In the Wiki markdown syntax the citation is inserted in the Wiki text at a position where the citation is mentioned. Later in the HTML generated output in the MediaWiki the collected citations are listed at the very end of the document or (if applicable) at the marker position (e.g. {{Reflist|2}}
) in the Wiki markdown source.
In LaTeX this marker can be replaced by the appropriate LaTeX command (see http://www.bibtex.org/Using/ )
\bibliography{mybib}{}
\bibliographystyle{plain}
- Parsing Concepts are based on Parsoid - https://www.mediawiki.org/wiki/Parsoid
- Output: Based on concepts of the swiss-army knife of
document conversion
developed by John MacFarlane PanDoc - https://www.pandoc.org