-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Pandoc Tricks
Here’s some tricks that are allowed by pandoc but not obvious at first sight.
- From Markdown, To Markdown
- Template Snippet
- YAML Metadata for Any Format
- Left-aligning Tables in LaTeX
- GFM Task Lists with Pandoc
- Today in date metadata
- Definition list terms on their own line in LaTeX
- Level 4 and 5 headings on their own line in LaTeX
- Globbing input files in the right order
using pandoc -f markdown... -t markdown...
can have surprisingly
useful applications. As a demo, this file is generated by
pandoc -f gfm -t gfm --atx-headers \
--reference-location=block --toc -s -o temp-github.md temp.md
Be careful of @
though, you need to escape it in pandoc since it is
treated as citation in pandoc.
As shown in issue #2814, rendering a document to itself can be used to clean up / normalize your markdown file.
e.g. you have a long markdown file in GitHub and want to have a TOC,
you can use pandoc -s -t gfm --toc -o example-with-toc.md example.md
This a useful workaround to update the TOC of very long documents, but—beware!—if you use this trick for writing over the input file, you’ll end stacking TOCs—each new Table of Contents being generated above the previously built ones, and indexing them too. This technique is useful when working with different source and output files.
Also, you can add a title to the TOC using the toc-title
variable, but
only if you use a markdown template — as explained ahead.
Did you know that you can use pandoc template with markdown too?
Ask pandoc to write-out the default template for markdown:
pandoc --print-default-template=markdown > template.markdown
And now let’s peek at the template we got:
$if(titleblock)$
$titleblock$
$endif$
$for(header-includes)$
$header-includes$
$endfor$
$for(include-before)$
$include-before$
$endfor$
$if(toc)$
$toc$
$endif$
$body$
$for(include-after)$
$include-after$
$endfor$
As you can see, there’s plenty of conditional statements to play with, allowing for additional control over the output markdown file.
You can also use the toc-title
template variable to tell pandoc to add
a title on top of the generated TOC. Change the template’s toc
block
like this:
$if(toc)$
$if(toc-title)$
# $toc-title$
$endif$
$toc$
$endif$
And now invoke pandoc like this:
pandoc --toc -V toc-title:"Table of Contents" --template=template.markdown -o example-with-toc.md example.md
And you’ll see in the example-with-toc.md
file an auto-generated Table
of Contents with a # Table of Contents
title over it.
NOTE: if you also include some extra markdown contents with the
--include-before-body
option (eg:--include-before-body=somefile.md
) the contents of the included file will go before the TOC (at least, with the template used in this example) and any headings it contains will not be included in the TOC — ie: the TOC only indexes what comes after the$toc$
template tag. This is useful if you’d like to include an Abstract before the TOC.
The manual said:
Note: the
--webtex
option will affect Markdown output as well as HTML.
This can be used to put math in pure markdown. e.g. you want to put math directly in the README.md in GitHub.
For example, in the temp.md
:
# Important Discovery!
$1+2\neq3!$
Try it!
Run this:
pandoc --atx-headers --webtex=https://latex.codecogs.com/png.latex? -s -o temp-codecogs.md temp.md
Then the output becomes:
Try it!
Say, in your source markdown file pipe.md
:
| testing | pandoc | tables |
|-------------|-------------------|---------|
| simple cell | no multiline cell | and |
| so | on | no list |
In command line,
pandoc -t markdown-simple_tables-multiline_tables-pipe_tables -s -o grid.md pipe.md
In the output grid.md
:
+--------------------------+--------------------------+--------------------------+
| testing | pandoc | tables |
+==========================+==========================+==========================+
| simple cell | no multiline cell | and |
+--------------------------+--------------------------+--------------------------+
| so | on | no list |
+--------------------------+--------------------------+--------------------------+
If you use auto-identifiers for the headers, and there are different headers with the same name across different files, you’d want to catenate them together, and pandoc can do this for you:
pandoc file1.md file2.md ...
But if there are repeated footnotes anchors on both files, you need to
use the --file-scope
option, which will parse each file individually
(so the footnotes anchors are “local” to the individual file):
pandoc file1.md file2.md --file-scope ...
What about if the 2 files have both these problems? i.e., headers with same names (hence the same Id by the auto-identifier) and footnotes with same anchors appear across the files. Either approach gives you problems.
In this case, you can use “to markdown from markdown” to write an
intermediate markdown file using --file-scope
, which handles the
colliding footnote anchors for you, and then generate the final document
from that intermediate markdown file, and let the auto-identifiers
handle the headers for you:
pandoc --file-scope -o intermediate.md file1.md file2.md
pandoc intermediate.md ...
If you wrote a template snippet that do not form a complete template.
The -H
, -B
, or -A
option would not help because pandoc would put
your snippet as is and wouldn’t process it as a template. i.e. The
snippet is included after the template is processed.
A trick mentioned by @cagix in jgm/pandoc-templates#220 is this:
pandoc --template=template_snippet.tex document.md -o processed_snippet
pandoc ... -H processed_snippet document.md -o document.<toFormat>
# Or shorter but bash only (process substitution)
SNIPPET=template_snippet.tex; INPUT=document.md; OUTPUT=document.<toFormat>
pandoc ... -H <(pandoc --template=$SNIPPET $INPUT) $INPUT -o $OUTPUT
The first line will process your template snippet according to the
properties of the document, but since your snippet (probably) do not
have $body$
, the body would not be in the output. Now the snippet is
processed and can then be included through -H
as is in the 2nd line.
YAML metadata is only defined for pandoc’s markdown syntax. See jgm/pandoc#1960.
Currently, there is a workaround like this (while the YAML metadata only accepts markdown syntax):
pandoc -f markdown -t native -s metadata.yml | sed '$ d' > metadata.native
pandoc -t native -o document.native document.<fromFormat>
pandoc -f native -s -o document.<toFormat> metadata.native document.native
# Or shorter but bash only (process substitution)
YAML=metadata.yml; INPUT=document.<fromFormat>; OUTPUT=document.<toFormat>
pandoc ... -f native -s -o $OUTPUT <(pandoc -f markdown -t native -s $YAML | sed '$ d') <(pandoc -t native $INPUT)
Explanation:
The sed
in the first line: because the metadata.yml
is regarding as
a markdown document with no body, so the last line of the metadata in
native format is []
, which you need to remove. Another way of removing
it is head -n -1
(would not work on Mac’s default head
). From my
test it seems the meta in native is always in one-line, if true then
head -n1
will work (which also works on Mac).
Based on this pandoc-discuss exchange and this TeX StackExchange topic, it is possible to left-align all tables in a document (in the PDF output from LaTeX) with this single invocation in the YAML header block of the markdown document:
---
header-includes:
- |
```{=latex}
\usepackage[margins=raggedright]{floatrow}
```
...
This applies to all floats, and fine-grained control may be achieved
with the options outlined in the documentation for the floatrow
LaTeX
package.
Task lists are part of pandoc as of v2.6. Syntax is the same as GFM.
Add this to the pandoc command you use:
-M date="$(date "+%B %e, %Y")"
Add this to the pandoc command you use:
-M date=%date%
Syntax tested based on a comment to issue #3778. Verified on Pandoc v2.18.
Add this to the pandoc command you use:
-M date="`date -u '+%Y-%m-%d'`"
Syntax contributed by a comment to issue #2865.
Most tools, including most Web browsers render definition lists with the (first) definition on a separate, indented line below the term:
Term
Definition A.
Second line of definition.
Definition B.
LaTeX instead sets the term in bold and the first definition run-in on the same line, which doesn't look good if you have space between paragraphs as Pandoc does:
**Term** Definition A.
Second line of definition.
Definition B.
It is easy to fix this without loading any extra package. Just make sure the following is in your LaTeX preamble:
% "Clone" the original \item command
\let\originalitem\item
% Redefine the \item command using the "clone"
\makeatletter
\renewcommand{\item}[1][\@nil]{%
\def\tmp{#1}%
\ifx\tmp\@nnil\originalitem\else\originalitem[#1]\hfill\par\fi}
\makeatother
This still leaves the term in boldface. To get the term in the normal typeface change the invocation of \originalitem[#1]
to
\originalitem[\textnormal{#1}]
Put this in your custom template or add a header-includes
field to your document metadata:
---
header-includes:
- |
````{=latex}
% insert the fix here
````
In LaTeX level 4 headings are rendered with the \paragraph
command and level 5 headings are rendered with the \subparagraph
command. These commands set the (first) paragraph after the heading run-in with the heading. There is an easy way to fix this. Make sure to include the following in your LaTeX preamble:
% Make "clones" of the commands
\let\originalparagraph\paragraph
\let\originalsubparagraph\subparagraph
% Redefine the commands using the "clones"
\renewcommand{\paragraph}[1]%
{\originalparagraph{#1}\hfill}
\renewcommand{\subparagraph}[1]%
{\originalsubparagraph{#1}\hfill}
Note that unlike the similar fix for definition list terms there should not be any \par
after the \hfill
here!
As you probably know you can pass multiple input files to pandoc
on the command line and they will be treated as a single long file, with blank lines inserted between them:
$ pandoc this.md that.md other.md -o all.html
You can even glob a whole gang of files. This will concatenate all files with an .md
extension in the current directory (aka folder):
$ pandoc *.md -o all.html
There is a snag with globbing like this, however: pandoc
will get the list of file names sorted in ASCII order — i.e. similar to alphabetical ordering, but using the order of characters in the ASCII encoding as sorting order (or actually according to the sorting order of the current locale in your shell) where letters A-Z and a-z happen to come alphabetically, but with all of A-Z before all of a-z — in any case possibly not in the order they are supposed to come in the text: given the three files given in the first example the *.md
glob pattern is equivalent to
other.md
that.md
this.md
There is however an age-old workaround, which actually exploits the glob sorting feature:
If you want the files in a specific order give them names starting with a zero-padded number:
00-intro.md
01-this.md
02-that.md
03-other.md
...
09-something.md
10-anything.md
11-more.md
The leading 0 in 01..09 makes sure that they sort before 10... This is necessary because the shell has no concept of numeric sorting but sorts all characters in globbed file names in ASCII order, but in ASCII order 0 comes before 1, which comes before 2 and so on, and all digits come before all letters.
It is usually a good idea to add an extra trailing 0 as well (or more if you have a lot of files or are going to move files around a lot):
000-intro.md
010-this.md
020-that.md
030-other.md
...
090-something.md
100-anything.md
110-more.md
this way if you want to move a part of the text around or add a part between existing parts you don't need to renumber all the files; you can just give the files containing the moved parts suitable intermediate numbers:
000-intro.md
010-this.md
015-other.md # used to be 030
020-that.md
...
090-something.md
100-anything.md
105-additional.md # new file!
110-more.md
Using this technique the numbering of the file names may get out of synch with the numbering of sections/chapters in the text, but that is OK; you should generally rely on the names/labels of chapters/sections as identifiers and let Pandoc itself, LaTeX and/or pandoc-crossref handle the actual section numbering. The numbers in the file names are file numbers, in a format which is good for the shell, and as human friendly as possible.