Skip to content

Commit

Permalink
Page label feature (#1188)
Browse files Browse the repository at this point in the history
* Doc: Add Polish tutorial (#1166)

- Translate existing one-page tutorials (tuto1, tuto2, tuto3, tuto4, tuto5, tuto6) into Polish language.

* add DarekRepos as a contributor for translation (#1167)

* update README.md [skip ci]

* update .all-contributorsrc [skip ci]

---------

Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com>

* Adding scripts/compare-changed-pdfs.py (#1134)

* page label feature

* update reference file

* create get_page_label()

* pylint. improve typing

* Apply suggestions from code review

Co-authored-by: Lucas Cimon <[email protected]>

* draft TableOfContents implementation

* update toc

* add toc extra pages test

* improve documentation

* add page labels documentation

* add page label test

* fix reference file

* update documentation and add changelog entry

* formatting

* implement reviewer suggestions

* implement reviewer suggestions

* add links on changelog and fix for method rename

* new reference files after creation data change

---------

Co-authored-by: Darek <[email protected]>
Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com>
Co-authored-by: Lucas Cimon <[email protected]>
  • Loading branch information
4 people authored Nov 21, 2024
1 parent 5b7fce3 commit fb1b01d
Show file tree
Hide file tree
Showing 17 changed files with 929 additions and 68 deletions.
2 changes: 1 addition & 1 deletion .all-contributorsrc
Original file line number Diff line number Diff line change
Expand Up @@ -1350,4 +1350,4 @@
"contributorsPerLine": 7,
"skipCi": true,
"commitType": "docs"
}
}
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ This can also be enabled programmatically with `warnings.simplefilter('default',
* new optional parameter `border` for table cells [issue #1192](https://github.com/py-pdf/fpdf2/issues/1192) users can define specific borders (left, right, top, bottom) for individual cells
* [`FPDF.write_html()`](https://py-pdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.write_html): now parses `<title>` tags to set the [document title](https://py-pdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.set_title). By default, it is added as PDF metadata, but not rendered in the document body. However, this can be enabled by passing `render_title_tag=True` to `FPDF.write_html()`.
* support for LZWDecode compression [issue #1271](https://github.com/py-pdf/fpdf2/issues/1271)
* support for [page labels](https://py-pdf.github.io/fpdf2/PageLabels.html) and created a [reference table of contents](https://py-pdf.github.io/fpdf2/DocumentOutlineAndTableOfContents.html) implementation
### Fixed
* support for `align=` in [`FPDF.table()`](https://py-pdf.github.io/fpdf2/Tables.html#setting-table-column-widths). Due to this correction, tables are now properly horizontally aligned on the page by default. This was always specified in the documentation, but was not in effect until now. You can revert to have left-aligned tables by passing `align="LEFT"` to `FPDF.table()`.
* `FPDF.set_text_shaping(False)` was broken since version 2.7.8 and is now working properly - [issue #1287](https://github.com/py-pdf/fpdf2/issues/1287)
Expand Down
87 changes: 66 additions & 21 deletions docs/DocumentOutlineAndTableOfContents.md
Original file line number Diff line number Diff line change
@@ -1,39 +1,82 @@
# Document outline & table of contents #
# Document Outline & Table of Contents

Quoting [Wikipedia](https://en.wikipedia.org/wiki/Table_of_contents), a **table of contents** is:
> a list, usually found on a page before the start of a written work, of its chapter or section titles or brief descriptions with their commencing page numbers.
## Overview

This document explains how to implement and customize the Document Outline (also known as Bookmarks) and Table of Contents (ToC) features in `fpdf2`.

---

Now quoting the 6th edition of the PDF format reference (v1.7 - 2006) :
## Document Outline (Bookmarks)

Document outlines allow users to navigate quickly through sections in the PDF by creating a hierarchical structure of clickable links.

Quoting the 6th edition of the PDF format reference (v1.7 - 2006) :
> A PDF document may optionally display a **document outline** on the screen, allowing the user to navigate interactively
> from one part of the document to another. The outline consists of a tree-structured hierarchy of outline items
> (sometimes called bookmarks), which serve as a visual table of contents to display the document’s structure to the user.
For example, there is how a document outline looks like in [Sumatra PDF Reader](https://www.sumatrapdfreader.org/free-pdf-reader.html):

![](document-outline.png)
![Document Outline Example](document-outline.png)

Since `fpdf2.3.3`, both features are supported through the use of the [`start_section`](fpdf/fpdf.html#fpdf.fpdf.FPDF.start_section) method,
that adds an entry in the internal "outline" table used to render both features.
Since `fpdf2.3.3`, you can use the [`start_section`](fpdf/fpdf.html#fpdf.fpdf.FPDF.start_section) method to add entries in the internal "outline" table, which is used to render both the outline and ToC.

Note that by default, calling `start_section` only records the current position in the PDF and renders nothing.
However, you can configure **global title styles** by calling [`set_section_title_styles`](fpdf/fpdf.html#fpdf.fpdf.FPDF.set_section_title_styles),
after which call to `start_section` will render titles visually using the styles defined.
However, you can configure **global title styles** by calling [`set_section_title_styles`](fpdf/fpdf.html#fpdf.fpdf.FPDF.set_section_title_styles), after which calls to `start_section` will render titles visually using the styles defined.

To provide a document outline to the PDF you generate, you just have to call the `start_section` method for every hierarchical section you want to define.

### Nested outlines

Outlines can be nested by specifying different levels. Higher-level outlines (e.g., level 0) appear at the top, while sub-levels (e.g., level 1, level 2) are indented.

```python
pdf.start_section(name="Chapter 1: Introduction", level=0)
pdf.start_section(name="Section 1.1: Background", level=1)
```

---

## Table of Contents

Quoting [Wikipedia](https://en.wikipedia.org/wiki/Table_of_contents), a **table of contents** is:
> a list, usually found on a page before the start of a written work, of its chapter or section titles or brief descriptions with their commencing page numbers.
### Inserting a Table of Contents

Use the [`insert_toc_placeholder`](fpdf/fpdf.html#fpdf.fpdf.FPDF.insert_toc_placeholder) method to define a placeholder for the ToC. A page break is triggered after inserting the ToC.

To provide a document outline to the PDF you generate, you just have to call the `start_section` method
for every hierarchical section you want to define.
**Parameters:**
- **render_toc_function**: Function called to render the ToC, receiving two parameters: `pdf`, an FPDF instance, and `outline`, a list of `fpdf.outline.OutlineSection`.
- **pages**: The number of pages that the ToC will span, including the current one. A page break occurs for each page specified.
- **allow_extra_pages**: If `True`, allows unlimited additional pages to be added to the ToC as needed. These extra ToC pages are initially created at the end of the document and then reordered when the final PDF is produced.

If you also want to insert a table of contents somewhere,
call [`insert_toc_placeholder`](fpdf/fpdf.html#fpdf.fpdf.FPDF.insert_toc_placeholder)
wherever you want to put it.
Note that a page break will always be triggered after inserting the table of contents.
**Note**: Enabling `allow_extra_pages` may affect page numbering for headers or footers. Since extra ToC pages are added after the document content, they might cause page numbers to appear out of sequence. To maintain consistent numbering, use (Page Labels)[PageLabels.md] to assign a specific numbering style to the ToC pages. When using Page Labels, any extra ToC pages will follow the numbering style of the first ToC page.

## With HTML ##
### Reference Implementation

When using [`FPDF.write_html`](HTML.md), a document outline is automatically built.
You can insert a table of content with the special `<toc>` tag.
_New in [:octicons-tag-24: 2.8.2](https://github.com/py-pdf/fpdf2/blob/master/CHANGELOG.md)_

The `fpdf.outline.TableOfContents` class provides a reference implementation of the ToC, which can be used as-is or subclassed.

```python
from fpdf import FPDF
from fpdf.outline import TableOfContents

pdf = FPDF()
pdf.add_page()
toc = TableOfContents()
pdf.insert_toc_placeholder(toc.render_toc, allow_extra_pages=True)
```

---

## Using Outlines and ToC with HTML

When using [`FPDF.write_html`](HTML.md), a document outline is automatically generated, and a ToC can be added with the `<toc>` tag.

To customize ToC styling, override the `render_toc` method in a subclass:

Custom styling of the table of contents can be achieved by overriding the `render_toc` method
in a subclass of `FPDF`:
```python
from fpdf import FPDF, HTML2FPDF

Expand All @@ -59,7 +102,9 @@ pdf.write_html("""<toc></toc>
pdf.output("html_toc.pdf")
```

## Code samples ##
---

## Additional Code Samples

The regression tests are a good place to find code samples.

Expand Down
103 changes: 103 additions & 0 deletions docs/PageLabels.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# Page Labels

_New in [:octicons-tag-24: 2.8.2](https://github.com/py-pdf/fpdf2/blob/master/CHANGELOG.md)_

## Overview

In a PDF document, each page is identified by an integer page index, representing the page's position within the document. Optionally, a document can also define **page labels** to visually display page identifiers.

**Page labels** can be customized. For example, a document might begin with front matter numbered in roman numerals and transition to arabic numerals for the main content. In this case:
- The first page (index `0`) would have a label `i`
- The twelfth page (index `11`) would have label `xii`
- The thirteenth page (index `12`) would start with label `1`

The most popular PDF readers, such as Sumatra PDF and Adobe Acrobat Reader, will accurately display page labels as configured in the PDF. However, not all PDF readers support this feature, and some may not honor or display page labels correctly. In particular, browser-based PDF viewers, like those in Chrome and Edge, currently do not display page labels and will only show default page numbering.

![Page Labels in Sumatra and Acrobat](page-labels.png)

---

## Page Label Components

A **page label** consists of three main parts: `Style`, `Prefix`, and `Start`.

### 1. Style
The style defines the numbering format for the numeric portion of each page label. Available styles are:

- **"D"**: Decimal Arabic numerals (1, 2, 3, ...)
- **"R"**: Uppercase Roman numerals (I, II, III, ...)
- **"r"**: Lowercase Roman numerals (i, ii, iii, ...)
- **"A"**: Uppercase letters (A to Z, then AA to ZZ, and so on)
- **"a"**: Lowercase letters (a to z, then aa to zz, and so on)

### 2. Prefix
The prefix is an optional string added before the numeric portion of each page label. For instance, a prefix of `"Appendix-"` with a style of `"D"` might result in labels like "Appendix-1", "Appendix-2", etc.

### 3. Start
The starting number for the first page of a labeled section. This is the initial numeric value applied to the first page of the label range.

---

## Using Page Labels in `fpdf2`

You can add page labels directly when adding a new page using the `add_page()` method or update them later using `set_page_label()`.

### Adding a Page with Labels in `add_page()`

When adding a page, you can specify the values for `label_style`, `label_prefix`, and `label_start` to define the page label. Here’s how to do it:

```python
from fpdf import FPDF

pdf = FPDF()

# Add a page with specific label parameters
pdf.add_page(
label_style="r", # Lowercase Roman numerals
label_prefix="Preface-", # Prefix for the label
label_start=1 # Start numbering at 1
)
pdf.output("document_with_labels.pdf")
```

### Modifying Page Labels with `set_page_label()`

You can also modify page labels after a page has been added by using `set_page_label()`. This is helpful to set a new label after adding a ToC placeholder or other action that automatically adds a page break, but keep in mind `set_page_label()` will always happen after the header have been rendered. If you need this, prefer to have the label written on footer only.

```python
# Set a page label with style, prefix, and start value
pdf.set_page_label(
label_style="D", # Decimal Arabic numerals
label_prefix="Chapter-", # Prefix for the label
label_start=1 # Start numbering at 1
)
```

### Retrieving the Current Page Label with `get_page_label()`

If you need to get the current page label, for example, to display it in a header or footer, you can use the `get_page_label()` method.

---

## Example Usage

Below is a complete example that demonstrates adding multiple pages with different page label styles and prefixes:

```python
from fpdf import FPDF

pdf = FPDF()

# Adding front matter with lowercase Roman numerals
pdf.add_page(label_style="r", label_start=1) # Starts with "i", "ii", "iii", etc.

# Adding main content with decimal numbers and a prefix
pdf.add_page(label_style="D", label_prefix="Chapter-", label_start=1) # "Chapter-1", "Chapter-2", etc.

# Adding an appendix section with uppercase letters
pdf.add_page(label_style="A", label_prefix="Appendix-", label_start=1) # "Appendix-A", "Appendix-B", etc.

pdf.output("labeled_document.pdf")
```

This example creates a document with three sections, each using a different labeling style and prefix.
Binary file added docs/page-labels.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
34 changes: 28 additions & 6 deletions fpdf/enums.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ class CoerciveEnum(Enum):
"An enumeration that provides a helper to coerce strings into enumeration members."

@classmethod
def coerce(cls, value):
def coerce(cls, value, case_sensitive=False):
"""
Attempt to coerce `value` into a member of this enumeration.
Expand Down Expand Up @@ -48,7 +48,7 @@ def coerce(cls, value):
except ValueError:
pass
try:
return cls[value.upper()]
return cls[value] if case_sensitive else cls[value.upper()]
except KeyError:
pass

Expand Down Expand Up @@ -193,7 +193,7 @@ class Align(CoerciveEnum):
"Justify text"

@classmethod
def coerce(cls, value):
def coerce(cls, value, case_sensitive=False):
if value == "":
return cls.L
return super(cls, cls).coerce(value)
Expand All @@ -213,7 +213,7 @@ class VAlign(CoerciveEnum):
"Place text at the bottom of the cell, but obey the cells padding"

@classmethod
def coerce(cls, value):
def coerce(cls, value, case_sensitive=False):
if value == "":
return cls.M
return super(cls, cls).coerce(value)
Expand Down Expand Up @@ -400,7 +400,7 @@ class TableCellFillMode(CoerciveEnum):
"Fill only table cells in even columns"

@classmethod
def coerce(cls, value):
def coerce(cls, value, case_sensitive=False):
"Any class that has a .should_fill_cell() method is considered a valid 'TableCellFillMode' (duck-typing)"
if callable(getattr(value, "should_fill_cell", None)):
return value
Expand Down Expand Up @@ -472,7 +472,7 @@ def is_fill(self):
return self in (self.F, self.DF)

@classmethod
def coerce(cls, value):
def coerce(cls, value, case_sensitive=False):
if not value:
return cls.D
if value == "FD":
Expand Down Expand Up @@ -1009,6 +1009,28 @@ class TextDirection(CoerciveEnum):
"bottom to top"


class PageLabelStyle(CoerciveEnum):
"Style of the page label"

NUMBER = intern("D")
"decimal arabic numerals"

UPPER_ROMAN = intern("R")
"uppercase roman numerals"

LOWER_ROMAN = intern("r")
"lowercase roman numerals"

UPPER_LETTER = intern("A")
"uppercase letters A to Z, AA to ZZ, AAA to ZZZ and so on"

LOWER_LETTER = intern("a")
"uppercase letters a to z, aa to zz, aaa to zzz and so on"

NONE = None
"no label"


class Duplex(CoerciveEnum):
"The paper handling option that shall be used when printing the file from the print dialog."

Expand Down
Loading

0 comments on commit fb1b01d

Please sign in to comment.