Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Page label feature #1188

Merged
merged 24 commits into from
Nov 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
5a9c8af
Doc: Add Polish tutorial (#1166)
DarekRepos May 13, 2024
cfb598f
add DarekRepos as a contributor for translation (#1167)
allcontributors[bot] May 13, 2024
41280fc
Adding scripts/compare-changed-pdfs.py (#1134)
Lucas-C May 14, 2024
319d1e3
page label feature
andersonhc May 30, 2024
9c3927e
update reference file
andersonhc May 30, 2024
1911650
create get_page_label()
andersonhc May 30, 2024
ccdb16a
pylint. improve typing
andersonhc May 30, 2024
5e436b1
Apply suggestions from code review
andersonhc Jun 13, 2024
856e89d
Merge remote-tracking branch 'upstream/master' into page-number
andersonhc Nov 1, 2024
715dad2
draft TableOfContents implementation
andersonhc Nov 4, 2024
89b754a
update toc
andersonhc Nov 5, 2024
4a00a89
add toc extra pages test
andersonhc Nov 7, 2024
891f0c2
improve documentation
andersonhc Nov 7, 2024
a79ea4a
add page labels documentation
andersonhc Nov 7, 2024
4657fa3
add page label test
andersonhc Nov 7, 2024
d550067
fix reference file
andersonhc Nov 7, 2024
d834e6f
update documentation and add changelog entry
andersonhc Nov 7, 2024
563ecfd
formatting
andersonhc Nov 7, 2024
a4fbdf6
implement reviewer suggestions
andersonhc Nov 8, 2024
1248bfb
implement reviewer suggestions
andersonhc Nov 8, 2024
6f1035f
Merge remote-tracking branch 'upstream/master' into page-number
andersonhc Nov 11, 2024
418a39b
add links on changelog and fix for method rename
andersonhc Nov 11, 2024
4312a5d
Merge remote-tracking branch 'upstream/master' into page-number
andersonhc Nov 21, 2024
34e8b0e
new reference files after creation data change
andersonhc Nov 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .all-contributorsrc
Original file line number Diff line number Diff line change
Expand Up @@ -1350,4 +1350,4 @@
"contributorsPerLine": 7,
"skipCi": true,
"commitType": "docs"
}
}
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ This can also be enabled programmatically with `warnings.simplefilter('default',
* new optional parameter `border` for table cells [issue #1192](https://github.com/py-pdf/fpdf2/issues/1192) users can define specific borders (left, right, top, bottom) for individual cells
* [`FPDF.write_html()`](https://py-pdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.write_html): now parses `<title>` tags to set the [document title](https://py-pdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.set_title). By default, it is added as PDF metadata, but not rendered in the document body. However, this can be enabled by passing `render_title_tag=True` to `FPDF.write_html()`.
* support for LZWDecode compression [issue #1271](https://github.com/py-pdf/fpdf2/issues/1271)
* support for [page labels](https://py-pdf.github.io/fpdf2/PageLabels.html) and created a [reference table of contents](https://py-pdf.github.io/fpdf2/DocumentOutlineAndTableOfContents.html) implementation
### Fixed
* support for `align=` in [`FPDF.table()`](https://py-pdf.github.io/fpdf2/Tables.html#setting-table-column-widths). Due to this correction, tables are now properly horizontally aligned on the page by default. This was always specified in the documentation, but was not in effect until now. You can revert to have left-aligned tables by passing `align="LEFT"` to `FPDF.table()`.
* `FPDF.set_text_shaping(False)` was broken since version 2.7.8 and is now working properly - [issue #1287](https://github.com/py-pdf/fpdf2/issues/1287)
Expand Down
87 changes: 66 additions & 21 deletions docs/DocumentOutlineAndTableOfContents.md
Original file line number Diff line number Diff line change
@@ -1,39 +1,82 @@
# Document outline & table of contents #
# Document Outline & Table of Contents

Quoting [Wikipedia](https://en.wikipedia.org/wiki/Table_of_contents), a **table of contents** is:
> a list, usually found on a page before the start of a written work, of its chapter or section titles or brief descriptions with their commencing page numbers.
## Overview

This document explains how to implement and customize the Document Outline (also known as Bookmarks) and Table of Contents (ToC) features in `fpdf2`.

---

Now quoting the 6th edition of the PDF format reference (v1.7 - 2006) :
## Document Outline (Bookmarks)

Document outlines allow users to navigate quickly through sections in the PDF by creating a hierarchical structure of clickable links.

Quoting the 6th edition of the PDF format reference (v1.7 - 2006) :
> A PDF document may optionally display a **document outline** on the screen, allowing the user to navigate interactively
> from one part of the document to another. The outline consists of a tree-structured hierarchy of outline items
> (sometimes called bookmarks), which serve as a visual table of contents to display the document’s structure to the user.
For example, there is how a document outline looks like in [Sumatra PDF Reader](https://www.sumatrapdfreader.org/free-pdf-reader.html):

![](document-outline.png)
![Document Outline Example](document-outline.png)

Since `fpdf2.3.3`, both features are supported through the use of the [`start_section`](fpdf/fpdf.html#fpdf.fpdf.FPDF.start_section) method,
that adds an entry in the internal "outline" table used to render both features.
Since `fpdf2.3.3`, you can use the [`start_section`](fpdf/fpdf.html#fpdf.fpdf.FPDF.start_section) method to add entries in the internal "outline" table, which is used to render both the outline and ToC.

Note that by default, calling `start_section` only records the current position in the PDF and renders nothing.
However, you can configure **global title styles** by calling [`set_section_title_styles`](fpdf/fpdf.html#fpdf.fpdf.FPDF.set_section_title_styles),
after which call to `start_section` will render titles visually using the styles defined.
However, you can configure **global title styles** by calling [`set_section_title_styles`](fpdf/fpdf.html#fpdf.fpdf.FPDF.set_section_title_styles), after which calls to `start_section` will render titles visually using the styles defined.

To provide a document outline to the PDF you generate, you just have to call the `start_section` method for every hierarchical section you want to define.

### Nested outlines

Outlines can be nested by specifying different levels. Higher-level outlines (e.g., level 0) appear at the top, while sub-levels (e.g., level 1, level 2) are indented.

```python
pdf.start_section(name="Chapter 1: Introduction", level=0)
pdf.start_section(name="Section 1.1: Background", level=1)
```

---

## Table of Contents

Quoting [Wikipedia](https://en.wikipedia.org/wiki/Table_of_contents), a **table of contents** is:
> a list, usually found on a page before the start of a written work, of its chapter or section titles or brief descriptions with their commencing page numbers.
### Inserting a Table of Contents

Use the [`insert_toc_placeholder`](fpdf/fpdf.html#fpdf.fpdf.FPDF.insert_toc_placeholder) method to define a placeholder for the ToC. A page break is triggered after inserting the ToC.

To provide a document outline to the PDF you generate, you just have to call the `start_section` method
for every hierarchical section you want to define.
**Parameters:**
- **render_toc_function**: Function called to render the ToC, receiving two parameters: `pdf`, an FPDF instance, and `outline`, a list of `fpdf.outline.OutlineSection`.
- **pages**: The number of pages that the ToC will span, including the current one. A page break occurs for each page specified.
- **allow_extra_pages**: If `True`, allows unlimited additional pages to be added to the ToC as needed. These extra ToC pages are initially created at the end of the document and then reordered when the final PDF is produced.

If you also want to insert a table of contents somewhere,
call [`insert_toc_placeholder`](fpdf/fpdf.html#fpdf.fpdf.FPDF.insert_toc_placeholder)
wherever you want to put it.
Note that a page break will always be triggered after inserting the table of contents.
**Note**: Enabling `allow_extra_pages` may affect page numbering for headers or footers. Since extra ToC pages are added after the document content, they might cause page numbers to appear out of sequence. To maintain consistent numbering, use (Page Labels)[PageLabels.md] to assign a specific numbering style to the ToC pages. When using Page Labels, any extra ToC pages will follow the numbering style of the first ToC page.

## With HTML ##
### Reference Implementation

When using [`FPDF.write_html`](HTML.md), a document outline is automatically built.
You can insert a table of content with the special `<toc>` tag.
_New in [:octicons-tag-24: 2.8.2](https://github.com/py-pdf/fpdf2/blob/master/CHANGELOG.md)_

The `fpdf.outline.TableOfContents` class provides a reference implementation of the ToC, which can be used as-is or subclassed.

```python
from fpdf import FPDF
from fpdf.outline import TableOfContents

pdf = FPDF()
pdf.add_page()
toc = TableOfContents()
pdf.insert_toc_placeholder(toc.render_toc, allow_extra_pages=True)
```

---

## Using Outlines and ToC with HTML

When using [`FPDF.write_html`](HTML.md), a document outline is automatically generated, and a ToC can be added with the `<toc>` tag.

To customize ToC styling, override the `render_toc` method in a subclass:

Custom styling of the table of contents can be achieved by overriding the `render_toc` method
in a subclass of `FPDF`:
```python
from fpdf import FPDF, HTML2FPDF

Expand All @@ -59,7 +102,9 @@ pdf.write_html("""<toc></toc>
pdf.output("html_toc.pdf")
```

## Code samples ##
---

## Additional Code Samples

The regression tests are a good place to find code samples.

Expand Down
103 changes: 103 additions & 0 deletions docs/PageLabels.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# Page Labels
andersonhc marked this conversation as resolved.
Show resolved Hide resolved

_New in [:octicons-tag-24: 2.8.2](https://github.com/py-pdf/fpdf2/blob/master/CHANGELOG.md)_

## Overview

In a PDF document, each page is identified by an integer page index, representing the page's position within the document. Optionally, a document can also define **page labels** to visually display page identifiers.

**Page labels** can be customized. For example, a document might begin with front matter numbered in roman numerals and transition to arabic numerals for the main content. In this case:
- The first page (index `0`) would have a label `i`
- The twelfth page (index `11`) would have label `xii`
- The thirteenth page (index `12`) would start with label `1`

The most popular PDF readers, such as Sumatra PDF and Adobe Acrobat Reader, will accurately display page labels as configured in the PDF. However, not all PDF readers support this feature, and some may not honor or display page labels correctly. In particular, browser-based PDF viewers, like those in Chrome and Edge, currently do not display page labels and will only show default page numbering.

![Page Labels in Sumatra and Acrobat](page-labels.png)

---

## Page Label Components

A **page label** consists of three main parts: `Style`, `Prefix`, and `Start`.

### 1. Style
The style defines the numbering format for the numeric portion of each page label. Available styles are:

- **"D"**: Decimal Arabic numerals (1, 2, 3, ...)
- **"R"**: Uppercase Roman numerals (I, II, III, ...)
- **"r"**: Lowercase Roman numerals (i, ii, iii, ...)
- **"A"**: Uppercase letters (A to Z, then AA to ZZ, and so on)
- **"a"**: Lowercase letters (a to z, then aa to zz, and so on)

### 2. Prefix
The prefix is an optional string added before the numeric portion of each page label. For instance, a prefix of `"Appendix-"` with a style of `"D"` might result in labels like "Appendix-1", "Appendix-2", etc.

### 3. Start
The starting number for the first page of a labeled section. This is the initial numeric value applied to the first page of the label range.

---

## Using Page Labels in `fpdf2`

You can add page labels directly when adding a new page using the `add_page()` method or update them later using `set_page_label()`.

### Adding a Page with Labels in `add_page()`

When adding a page, you can specify the values for `label_style`, `label_prefix`, and `label_start` to define the page label. Here’s how to do it:

```python
from fpdf import FPDF

pdf = FPDF()

# Add a page with specific label parameters
pdf.add_page(
label_style="r", # Lowercase Roman numerals
label_prefix="Preface-", # Prefix for the label
label_start=1 # Start numbering at 1
)
pdf.output("document_with_labels.pdf")
```

### Modifying Page Labels with `set_page_label()`

You can also modify page labels after a page has been added by using `set_page_label()`. This is helpful to set a new label after adding a ToC placeholder or other action that automatically adds a page break, but keep in mind `set_page_label()` will always happen after the header have been rendered. If you need this, prefer to have the label written on footer only.

```python
# Set a page label with style, prefix, and start value
pdf.set_page_label(
label_style="D", # Decimal Arabic numerals
label_prefix="Chapter-", # Prefix for the label
label_start=1 # Start numbering at 1
)
```

### Retrieving the Current Page Label with `get_page_label()`

If you need to get the current page label, for example, to display it in a header or footer, you can use the `get_page_label()` method.

---

## Example Usage

Below is a complete example that demonstrates adding multiple pages with different page label styles and prefixes:

```python
from fpdf import FPDF

pdf = FPDF()

# Adding front matter with lowercase Roman numerals
pdf.add_page(label_style="r", label_start=1) # Starts with "i", "ii", "iii", etc.

# Adding main content with decimal numbers and a prefix
pdf.add_page(label_style="D", label_prefix="Chapter-", label_start=1) # "Chapter-1", "Chapter-2", etc.

# Adding an appendix section with uppercase letters
pdf.add_page(label_style="A", label_prefix="Appendix-", label_start=1) # "Appendix-A", "Appendix-B", etc.

pdf.output("labeled_document.pdf")
```

This example creates a document with three sections, each using a different labeling style and prefix.
Binary file added docs/page-labels.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
34 changes: 28 additions & 6 deletions fpdf/enums.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ class CoerciveEnum(Enum):
"An enumeration that provides a helper to coerce strings into enumeration members."

@classmethod
def coerce(cls, value):
def coerce(cls, value, case_sensitive=False):
"""
Attempt to coerce `value` into a member of this enumeration.
Expand Down Expand Up @@ -48,7 +48,7 @@ def coerce(cls, value):
except ValueError:
pass
try:
return cls[value.upper()]
return cls[value] if case_sensitive else cls[value.upper()]
except KeyError:
pass

Expand Down Expand Up @@ -193,7 +193,7 @@ class Align(CoerciveEnum):
"Justify text"

@classmethod
def coerce(cls, value):
def coerce(cls, value, case_sensitive=False):
if value == "":
return cls.L
return super(cls, cls).coerce(value)
Expand All @@ -213,7 +213,7 @@ class VAlign(CoerciveEnum):
"Place text at the bottom of the cell, but obey the cells padding"

@classmethod
def coerce(cls, value):
def coerce(cls, value, case_sensitive=False):
if value == "":
return cls.M
return super(cls, cls).coerce(value)
Expand Down Expand Up @@ -400,7 +400,7 @@ class TableCellFillMode(CoerciveEnum):
"Fill only table cells in even columns"

@classmethod
def coerce(cls, value):
def coerce(cls, value, case_sensitive=False):
"Any class that has a .should_fill_cell() method is considered a valid 'TableCellFillMode' (duck-typing)"
if callable(getattr(value, "should_fill_cell", None)):
return value
Expand Down Expand Up @@ -472,7 +472,7 @@ def is_fill(self):
return self in (self.F, self.DF)

@classmethod
def coerce(cls, value):
def coerce(cls, value, case_sensitive=False):
if not value:
return cls.D
if value == "FD":
Expand Down Expand Up @@ -1009,6 +1009,28 @@ class TextDirection(CoerciveEnum):
"bottom to top"


class PageLabelStyle(CoerciveEnum):
"Style of the page label"

NUMBER = intern("D")
"decimal arabic numerals"

UPPER_ROMAN = intern("R")
"uppercase roman numerals"

LOWER_ROMAN = intern("r")
"lowercase roman numerals"

UPPER_LETTER = intern("A")
"uppercase letters A to Z, AA to ZZ, AAA to ZZZ and so on"

LOWER_LETTER = intern("a")
"uppercase letters a to z, aa to zz, aaa to zzz and so on"

NONE = None
"no label"


class Duplex(CoerciveEnum):
"The paper handling option that shall be used when printing the file from the print dialog."

Expand Down
Loading
Loading