Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds a fontsize definition to the Glossary and updates relevant links. #2679

Merged
merged 2 commits into from
Sep 21, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/annot.rst
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ There is a parent-child relationship between an annotation and its page. If the

:arg str name: the new name.

.. caution:: If you set the name of a 'Stamp' annotation, then this will **not change** the rectangle, nor will the text be layouted in any way. If you choose a standard text from :ref:`StampIcons` (the **exact** name piece after `"STAMP_"`), you should receive the original layout. An **arbitrary text** will not be changed to upper case, but be written in font "Times-Bold" as is, horizontally centered in **one line** and be shortened to fit. To get your text fully displayed, its length using fontsize 20 must not exceed 190 pixels. So please make sure that the following inequality is true: `fitz.get_text_length(text, fontname="tibo", fontsize=20) <= 190`.
.. caution:: If you set the name of a 'Stamp' annotation, then this will **not change** the rectangle, nor will the text be layouted in any way. If you choose a standard text from :ref:`StampIcons` (the **exact** name piece after `"STAMP_"`), you should receive the original layout. An **arbitrary text** will not be changed to upper case, but be written in font "Times-Bold" as is, horizontally centered in **one line** and be shortened to fit. To get your text fully displayed, its length using :data:`fontsize` 20 must not exceed 190 pixels. So please make sure that the following inequality is true: `fitz.get_text_length(text, fontname="tibo", fontsize=20) <= 190`.

.. method:: set_rect(rect)

Expand Down Expand Up @@ -328,7 +328,7 @@ There is a parent-child relationship between an annotation and its page. If the

:arg float opacity: *(new in v1.16.14)* **valid for all annotation types:** change or set the annotation's transparency. Valid values are *0 <= opacity < 1*.
:arg str blend_mode: *(new in v1.16.14)* **valid for all annotation types:** change or set the annotation's blend mode. For valid values see :ref:`BlendModes`.
:arg float fontsize: change font size of the text. 'FreeText' annotations only.
:arg float fontsize: change :data:`fontsize` of the text. 'FreeText' annotations only.
:arg sequence,float text_color: change the text color. 'FreeText' annotations only.
:arg sequence,float border_color: change the border color. 'FreeText' annotations only.
:arg sequence,float fill_color: the fill color.
Expand Down
2 changes: 1 addition & 1 deletion docs/document.rst
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,7 @@ For details on **embedded files** refer to Appendix 3.

:arg float height: may used together with *width* as an alternative to *rect* to specify layout information.

:arg float fontsize: the default fontsize for reflowable document types. This parameter is ignored if none of the parameters *rect* or *width* and *height* are specified. Will be used to calculate the page layout.
:arg float fontsize: the default :data:`fontsize` for reflowable document types. This parameter is ignored if none of the parameters *rect* or *width* and *height* are specified. Will be used to calculate the page layout.

:raises TypeError: if the *type* of any parameter does not conform.
:raises FileNotFoundError: if the file / path cannot be found. Re-implemented as subclass of `RuntimeError`.
Expand Down
6 changes: 3 additions & 3 deletions docs/font.rst
Original file line number Diff line number Diff line change
Expand Up @@ -213,7 +213,7 @@ A Font object also contains useful general information, like the font bbox, the

.. method:: glyph_bbox(chr, language=None, script=0)

The glyph rectangle relative to fontsize 1.
The glyph rectangle relative to :data:`fontsize` 1.

:arg int chr: *ord()* of the character.

Expand Down Expand Up @@ -241,7 +241,7 @@ A Font object also contains useful general information, like the font bbox, the

:arg str text: a text string, UTF-8 encoded.

:arg float fontsize: the fontsize.
:arg float fontsize: the :data:`fontsize`.

:rtype: float

Expand All @@ -265,7 +265,7 @@ A Font object also contains useful general information, like the font bbox, the

:arg str text: a text string, UTF-8 encoded.

:arg float fontsize: the fontsize.
:arg float fontsize: the :data:`fontsize`.

:rtype: tuple

Expand Down
10 changes: 5 additions & 5 deletions docs/functions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ Yet others are handy, general-purpose utilities.
:meth:`EMPTY_RECT` return the (standard) empty / invalid rectangle
:meth:`get_pdf_now` return the current timestamp in PDF format
:meth:`get_pdf_str` return PDF-compatible string
:meth:`get_text_length` return string length for a given font & fontsize
:meth:`get_text_length` return string length for a given font & :data:`fontsize`
:meth:`glyph_name_to_unicode` return unicode from a glyph name
:meth:`image_profile` return a dictionary of basic image properties
:meth:`INFINITE_IRECT` return the (only existing) infinite rectangle
Expand Down Expand Up @@ -361,11 +361,11 @@ Yet others are handy, general-purpose utilities.

* New in version 1.14.7

Calculate the length of text on output with a given **builtin** font, fontsize and encoding.
Calculate the length of text on output with a given **builtin** font, :data:`fontsize` and encoding.

:arg str text: the text string.
:arg str fontname: the fontname. Must be one of either the :ref:`Base-14-Fonts` or the CJK fonts, identified by their "reserved" fontnames (see table in :meth:`Page.insert_font`).
:arg float fontsize: the fontsize.
:arg float fontsize: the :data:`fontsize`.
:arg int encoding: the encoding to use. Besides 0 = Latin, 1 = Greek and 2 = Cyrillic (Russian) are available. Relevant for Base-14 fonts "Helvetica", "Courier" and "Times" and their variants only. Make sure to use the same value as in the corresponding text insertion.
:rtype: float
:returns: the length in points the string will have (e.g. when used in :meth:`Page.insert_text`).
Expand Down Expand Up @@ -568,7 +568,7 @@ Yet others are handy, general-purpose utilities.
- 1: Stroked text -- equivalent to `1 Tr`, only the character borders are shown.
- 3: Ignored text -- equivalent to `3 Tr` (hidden text).

3. Line width in this context is important only for processing `span["type"] != 0`: it determines the thickness of the character's border line. This value may not be provided at all with the text data. In this case, a value of 5% of the fontsize (`span["size"] * 0,05`) is generated. Often, an "artificial" bold text in PDF is created by `2 Tr`. There is no equivalent span type for this case. Instead, respective text is represented by two consecutive spans -- which are identical in every aspect, except for their types, which are 0, resp 1. It is your responsibility to handle this type of situation - in :meth:`Page.get_text`, MuPDF is doing this for you.
3. Line width in this context is important only for processing `span["type"] != 0`: it determines the thickness of the character's border line. This value may not be provided at all with the text data. In this case, a value of 5% of the :data:`fontsize` (`span["size"] * 0,05`) is generated. Often, an "artificial" bold text in PDF is created by `2 Tr`. There is no equivalent span type for this case. Instead, respective text is represented by two consecutive spans -- which are identical in every aspect, except for their types, which are 0, resp 1. It is your responsibility to handle this type of situation - in :meth:`Page.get_text`, MuPDF is doing this for you.
4. For data compactness, the character's unicode is provided here. Use built-in function `chr()` for the character itself.
5. The alpha / opacity value of the span's text, `0 <= opacity <= 1`, 0 is invisible text, 1 (100%) is intransparent. Depending on `span["type"]`, interpret this value as *fill* opacity or, resp. *stroke* opacity.
6. *(Changed in v1.19.0)* This value is equal or close to `char["bbox"]` of "rawdict". In particular, the bbox **height** value is always computed as if **"small glyph heights"** had been requested.
Expand Down Expand Up @@ -703,7 +703,7 @@ Yet others are handy, general-purpose utilities.
:arg int limit: limits the number of returned entries. The default of 256 is enforced for all fonts that only support 1-byte characters, so-called "simple fonts" (checked by this method). All :ref:`Base-14-Fonts` are simple fonts.

:rtype: list
:returns: a list of *limit* tuples. Each character *c* has an entry *(g, w)* in this list with an index of *ord(c)*. Entry *g* (integer) of the tuple is the glyph id of the character, and float *w* is its normalized width. The actual width for some fontsize can be calculated as *w * fontsize*. For simple fonts, the *g* entry can always be safely ignored. In all other cases *g* is the basis for graphically representing *c*.
:returns: a list of *limit* tuples. Each character *c* has an entry *(g, w)* in this list with an index of *ord(c)*. Entry *g* (integer) of the tuple is the glyph id of the character, and float *w* is its normalized width. The actual width for some :data:`fontsize` can be calculated as *w * fontsize*. For simple fonts, the *g* entry can always be safely ignored. In all other cases *g* is the basis for graphically representing *c*.

This function calculates the pixel width of a string called *text*::

Expand Down
5 changes: 5 additions & 0 deletions docs/glossary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,11 @@ Glossary

Abbreviation for cross-reference number: this is an integer unique identification for objects in a PDF. There exists a cross-reference table (which may physically consist of several separate segments) in each PDF, which stores the relative position of each object for quick lookup. The cross-reference table is one entry longer than the number of existing object: item zero is reserved and must not be used in any way. Many PyMuPDF classes have an *xref* attribute (which is zero for non-PDFs), and one can find out the total number of objects in a PDF via :meth:`Document.xref_length` *- 1*.


.. data:: fontsize

When referring to font size this metric is measured in points where 1 inch = 72 points.

.. data:: resolution

Images and :ref:`Pixmap` objects may contain resolution information provided as "dots per inch", dpi, in each direction (horizontal and vertical). When MuPDF reads an image from a file or from a PDF object, it will parse this information and put it in :attr:`Pixmap.xres`, :attr:`Pixmap.yres`, respectively. If it finds no meaningful information in the input (like non-positive values or values exceeding 4800), it will use "sane" defaults instead. The usual default value is 96, but it may also be 72 in some cases (e.g. for JPX images).
Expand Down
6 changes: 3 additions & 3 deletions docs/module.rst
Original file line number Diff line number Diff line change
Expand Up @@ -429,7 +429,7 @@ Extract text from arbitrary :ref:`supported documents<Supported_File_Types>` to

After each page of the output file, a formfeed character, `hex(12)` is written -- even if the input page has no text at all. This behavior can be controlled via options.

.. note:: For "layout" mode, **only horizontal, left-to-right, top-to bottom** text is supported, other text is ignored. In this mode, text is also ignored, if its fontsize is too small.
.. note:: For "layout" mode, **only horizontal, left-to-right, top-to bottom** text is supported, other text is ignored. In this mode, text is also ignored, if its :data:`fontsize` is too small.

"Simple" and "blocks" mode in contrast output **all text** for any text size or orientation.

Expand Down Expand Up @@ -459,7 +459,7 @@ Command::
-skip-empty suppress pages with no text (default False)
-output OUTPUT store text in this file (default inputfilename.txt)
-grid GRID merge lines if closer than this (default 2)
-fontsize FONTSIZE only include text with a larger fontsize (default 3)
-fontsize FONTSIZE only include text with a larger :data:`fontsize` (default 3)

.. note:: Command options may be abbreviated as long as no ambiguities are introduced. So the following do the same:

Expand All @@ -475,7 +475,7 @@ Command::
* **noformfeed:** (bool) instead of `hex(12)` (formfeed), write linebreaks `\n` at end of output pages.
* **skip-empty:** (bool) skip pages with no text.
* **grid:** lines with a vertical coordinate difference of no more than this value (in points) will be merged into the same output line. Only relevant for "layout" mode. **Use with care:** 3 or the default 2 should be adequate in most cases. If **too large**, lines that are *intended* to be different in the original may be merged and will result in garbled and / or incomplete output. If **too low**, artifact separate output lines may be generated for some spans in the input line, just because they are coded in a different font with slightly deviating properties.
* **fontsize:** include text with fontsize larger than this value only (default 3). Only relevant for "layout" option.
* **fontsize:** include text with :data:`fontsize` larger than this value only (default 3). Only relevant for "layout" option.


.. highlight:: python
Expand Down
4 changes: 2 additions & 2 deletions docs/page.rst
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,7 @@ In a nutshell, this is what you can do with PyMuPDF:
:arg rect_like rect: the rectangle into which the text should be inserted. Text is automatically wrapped to a new line at box width. Lines not fitting into the box will be invisible.

:arg str text: the text. *(New in v1.17.0)* May contain any mixture of Latin, Greek, Cyrillic, Chinese, Japanese and Korean characters. The respective required font is automatically determined.
:arg float fontsize: the font size. Default is 12.
:arg float fontsize: the :data:`fontsize`. Default is 12.
:arg str fontname: the font name. Default is "Helv". Accepted alternatives are "Cour", "TiRo", "ZaDb" and "Symb". The name may be abbreviated to the first two characters, like "Co" for "Cour". Lower case is also accepted. *(Changed in v1.16.0)* Bold or italic variants of the fonts are **no longer accepted**. A user-contributed script provides a circumvention for this restriction -- see section *Using Buttons and JavaScript* in chapter :ref:`FAQ`. *(New in v1.17.0)* The actual font to use is now determined on a by-character level, and all required fonts (or sub-fonts) are automatically included. Therefore, you should rarely ever need to care about this parameter and let it default (except you insist on a serifed font for your non-CJK text parts).
:arg sequence,float text_color: *(new in v1.16.0)* the text color. Default is black.

Expand Down Expand Up @@ -279,7 +279,7 @@ In a nutshell, this is what you can do with PyMuPDF:
)
page.add_redact_annot(..., fontname="newname")

:arg float fontsize: *(New in v1.16.12)* the fontsize to use for the replacing text. If the text is too large to fit, several insertion attempts will be made, gradually reducing the fontsize to no less than 4. If then the text will still not fit, no text insertion will take place at all.
:arg float fontsize: *(New in v1.16.12)* the :data:`fontsize` to use for the replacing text. If the text is too large to fit, several insertion attempts will be made, gradually reducing the :data:`fontsize` to no less than 4. If then the text will still not fit, no text insertion will take place at all.

:arg int align: *(New in v1.16.12)* the horizontal alignment for the replacing text. See :meth:`insert_textbox` for available values. The vertical alignment is (approximately) centered if a PDF built-in font is used (CJK or :ref:`Base-14-Fonts`).

Expand Down
2 changes: 1 addition & 1 deletion docs/recipes-text.rst
Original file line number Diff line number Diff line change
Expand Up @@ -322,7 +322,7 @@ Output some text lines on a page::

doc.save("text.pdf")

With this method, only the **number of lines** will be controlled to not go beyond page height. Surplus lines will not be written and the number of actual lines will be returned. The calculation uses a line height calculated from the fontsize and 36 points (0.5 inches) as bottom margin.
With this method, only the **number of lines** will be controlled to not go beyond page height. Surplus lines will not be written and the number of actual lines will be returned. The calculation uses a line height calculated from the :data:`fontsize` and 36 points (0.5 inches) as bottom margin.

Line **width is ignored**. The surplus part of a line will simply be invisible.

Expand Down
2 changes: 1 addition & 1 deletion docs/shape.rst
Original file line number Diff line number Diff line change
Expand Up @@ -544,7 +544,7 @@ Common Parameters

**fontsize** (*float*)

Font size of text.
Font size of text, see: :data:`fontsize`.

----

Expand Down
4 changes: 2 additions & 2 deletions docs/textpage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -279,7 +279,7 @@ chars (only for :meth:`extractRAWDICT`) *list* of character dictionari

*(New in version 1.16.0):* *"color"* is the text color encoded in sRGB (int) format, e.g. 0xFF0000 for red. There are functions for converting this integer back to formats (r, g, b) (PDF with float values from 0 to 1) :meth:`sRGB_to_pdf`, or (R, G, B), :meth:`sRGB_to_rgb` (with integer values from 0 to 255).

*(New in v1.18.5):* *"ascender"* and *"descender"* are font properties, provided relative to fontsize 1. Note that descender is a negative value. The following picture shows the relationship to other values and properties.
*(New in v1.18.5):* *"ascender"* and *"descender"* are font properties, provided relative to :data:`fontsize` 1. Note that descender is a negative value. The following picture shows the relationship to other values and properties.

.. image:: images/img-asc-desc.*
:scale: 60
Expand All @@ -294,7 +294,7 @@ These numbers may be used to compute the minimum height of a character (or span)
>>> r.y0 = r.y1 - span["size"]
>>> # r now is a rectangle of height 'fontsize'

.. caution:: The above calculation may deliver a **larger** height! This may e.g. happen for OCRed documents, where the risk of all sorts of text artifacts is high. MuPDF tries to come up with a reasonable bbox height, independently from the fontsize found in the PDF. So please ensure that the height of `span["bbox"]` is **larger** than `span["size"]`.
.. caution:: The above calculation may deliver a **larger** height! This may e.g. happen for OCRed documents, where the risk of all sorts of text artifacts is high. MuPDF tries to come up with a reasonable bbox height, independently from the :data:`fontsize` found in the PDF. So please ensure that the height of `span["bbox"]` is **larger** than `span["size"]`.

.. note:: You may request PyMuPDF to do all of the above automatically by executing `fitz.TOOLS.set_small_glyph_heights(True)`. This sets a global parameter so that all subsequent text searches and text extractions are based on reduced glyph heights, where meaningful.

Expand Down
Loading
Loading