Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Render high-res partial page views when falling back to CSS zoom #19128

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

nicolo-ribaudo
Copy link
Contributor

@nicolo-ribaudo nicolo-ribaudo commented Nov 29, 2024

Commit:

Render high-res partial page views when falling back to CSS zoom

When rendering big PDF pages at high zoom levels, we currently fall back to CSS zoom to avoid rendering canvases with too many pixels. This causes zoomed in PDF to look blurry, and the text to be potentially unreadable.

This commit adds support for rendering part of a page (called PDFPageDetailView in the code), so that we can render portion of a page in a smaller canvas without hiting the maximun canvas size limit.

Specifically, we render an area of that page that is slightly larger than the area that is visible on the screen (100% larger in each direction, unless we have to limit it due to the maximum canvas size). As the user scrolls around the page, we re-render a new area centered around what is currently visible.

To play around with this patch, you need to use a large PDF (the one I'm using is https://www.gtt.to.it/cms/risorse/urbana/mappa/mapparete.pdf) and zoom in: in the current release the text becomes blurry, while with this patch it remains sharp. Also, if you set maxCanvasPixels to a smaller value (for example, 8M) you'll more clearly see the effects also at lower zoom levels. Note however that if maxCanvasPixels is set to less than 9 time the number of pixels visible in your window you'll get more frequent re-renders as you scroll.

The PDFPageDetailView class still process every single operation of the PDF, even if many of them will actually be outside of the canvas and thus not be actually rendered. This is acceptable because we are only rendering one "detail view" per page at a time, and we are not splitting the page in multiple tiles that might appear at the same time on the screen. In the future however #19043 could improve the performance of this patch, and it would enable (for example) tiling a page into multiple detail views when printing it.

There are some very minor TODOs that I left in the code, for some areas I need review feedback for. Also, I still need to find the best way to write tests.

Fixes #14147

Before:
image

After:
image

https://www.gtt.to.it/cms/risorse/urbana/mappa/mapparete.pdf at 400% zoom

Before:
image

After:
image

web/pdf_page_view.js Fixed Show fixed Hide fixed
web/pdf_page_view.js Outdated Show resolved Hide resolved
@nicolo-ribaudo
Copy link
Contributor Author

After discussing this patch with @calixteman, I updated it to prioritize rendering the full css-zoomed canvas rather than the high-resolution one. This means that, when zooming in, you'll first see the css-zoomed canvas, and then it will be replaced by the high-res one once it's ready. It's a slightly worse experience, however it guarantees that we do not regress in cases where the user starts moving around before that we are done rendering the background canvas (because if only the high-res canvas is there, they'll see just white until when the new high-res canvas is rendered).

I also changed the logic that decides when to re-render the high-res canvas to not only do it once the user scrolls past its edges, but when the user is close to scrolling past the edges.

web/pdf_page_view.js Outdated Show resolved Hide resolved
// enable the feature?
const ENABLE_ZOOM_DETAIL = true;

class Recorder {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intention to remove this class once the PR is ready to land, since the patch is already large and complex enough without this code?

Not to mention that there's a bunch logging that I don't believe is desirable in e.g. production-builds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I plan to remove the second commit :) It's currently here because it has been very helpful when playing around with the PR, but it's definitely not something for production.

web/pdf_page_view.js Show resolved Hide resolved
web/app.js Outdated Show resolved Hide resolved
@nicolo-ribaudo nicolo-ribaudo force-pushed the draw-page-portion branch 2 times, most recently from 69a98fc to 3408061 Compare December 11, 2024 13:30
@nicolo-ribaudo
Copy link
Contributor Author

Unfortunately this approach has a problem right now: since the various SVG paths for drawings are inserted inside the .canvasWrapper, they are now behind the detail layer and thus not visible. Marking as draft until this is fixed.

@nicolo-ribaudo nicolo-ribaudo marked this pull request as draft December 11, 2024 14:20
Comment on lines +220 to +224
this.#enableHWA =
#enableHWA in options ? options.#enableHWA : options.enableHWA || false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be simplified e.g. as follows?

Suggested change
this.#enableHWA =
#enableHWA in options ? options.#enableHWA : options.enableHWA || false;
this.#enableHWA = options.#enableHWA ?? (options.enableHWA || false);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately no: if #enableHWA is not there (in the super() call from the PDFPageView constructor), options.#enableHWA will throw.

web/pdf_page_view.js Outdated Show resolved Hide resolved
@nicolo-ribaudo
Copy link
Contributor Author

Fixing the drawing problem required changing the new canvas to be in the existing .canvasWrapper, rather than in a separate layer. This actually ended up simplifying the code a bit, and the canvasWrapper management logic is now fully contained in PDFPageView.

@nicolo-ribaudo nicolo-ribaudo marked this pull request as ready for review December 11, 2024 16:18
Copy link
Collaborator

@Snuffleupagus Snuffleupagus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving a few more comments, mostly for the first patch since I've only begun looking at the Render high-res partial page views when falling back to CSS zoom commit (and that one requires careful checking).

Also, is this new functionality anything that we could "easily" test with e.g. integration-tests?

web/pdf_page_view.js Outdated Show resolved Hide resolved
web/pdf_page_view.js Outdated Show resolved Hide resolved
web/pdf_page_view.js Outdated Show resolved Hide resolved
web/pdf_page_view.js Outdated Show resolved Hide resolved
web/pdf_page_view.js Outdated Show resolved Hide resolved
web/pdf_viewer.js Outdated Show resolved Hide resolved
web/pdf_page_view.js Outdated Show resolved Hide resolved
web/pdf_rendering_queue.js Outdated Show resolved Hide resolved
web/pdf_viewer.js Outdated Show resolved Hide resolved
web/pdf_viewer.js Outdated Show resolved Hide resolved
This base class contains the generic logic for:
- Creating a canvas and showing when appropriate
- Rendering in the canvas
- Keeping track of the rendering state
Comment on lines 124 to 136
for (let i = 0; i < numVisible; i++) {
const view = visibleViews[i].view;
if (!this.isViewFinished(view)) {
return view;
}
}

for (let i = 0; i < numVisible; i++) {
const { detailView } = visibleViews[i].view;
if (detailView && !this.isViewFinished(detailView)) {
return detailView;
}
}

Copy link
Collaborator

@Snuffleupagus Snuffleupagus Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another observation/question that I have here is about the order of rendering.

Would it make sense to attempt to render both a page and its detailView before moving onto the other visible pages, such that the most visible page will always render "fully" without having to wait for (potentially) a lot of other pages to finish rendering first?

Basically something like the following, where we combine the loops.
However, please note that I'm not saying that we must do this but I rather just want us to have considered/discussed it!

Suggested change
for (let i = 0; i < numVisible; i++) {
const view = visibleViews[i].view;
if (!this.isViewFinished(view)) {
return view;
}
}
for (let i = 0; i < numVisible; i++) {
const { detailView } = visibleViews[i].view;
if (detailView && !this.isViewFinished(detailView)) {
return detailView;
}
}
for (let i = 0; i < numVisible; i++) {
const view = visibleViews[i].view;
if (!this.isViewFinished(view)) {
return view;
}
const { detailView } = view;
if (detailView && !this.isViewFinished(detailView)) {
return detailView;
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The answer is maybe.

The main use case I am thinking about for this patch is maps, because they are large documents where you usually want to zoom in a lot to the particular point that you are interested in.

The assumption is that the normal usage flow is:

  • you zoom in a lot to roughly the point you are interested in
  • you scroll a bit to adjust what's visible, so that you can actually see what you want
  • you might scroll more in any direction, for example to follow a street or to see what is close to the point you were looking at

It's thus important that when you scroll around the page there is something visible, and so this patch prioritizes rendering the "background view" over the "detail view" (a second reason is that it makes it easier to integrate #19043 because it means that when we try to render the detail view we already have the bounding boxes, but there are also other ways to do it). This however matters within a page: it's probably unlikely that you zoom in to the wrong page, and then you need to scroll to another one. So your suggestion would be reasonable and not in conflict with the use case.

However, there is a problem: the visible pages are rendered first-to-last, meaning that if you have 10px of page 1 visible and 500px of page 2, it will first render page 1 and then page 2. Rendering page 1, then the detail of page 1, and then page 2 increase the total time for seeing something meaningful on screen. So we would first need to change getHighestPriority to prioritize pages by how much they are visible, and then we can do something like that you are suggesting.

wait for (potentially) a lot of other pages

Note that it's very likely that there are at most two pages on screen when the detail view is relevant: the pages must be so large that are above maxCanvasPixels, which makes it difficult to fit more than part of a page in the top half of the screen and part of it in the bottom half.

The only realistic case I can think of is when there are two pages side-by-side, so that we have four pages in total (one per "corner").

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we would first need to change getHighestPriority to prioritize pages by how much they are visible, and then we can do something like that you are suggesting.

Isn't that already the case though, given the sortByVisibility parameter?

pdf.js/web/ui_utils.js

Lines 423 to 425 in 2df5d8f

* @property {boolean} sortByVisibility - If `true`, the returned elements are
* sorted in descending order of the percent of their padding box that is
* visible. The default value is `false`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you are right, I missed it while reading through the code.

Then it's just a question of what we want to prioritize: render the largest chunk well first or render everything not-so-well first.

I have a slight preference for the second, because some PDFs can take very long (30-60 seconds per page) and so the quicker we show something the better it is, but ultimately I don't feel strongly about it.

Copy link
Collaborator

@Snuffleupagus Snuffleupagus Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, my intention here is only that we make an informed decision (whatever it may be) by considering the various cases at hand.

I have a slight preference for the second, because some PDFs can take very long (30-60 seconds per page) and so the quicker we show something the better it is, but ultimately I don't feel strongly about it.

There's one additional complication if there are multiple pages visible and they contain very large image resources.

We have API caching of operatorLists and image (and other) resources, however for complex pages those caches will be cleared (with a delay) after rendering completes. So by delaying rendering of the detailView until other pages have finished there's a risk that the caches have been cleared, and that we'll need to re-parse the page on the worker-thread for the detailView rendering.

See e.g. the following API code (and also the _maybeCleanupAfterRender property):

pdf.js/src/display/api.js

Lines 1780 to 1787 in 2df5d8f

/**
* Attempts to clean up if rendering is in a state where that's possible.
* @param {boolean} [delayed] - Delay the cleanup, to e.g. improve zooming
* performance in documents with large images.
* The default value is `false`.
* @returns {boolean} Indicates if clean-up was successfully run.
*/
#tryCleanup(delayed = false) {

When rendering big PDF pages at high zoom levels, we currently fall back
to CSS zoom to avoid rendering canvases with too many pixels. This
causes zoomed in PDF to look blurry, and the text to be potentially
unreadable.

This commit adds support for rendering _part_ of a page (called
`PDFPageDetailView` in the code), so that we can render portion of a
page in a smaller canvas without hiting the maximun canvas size limit.

Specifically, we render an area of that page that is slightly larger
than the area that is visible on the screen (100% larger in each
direction, unless we have to limit it due to the maximum canvas size).
As the user scrolls around the page, we re-render a new area centered
around what is currently visible.
@nicolo-ribaudo
Copy link
Contributor Author

I'm working on adding some meaningful tests.

Copy link
Collaborator

@Snuffleupagus Snuffleupagus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we reset the "main" page and cancel its rendering here:

pdf.js/web/pdf_page_view.js

Lines 674 to 686 in a1c884c

reset({
keepAnnotationLayer = false,
keepAnnotationEditorLayer = false,
keepXfaLayer = false,
keepTextLayer = false,
keepCanvasWrapper = false,
} = {}) {
this.cancelRendering({
keepAnnotationLayer,
keepAnnotationEditorLayer,
keepXfaLayer,
keepTextLayer,
});

don't we also need to reset and cancel a pending detailView rendering as well?

this.pageView = pageView;
this.renderingId = "detail" + this.id;

this.detailArea = null;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Unless something "outside" of the class depends on this, can it be a private field instead?

// If there is already the lower resolution canvas behind,
// we don't show the new one until when it's fully ready.
this.pageView.renderingState === RenderingStates.FINISHED ||
initialRenderingState === RenderingStates.FINISHED
Copy link
Collaborator

@Snuffleupagus Snuffleupagus Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something, but can we actually reach this spot if the initial RenderingState is FINISHED?

Comment on lines +1484 to +1490
this.eventBus.dispatch("pagerendered", {
source: this,
pageNumber: this.id,
cssTransform: false,
timestamp: performance.now(),
error: this._renderError,
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should introduce helper-methods in the base-class for dispatching this event, to reduce duplication?

Comment on lines +1496 to +1499
this.eventBus.dispatch("pagerender", {
source: this,
pageNumber: this.id,
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should introduce helper-methods in the base-class for dispatching this event, to reduce duplication?

@Snuffleupagus
Copy link
Collaborator

Btw, does this PR also fix issue #14193?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PDF is blurry
3 participants