Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Printer Status: Performance improvements #1124

Merged
merged 15 commits into from
Oct 12, 2023
Merged

Printer Status: Performance improvements #1124

merged 15 commits into from
Oct 12, 2023

Conversation

tresf
Copy link
Contributor

@tresf tresf commented May 1, 2023

Introduce PrintService caching to workaround https://bugs.openjdk.org/browse/JDK-7001133

Upstream: JDK-7001133
BellSoft: QZ-14 (private)

Related: #1181
Closes #1116
Closes #1189

@Vzor-
Copy link
Contributor

Vzor- commented May 1, 2023

A lot of the slowdown with #1116 was due to printService.getName() which was reaching out to native and was called dozens of time for each status returned. I added a cached value that has a TTL of 10 seconds. I want to take a look at the various places getname is used to make sure this wont lead to any issues.

@Vzor-
Copy link
Contributor

Vzor- commented May 1, 2023

Quick test on my system results in high CPU utilization for 4 seconds after 10 prints to a usb label printer. Prior it would take 1:30s with the same test. I think the actual print time improved too, though I don't have an easy way to measure that.

@Vzor-
Copy link
Contributor

Vzor- commented Jun 7, 2023

Both getName() and lookupDefaultPrintService() are pretty expensive under certain circumstances. Statuses usually come in bundles of 5-10 events, depending on the type of printer. Each event calls PrintServiceMatcher.matchPrinter(), which calls lookupDefaultPrintService once and getName n times, where n is the number of printers on the system. This means getName is called more often than lookupDefaultPrintService.

In a stand alone test, I made 100 calls to each and recorded the execution time.

Mac Linux Windows
printService 6000ms 10000ms 75ms
getName 255ms 1ms 0ms

I cached the values for both and tested the cached version on arm Window, Mac and Linux and tested how fast Tray could match 10 arbitrary printer names. On Mac, both helped. On linux, only defaultPrintService caching helped, and on windows none of this seemed to matter.

Mac Linux Windows
No caching 1350ms 1500ms 25ms
Name caching 950ms 1500ms 25ms
Service caching 800ms 560ms 25ms
Both caching 410ms 550ms 30ms

@Vzor-
Copy link
Contributor

Vzor- commented Jun 8, 2023

FYI this last commit is 100% untested. I just needed to commit it.

src/qz/common/CachedObject.java Outdated Show resolved Hide resolved
src/qz/common/CachedObject.java Outdated Show resolved Hide resolved
src/qz/common/CachedObject.java Outdated Show resolved Hide resolved
src/qz/common/CachedObject.java Outdated Show resolved Hide resolved
@tresf
Copy link
Contributor Author

tresf commented Jun 9, 2023

I really like the introduction of the cached object handling. I have a few cosmetic requests but I think this is ready for testing.

@Vzor-
Copy link
Contributor

Vzor- commented Jun 30, 2023

I have been load testing this with a handful of virtual printer. Under high status traffic, sanitizePrinterName slows down on mac. I am investigating and will have a commit for it soon.

@Vzor-
Copy link
Contributor

Vzor- commented Jun 30, 2023

After further examination I found the slowdown was occurring when we called PrintServiceLookup.lookupPrintServices(null, null).

In a stand-alone test, I was only able to recreate the issue after setting "sun.java2d.print.polling" to false as we do in TrayManager, and conversely, removing this line fixes the slowdown in our code.

I dug into the JRE and found that internally, java is adding PrintServices to an arraylist and calling .indexOf on that list. This calls equals() many times, which itself calls getName, the original slowdown associated with this PR. It seems that not only is this function expensive, but it becomes slower the more it is called. I have demonstrated the following stand-alone test.

https://gist.github.com/Vzor-/0a4d5f2883dcfc4539426b6948c029ca

@tresf
Copy link
Contributor Author

tresf commented Sep 11, 2023

After further examination I found the slowdown was occurring when we called PrintServiceLookup.lookupPrintServices(null, null).

In a stand-alone test, I was only able to recreate the issue after setting "sun.java2d.print.polling" to false as we do in TrayManager, and conversely, removing this line fixes the slowdown in our code.

I dug into the JRE and found that internally, java is adding PrintServices to an arraylist and calling .indexOf on that list. This calls equals() many times, which itself calls getName, the original slowdown associated with this PR. It seems that not only is this function expensive, but it becomes slower the more it is called. I have demonstrated the following stand-alone test.

https://gist.github.com/Vzor-/0a4d5f2883dcfc4539426b6948c029ca

A friendly reminder, this MUST be filed upstream (first with our JDK support provider, and then they'll report it to OpenJDK).

In regards to the caching improvements, the benchmarks above are helpful, but hard to understand. I think they should be better quantified in terms of API calls. For example, startListening with 100 events BEFORE/AFTER. Then again with 1,000 events, then 10,000 events. This way we can decide if complicating the codebase for this upstream bug is warranted or not.

@Vzor-
Copy link
Contributor

Vzor- commented Sep 12, 2023

Screenshot 2023-09-12 at 2 31 16 AM

WIP, I am running a test with statuses now, and I think I will run 1 more where there is a 5 second delay between prints, so each one starts with a expired cache.

@Vzor-

This comment was marked as outdated.

@Vzor-
Copy link
Contributor

Vzor- commented Sep 25, 2023

The printer list is falling back on [lpstat, -l, -p] until CachedObject<PrintService> cachedDefault goes stale and refreshes. ill look into that now

@Vzor-
Copy link
Contributor

Vzor- commented Sep 27, 2023

So the results with the newest commit are very good, much better than expected.
Screenshot 2023-09-27 at 2 01 19 AM
As we can see the "Wrapped Cache" is so fast, we can't really see it. Here it is with the uncached line removed.
Screenshot 2023-09-27 at 2 03 00 AM
The entire time of execution is 15 seconds, as opposed to 247 seconds with cached. Since our lifespan is only 5 seconds, that means we only updated the cache 3 times, which is why we see virtually no slowdown. I will run a test overnight that delays each print by 5 seconds to maximize the cache updates.

tresf added a commit that referenced this pull request Oct 4, 2023
Fix thread deadlock
Related #1116, #1124

Co-authored-by: Tres Finocchiaro <[email protected]>
@tresf tresf added the upstream label Oct 4, 2023
Caching for CUPS status performance

---------

Co-authored-by: Vzor- <[email protected]>
@tresf
Copy link
Contributor Author

tresf commented Oct 10, 2023

This code should be good to be merged. Just some final testing.

@Vzor-
Copy link
Contributor

Vzor- commented Oct 10, 2023

Tests:
If a service is stale, and you attempt to print to it, we need to verify that this failure is graceful.

  • Windows (shouldn't be affected)
  • Mac
  • Linux

Simple congestion test, lots of prints, lots of statuses

  • Windows (shouldn't be affected)
  • Mac
  • Linux*

[*] Job statuses aren't working, but it seems to be an issue on master as well. We will consider this a pass and open a separate issue.

@Vzor-
Copy link
Contributor

Vzor- commented Oct 11, 2023

Do not merge yet. I think i found an issue on linux but I will have to confirm it tomorrow.

Resolved

@tresf
Copy link
Contributor Author

tresf commented Oct 11, 2023

FYI, if a job is sent to Linux after a printer has been removed, but before the cache has refreshed, this may throw the following error: (this is a bit misleading, but I consider it a rather graceful catch).

[ERROR] 2023-10-11T14:57:08,332 @ qz.utils.PrintingUtilities:214
    Failed to print
sun.print.PrintJobFlavorException: invalid flavor
    at sun.print.UnixPrintJob.print(UnixPrintJob.java:347) ~[?:?]
    at qz.printer.action.PrintRaw.waitForPrint(PrintRaw.java:470) ~[production/:?]
    at qz.printer.action.PrintRaw.printToPrinter(PrintRaw.java:427) ~[production/:?]
    at qz.printer.action.PrintRaw.print(PrintRaw.java:354) ~[production/:?]
    at qz.utils.PrintingUtilities.processPrintRequest(PrintingUtilities.java:204) ~[production/:?]
    at qz.ws.PrintSocketClient.processMessage(PrintSocketClient.java:311) ~[production/:?]
    at qz.ws.PrintSocketClient.lambda$onMessage$0(PrintSocketClient.java:169) ~[production/:?]
    at java.lang.Thread.run(Thread.java:833) [?:?]
[ERROR] 2023-10-11T15:08:41,401 @ qz.utils.PrintingUtilities:214
    Failed to print
java.awt.print.PrinterException: javax.print.PrintException: java.awt.print.PrinterIOException
    at sun.print.RasterPrinterJob.spoolToService(RasterPrinterJob.java:1476) ~[?:?]
    at sun.print.RasterPrinterJob.print(RasterPrinterJob.java:1562) ~[?:?]
    at qz.printer.action.PrintPixel.printCopies(PrintPixel.java:132) ~[production/:?]
    at qz.printer.action.PrintImage.print(PrintImage.java:168) ~[production/:?]
    at qz.utils.PrintingUtilities.processPrintRequest(PrintingUtilities.java:204) ~[production/:?]
    at qz.ws.PrintSocketClient.processMessage(PrintSocketClient.java:311) ~[production/:?]
    at qz.ws.PrintSocketClient.lambda$onMessage$0(PrintSocketClient.java:169) ~[production/:?]
    at java.lang.Thread.run(Thread.java:833) [?:?]

@Vzor-
Copy link
Contributor

Vzor- commented Oct 11, 2023

All of my tests passed. Looks ready to merge from my end.

@tresf tresf merged commit 186ac38 into qzind:master Oct 12, 2023
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

printers.putAll(...) clears NativePrinterMap MacOS: Unusually high CPU when listening for printer events
2 participants