[Request] Improve plain text performance (getting bold and italics) #132

KennyChenBasis · 2024-04-09T15:06:06Z

When taking the plain text of the Belgium page, more than half of the time is spent in parsed.get_bolds_and_italics, with most of that time spent in BOLD_ITALIC_FINDITER. It seems like an obvious bottleneck, so it would be great to speed it up - either with a better regex, or a non-regex solution (maybe port the PHP code?).

The text was updated successfully, but these errors were encountered:

5j9 · 2024-04-09T15:39:22Z

IIRC, parsing bolds and italics has some odd edge cases that make the processing slow. I'm not sure if I'll be able to improve it much. For now, if you don't mind bold and italic marks not being removed from the result, you can try adding the replace_bolds_and_italics=False parameter to your plain_text calls.

(There's a trade-off: the situation can certainly be improved for plain_text by moving the main processing steps of bold and italics to the initial parsing stage, but that would slow down all other functions that don't rely on bold/italic formatting.)

This branch was intended to improves bolds/italics performance (#132), but my test results do not show meaningful enough performance improvements to convince me to merge it into main branch.

5j9 · 2024-04-12T12:25:22Z

Closing as I could not think of other clever ways to improve the situation. I'm of-course open to suggestions or PRs. #133 helped a lot and is released as v0.55.12.

KennyChenBasis · 2024-04-12T15:02:58Z

Thanks for taking the time to look into it!

KennyChenBasis mentioned this issue Apr 10, 2024

Improve getting bolds and italics performance #133

Merged

5j9 closed this as completed Apr 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Request] Improve plain text performance (getting bold and italics) #132

[Request] Improve plain text performance (getting bold and italics) #132

KennyChenBasis commented Apr 9, 2024

5j9 commented Apr 9, 2024 •

edited

Loading

5j9 commented Apr 12, 2024 •

edited

Loading

KennyChenBasis commented Apr 12, 2024

[Request] Improve plain text performance (getting bold and italics) #132

[Request] Improve plain text performance (getting bold and italics) #132

Comments

KennyChenBasis commented Apr 9, 2024

5j9 commented Apr 9, 2024 • edited Loading

5j9 commented Apr 12, 2024 • edited Loading

KennyChenBasis commented Apr 12, 2024

5j9 commented Apr 9, 2024 •

edited

Loading

5j9 commented Apr 12, 2024 •

edited

Loading