Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Request] Improve plain text performance (getting bold and italics) #132

Closed
KennyChenBasis opened this issue Apr 9, 2024 · 3 comments
Closed

Comments

@KennyChenBasis
Copy link
Contributor

When taking the plain text of the Belgium page, more than half of the time is spent in parsed.get_bolds_and_italics, with most of that time spent in BOLD_ITALIC_FINDITER. It seems like an obvious bottleneck, so it would be great to speed it up - either with a better regex, or a non-regex solution (maybe port the PHP code?).

@5j9
Copy link
Owner

5j9 commented Apr 9, 2024

IIRC, parsing bolds and italics has some odd edge cases that make the processing slow. I'm not sure if I'll be able to improve it much. For now, if you don't mind bold and italic marks not being removed from the result, you can try adding the replace_bolds_and_italics=False parameter to your plain_text calls.

(There's a trade-off: the situation can certainly be improved for plain_text by moving the main processing steps of bold and italics to the initial parsing stage, but that would slow down all other functions that don't rely on bold/italic formatting.)

5j9 added a commit that referenced this issue Apr 11, 2024
This branch was intended to improves bolds/italics performance (#132),
but my test results do not show meaningful enough performance improvements
to convince me to merge it into main branch.
5j9 added a commit that referenced this issue Apr 11, 2024
This branch was intended to improves bolds/italics performance (#132),
but my test results do not show meaningful enough performance improvements
to convince me to merge it into main branch.
@5j9
Copy link
Owner

5j9 commented Apr 12, 2024

Closing as I could not think of other clever ways to improve the situation. I'm of-course open to suggestions or PRs. #133 helped a lot and is released as v0.55.12.

@5j9 5j9 closed this as completed Apr 12, 2024
@KennyChenBasis
Copy link
Contributor Author

Thanks for taking the time to look into it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants