Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dtml-in batching accesses the complete list #75

Closed
d-maurer opened this issue Sep 27, 2024 · 6 comments · Fixed by #76
Closed

dtml-in batching accesses the complete list #75

d-maurer opened this issue Sep 27, 2024 · 6 comments · Fixed by #76

Comments

@d-maurer
Copy link
Contributor

DocumentTemplate 4.0

"zopefoundation/Products.ZCatalog#155" reports that Products.ZCatalog's manage_catalogView takes excessive times for large catalogs.
"zopefoundation/Products.ZCatalog#155 (comment)" contains corresponding profile results. They suggest that dtml-in accesses each element in the list even when called with batching parameters.

@d-maurer
Copy link
Contributor Author

The behavior is caused by the list conversion in "

sequence = list(sequence)
".
It is there to support iterators in addition to true sequences but of course it destroys lazy access to a sequence (highly wished for batching).

I do not yet know how to distinguish safely between a true sequence (not requiring a list conversion) and something requiring a list conversion (e.g. a general iterator).

@d-maurer
Copy link
Contributor Author

@sauzher found out that the list conversion has been introduced by @hannosch in "5716eb8". Unfortunately, the prevents lazy batching.

@sauzher
Copy link

sauzher commented Sep 27, 2024

For what I see, there are a lot of places where sequence gets unpacked: to-list conversions and key-lookup access:

List conversions:

Key lookups:

Key lookups I think are unnecessary because all the try-execeps blocks could be rewritten as simple if-then-else checks on start/end values.

List conversions perhaps could be refactored to convert in list only the firsts "batch_size * Page number" elements.

my 2 cent

@d-maurer
Copy link
Contributor Author

I found out that dtml-in (as it is now) does not support arbitrary iterators:

uses not sequence and this will be False for general iterators even if the iterator does not give any values. The list conversion comes too late to remedy this.

@d-maurer
Copy link
Contributor Author

The original implementation tried hard to avoid the computation of the sequence length (by e.g. using try: sequence[end] except: to check whether there is an element at end) because it wanted support for lazy batching and for some lazy sequences length computation is expensive. Of course, converting the sequence into a list destroys this aim (at it computes the length as a side effect).

I propose to retain the support for lazy batching as follows:
We check whether the sequence has __len__ and __getitem__ but lacks keys; this first indicates that it may be designed as a Sequence, the second that it likely is not designed as a Mapping. We use the sequence as is if the check succeeds; otherwise we replace it by its list conversion.
@hannosch What do you think?

@d-maurer
Copy link
Contributor Author

d-maurer commented Sep 27, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants