Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues searching through channel with large amount of entries #376

Open
wblommaert opened this issue Jul 24, 2023 · 3 comments
Labels

Comments

@wblommaert
Copy link

wblommaert commented Jul 24, 2023

Hi, we're avid users (and a fan) of your plugin in any project that involves locations. In a more recent project, we applied the Map field to our companies section, which consists of about 151.000 entries. Our front-end users are able to search through these entries using a search term and a location (powered by Google Maps auto-completion). In our templates this combined search would look something like this:

{% set companiesQuery = craft.entries().section('companies').with(['logo', 'sectors']).search('title:' ~ querySearch ~ ' OR ' ~ 'sectors:' ~ querySearch).address({
  location: queryRegion,
  radius: 20,
  unit: 'km',
}).orderBy('hasPaidPlan desc, score desc' ~ (queryRegion ? ', distance asc' : '')).limit(20) %}

This query object is then passed on to a paginator using Craft CMS's native {% paginate %} tag. As soon as the queryRegion parameter is passed, the search process slows down completely, sometimes hitting 504 Gateway Timeout errors after 30 seconds (our max runtime on the load balancer). If we leave off the queryRegion part and only search by term, the search is fast and performant.

I used the Craft CMS debug toolbar to analyse the DB queries that are being run in both situations and noticed that when we're searching by location, a JOIN occurs to add the location information to the main query. Upon inspecting this further using the MySQL EXPLAIN command, I noticed that as soon as the JOIN occurs, it triggers an entry query that checks 147.000 ish rows. If I leave the search by location off, this does not occur and there is no query slowdown in the profiler. The affected query is shown below. Total query time is between 24 and 30 seconds.

Screenshot 2023-07-24 at 15 44 04

Because searching by location is an essential part of the website, we'd like to keep using existing logic instead of building a custom search index around it. What I noticed is that, if I pas the same IN condition that is present in the main query to the subquery that is added via SimpleMap, the entire query becomes really fast, as shown below. Total query time here is now between 120 and 400 milliseconds.

Screenshot 2023-07-24 at 15 56 17

This difference is huge since the query no longer has to scan the entire section, even more so considering the amount of returned rows is the exact same.

I had a look in the SimpleMap source code to see where I could potentially optimize this JOIN to reproduce the performance I gained in the raw SQL query, and found the modifyElementsQuery function in the MapService class that is used by the MapField class whenever a search occurs with a location parameter. However, I could not obtain the information from the main query (the IN part) using the query object that is passed to this function. Unfortunately it seems this is more of a limitation imposed by Craft CMS, rather than something SimpleMap could work around. I'd be more than happy to further investigate this and to answer any questions you might have.

Thanks for looking into this!

@alexjcollins
Copy link
Member

@wblommaert I really appreciate the effort and detail here.
We're going to discuss and do some investigating ourselves this week. I'll come back to you shortly if we have any questions or suggestions for you to try with your large dataset. Thanks!

@Tam
Copy link
Member

Tam commented Aug 18, 2023

@wblommaert Those ID's appear to be inserted by the search index, which is done after Maps has had a chance to modify the query. It also looks like the element query passed to Maps isn't in a state where we could do a separate search lookup so I think this is something Craft would have to change.

@wblommaert
Copy link
Author

@wblommaert Those ID's appear to be inserted by the search index, which is done after Maps has had a chance to modify the query. It also looks like the element query passed to Maps isn't in a state where we could do a separate search lookup so I think this is something Craft would have to change.

I was afraid this would be the case after my initial look at it, but thanks for digging into it and confirming. In our specific case we have suggested to our client that we should move the search to an external tool with geospatial support such as ElasticSearch or Algolia, should they wish better performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants