From 472db60a71c11d50b817bd27ee781191c7b3a72d Mon Sep 17 00:00:00 2001 From: Tomas D'Stefano Date: Tue, 17 Dec 2024 16:10:12 +0000 Subject: [PATCH] Remove includes course sites on provider courses page This is generating a rails default query where the main nested loop are being higher and making the query takes more than 500ms on average sometimes reaching more than 2 seconds. Taking a look on the query: ``` EXPLAIN ANALYZE SELECT "course_site".* FROM "course_site" INNER JOIN "site" ON "site"."id" = "course_site"."site_id" INNER JOIN "course_site" AS "site_statuses" ON "site_statuses"."site_id" = "site"."id" WHERE "site_statuses"."status" IN ('N', 'R') AND "course_site"."course_id" IN ( ); ``` If you run the query above you will find some interesting info. If I can summarise in a statement of my interpretation is: Somehow that it requires more deep investigations Rails default includes generates the query above and the query above does not use the composite index of course_site.course_id and course_site.site_id. The details of the explain 1. The Nested Loop join is iterating over the results of the first part of the join (the outer loop), which is the result of scanning course_site. For each row in this result, the system performs an inner scan (the inner loop) to match records from site_statuses and PK_ucas_campus. This can result in a high number of rows if the tables being joined are large or if there's a large number of qualifying rows in the tables. 2. Row Count (rows=5397 / rows=483361) too high! Rows in the actual result suggests that the query planner was likely underestimating the number of rows returned from the join. This can occur if there is high cardinality (many different combinations of records) between the tables involved in the join. The high row count suggests that each course_site row is joining with multiple rows in site_statuses. If each course_site record matches many site_statuses rows, the number of output rows can grow exponentially. 3. Repeated Access to Site (5775 loops): For every row returned by the first index scan (course_site), there are 5775 iterations (loops) over the site data and site_statuses, which increases the overall row count. --- app/controllers/publish/courses_controller.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/app/controllers/publish/courses_controller.rb b/app/controllers/publish/courses_controller.rb index 6dff10b42c..6f23eb1565 100644 --- a/app/controllers/publish/courses_controller.rb +++ b/app/controllers/publish/courses_controller.rb @@ -117,7 +117,7 @@ def fetch_course def provider @provider ||= recruitment_cycle.providers - .includes(courses: %i[sites site_statuses enrichments provider]) + .includes(courses: %i[site_statuses enrichments provider]) .find_by!(provider_code: params[:provider_code]) end