Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

view_sum cache for courses is miscalculated in CourseCacheManager #5911

Open
gabina opened this issue Jul 26, 2024 · 4 comments
Open

view_sum cache for courses is miscalculated in CourseCacheManager #5911

gabina opened this issue Jul 26, 2024 · 4 comments

Comments

@gabina
Copy link
Member

gabina commented Jul 26, 2024

What is happening?

There is a pretty counter-intuitive behavior related to ActiveRecord distinct and sum, which makes the view_sum cache incorrect. See issue #33082 in rails repo for more details on how this actually works.

The main method for CourseCacheManager class is update_cache, which updates the different caches.
The view_sum cache is updated through the following private method:

  def update_view_sum
    @course.view_sum = @course.articles_courses.tracked.live.sum(:view_count)
  end

This method is associated to the following SQL query:

SELECT SUM(DISTINCT `articles_courses`.`view_count`) FROM `articles_courses` INNER JOIN `articles` ON `articles`.`id` = `articles_courses`.`article_id` WHERE `articles_courses`.`course_id` = 16 AND `articles_courses`.`tracked` = TRUE AND `articles`.`deleted` = FALSE

This means that if the value of view_count is the same for different article courses, the repeated values are ignored, generating an incorrect calculation of the total sum of views.

This is caused by a not very intuitive behavior when using distinct and sum in ActiveRecord (note that tracked and live articles courses scopes use distinct).

To Reproduce

Steps to reproduce the behavior:

  1. Have a course downloaded locally with several articles courses, and make the view_count for those articles courses not unique. Example:
    image
  1. Move the update_view_sum private method out of the private methods section so that you can call directly
  2. Open a rails console
  3. Run the following code:
course = Course.find(16)
ccm = CourseCacheManager.new(course)
ccm.update_view_sum

view_sum = 0
course.articles_courses.tracked.live.each { |ac| view_sum+= ac.view_count }
  1. See that course.view_sum is 4049 (doesn't take the repeated 220 into account), while view_sum is 4269 (4049+220).

Expected behavior

view_sum field for Course should have the sum of view_count for all tracked live articles courses, no matter if the view_count value is not unique .

The SQL query should be:
SELECT DISTINCT `articles_courses`.* FROM `articles_courses` INNER JOIN `articles` ON `articles`.`id` = `articles_courses`.`article_id` WHERE `articles_courses`.`course_id` = 16 AND `articles_courses`.`tracked` = TRUE AND `articles`.`deleted` = FALSE

One option is to use the following definition, but I' m not sure if this could be less performant.

  def update_view_sum
    @course.view_sum = @course.articles_courses.tracked.live.sum(&:view_count)
  end

Additional context

It is possible that this same behavior is causing problems in other parts of the code. We should review all the code when we fix this.

@ragesoss
Copy link
Member

Wow, nice find.

@Abishekcs
Copy link
Contributor

Working on this.

@Abishekcs
Copy link
Contributor

After going through this issue and #33082 , I found that the only way forward, as mentioned in one of the comments, is: We are now forced to write more code as a workaround, like plucking the IDs in one call and then putting those IDs in a where clause of a second call to get the correct sum. That’s not satisfying at all.

Most suggested solutions are similar to that comment.

Here are a few examples I ran:

Code

For the Course with articles:

WDG - AF 2018 Florence (305 Articles)
WDG - AF 2018 Florence output

Psychology Capstone (5 Articles)
Psychology Capstone articles

Time Taken:

For Psychology Capstone (5 Articles):

a) update_view_sum (Currently in use but miscalculates): 0.7ms (343 views)

b) update_view_sum_with_ampersand (Correct Calculation) : 2.0ms (459 views)

c) update_view_sum_with_group (Correct Calculation): 0.9ms (459 views)

d) update_view_sum_with_pluck (miscalculated #Just wanted to try out): 0.9ms (343 views)

Psychology Capstone

For WDG - AF 2018 Florence (305 Articles):

a) update_view_sum (Currently in use but miscalculates): 3.5ms (12880276 views)

b) update_view_sum_with_ampersand (TypeError: nil can't be coerced into Integer) : 13.0 ms

c) update_view_sum_with_group (Correct Calculation): 3.5ms (12888880 views)

WDG - AF 2018 Florence
WDG - AF 2018 Florence Type Error

After testing this out, I considered using the version with group, as shown below. However, I still have some doubts about its usefulness.

@ragesoss @gabina , do you have any thoughts on this?

def update_view_sum_with_group
    @course.view_sum = @course.articles_courses.tracked.live.group(:id).sum(:view_count).values.sum
end

@ragesoss
Copy link
Member

ragesoss commented Oct 7, 2024

Hmm... the group workaround seems to be the main one that came out of the discussion in rails#33082, and it seems to be similar performance-wise. What are your doubts about that?

One other thing to investigate might be whether those scopes could be rewritten to have the same intended behavior (returning only one result per unique Article record, rather than one per unique ArticlesCourses record) without using .distinct. If that's possible and performance is fine, it would provide some assurance that we don't have any similar bugs elsewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants