Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix slow query caused by GET /articles #414

Merged
merged 3 commits into from
Nov 16, 2023
Merged

Fix slow query caused by GET /articles #414

merged 3 commits into from
Nov 16, 2023

Conversation

retroinspect
Copy link
Contributor

@retroinspect retroinspect commented Sep 21, 2023

  • Mitigate two slow queries reported from sentry

  • Add composite index for core_article.(created_at, parent_board_id) to avoid full scan due to order by from following slow query:

SELECT `core_article`.`id`, `core_article`.`created_at`, `core_article`.`updated_at`,
  `core_article`.`deleted_at`, `core_article`.`title`, `core_article`.`content`,
  `core_article`.`content_text`, `core_article`.`name_type`, `core_article`.`is_content_sexual`,
  `core_article`.`is_content_social`, `core_article`.`hit_count`, `core_article`.`comment_count`,
  `core_article`.`report_count`, `core_article`.`positive_vote_count`,
  `core_article`.`negative_vote_count`, `core_article`.`migrated_hit_count`,
  `core_article`.`migrated_positive_vote_count`, `core_article`.`migrated_negative_vote_count`,
  `core_article`.`created_by_id`, `core_article`.`parent_topic_id`, `core_article`.`parent_board_id`,
  `core_article`.`commented_at`, `core_article`.`url`, `core_article`.`content_updated_at`,
  `core_article`.`hidden_at`, `core_article`.`topped_at`
FROM `core_article`
WHERE (
  `core_article`.`deleted_at` = '0001-01-01 00:00:00.000000' AND
  `core_article`.`parent_board_id` = 1
)
ORDER BY `core_article`.`created_at` DESC
LIMIT 1
  • Add index for core_article.name_type to make the following slow query faster:
SELECT COUNT(*) AS `__count`
FROM `core_article`
WHERE (
  `core_article`.`deleted_at` = '0001-01-01 00:00:00.000000' AND NOT (`core_article`.`created_by_id` IN (
    SELECT U0.`user_id`
    FROM `core_block` U0
    WHERE (
      U0.`deleted_at` = '0001-01-01 00:00:00.000000' AND U0.`blocked_by_id` = %s
    )
  ) AND `core_article`.`name_type` = %s)
)
  • Add custom paginator to use more efficient COUNT query for pagination.

Query after optimization consists of two parts to not use NOT IN statement:

  1. Getting number of all articles
SELECT COUNT(*) AS `__count` FROM `core_article` WHERE `core_article`.`deleted_at` = '0001-01-01 00:00:00'
  1. Getting number of articles written by blocked users
SELECT COUNT(*) AS `__count` FROM `core_article` WHERE (`core_article`.`deleted_at` = '0001-01-01 00:00:00' AND `core_article`.`created_by_id` IN (SELECT U0.`user_id` FROM `core_block` U0 WHERE (U0.`deleted_at` = '0001-01-01 00:00:00' AND U0.`blocked_by_id` = 1)) AND `core_article`.`name_type` = 2)
  • Refer to card for more info

@notion-workspace
Copy link

Slow Query 해결

@retroinspect retroinspect self-assigned this Sep 21, 2023
@retroinspect retroinspect linked an issue Sep 21, 2023 that may be closed by this pull request
@retroinspect retroinspect changed the title Add indices to fix slow query Fix slow query caused by GET /articles Oct 26, 2023
@injoonH injoonH added enhancement New feature or request refactor labels Nov 2, 2023
Comment on lines +161 to +167
indexes = [
models.Index(
fields=["created_at", "parent_board_id"],
name="created_at_parent_board_id_idx",
)
]

Copy link
Contributor Author

@retroinspect retroinspect Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Composite index of (created_at, parent_board_id) is needed
MySQL order by optimization doc

In this query, key_part1 is constant, so all rows accessed through the index are in key_part2 order, and an index on (key_part1, key_part2) avoids sorting if the WHERE clause is selective enough to make an index range scan cheaper than a table scan:

SELECT * FROM t1
  WHERE key_part1 = constant
  ORDER BY key_part2;

Comment on lines +93 to +105
count = (
queryset.count()
- queryset.filter(
created_by__id__in=self.request.user.block_set.values("user"),
name_type=NameType.ANONYMOUS,
).count()
)

# exclude article written by blocked users in anonymous board
queryset = queryset.exclude(
created_by__id__in=self.request.user.block_set.values("user"),
name_type=NameType.ANONYMOUS,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocked user filtering을 한 번만 하게 바꾸는 게 어떤가요?

Suggested change
count = (
queryset.count()
- queryset.filter(
created_by__id__in=self.request.user.block_set.values("user"),
name_type=NameType.ANONYMOUS,
).count()
)
# exclude article written by blocked users in anonymous board
queryset = queryset.exclude(
created_by__id__in=self.request.user.block_set.values("user"),
name_type=NameType.ANONYMOUS,
)
# exclude article written by blocked users in anonymous board
queryset = queryset.exclude(
created_by__id__in=self.request.user.block_set.values("user"),
name_type=NameType.ANONYMOUS,
)
count = queryset.count()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • count = queryset.count() 로 하면 현재와 동일한 쿼리를 사용하게 됨 (다음 문제의 쿼리)
SELECT COUNT(*) AS `__count`
FROM `core_article`
WHERE (
  `core_article`.`deleted_at` = '0001-01-01 00:00:00.000000' AND NOT (`core_article`.`created_by_id` IN (
    SELECT U0.`user_id`
    FROM `core_block` U0
    WHERE (
      U0.`deleted_at` = '0001-01-01 00:00:00.000000' AND U0.`blocked_by_id` = %s
    )
  ) AND `core_article`.`name_type` = %s)
)
  • 위 쿼리가 느린 이유는 NOT IN 절로 예상됨
  • 위 쿼리를 유지한 상태에서 속도를 개선하는 방법은 찾지 못했으며, 코드상의 쿼리처럼 둘로 나누어 처리하면 쿼리가 Slow Query로 잡히지 않을 정도로 빨라지는 것을 확인함

Copy link
Member

@injoonH injoonH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@injoonH injoonH merged commit c6f976b into develop Nov 16, 2023
1 check passed
@injoonH injoonH deleted the fix/slow-query branch November 16, 2023 13:39
@injoonH injoonH mentioned this pull request Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request refactor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix Slow Query
2 participants