query for existing issues might fail silently and a new issue created for every issue detected by the task #2270

Archaeopteryx · 2024-07-23T10:05:01Z

2024-07-23 06:00:42.000212 [INFO    ] [info     ] Checking for existing issues in the backend base_revision_changeset=ac4a1f84adfa69b77ccec3589f2a28ec7089fe10
2024-07-23 06:17:48.000023 [INFO    ] [info     ] Found 780 new issues (over 780 total detected issues) task=ZIJe3nlqQ4CvIivkOYtMNg

It took 17 min 06 s to query for known issues, yet no known issue has been detected by the task.

The time is close to 16 min 40 s, or 1000 s as a timeout.

If there is a performance issue which would cause the retrieval of the data to fail, the creation of tickets for every issue afterwards will further degrade the performance of the code review server.

Should the bulk of the known issues be served from a downloaded artifact and only the newest known issues be served as query of incremental data?

@La0 @marco-c

The text was updated successfully, but these errors were encountered:

Archaeopteryx · 2024-07-23T10:05:46Z

Time to create a ticket for a new issue varies between 0.5 and 5 seconds per ticket.

La0 · 2024-09-05T13:45:48Z

The bot code only iterate on all issues path and query the list_repo_issues endpoint.

We could look into performance and even if the whole output is needed (the bot only consume hashes).

Or even build a new endpoint that directly check if a hash for a specific path+repo is known: it may be way faster to query in DB

La0 · 2024-09-09T08:54:45Z

I just started a manual backup on heroku so we can test locally for performance issues.

La0 · 2024-09-10T11:37:57Z

I was able to restore the backup, and test API queries. The list issue endpoint is indeed super-slow (taking several seconds per hit...)

I noticed a few immediate issues:

no index on Revision.head_changeset & Issue.path which are used to filter the endpoint
we only need to serialize issue id & hash (so we only need to load these in the queryset)
the main slow query is joining twice on IssueLink just because of the multiple .filter ORM calls: by aggregating all filters into a dict, then calling once .filter, the ORM becomes smarter and only make a single join

I used the following test code & payload, but you can also simply hit the following url

from datetime import datetime
import json

from code_review_bot.backend import BackendAPI

from code_review_bot import taskcluster

taskcluster.secrets = {
    "backend": {
        "url": "http://localhost:8000",
        "username": "bot",
        "password": "Teklia12345",

    }
}

current_date = datetime.now().strftime("%Y-%m-%d")
api = BackendAPI()

with open("payload.json") as f:
    payload = json.load(f)


for path in payload["paths"]:
    print(path)
    out = api.list_repo_issues(
        "mozilla-central", date=current_date, revision_changeset=payload['revision_changeset'], path=path
    )
    print(out)

payload.json

marco-c mentioned this issue Jul 24, 2024

[meta] Code Review Bot development 2024 by Teklia #2273

Open

marco-c added the teklia-2024 Issue for Teklia work in 2024 label Aug 2, 2024

La0 self-assigned this Sep 5, 2024

La0 assigned vrigal Sep 10, 2024

La0 mentioned this issue Sep 19, 2024

Improve performance of issues listing API endpoint #2376

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

query for existing issues might fail silently and a new issue created for every issue detected by the task #2270

query for existing issues might fail silently and a new issue created for every issue detected by the task #2270

Archaeopteryx commented Jul 23, 2024

Archaeopteryx commented Jul 23, 2024

La0 commented Sep 5, 2024

La0 commented Sep 9, 2024

La0 commented Sep 10, 2024

query for existing issues might fail silently and a new issue created for every issue detected by the task #2270

query for existing issues might fail silently and a new issue created for every issue detected by the task #2270

Comments

Archaeopteryx commented Jul 23, 2024

Archaeopteryx commented Jul 23, 2024

La0 commented Sep 5, 2024

La0 commented Sep 9, 2024

La0 commented Sep 10, 2024