Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

contest: subcases: handle retry cases #43

Merged
merged 1 commit into from
Nov 3, 2024

Conversation

matttbe
Copy link
Member

@matttbe matttbe commented Oct 30, 2024

In case of failure, some tests are restarted. That's great, but when parsing the subcases (nested tests), the previous results were overridden. Also, when displaying the results, the 'retry' field from the main test was re-used in each subcases: quite confusing for the reader.

Now, in case of retry, a new dictionary under the 'results-retry' key is stored in the 'json_full' entry in the DB. This dictionary uses the subcase titles as keys, and the results as values. Thanks to that, when querying the subcases results, this dictionary is used to properly set the 'retry' field for each subcase.

Important note: I'm sorry, I didn't set up a DB to do the tests locally. I checked the code twice, and I hope I didn't break anything :)

@matttbe matttbe requested a review from kuba-moo October 30, 2024 17:06
@kuba-moo
Copy link
Contributor

Thanks a lot for taking care of this, and sorry for deficiencies when it comes to testing this stuff :) My best idea so far is what I have done in crash.py - add some unit tests so that at least we can exercise that the functions do the right thing. But it may be hard here, absolutely not a requirement.

In terms of the implementation -- would it be possible to combine the retry and the results together? Instead of adding a key at the .results[] level, add retry within .results[].results[]? You already seem to collect the retry results as a dict, so just walk the "main" table of subcases and give each other a: if case["test"] in retry_dict: case["retry"] = retry_dict[case["test"]] ? That way we avoid having to make changes to any other component, like query.

I updated the "schema" on the wiki: https://github.com/linux-netdev/nipa/wiki/Netdev-CI-system#results but I can't link to the exact row..

contest/remote/vmksft-p.py Outdated Show resolved Hide resolved
@matttbe
Copy link
Member Author

matttbe commented Oct 31, 2024

My best idea so far is what I have done in crash.py - add some unit tests so that at least we can exercise that the functions do the right thing. But it may be hard here, absolutely not a requirement.

Ah yes, I should have done that when adding the parsing of the subtests. But here, that was the DB that needed new tests I think. We could also have tests inserting dummy results in a DB and checking the expected results. But well, I think we would need time to do all this :)

In terms of the implementation -- would it be possible to combine the retry and the results together? Instead of adding a key at the .results[] level, add retry within .results[].results[]? You already seem to collect the retry results as a dict, so just walk the "main" table of subcases and give each other a: if case["test"] in retry_dict: case["retry"] = retry_dict[case["test"]] ? That way we avoid having to make changes to any other component, like query.

Ah yes, thank you, I don't know why I had in mind that I should not touch the previous subcases results. I gave this a try, and simplify the code to handle the retry in the parsing of the subtests directly.

I updated the "schema" on the wiki: https://github.com/linux-netdev/nipa/wiki/Netdev-CI-system#results but I can't link to the exact row..

Great, thank you! That reflects what I did here.

Copy link
Contributor

@kuba-moo kuba-moo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nits, otherwise LGTM!

contest/remote/vmksft-p.py Outdated Show resolved Hide resolved
contest/remote/vmksft-p.py Outdated Show resolved Hide resolved
In case of failure, some tests are restarted. That's great, but when
parsing the subcases (nested tests), the previous results were
overridden. Also, when displaying the results, the 'retry' field from
the main test was re-used in each subcases: quite confusing for the
reader.

Now, in case of retry, the previous 'results' list will be modified to
add a new 'retry' entry for each test that has been re-validated. If the
previous test with the same name cannot be found, that could be because
there was major issue before and some subcases have not been executed
(or the names are not fixed). In this case, a new entry with a skipped
first attempt will be added to the list. When querying the subcases
results, the new 'retry' entry will be used, or none if it is not there.

Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
@kuba-moo kuba-moo merged commit c5eb5d6 into linux-netdev:main Nov 3, 2024
1 check passed
@kuba-moo
Copy link
Contributor

kuba-moo commented Nov 3, 2024

Thank you! Deploying now, will be used by runners starting with net-next-2024-11-03--18-00

@matttbe matttbe deleted the subcases-retry branch November 4, 2024 09:14
@matttbe
Copy link
Member Author

matttbe commented Nov 4, 2024

Thank you! Deploying now, will be used by runners starting with net-next-2024-11-03--18-00

Thanks! No retry so far, but I will keep monitoring!

Something strange I just noticed is that if I look at previous results, I can still see some subcases with a retry key e.g. from https://netdev.bots.linux.dev/query/results?branch-name=net-next-2024-10-29--21-00&format=l2

      {
        "test": "mptcp-join-sh.mptcp-join-no-JOIN",
        "group": "selftests-net-mptcp",
        "result": "pass",
        "link": "https://netdev-3.bots.linux.dev/vmksft-mptcp-dbg/results/837261/1-mptcp-join-sh",
        "retry": "pass",
        "time": 17
      },

That should no longer be the case with:

# in case of retry, the subtest might not have been re-executed
if "retry" in data:
del data["retry"]

Or maybe this has no effect on previous results, because there are some caches being used here?

@kuba-moo
Copy link
Contributor

kuba-moo commented Nov 5, 2024

My bad, the tester services automatically restart themselves when they see the repo is updated. But the query service is flask and I forgot to restart it. Should be good now?

@matttbe
Copy link
Member Author

matttbe commented Nov 5, 2024

No problem!

My bad, the tester services automatically restart themselves when they see the repo is updated. But the query service is flask and I forgot to restart it. Should be good now?

No problem!

When I look at some previous results, I can still see the same "retry": "pass" for subcases which at that time should not have any retry (I guess the retry comes from the parent test):

      {
        "test": "mptcp-join-sh.mptcp-join-no-JOIN",
        "group": "selftests-net-mptcp",
        "result": "pass",
        "link": "https://netdev-3.bots.linux.dev/vmksft-mptcp-dbg/results/837261/1-mptcp-join-sh",
        "retry": "pass",
        "time": 17
      },

But I'm sure that's fine: new results should have the right thing set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants