Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory usage - Suspected cleanup not done per testcase #569

Open
gridhead opened this issue Sep 21, 2024 · 11 comments
Open

High memory usage - Suspected cleanup not done per testcase #569

gridhead opened this issue Sep 21, 2024 · 11 comments

Comments

@gridhead
Copy link

For testing our application, Loadouts for Genshin Impact, we see a very high memory usage of up to 4.0 GiB even when the actual run of the application takes somewhere in the neighbourhood of 200 MiB of memory. This starts from a meagre 100 MiB and with the run of each testcase associated with pytest-qt, it adds around 100 MiB of memory which doesn't get offloaded until the end of the entire test run. We currently have around 842 tests out of which around half are associated with pytest-qt. Ideally, if the cleanups were done per testcase, we expect the run to stay well under 500 MiB of memory usage.

Here's what the conftest module looks like where our fixture is located.

import pytest

from gi_loadouts.face.wind.main import MainWindow


@pytest.fixture
def runner(qtbot):
    testwind = MainWindow()
    qtbot.addWidget(testwind)
    return testwind
@nicoddemus
Copy link
Member

nicoddemus commented Sep 21, 2024

Hi @gridhead,

Each widget added via qtbot.addWidget gets cleaned up after each test, if there are widgets being created and not added to it (you only need to add the top-most widget, child widgets will get deleted automatically).

At work we have many test suites with over 3000ks+ tests, with many of the tests (more than 50%) use pytest-qt, and we do not see abnormal memory usage, so I'm fairly certain it is not a bug on pytest-qt but something somewhere in your test suite and/or application.

@gridhead
Copy link
Author

Ah, I do not claim it to be - apologies if it sounded that way.

I could use your assistance in understanding how I can lower it. From what it seems from my fixture named runner, it should have taken care of the MainWindow() instance but for some reason, it does not seem to be. I also tried explicitly removing the testwind object but I decided against it as it is anti-pattern to how qtbot operates and it did not yield any better results.

@nicoddemus
Copy link
Member

Ah, I do not claim it to be - apologies if it sounded that way.

It did to me but now worries -- not offended at all, just pointing that out because chasing that direction would not be useful.

I can show where this is handled in pytest-qt in case it helps:

We call close_widgets at each test teardown:

def pytest_runtest_teardown(item):
"""
Hook called after each test tear down, to process any pending events and
avoiding leaking events to the next test. Also, if exceptions have
been captured during fixtures teardown, fail the test.
"""
_process_events()
_close_widgets(item)

_close_widgets explicitly calls close() and deleteLater() in all widgets being tracked in the qt_widgets attribute of the pytest.Item:

def _close_widgets(item):
"""
Close all widgets registered in the pytest item.
"""
widgets = getattr(item, "qt_widgets", None)
if widgets:
for w, before_close_func in item.qt_widgets:
w = w()
if w is not None:
if before_close_func is not None:
before_close_func(w)
w.close()
w.deleteLater()
del item.qt_widgets

Which we track when qtbot.addWidget is called:

def _add_widget(item, widget, *, before_close_func=None):
"""
Register a widget into the given pytest item for later closing.
"""
qt_widgets = getattr(item, "qt_widgets", [])
qt_widgets.append((weakref.ref(widget), before_close_func))
item.qt_widgets = qt_widgets

My guess is that MainWindow is initializing some global state each time it is being instantiated, and not freeing it properly when closed... another possibility is that close() is not actually closing the MainWindow, given that it is possible to overwrite closeEvent and ignore the event.

@gridhead
Copy link
Author

The snippets help us understand the deeper workings of pytest-qt - Thanks! ❤️

We have tried some fixes but it has mostly been shotgun debugging as we were very confused as to why things were not working as intended.

Try 1 - We tried to remove the deconstructor of the MainWindow class (ref. https://github.com/gridhead/gi-loadouts/blob/7ed8ff2ad47f85c2375a973cf1feeaf24cafed94/gi_loadouts/face/wind/main.py#L19-L20) because we thought that it might be messing with the way qtbot handles context for an object. We have the deconstructor to ensure that we can remove the temporary files that we create at the start of the application.

Try 2 - We tried to move the fixture named runner (ref. https://github.com/gridhead/gi-loadouts/blob/7ed8ff2ad47f85c2375a973cf1feeaf24cafed94/test/conftest.py#L6-L10) to each of the test modules because we suspected that maybe the context is retained across multiple test runs. That ended up not being the case as our tests were as predictable as they would run on a fresh instance of the application.

Try 3 - We tried to modify the fixture named runner (ref. https://github.com/gridhead/gi-loadouts/blob/7ed8ff2ad47f85c2375a973cf1feeaf24cafed94/test/conftest.py#L6-L10) to include destruction of the objects by converting the fixture to a generator and explicitly calling for close(), deleteLater() and del, the example of which you can find below.

@pytest.fixture
def runner(qtbot):
    testwind = MainWindow()
    qtbot.addWidget(testwind)
    yield testwind
    testwind.close()
    testwind.deleteLater()
    del testwind

Try 4 - We tried commenting out the codebase of creation of temporary data (to check if that was the cause of this problem) and the static resources (ref. https://github.com/gridhead/gi-loadouts/tree/test/gi_loadouts/face/rsrc) converted from QResource files (to confirm if those lingered on in the memory after the context) but either they were imported anyway or this action is unrelated to the actual problem.

Try 5 - We tried to observe the memory consumption per module using tracemalloc (ref. https://docs.python.org/3/library/tracemalloc.html) in our tests. However, the most consumption we got to see according to tracemalloc was in the order of magnitude of B and MB while the actual memory consumption was in MB - so we were not sure if this is workable information.

Top 10 memory consumers (Arranged in descending order)
<frozen importlib._bootstrap_external>:753: size=380 KiB (+380 KiB), count=3175 (+3175), average=123 B
/home/altruism/Projects/gi-loadouts/gi_loadouts/face/wind/wind.py:30: size=141 KiB (+141 KiB), count=3 (+3), average=46.9 KiB
/home/altruism/Projects/gi-loadouts/venv/lib/python3.12/site-packages/pytestqt/qtbot.py:697: size=24.9 KiB (+24.9 KiB), count=316 (+316), average=81 B
/usr/lib/python3.12/linecache.py:139: size=22.9 KiB (+22.9 KiB), count=218 (+218), average=108 B
/usr/lib/python3.12/unittest/mock.py:2152: size=18.0 KiB (+18.0 KiB), count=308 (+308), average=60 B
/usr/lib/python3.12/enum.py:596: size=14.9 KiB (+14.9 KiB), count=41 (+41), average=372 B
/home/altruism/Projects/gi-loadouts/gi_loadouts/face/wind/wind.py:1436: size=8136 B (+8136 B), count=31 (+31), average=262 B
<frozen importlib._bootstrap>:488: size=6774 B (+6774 B), count=91 (+91), average=74 B
/usr/lib/python3.12/unittest/mock.py:434: size=5466 B (+5466 B), count=19 (+19), average=288 B
/usr/lib/python3.12/enum.py:269: size=5000 B (+5000 B), count=34 (+34), average=147 B

Try 6 - We tried renaming the fixture named runner (ref. ref. https://github.com/gridhead/gi-loadouts/blob/7ed8ff2ad47f85c2375a973cf1feeaf24cafed94/test/conftest.py#L6-L10) because we suspected that we might be shadowing an internal identifier with the same name that is expected to live as long as the entire test run. That, of course, was a shot in the dark.

Observation - We did observe that the __del__ method or the deconstructor of the MainWindow (ref. https://github.com/gridhead/gi-loadouts/blob/7ed8ff2ad47f85c2375a973cf1feeaf24cafed94/gi_loadouts/face/wind/main.py#L19-L20) was not being called as evidenced by the cache file staying back in the temporary directory. While the following contextualization of MainWindow class in the subsequent testcases did ensure that the make_temp_file method (ref. https://github.com/gridhead/gi-loadouts/blob/7ed8ff2ad47f85c2375a973cf1feeaf24cafed94/gi_loadouts/face/rsrc/__init__.py#L15-L47) ran and that included a part where we also cleanup residual data from the previous executions that did not exit cleanly - there was no cleanup done after the last testcase finished. This meant that the deconstructor which had the cleanup instruction did not run.

@gridhead
Copy link
Author

I did some more digging in by trying to run the tests on Windows and I quickly discovered that the MainWindow instances were not removed after the tests were done with them. Take a look at the following screenshots.

Task manager

Residue

Task switcher

Untitled

The qtbot fixture has been properly made use of but even then such a problem is faced.

@gridhead
Copy link
Author

Similar is the case for Fedora Workstation 40.

I have temporarily disabled Xvfb to see things running as they are and I nearly wished I did not.

{8287109B-7150-4961-AC8A-0AE2F0254C39}

@nicoddemus
Copy link
Member

nicoddemus commented Sep 25, 2024

One thing you can do as sanity check is to edit your local pytest-qt copy and add an assert False here:

w.close()
w.deleteLater()

This should blow up and demonstrate that close is being called, and the MainWindow is not closing for some reason.

We did observe that the del method or the deconstructor of the MainWindow (ref. https://github.com/gridhead/gi-loadouts/blob/7ed8ff2ad47f85c2375a973cf1feeaf24cafed94/gi_loadouts/face/wind/main.py#L19-L20) was not being called as evidenced by the cache file staying back in the temporary directory.

Also note that having a __del__ method might prevent some objects from being properly collected by the garbage collector, in case there are references to MainWindow already in place.

Also note that closeEvent might cause close() to do nothing.

@gridhead
Copy link
Author

I gave it a try and you're right, the close and deleteLater are being called. We got rid of the __del__ method previously too and we tried doing it again for the heck of it but that did not result in any difference. I have scanned the entire code to check if there are any places where I am overriding the closeEvent behaviour but even that does not seem to be the case.

I am considering grouping my tests in classes and restricting the context of the fixture but I have limited confidence on whether this would end up working (like https://stackoverflow.com/a/62176555). Would you know if testing QApplication instead of QMainWindow could lead to differences in terms of the memory utilization and the lifecycle of the object of the MainWindow class?

N.B. A friend of mine @sdglitched suggested that maybe the close method is not working because we have a QDialogBox with a modality instance active which is not dismissed. While that might be true in normal interaction, I suspected qtbot would be able to close things regardless of whether the said widget is active or not. Could you confirm the same to be the case?

@nicoddemus
Copy link
Member

Would you know if testing QApplication instead of QMainWindow could lead to differences in terms of the memory utilization and the lifecycle of the object of the MainWindow class?

I doubt it, our test suites at work use QMainWindow classes without problems.

Could you confirm the same to be the case?

Perhaps... ideally modal dialogs should be mocked in tests given they block the test flow, see https://pytest-qt.readthedocs.io/en/latest/note_dialogs.html.

@gridhead
Copy link
Author

The grouping of tests with classes did not work, as I previously thought. Interestingly enough, the memory leak does not show up (or at least in a distinctive manner) in the actual instance of the running application, mostly because there is only one instance of the MainWindow class. Honestly, I am at my wit's end - I am probably looking at the wrong aspect of the application code to fix because I seem to have tried everything I could.

@gridhead
Copy link
Author

I was reading more into memory leaks and why some objects would outlive their utility, and the two things that stuck out to me are one, where signals/slots are not properly disconnected and the other, where the usage of lambda functions is ill-advised.

Take this snippet, for example.

    def initialize_events(self) -> None:
        """
        Initialize events on the user interface

        :return:
        """
        self.head_scan.clicked.connect(self.show_output_window)
        self.head_char_elem.currentTextChanged.connect(self.handle_elem_data)
        self.head_char_name.currentTextChanged.connect(self.handle_char_data)
        self.head_char_levl.currentTextChanged.connect(self.handle_char_data)
        self.head_char_name.currentTextChanged.connect(self.format_weapon_by_char_change)
        self.weap_area_type.currentTextChanged.connect(self.convey_weapon_type_change)
        self.weap_area_name.currentTextChanged.connect(self.convey_weapon_name_change)
        self.weap_area_levl.currentTextChanged.connect(self.convey_weapon_levl_change)
        self.weap_area_refn.currentTextChanged.connect(self.convey_refinement_change)
        for item in [
            (self.arti_fwol_levl, self.arti_fwol_type, self.arti_fwol_rare, self.arti_fwol_name_main, self.arti_fwol_data_main, "fwol", self.arti_fwol_type_name, self.arti_fwol_head_area, self.arti_fwol_head_icon),
            (self.arti_pmod_levl, self.arti_pmod_type, self.arti_pmod_rare, self.arti_pmod_name_main, self.arti_pmod_data_main, "pmod", self.arti_pmod_type_name, self.arti_pmod_head_area, self.arti_pmod_head_icon),
            (self.arti_sdoe_levl, self.arti_sdoe_type, self.arti_sdoe_rare, self.arti_sdoe_name_main, self.arti_sdoe_data_main, "sdoe", self.arti_sdoe_type_name, self.arti_sdoe_head_area, self.arti_sdoe_head_icon),
            (self.arti_gboe_levl, self.arti_gboe_type, self.arti_gboe_rare, self.arti_gboe_name_main, self.arti_gboe_data_main, "gboe", self.arti_gboe_type_name, self.arti_gboe_head_area, self.arti_gboe_head_icon),
            (self.arti_ccol_levl, self.arti_ccol_type, self.arti_ccol_rare, self.arti_ccol_name_main, self.arti_ccol_data_main, "ccol", self.arti_ccol_type_name, self.arti_ccol_head_area, self.arti_ccol_head_icon),
        ]:
            item[1].currentTextChanged.connect(lambda _, a_type=item[1], a_rare=item[2], a_name=item[6], a_back=item[8], a_id=item[5]: self.change_rarity_backdrop_by_changing_type(a_type, a_rare, a_name, a_back, a_id))
            item[1].currentTextChanged.connect(lambda _, a_type=item[1], a_id=item[5]: self.change_artifact_team_by_changing_type(a_type, a_id))
            item[1].currentTextChanged.connect(lambda _, a_type=item[1], a_id=item[5]: self.remove_artifact(a_type, a_id))
            item[0].currentTextChanged.connect(lambda _, a_levl=item[0], a_type=item[1], a_rare=item[2], a_name=item[3], a_data=item[4], a_id=item[5]: self.change_data_by_changing_level_or_stat(a_levl, a_type, a_rare, a_name, a_data, a_id))
            item[3].currentTextChanged.connect(lambda _, a_levl=item[0], a_type=item[1], a_rare=item[2], a_name=item[3], a_data=item[4], a_id=item[5]: self.change_data_by_changing_level_or_stat(a_levl, a_type, a_rare, a_name, a_data, a_id))
            item[2].currentTextChanged.connect(lambda _, a_rare=item[2], a_name=item[3], a_id=item[5]: self.change_artifact_substats_by_changing_rarity_or_mainstat(a_rare, a_name, a_id))
            item[3].currentTextChanged.connect(lambda _, a_rare=item[2], a_name=item[3], a_id=item[5]: self.change_artifact_substats_by_changing_rarity_or_mainstat(a_rare, a_name, a_id))
            item[2].currentTextChanged.connect(lambda _, a_levl=item[0], a_back=item[7], a_rare=item[2]: self.change_levels_backdrop_by_changing_rarity(a_levl, a_back, a_rare))
        for part in ["fwol", "pmod", "sdoe", "gboe", "ccol"]:
            for alfa in ["a", "b", "c", "d"]:
                drop, text = getattr(self, f"arti_{part}_name_{alfa}"), getattr(self, f"arti_{part}_data_{alfa}")
                drop.currentTextChanged.connect(lambda _, a_drop=drop, a_text=text: self.render_lineedit_readonly_when_none(a_drop, a_text))
                text.textChanged.connect(self.validate_lineedit_userdata)
        for part in ["fwol", "pmod", "sdoe", "gboe", "ccol"]:
            getattr(self, f"arti_{part}_scan").clicked.connect(lambda _, a_part=part: self.show_scan_dialog(a_part))
            getattr(self, f"arti_{part}_load").clicked.connect(lambda _, a_part=part: self.arti_load(a_part))
            getattr(self, f"arti_{part}_save").clicked.connect(lambda _, a_part=part: self.arti_save(a_part))
            getattr(self, f"arti_{part}_wipe").clicked.connect(lambda _, a_part=part: self.wipe_artifact(a_part))
        self.head_load.clicked.connect(self.team_load)
        self.head_save.clicked.connect(self.team_save)
        self.head_wipe.clicked.connect(self.wipe_team)
        self.weap_head_load.clicked.connect(self.weap_load)
        self.weap_head_save.clicked.connect(self.weap_save)
        self.char_head_lumi.clicked.connect(lambda _, a_char=CharName.lumine: self.select_char_from_dropdown(a_char))
        self.char_head_aeth.clicked.connect(lambda _, a_char=CharName.aether: self.select_char_from_dropdown(a_char))
        self.side_head.clicked.connect(lambda _, a_link=__homepage__: self.open_link(a_link))
        self.side_tckt.clicked.connect(lambda _, a_link=__issutckt__: self.open_link(a_link))
        self.side_cash.clicked.connect(lambda _, a_link=__donation__: self.open_link(a_link))
        self.side_info.clicked.connect(self.show_info_dialog)
        self.side_lcns.clicked.connect(self.show_lcns_dialog)

I thought that qtbot.deleteLater() under normal circumstances should take care of decontextualizing the elements associated with the object of the MainWindow type and that in turn should disconnect the signals and slots but let me know if this is incorrect. If this holds, I will start working on the teardown function for the test that disconnects the signals and slots that would run per testcase by making the runner fixture a generator and cleaning up right after.

Also, please feel free to let me know if this is the wrong direction for pursuing the memory persistence problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants