[bugfix][6.6]Sync 2 patches about folio_test_lru() from upstream #507

wojiaohanliyang · 2024-11-28T11:15:43Z

No description provided.

commit 33dfe9204f29b415bbc0abb1a50642d1ba94f5e9 upstream. If a large number of CMA memory are configured in system (for example, the CMA memory accounts for 50% of the system memory), starting a virtual virtual machine with device passthrough, it will call pin_user_pages_remote(..., FOLL_LONGTERM, ...) to pin memory. Normally if a page is present and in CMA area, pin_user_pages_remote() will migrate the page from CMA area to non-CMA area because of FOLL_LONGTERM flag. But the current code will cause the migration failure due to unexpected page refcounts, and eventually cause the virtual machine fail to start. If a page is added in LRU batch, its refcount increases one, remove the page from LRU batch decreases one. Page migration requires the page is not referenced by others except page mapping. Before migrating a page, we should try to drain the page from LRU batch in case the page is in it, however, folio_test_lru() is not sufficient to tell whether the page is in LRU batch or not, if the page is in LRU batch, the migration will fail. To solve the problem above, we modify the logic of adding to LRU batch. Before adding a page to LRU batch, we clear the LRU flag of the page so that we can check whether the page is in LRU batch by folio_test_lru(page). It's quite valuable, because likely we don't want to blindly drain the LRU batch simply because there is some unexpected reference on a page, as described above. This change makes the LRU flag of a page invisible for longer, which may impact some programs. For example, as long as a page is on a LRU batch, we cannot isolate it, and we cannot check if it's an LRU page. Further, a page can now only be on exactly one LRU batch. This doesn't seem to matter much, because a new page is allocated from buddy and added to the lru batch, or be isolated, it's LRU flag may also be invisible for a long time. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 9a4e9f3 ("mm: update get_user_pages_longterm to migrate pages allocated from CMA region") Signed-off-by: yangge <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Baolin Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Barry Song <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: hanliyang <[email protected]>

commit 0885ef4705607936fc36a38fd74356e1c465b023 upstream. I found a regression on mm-unstable during my swap stress test, using tmpfs to compile linux. The test OOM very soon after the make spawns many cc processes. It bisects down to this change: 33dfe9204f29b415bbc0abb1a50642d1ba94f5e9 (mm/gup: clear the LRU flag of a page before adding to LRU batch) Yu Zhao propose the fix: "I think this is one of the potential side effects -- Huge mentioned earlier about isolate_lru_folios():" I test that with it the swap stress test no longer OOM. Link: https://lore.kernel.org/r/CAOUHufYi9h0kz5uW3LHHS3ZrVwEq-kKp8S6N-MZUmErNAXoXmw@mail.gmail.com/ Link: https://lkml.kernel.org/r/[email protected] Fixes: 33dfe9204f29 ("mm/gup: clear the LRU flag of a page before adding to LRU batch") Signed-off-by: Chris Li <[email protected]> Suggested-by: Yu Zhao <[email protected]> Suggested-by: Hugh Dickins <[email protected]> Closes: https://lore.kernel.org/all/CAF8kJuNP5iTj2p07QgHSGOJsiUfYpJ2f4R1Q5-3BN9JiD9W_KA@mail.gmail.com/ Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: hanliyang <[email protected]>

deepin-ci-robot · 2024-11-28T11:16:15Z

Hi @wojiaohanliyang. Thanks for your PR.

I'm waiting for a deepin-community member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

deepin-ci-robot · 2024-11-28T11:16:29Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign matrix-wsk for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

deepin/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

yangge and others added 2 commits November 28, 2024 15:33

deepin-ci-robot added the needs-ok-to-test label Nov 28, 2024

deepin-ci-robot requested a review from winnscode November 28, 2024 11:16

Avenger-285714 merged commit 0c448d8 into deepin-community:linux-6.6.y Nov 28, 2024
3 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bugfix][6.6]Sync 2 patches about folio_test_lru() from upstream #507

[bugfix][6.6]Sync 2 patches about folio_test_lru() from upstream #507

wojiaohanliyang commented Nov 28, 2024

deepin-ci-robot commented Nov 28, 2024

deepin-ci-robot commented Nov 28, 2024

[bugfix][6.6]Sync 2 patches about folio_test_lru() from upstream #507

[bugfix][6.6]Sync 2 patches about folio_test_lru() from upstream #507

Conversation

wojiaohanliyang commented Nov 28, 2024

deepin-ci-robot commented Nov 28, 2024

deepin-ci-robot commented Nov 28, 2024