Skip to content

Commit

Permalink
mm/thp: Split huge pmds/puds if they're pinned when fork()
Browse files Browse the repository at this point in the history
Pinned pages shouldn't be write-protected when fork() happens, because
follow up copy-on-write on these pages could cause the pinned pages to
be replaced by random newly allocated pages.

For huge PMDs, we split the huge pmd if pinning is detected.  So that
future handling will be done by the PTE level (with our latest changes,
each of the small pages will be copied).  We can achieve this by let
copy_huge_pmd() return -EAGAIN for pinned pages, so that we'll
fallthrough in copy_pmd_range() and finally land the next
copy_pte_range() call.

Huge PUDs will be even more special - so far it does not support
anonymous pages.  But it can actually be done the same as the huge PMDs
even if the split huge PUDs means to erase the PUD entries.  It'll
guarantee the follow up fault ins will remap the same pages in either
parent/child later.

This might not be the most efficient way, but it should be easy and
clean enough.  It should be fine, since we're tackling with a very rare
case just to make sure userspaces that pinned some thps will still work
even without MADV_DONTFORK and after they fork()ed.

Signed-off-by: Peter Xu <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
  • Loading branch information
xzpeter authored and torvalds committed Sep 27, 2020
1 parent 70e806e commit d042035
Showing 1 changed file with 28 additions and 0 deletions.
28 changes: 28 additions & 0 deletions mm/huge_memory.c
Original file line number Diff line number Diff line change
Expand Up @@ -1074,6 +1074,24 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,

src_page = pmd_page(pmd);
VM_BUG_ON_PAGE(!PageHead(src_page), src_page);

/*
* If this page is a potentially pinned page, split and retry the fault
* with smaller page size. Normally this should not happen because the
* userspace should use MADV_DONTFORK upon pinned regions. This is a
* best effort that the pinned pages won't be replaced by another
* random page during the coming copy-on-write.
*/
if (unlikely(is_cow_mapping(vma->vm_flags) &&
atomic_read(&src_mm->has_pinned) &&
page_maybe_dma_pinned(src_page))) {
pte_free(dst_mm, pgtable);
spin_unlock(src_ptl);
spin_unlock(dst_ptl);
__split_huge_pmd(vma, src_pmd, addr, false, NULL);
return -EAGAIN;
}

get_page(src_page);
page_dup_rmap(src_page, true);
add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR);
Expand Down Expand Up @@ -1177,6 +1195,16 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm,
/* No huge zero pud yet */
}

/* Please refer to comments in copy_huge_pmd() */
if (unlikely(is_cow_mapping(vma->vm_flags) &&
atomic_read(&src_mm->has_pinned) &&
page_maybe_dma_pinned(pud_page(pud)))) {
spin_unlock(src_ptl);
spin_unlock(dst_ptl);
__split_huge_pud(vma, src_pud, addr);
return -EAGAIN;
}

pudp_set_wrprotect(src_mm, addr, src_pud);
pud = pud_mkold(pud_wrprotect(pud));
set_pud_at(dst_mm, addr, dst_pud, pud);
Expand Down

0 comments on commit d042035

Please sign in to comment.