- phys_to_virt() - Translates a physical address to a virtual one.
- virt_to_phys() - Translates a virtual address to a physical one.
- virt_to_page() - Retrieves the struct page that describes the page containing the specified virtual address.
- page_to_pfn() - Converts a struct page to its corresponding Page Frame Number (PFN.)
- page_to_phys() - Retrieve the physical address of the page described by the specified struct page.
- pfn_to_page() - Converts a Page Frame Number (PFN) to its corresponding struct page.
- __va() - Translates a physical address to a virtual one.
- __pa() - Translates a virtual address to a physical one.
- page_address() - Returns the virtual address of a physical struct page.
- pmd_to_page() - Returns the struct page associated with the PMD page the specified PMD entry belongs to.
- virt_addr_valid() - Determines if a specified virtual address is a valid kernel virtual address.
- pfn_valid() - Determines if a specified Page Frame Number (PFN) represents a valid physical address.
- pte_same() - Determines whether the two specified PTE entries refer to the same physical page and flags.
- flush_tlb() - Flushes the current struct mm_struct's TLB entries.
- flush_tlb_all() - Flushes all processes TLB entries.
- flush_tlb_mm() - Flushes the specified struct mm_struct's TLB entries.
- flush_tlb_page() - Flushes a single struct vm_area_struct-specified page's TLB entry.
- flush_tlb_range() - Flushes a range of struct vm_area_struct-specified addresses TLB entries.
- flush_tlb_mm_range() - Flushes a range of struct mm_struct's TLB entries.
- flush_tlb_kernel_range() - Flushes a range of kernel pages.
- flush_tlb_others() - Flushes a range of struct mm_struct's TLB entries on other CPUs.
void *phys_to_virt(phys_addr_t address)
phys_to_virt() translates physical address address
to a
kernel-mapped virtual one.
Wrapper around __va().
address
- Physical address to be translated.
Kernel-mapped virtual address.
phys_addr_t virt_to_phys(volatile void *address)
virt_to_phys() translate kernel-mapped virtual address address
to a physical one.
Wrapper around __pa().
address
- Kernel-mapped virtual address to be translated.
Physical address associated with specified virtual address.
void *__va(phys_addr_t address)
__va() does the heavy lifting for phys_to_virt()
, converting a
physical memory address to a kernel-valid virtual one.
This function simply adds the kernel memory offset PAGE_OFFSET
,
0xffff880000000000
for x86-64. Looking at the x86-64 memory map,
you can see this simply provides a virtual address that is part of the kernel's
complete direct mapping of all physical memory between 0xffff880000000000
and 0xffffc7ffffffffff
.
The function performs no checks and assumes the supplied physical memory address is valid and in the 64TiB of allowed memory in current x86-64 linux kernel implementations.
NOTE: Macro, inferring function signature.
phys_addr_t __pa(volatile void *address)
__pa() does the heavy lifting for virt_to_phys()
, converting a
kernel virtual memory address to a physical one. The function is a wrapper
around __phys_addr().
The function isn't quite as simple as __va() as it has to determine
whether the supplied virtual address is part of the kernel's direct mapping of
all of the physical memory between 0xffff880000000000
and
0xffffc7ffffffffff
, or whether it's part of the kernel's 'text' from
__START_KERNEL_map
on (0xffffffff80000000
for x86-64), and
offsets the supplied virtual address accordingly.
In the case of the address originating from the kernel's 'text', the physical
offset of the kernel, phys_base
is taken into account, which allows for the
kernel to be loaded in a different physical location e.g. when
kdumping.
NOTE: Macro, inferring function signature.
void *page_address(const struct page *page)
page_address() determines the virtual address of the specified physical struct page.
In x86-64 the implementation is straightforward and provided by lowmem_page_address() (as we have no high memory to worry about.) We simply obtain the PFN of the page via page_to_pfn(), translate it to a physical page via PFN_PHYS() and return the kernel-mapped virtual address via __va().
page
- The physical page whose virtual address we desire.
The virtual address mapped to the specified physical struct page.
struct page *pmd_to_page(pmd_t *pmd)
pmd_to_page() returns the struct page describing the
physical page the PMD page the PMD entry pmd
resides in.
The function works by masking out the offset of the entry within the page using
the mask ~(PTRS_PER_PMD * sizeof(pmd_t) - 1)
which ultimately page-aligns the
virtual address.
Note that we're dealing with the address of the entry, utterly ignoring the entry's contents, so the virtual address refers to the PMD page rather than what the PMD entry is pointing at.
Finally, the function returns the struct page associated with the PMD page via virt_to_page().
The function is used by the split page table lock functionality in the kernel.
pmd
- The PMD entry whose PMD page we want to obtain the struct page to.
The struct page describing the PMD page the specified PMD entry belongs to.
struct page *virt_to_page(unsigned long kaddr)
virt_to_page() determines the physical address of the specified kernel virtual address, then the Page Frame Number (PFN) of the physical page that contains it, and finally passes this to pfn_to_page() to retrieve the struct page which describes the physical page.
Important: As per the code comment above the function, the returned pointer
is valid if and only if virt_addr_valid(kaddr) returns
true
.
NOTE: Macro, inferring function signature.
kaddr
- The virtual kernel address whose struct page we desire.
The struct page describing the physical page the specified virtual address resides in.
unsigned long page_to_pfn(struct page *page)
page_to_pfn() returns the Page Frame Number (PFN) that is associated with the specified struct page.
The PFN of a physical address is simply the (masked) address's value shifted
right by the number of bits of the page size, so in a standard x86-64
configuration, 12 bits (equivalent to the default 4KiB page size), and pfn = masked_phys_addr >> 12
.
How the PFN is determined varies depending on the memory model, in x86-64 UMA
this is __page_to_pfn() under CONFIG_SPARSEMEM_VMEMMAP
- the
memory map is virtually contiguous at vmemmap
, (0xffffea0000000000
, see
x86-64 memory map.)
This makes the implementation of the function straightforward - simply subtract
vmemmap
from the page pointer (being careful with typing to have pointer
arithmetic take into account sizeof(struct page)
for you.)
NOTE: Macro, inferring function signature.
page
- The struct page whose corresponding Page Frame Number (PFN) is desired.
The PFN of the specified struct page.
dma_addr_t page_to_phys(struct page *page)
page_to_phys() returns a physical address for the page described by the specified struct page.
Oddly it seems to return a dma_addr_t, possibly due to use for
device I/O (it is declared in arch/x86/include/asm/io.h
), however in x86-64 it
makes no difference as dma_addr_t
and phys_addr_t are the
same - unsigned long (64-bit.)
NOTE: Macro, inferring function signature.
page
- The struct page whose physical start address we desire.
The physical address of the start of the page which the specified struct page describes.
struct page *pfn_to_page(unsigned long pfn)
pfn_to_page() returns the struct page that is associated with the specified Page Frame Number (PFN.)
The PFN of a physical address is simply the (masked) address's value shifted
right by the number of bits of the page size, so in a standard x86-64
configuration, 12 bits (equivalent to the default 4KiB page size), and pfn = masked_phys_addr >> 12
.
How the struct page is located varies depending on the memory model, in
x86-64 UMA this is __pfn_to_page() under
CONFIG_SPARSEMEM_VMEMMAP
- the memory map is virtually contiguous at
vmemmap
, (0xffffea0000000000
, see x86-64 memory map.)
This makes the implementation of the function straightforward - simply offset
the PFN by vmemmap
(being careful with typing to have pointer arithmetic take
into account sizeof(struct page)
for you.)
NOTE: Macro, inferring function signature.
pfn
- The Page Frame Number (PFN) whose corresponding struct page is desired.
The struct page that describes the physical page with specified PFN.
bool virt_addr_valid(unsigned long kaddr)
virt_addr_valid() determines if the specified virtual address
kaddr
is actually a valid, non-vmalloc'd kernel address.
The function is a wrapper for __virt_addr_valid(), which, once its checked the virtual address is in a valid range, checks it has a valid corresponding physical PFN via pfn_valid().
NOTE: Macro, inferring function signature.
kaddr
- Virtual address which we want to determine is a valid non-vmalloc'd kernel address or not.
true
if the address is valid, false
if not.
int pfn_valid(unsigned long pfn)
pfn_valid() determine whether the specified Page Frame Number (PFN) is valid, i.e. in x86-64 whether it refers to a valid 46-bit address, and whether there is actually physical memory mapped to that physical location.
NOTE: Macro, inferring function signature.
pfn
- PFN whose validity we wish to determine.
Truthy (non-zero) if the PFN is valid, 0 if not.
int pte_same(pte_t a, pte_t b)
pte_same() determines whether the two specified PTE entries refer to the same physical page AND share the same flags.
On x86-64 it's as simple as a.pte == b.pte
.
-
a
- The first PTE entry whose physical page address and flags we want to compare. -
b
- The second PTE entry whose physical page address and flags we want to compare.
1 if the PTE entries' physical address and flags are the same, 0 if not.
void flush_tlb(void)
flush_tlb() is a wrapper around flush_tlb_current_task() which flushes the current struct mm_struct TLB mappings.
flush_tlb_current_task() checks whether any other CPUs use the current struct mm_struct, and if so invokes flush_tlb_others() to flush the TLB entries for those CPUs too.
Ultimately the flushing is performed by local_flush_tlb()
which is a wrapper around __flush_tlb() which itself wraps
__native_flush_tlb() which flushes the CPU's TLB by simply
reading and writing back the contents of the cr3
register.
NOTE: Macro, inferring function signature.
N/A
N/A
void flush_tlb_all(void)
flush_tlb_all() flushes the TLB entries for all processes on all CPUs.
Note that this function causes mappings that have _PAGE_GLOBAL
to be evicted
also, using the invpcid
instruction on modern CPUs (via
__flush_tlb_all(), __flush_tlb_global()
and subsequently invpcid_flush_all().)
N/A
N/A
void flush_tlb_mm(struct mm_struct *mm)
flush_tlb_mm() is simply a wrapper around
flush_tlb_mm_range() specifying that the whole memory
address space mm
describes is to be flushed, with no flags specified.
See the description of flush_tlb_mm_range()
below for a description of the
implementation of the flush.
NOTE: Macro, inferring function signature.
mm
- The struct mm_struct whose TLB entries we want flushed.
N/A
void flush_tlb_page(struct vm_area_struct *vma, unsigned long start)
flush_tlb_page() flushes a single page's TLB mapping at the specified address.
If struct vm_area_struct specified by vma
does not refer
to the active struct mm_struct, then the operation is a no-op on
this CPU, as there's nothing to flush.
If the current process is a kernel thread (i.e. current->mm == NULL
), which
indicates the current CPU is in a lazy TLB mode, we invoke
leave_mm() to switch out the struct mm_struct we are
'borrowing' and we're done.
Otherwise, ultimately the invlpg
instruction is invoked (ignoring the
paravirtualised case where a hypervisor function is called directly) via
__flush_tlb_one(),
__flush_tlb_single(), and
__native_flush_tlb_single().
Intel's documentation on the invlpg
instruction indicates that the specified
address does not need to be page aligned, and in the case of pages larger than
4KiB in size with multiple TLBs for that page, all will be safely
flushed. Additionally, under certain circumstances more or even all TLB entries
may be flushed, however this is presumably unlikely.
-
vma
- The struct vm_area_struct which contains the struct mm_struct which in turn describes the page whose TLB entry we want to flush. -
start
- A virtual address contained in the page we want to flush.
N/A
void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
unsigned long end)
flush_tlb_range() is a wrapper around
flush_tlb_mm_range(), it uses the
struct mm_struct belonging to the specified
struct vm_area_struct vma
, i.e. vma->vm_mm
, as well as the
vma's flags, i.e. vma->vm_flags
and simply passes these on:
#define flush_tlb_range(vma, start, end) \
flush_tlb_mm_range(vma->vm_mm, start, end, vma->vm_flags)
See flush_tlb_mm_range() below for more details as to how the range flush is achieved.
NOTE: Macro, inferring function signature.
-
vma
- The struct vm_area_struct which contains the range of addresses we wish to TLB flush. -
start
- The start of the range of virtual addresses we want to TLB flush. -
end
- The exclusive upper bound of the range of virtual addresses we wish to TLB flush.
N/A
void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
unsigned long end, unsigned long vmflag)
flush_tlb_mm_range() causes a TLB flush to occur in
the virtual address range specified (note that end
is an exclusive bound.)
If the current process's struct task_struct's active_mm
field
doesn't match the specified mm
argument, there's nothing to do so we exit.
If the current process is a kernel thread (i.e. current->mm == NULL
), which
indicates the current CPU is in a lazy TLB mode, we invoke
leave_mm() to switch out the struct mm_struct we are
'borrowing' and we're done.
If the vmflag
flag field has its VM_HUGETLB
bit set, i.e. the page range
includes huge pages, or the range is specified to be a full flush range
(i.e. end == TLB_FLUSH_ALL
), then a full flush is performed.
Otherwise, the number of pages to flush is compared to the
tlb_single_page_flush_ceiling variable, which
is set at 33
by default and tunable via
/sys/kernel/debug/x86/tlb_single_page_flush_ceiling
. If this value is set to
0, page-granularity flushes are not performed at all.
If the number of pages to flush exceeds this value, then a full flush is performed instead. The Documentation/x86/tlb.txt documentation goes into more detail as to the trade off between a full flush and individual flushes, and the code comment above this value explains:
/*
* See Documentation/x86/tlb.txt for details. We choose 33
* because it is large enough to cover the vast majority (at
* least 95%) of allocations, and is small enough that we are
* confident it will not cause too much overhead. Each single
* flush is about 100 ns, so this caps the maximum overhead at
* _about_ 3,000 ns.
*
* This is in units of pages.
*/
If any of the conditions for a full flush are met, local_flush_tlb() is called.
Otherwise, each page is flushed via __flush_tlb_single(),
and ultimately the x86 invlpg
instruction is invoked to perform the page-level
flushing, unless paravirtualisation (e.g. xen) is in place in which case
a hypervisor function is called directly.
If the struct mm_struct is in use by other CPUs, flush_tlb_others() is invoked to perform a TLB flush for those CPUs also.
-
mm
- The struct mm_struct which contains the range of addresses we wish to TLB flush. -
start
- The start of the range of virtual addresses we want to TLB flush. -
end
- The exclusive upper bound of the range of virtual addresses we wish to TLB flush. -
vmflag
- The struct vm_area_struct flags associated with this region of memory, used only to determine ifVM_HUGETLB
is set.
N/A
void flush_tlb_kernel_range(unsigned long start, unsigned long end)
flush_tlb_kernel_range() TLB flushes the specified range of kernel virtual addresses.
If the end
argument is set to TLB_FLUSH_ALL
or the specified address range
exceeds the tlb_single_page_flush_ceiling
variable (see description of flush_tlb_mm_range()
above for more details on
this), a global flush is performed on each CPU via
__flush_tlb_all(). This is necessary as kernel mappings are
marked _PAGE_GLOBAL
.
Otherwise, individual pages are flushed one-by-one via __flush_tlb_single().
-
start
- The start of the range of virtual kernel addresses we want to TLB flush. -
end
- The exclusive upper bound of the range of virtual kernel addresses we wish to TLB flush.
N/A
void flush_tlb_others(const struct cpumask *cpumask,
struct mm_struct *mm, unsigned long start,
unsigned long end)
flush_tlb_others() flushes the TLB for CPUs other than the one invoking the call. It's used by the other TLB flush functions to ensure that TLB flushes are performed across all pertinent CPUs (i.e. each CPU which reference a struct mm_struct or in the case of full flushes, all CPUs.)
Other CPUs are made to performing a flush via Inter-Processor Interrupts (IPIs) using smp_call_function_many() each of which invoke flush_tlb_func().
flush_tlb_func() operates as follows:
If the current->active_mm
struct mm_struct is not the same as the
one requested to be flushed, the function exits. Additionally, if the CPU TLB
state is set to lazy TLB, leave_mm() is invoked to switch out the
struct mm_struct we are 'borrowing' and we're done.
Otherwise, it is checked whether a full flush is requested, if so this is performed via local_flush_tlb(). If a full flush is not requested, individual pages are evicted via __flush_tlb_single(). Interestingly, no check against tlb_single_page_flush_ceiling is performed, presumably because flush_tlb_others() seems only to be used by other manual TLB flushing functions and presumably they would have already checked for this.
flush_tlb_func() has some egregious timing concerns because of the invocation of an IPI and the possibility of a struct mm_struct being switched out from under the function via switch_mm(), as described by its comment:
/*
* The flush IPI assumes that a thread switch happens in this order:
* [cpu0: the cpu that switches]
* 1) switch_mm() either 1a) or 1b)
* 1a) thread switch to a different mm
* 1a1) set cpu_tlbstate to TLBSTATE_OK
* Now the tlb flush NMI handler flush_tlb_func won't call leave_mm
* if cpu0 was in lazy tlb mode.
* 1a2) update cpu active_mm
* Now cpu0 accepts tlb flushes for the new mm.
* 1a3) cpu_set(cpu, new_mm->cpu_vm_mask);
* Now the other cpus will send tlb flush ipis.
* 1a4) change cr3.
* 1a5) cpu_clear(cpu, old_mm->cpu_vm_mask);
* Stop ipi delivery for the old mm. This is not synchronized with
* the other cpus, but flush_tlb_func ignore flush ipis for the wrong
* mm, and in the worst case we perform a superfluous tlb flush.
* 1b) thread switch without mm change
* cpu active_mm is correct, cpu0 already handles flush ipis.
* 1b1) set cpu_tlbstate to TLBSTATE_OK
* 1b2) test_and_set the cpu bit in cpu_vm_mask.
* Atomically set the bit [other cpus will start sending flush ipis],
* and test the bit.
* 1b3) if the bit was 0: leave_mm was called, flush the tlb.
* 2) switch %%esp, ie current
*
* The interrupt must handle 2 special cases:
* - cr3 is changed before %%esp, ie. it cannot use current->{active_,}mm.
* - the cpu performs speculative tlb reads, i.e. even if the cpu only
* runs in kernel space, the cpu could load tlb entries for user space
* pages.
*
* The good news is that cpu_tlbstate is local to each cpu, no
* write/read ordering problems.
*/
/*
* TLB flush funcation:
* 1) Flush the tlb entries if the cpu uses the mm that's being flushed.
* 2) Leave the mm if we are in the lazy tlb mode.
*/
NOTE: Macro, inferring function signature.
-
cpumask
- The mask representing the CPUs which are to be TLB flushed. -
mm
- The struct mm_struct which contains the range of addresses we wish to TLB flush. -
start
- The start of the range of virtual addresses we want to TLB flush. -
end
- The exclusive upper bound of the range of virtual addresses we wish to TLB flush.
N/A