Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bpftool sync 2024-03-26 #138

Merged
merged 17 commits into from
Mar 26, 2024

Commits on Mar 26, 2024

  1. mirror: Add u16 definition to types.h

    We'll have a value cast as a u16 in src/kernel/bpf/disasm.c in a future
    commit. Add the type definition to the relevant header.
    
    Signed-off-by: Quentin Monnet <[email protected]>
    qmonnet committed Mar 26, 2024
    Configuration menu
    Copy the full SHA
    933ac5c View commit details
    Browse the repository at this point in the history
  2. sync: Update libbpf submodule

    Pull latest libbpf from mirror.
    Libbpf version: 1.4.0
    Libbpf commit:  20ea95b4505c477af3b6ff6ce9d19cee868ddc5d
    
    Signed-off-by: Quentin Monnet <[email protected]>
    qmonnet committed Mar 26, 2024
    Configuration menu
    Copy the full SHA
    2d219fb View commit details
    Browse the repository at this point in the history
  3. bpftool: rename is_internal_mmapable_map into is_mmapable_map

    It's not restricted to working with "internal" maps, it cares about any
    map that can be mmap'ed. Reflect that in more succinct and generic name.
    
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Signed-off-by: Alexei Starovoitov <[email protected]>
    Acked-by: Quentin Monnet <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Martin KaFai Lau <[email protected]>
    anakryiko authored and qmonnet committed Mar 26, 2024
    Configuration menu
    Copy the full SHA
    75beab8 View commit details
    Browse the repository at this point in the history
  4. bpf: Introduce bpf_arena.

    Introduce bpf_arena, which is a sparse shared memory region between the bpf
    program and user space.
    
    Use cases:
    1. User space mmap-s bpf_arena and uses it as a traditional mmap-ed
       anonymous region, like memcached or any key/value storage. The bpf
       program implements an in-kernel accelerator. XDP prog can search for
       a key in bpf_arena and return a value without going to user space.
    2. The bpf program builds arbitrary data structures in bpf_arena (hash
       tables, rb-trees, sparse arrays), while user space consumes it.
    3. bpf_arena is a "heap" of memory from the bpf program's point of view.
       The user space may mmap it, but bpf program will not convert pointers
       to user base at run-time to improve bpf program speed.
    
    Initially, the kernel vm_area and user vma are not populated. User space
    can fault in pages within the range. While servicing a page fault,
    bpf_arena logic will insert a new page into the kernel and user vmas. The
    bpf program can allocate pages from that region via
    bpf_arena_alloc_pages(). This kernel function will insert pages into the
    kernel vm_area. The subsequent fault-in from user space will populate that
    page into the user vma. The BPF_F_SEGV_ON_FAULT flag at arena creation time
    can be used to prevent fault-in from user space. In such a case, if a page
    is not allocated by the bpf program and not present in the kernel vm_area,
    the user process will segfault. This is useful for use cases 2 and 3 above.
    
    bpf_arena_alloc_pages() is similar to user space mmap(). It allocates pages
    either at a specific address within the arena or allocates a range with the
    maple tree. bpf_arena_free_pages() is analogous to munmap(), which frees
    pages and removes the range from the kernel vm_area and from user process
    vmas.
    
    bpf_arena can be used as a bpf program "heap" of up to 4GB. The speed of
    bpf program is more important than ease of sharing with user space. This is
    use case 3. In such a case, the BPF_F_NO_USER_CONV flag is recommended.
    It will tell the verifier to treat the rX = bpf_arena_cast_user(rY)
    instruction as a 32-bit move wX = wY, which will improve bpf prog
    performance. Otherwise, bpf_arena_cast_user is translated by JIT to
    conditionally add the upper 32 bits of user vm_start (if the pointer is not
    NULL) to arena pointers before they are stored into memory. This way, user
    space sees them as valid 64-bit pointers.
    
    Diff llvm/llvm-project#84410 enables LLVM BPF
    backend generate the bpf_addr_space_cast() instruction to cast pointers
    between address_space(1) which is reserved for bpf_arena pointers and
    default address space zero. All arena pointers in a bpf program written in
    C language are tagged as __attribute__((address_space(1))). Hence, clang
    provides helpful diagnostics when pointers cross address space. Libbpf and
    the kernel support only address_space == 1. All other address space
    identifiers are reserved.
    
    rX = bpf_addr_space_cast(rY, /* dst_as */ 1, /* src_as */ 0) tells the
    verifier that rX->type = PTR_TO_ARENA. Any further operations on
    PTR_TO_ARENA register have to be in the 32-bit domain. The verifier will
    mark load/store through PTR_TO_ARENA with PROBE_MEM32. JIT will generate
    them as kern_vm_start + 32bit_addr memory accesses. The behavior is similar
    to copy_from_kernel_nofault() except that no address checks are necessary.
    The address is guaranteed to be in the 4GB range. If the page is not
    present, the destination register is zeroed on read, and the operation is
    ignored on write.
    
    rX = bpf_addr_space_cast(rY, 0, 1) tells the verifier that rX->type =
    unknown scalar. If arena->map_flags has BPF_F_NO_USER_CONV set, then the
    verifier converts such cast instructions to mov32. Otherwise, JIT will emit
    native code equivalent to:
    rX = (u32)rY;
    if (rY)
      rX |= clear_lo32_bits(arena->user_vm_start); /* replace hi32 bits in rX */
    
    After such conversion, the pointer becomes a valid user pointer within
    bpf_arena range. The user process can access data structures created in
    bpf_arena without any additional computations. For example, a linked list
    built by a bpf program can be walked natively by user space.
    
    Signed-off-by: Alexei Starovoitov <[email protected]>
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Reviewed-by: Barret Rhoden <[email protected]>
    Link: https://lore.kernel.org/bpf/[email protected]
    Alexei Starovoitov authored and qmonnet committed Mar 26, 2024
    Configuration menu
    Copy the full SHA
    6aa4b2e View commit details
    Browse the repository at this point in the history
  5. bpf: Disasm support for addr_space_cast instruction.

    LLVM generates rX = addr_space_cast(rY, dst_addr_space, src_addr_space)
    instruction when pointers in non-zero address space are used by the bpf
    program. Recognize this insn in uapi and in bpf disassembler.
    
    Signed-off-by: Alexei Starovoitov <[email protected]>
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Acked-by: Kumar Kartikeya Dwivedi <[email protected]>
    Link: https://lore.kernel.org/bpf/[email protected]
    Alexei Starovoitov authored and qmonnet committed Mar 26, 2024
    Configuration menu
    Copy the full SHA
    8940e67 View commit details
    Browse the repository at this point in the history
  6. bpftool: Recognize arena map type

    Teach bpftool to recognize arena map type.
    
    Signed-off-by: Alexei Starovoitov <[email protected]>
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Acked-by: Quentin Monnet <[email protected]>
    Link: https://lore.kernel.org/bpf/[email protected]
    Alexei Starovoitov authored and qmonnet committed Mar 26, 2024
    Configuration menu
    Copy the full SHA
    3964895 View commit details
    Browse the repository at this point in the history
  7. libbpf: Recognize __arena global variables.

    LLVM automatically places __arena variables into ".arena.1" ELF section.
    In order to use such global variables bpf program must include definition
    of arena map in ".maps" section, like:
    struct {
           __uint(type, BPF_MAP_TYPE_ARENA);
           __uint(map_flags, BPF_F_MMAPABLE);
           __uint(max_entries, 1000);         /* number of pages */
           __ulong(map_extra, 2ull << 44);    /* start of mmap() region */
    } arena SEC(".maps");
    
    libbpf recognizes both uses of arena and creates single `struct bpf_map *`
    instance in libbpf APIs.
    ".arena.1" ELF section data is used as initial data image, which is exposed
    through skeleton and bpf_map__initial_value() to the user, if they need to tune
    it before the load phase. During load phase, this initial image is copied over
    into mmap()'ed region corresponding to arena, and discarded.
    
    Few small checks here and there had to be added to make sure this
    approach works with bpf_map__initial_value(), mostly due to hard-coded
    assumption that map->mmaped is set up with mmap() syscall and should be
    munmap()'ed. For arena, .arena.1 can be (much) smaller than maximum
    arena size, so this smaller data size has to be tracked separately.
    Given it is enforced that there is only one arena for entire bpf_object
    instance, we just keep it in a separate field. This can be generalized
    if necessary later.
    
    All global variables from ".arena.1" section are accessible from user space
    via skel->arena->name_of_var.
    
    For bss/data/rodata the skeleton/libbpf perform the following sequence:
    1. addr = mmap(MAP_ANONYMOUS)
    2. user space optionally modifies global vars
    3. map_fd = bpf_create_map()
    4. bpf_update_map_elem(map_fd, addr) // to store values into the kernel
    5. mmap(addr, MAP_FIXED, map_fd)
    after step 5 user spaces see the values it wrote at step 2 at the same addresses
    
    arena doesn't support update_map_elem. Hence skeleton/libbpf do:
    1. addr = malloc(sizeof SEC ".arena.1")
    2. user space optionally modifies global vars
    3. map_fd = bpf_create_map(MAP_TYPE_ARENA)
    4. real_addr = mmap(map->map_extra, MAP_SHARED | MAP_FIXED, map_fd)
    5. memcpy(real_addr, addr) // this will fault-in and allocate pages
    
    At the end look and feel of global data vs __arena global data is the same from
    bpf prog pov.
    
    Another complication is:
    struct {
      __uint(type, BPF_MAP_TYPE_ARENA);
    } arena SEC(".maps");
    
    int __arena foo;
    int bar;
    
      ptr1 = &foo;   // relocation against ".arena.1" section
      ptr2 = &arena; // relocation against ".maps" section
      ptr3 = &bar;   // relocation against ".bss" section
    
    Fo the kernel ptr1 and ptr2 has point to the same arena's map_fd
    while ptr3 points to a different global array's map_fd.
    For the verifier:
    ptr1->type == unknown_scalar
    ptr2->type == const_ptr_to_map
    ptr3->type == ptr_to_map_value
    
    After verification, from JIT pov all 3 ptr-s are normal ld_imm64 insns.
    
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Signed-off-by: Alexei Starovoitov <[email protected]>
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Acked-by: Quentin Monnet <[email protected]>
    Link: https://lore.kernel.org/bpf/[email protected]
    anakryiko authored and qmonnet committed Mar 26, 2024
    Configuration menu
    Copy the full SHA
    f1f54c6 View commit details
    Browse the repository at this point in the history
  8. bpftool: Cast pointers for shadow types explicitly.

    According to a report, skeletons fail to assign shadow pointers when being
    compiled with C++ programs. Unlike C doing implicit casting for void
    pointers, C++ requires an explicit casting.
    
    To support C++, we do explicit casting for each shadow pointer.
    
    Also add struct_ops_module.skel.h to test_cpp to validate C++
    compilation as part of BPF selftests.
    
    Signed-off-by: Kui-Feng Lee <[email protected]>
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Acked-by: Yonghong Song <[email protected]>
    Acked-by: Quentin Monnet <[email protected]>
    Link: https://lore.kernel.org/bpf/[email protected]
    ThinkerYzu1 authored and qmonnet committed Mar 26, 2024
    Configuration menu
    Copy the full SHA
    2a37aa0 View commit details
    Browse the repository at this point in the history
  9. bpftool: Fix missing pids during link show

    Current 'bpftool link' command does not show pids, e.g.,
      $ tools/build/bpftool/bpftool link
      ...
      4: tracing  prog 23
            prog_type lsm  attach_type lsm_mac
            target_obj_id 1  target_btf_id 31320
    
    Hack the following change to enable normal libbpf debug output,
      --- a/tools/bpf/bpftool/pids.c
      +++ b/tools/bpf/bpftool/pids.c
      @@ -121,9 +121,9 @@ int build_obj_refs_table(struct hashmap **map, enum bpf_obj_type type)
              /* we don't want output polluted with libbpf errors if bpf_iter is not
               * supported
               */
      -       default_print = libbpf_set_print(libbpf_print_none);
      +       /* default_print = libbpf_set_print(libbpf_print_none); */
              err = pid_iter_bpf__load(skel);
      -       libbpf_set_print(default_print);
      +       /* libbpf_set_print(default_print); */
    
    Rerun the above bpftool command:
      $ tools/build/bpftool/bpftool link
      libbpf: prog 'iter': BPF program load failed: Permission denied
      libbpf: prog 'iter': -- BEGIN PROG LOAD LOG --
      0: R1=ctx() R10=fp0
      ; struct task_struct *task = ctx->task; @ pid_iter.bpf.c:69
      0: (79) r6 = *(u64 *)(r1 +8)          ; R1=ctx() R6_w=ptr_or_null_task_struct(id=1)
      ; struct file *file = ctx->file; @ pid_iter.bpf.c:68
      ...
      ; struct bpf_link *link = (struct bpf_link *) file->private_data; @ pid_iter.bpf.c:103
      80: (79) r3 = *(u64 *)(r8 +432)       ; R3_w=scalar() R8=ptr_file()
      ; if (link->type == bpf_core_enum_value(enum bpf_link_type___local, @ pid_iter.bpf.c:105
      81: (61) r1 = *(u32 *)(r3 +12)
      R3 invalid mem access 'scalar'
      processed 39 insns (limit 1000000) max_states_per_insn 0 total_states 3 peak_states 3 mark_read 2
      -- END PROG LOAD LOG --
      libbpf: prog 'iter': failed to load: -13
      ...
    
    The 'file->private_data' returns a 'void' type and this caused subsequent 'link->type'
    (insn libbpf#81) failed in verification.
    
    To fix the issue, restore the previous BPF_CORE_READ so old kernels can also work.
    With this patch, the 'bpftool link' runs successfully with 'pids'.
      $ tools/build/bpftool/bpftool link
      ...
      4: tracing  prog 23
            prog_type lsm  attach_type lsm_mac
            target_obj_id 1  target_btf_id 31320
            pids systemd(1)
    
    Fixes: 44ba7b30e84f ("bpftool: Use a local copy of BPF_LINK_TYPE_PERF_EVENT in pid_iter.bpf.c")
    Signed-off-by: Yonghong Song <[email protected]>
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Tested-by: Quentin Monnet <[email protected]>
    Reviewed-by: Quentin Monnet <[email protected]>
    Link: https://lore.kernel.org/bpf/[email protected]
    Yonghong Song authored and qmonnet committed Mar 26, 2024
    Configuration menu
    Copy the full SHA
    7752997 View commit details
    Browse the repository at this point in the history
  10. bpf: support BPF cookie in raw tracepoint (raw_tp, tp_btf) programs

    Wire up BPF cookie for raw tracepoint programs (both BTF and non-BTF
    aware variants). This brings them up to part w.r.t. BPF cookie usage
    with classic tracepoint and fentry/fexit programs.
    
    Acked-by: Stanislav Fomichev <[email protected]>
    Acked-by: Eduard Zingerman <[email protected]>
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Message-ID: <[email protected]>
    Signed-off-by: Alexei Starovoitov <[email protected]>
    anakryiko authored and qmonnet committed Mar 26, 2024
    Configuration menu
    Copy the full SHA
    7ca158b View commit details
    Browse the repository at this point in the history
  11. bpftool: Enable libbpf logs when loading pid_iter in debug mode

    When trying to load the pid_iter BPF program used to iterate over the
    PIDs of the processes holding file descriptors to BPF links, we would
    unconditionally silence libbpf in order to keep the output clean if the
    kernel does not support iterators and loading fails.
    
    Although this is the desirable behaviour in most cases, this may hide
    bugs in the pid_iter program that prevent it from loading, and it makes
    it hard to debug such load failures, even in "debug" mode. Instead, it
    makes more sense to print libbpf's logs when we pass the -d|--debug flag
    to bpftool, so that users get the logs to investigate failures without
    having to edit bpftool's source code.
    
    Signed-off-by: Quentin Monnet <[email protected]>
    Message-ID: <[email protected]>
    Signed-off-by: Alexei Starovoitov <[email protected]>
    qmonnet committed Mar 26, 2024
    Configuration menu
    Copy the full SHA
    6c7fc19 View commit details
    Browse the repository at this point in the history
  12. bpftool: Remove unnecessary source files from bootstrap version

    Commit d510296d331a ("bpftool: Use syscall/loader program in "prog load"
    and "gen skeleton" command.") added new files to the list of objects to
    compile in order to build the bootstrap version of bpftool. As far as I
    can tell, these objects are unnecessary and were added by mistake; maybe
    a draft version intended to add support for loading loader programs from
    the bootstrap version. Anyway, we can remove these object files from the
    list to make the bootstrap bpftool binary a tad smaller and faster to
    build.
    
    Fixes: d510296d331a ("bpftool: Use syscall/loader program in "prog load" and "gen skeleton" command.")
    Signed-off-by: Quentin Monnet <[email protected]>
    Message-ID: <[email protected]>
    Signed-off-by: Alexei Starovoitov <[email protected]>
    qmonnet committed Mar 26, 2024
    Configuration menu
    Copy the full SHA
    bea9bb8 View commit details
    Browse the repository at this point in the history
  13. bpftool: Clean up HOST_CFLAGS, HOST_LDFLAGS for bootstrap bpftool

    Bpftool's Makefile uses $(HOST_CFLAGS) to build the bootstrap version of
    bpftool, in order to pick the flags for the host (where we run the
    bootstrap version) and not for the target system (where we plan to run
    the full bpftool binary). But we pass too much information through this
    variable.
    
    In particular, we set HOST_CFLAGS by copying most of the $(CFLAGS); but
    we do this after the feature detection for bpftool, which means that
    $(CFLAGS), hence $(HOST_CFLAGS), contain all macro definitions for using
    the different optional features. For example, -DHAVE_LLVM_SUPPORT may be
    passed to the $(HOST_CFLAGS), even though the LLVM disassembler is not
    used in the bootstrap version, and the related library may even be
    missing for the host architecture.
    
    A similar thing happens with the $(LDFLAGS), that we use unchanged for
    linking the bootstrap version even though they may contains flags to
    link against additional libraries.
    
    To address the $(HOST_CFLAGS) issue, we move the definition of
    $(HOST_CFLAGS) earlier in the Makefile, before the $(CFLAGS) update
    resulting from the feature probing - none of which being relevant to the
    bootstrap version. To clean up the $(LDFLAGS) for the bootstrap version,
    we introduce a dedicated $(HOST_LDFLAGS) variable that we base on
    $(LDFLAGS), before the feature probing as well.
    
    On my setup, the following macro and libraries are removed from the
    compiler invocation to build bpftool after this patch:
    
      -DUSE_LIBCAP
      -DHAVE_LLVM_SUPPORT
      -I/usr/lib/llvm-17/include
      -D_GNU_SOURCE
      -D__STDC_CONSTANT_MACROS
      -D__STDC_FORMAT_MACROS
      -D__STDC_LIMIT_MACROS
      -lLLVM-17
      -L/usr/lib/llvm-17/lib
    
    Another advantage of cleaning up these flags is that displaying
    available features with "bpftool version" becomes more accurate for the
    bootstrap bpftool, and no longer reflects the features detected (and
    available only) for the final binary.
    
    Cc: Jean-Philippe Brucker <[email protected]>
    Signed-off-by: Quentin Monnet <[email protected]>
    Acked-by: Jiri Olsa <[email protected]>
    Message-ID: <[email protected]>
    Signed-off-by: Alexei Starovoitov <[email protected]>
    qmonnet committed Mar 26, 2024
    Configuration menu
    Copy the full SHA
    d6b80c6 View commit details
    Browse the repository at this point in the history
  14. bpf: Sync uapi bpf.h to tools directory

    There is a difference between kernel uapi bpf.h and tools
    uapi bpf.h. There is no functionality difference, but let
    us sync properly to make it easy for later bpf.h update.
    
    Signed-off-by: Yonghong Song <[email protected]>
    Signed-off-by: Daniel Borkmann <[email protected]>
    Link: https://lore.kernel.org/bpf/[email protected]
    Yonghong Song authored and qmonnet committed Mar 26, 2024
    Configuration menu
    Copy the full SHA
    5a86201 View commit details
    Browse the repository at this point in the history
  15. libbpf, selftests/bpf: Adjust libbpf, bpftool, selftests to match LLVM

    The selftests use
    to tell LLVM about special pointers. For LLVM there is nothing "arena"
    about them. They are simply pointers in a different address space.
    Hence LLVM diff llvm/llvm-project#85161 renamed:
    . macro __BPF_FEATURE_ARENA_CAST -> __BPF_FEATURE_ADDR_SPACE_CAST
    . global variables in __attribute__((address_space(N))) are now
      placed in section named ".addr_space.N" instead of ".arena.N".
    
    Adjust libbpf, bpftool, and selftests to match LLVM.
    
    Signed-off-by: Alexei Starovoitov <[email protected]>
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Acked-by: Stanislav Fomichev <[email protected]>
    Link: https://lore.kernel.org/bpf/[email protected]
    Alexei Starovoitov authored and qmonnet committed Mar 26, 2024
    Configuration menu
    Copy the full SHA
    a2282a6 View commit details
    Browse the repository at this point in the history
  16. sync: Pull latest bpftool changes from kernel

    Syncing latest bpftool commits from kernel repository.
    Baseline bpf-next commit:   e63985ecd22681c7f5975f2e8637187a326b6791
    Checkpoint bpf-next commit: 14bb1e8c8d4ad5d9d2febb7d19c70a3cf536e1e5
    Baseline bpf commit:        2487007aa3b9fafbd2cb14068f49791ce1d7ede5
    Checkpoint bpf commit:      443574b033876c85a35de4c65c14f7fe092222b2
    
    Alexei Starovoitov (4):
      bpf: Introduce bpf_arena.
      bpf: Disasm support for addr_space_cast instruction.
      bpftool: Recognize arena map type
      libbpf, selftests/bpf: Adjust libbpf, bpftool, selftests to match LLVM
    
    Andrii Nakryiko (3):
      bpftool: rename is_internal_mmapable_map into is_mmapable_map
      libbpf: Recognize __arena global variables.
      bpf: support BPF cookie in raw tracepoint (raw_tp, tp_btf) programs
    
    Kui-Feng Lee (1):
      bpftool: Cast pointers for shadow types explicitly.
    
    Quentin Monnet (3):
      bpftool: Enable libbpf logs when loading pid_iter in debug mode
      bpftool: Remove unnecessary source files from bootstrap version
      bpftool: Clean up HOST_CFLAGS, HOST_LDFLAGS for bootstrap bpftool
    
    Yonghong Song (2):
      bpftool: Fix missing pids during link show
      bpf: Sync uapi bpf.h to tools directory
    
     docs/bpftool-map.rst        |  2 +-
     include/uapi/linux/bpf.h    | 20 ++++++++++++++++++--
     src/Makefile                | 14 ++++++--------
     src/gen.c                   | 34 ++++++++++++++++++++++++----------
     src/kernel/bpf/disasm.c     | 10 ++++++++++
     src/map.c                   |  2 +-
     src/pids.c                  | 19 ++++++++++++-------
     src/skeleton/pid_iter.bpf.c |  4 ++--
     8 files changed, 74 insertions(+), 31 deletions(-)
    
    Signed-off-by: Quentin Monnet <[email protected]>
    qmonnet committed Mar 26, 2024
    Configuration menu
    Copy the full SHA
    2e7a423 View commit details
    Browse the repository at this point in the history
  17. mirror: Update expected diff with kernel sources

    A recent patch has touched some portions of bpftool's Makefile that
    differ between kernel's and mirror's sources. Let's update the diff with
    the expected differences accordingly, to smoothen future sync ups.
    
    Signed-off-by: Quentin Monnet <[email protected]>
    qmonnet committed Mar 26, 2024
    Configuration menu
    Copy the full SHA
    6c10403 View commit details
    Browse the repository at this point in the history