Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bpftool sync 2023-08-29 #113

Merged
merged 29 commits into from
Aug 29, 2023

Commits on Jun 22, 2023

  1. asm-generic: Unify uapi bitsperlong.h for arm64, riscv and loongarch

    Now we specify the minimal version of GCC as 5.1 and Clang/LLVM as 11.0.0
    in Documentation/process/changes.rst, __CHAR_BIT__ and __SIZEOF_LONG__ are
    usable, it is probably fine to unify the definition of __BITS_PER_LONG as
    (__CHAR_BIT__ * __SIZEOF_LONG__) in asm-generic uapi bitsperlong.h.
    
    In order to keep safe and avoid regression, only unify uapi bitsperlong.h
    for some archs such as arm64, riscv and loongarch which are using newer
    toolchains that have the definitions of __CHAR_BIT__ and __SIZEOF_LONG__.
    
    Suggested-by: Xi Ruoyao <[email protected]>
    Link: https://lore.kernel.org/all/[email protected]/
    Suggested-by: Arnd Bergmann <[email protected]>
    Link: https://lore.kernel.org/linux-arch/[email protected]/
    Signed-off-by: Tiezhu Yang <[email protected]>
    Signed-off-by: Arnd Bergmann <[email protected]>
    Tiezhu Yang authored and qmonnet committed Jun 22, 2023
    Configuration menu
    Copy the full SHA
    938e3a6 View commit details
    Browse the repository at this point in the history

Commits on Jun 28, 2023

  1. bpf: Add kernel/bpftool asm support for new instructions

    Add asm support for new instructions so kernel verifier and bpftool
    xlated insn dumps can have proper asm syntax for new instructions.
    
    Acked-by: Eduard Zingerman <[email protected]>
    Acked-by: Quentin Monnet <[email protected]>
    Signed-off-by: Yonghong Song <[email protected]>
    Signed-off-by: Alexei Starovoitov <[email protected]>
    Yonghong Song authored and qmonnet committed Jun 28, 2023
    Configuration menu
    Copy the full SHA
    a33fafc View commit details
    Browse the repository at this point in the history

Commits on Jul 9, 2023

  1. bpf: Support ->fill_link_info for kprobe_multi

    With the addition of support for fill_link_info to the kprobe_multi link,
    users will gain the ability to inspect it conveniently using the
    `bpftool link show`. This enhancement provides valuable information to the
    user, including the count of probed functions and their respective
    addresses. It's important to note that if the kptr_restrict setting is not
    permitted, the probed address will not be exposed, ensuring security.
    
    Signed-off-by: Yafang Shao <[email protected]>
    Acked-by: Jiri Olsa <[email protected]>
    Acked-by: Andrii Nakryiko <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    laoar authored and qmonnet committed Jul 9, 2023
    Configuration menu
    Copy the full SHA
    ae4a112 View commit details
    Browse the repository at this point in the history
  2. bpftool: Dump the kernel symbol's module name

    If the kernel symbol is in a module, we will dump the module name as
    well. The square brackets around the module name are trimmed.
    
    Signed-off-by: Yafang Shao <[email protected]>
    Reviewed-by: Quentin Monnet <[email protected]>
    Acked-by: Jiri Olsa <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    laoar authored and qmonnet committed Jul 9, 2023
    Configuration menu
    Copy the full SHA
    da69013 View commit details
    Browse the repository at this point in the history
  3. bpftool: Show kprobe_multi link info

    Show the already expose kprobe_multi link info in bpftool. The result as
    follows,
    
    $ tools/bpf/bpftool/bpftool link show
    91: kprobe_multi  prog 244
            kprobe.multi  func_cnt 7
            addr             func [module]
            ffffffff98c44f20 schedule_timeout_interruptible
            ffffffff98c44f60 schedule_timeout_killable
            ffffffff98c44fa0 schedule_timeout_uninterruptible
            ffffffff98c44fe0 schedule_timeout_idle
            ffffffffc075b8d0 xfs_trans_get_efd [xfs]
            ffffffffc0768a10 xfs_trans_get_buf_map [xfs]
            ffffffffc076c320 xfs_trans_get_dqtrx [xfs]
            pids kprobe_multi(188367)
    92: kprobe_multi  prog 244
            kretprobe.multi  func_cnt 7
            addr             func [module]
            ffffffff98c44f20 schedule_timeout_interruptible
            ffffffff98c44f60 schedule_timeout_killable
            ffffffff98c44fa0 schedule_timeout_uninterruptible
            ffffffff98c44fe0 schedule_timeout_idle
            ffffffffc075b8d0 xfs_trans_get_efd [xfs]
            ffffffffc0768a10 xfs_trans_get_buf_map [xfs]
            ffffffffc076c320 xfs_trans_get_dqtrx [xfs]
            pids kprobe_multi(188367)
    
    $ tools/bpf/bpftool/bpftool link show -j
    [{"id":91,"type":"kprobe_multi","prog_id":244,"retprobe":false,"func_cnt":7,"funcs":[{"addr":18446744071977586464,"func":"schedule_timeout_interruptible","module":null},{"addr":18446744071977586528,"func":"schedule_timeout_killable","module":null},{"addr":18446744071977586592,"func":"schedule_timeout_uninterruptible","module":null},{"addr":18446744071977586656,"func":"schedule_timeout_idle","module":null},{"addr":18446744072643524816,"func":"xfs_trans_get_efd","module":"xfs"},{"addr":18446744072643578384,"func":"xfs_trans_get_buf_map","module":"xfs"},{"addr":18446744072643592992,"func":"xfs_trans_get_dqtrx","module":"xfs"}],"pids":[{"pid":188367,"comm":"kprobe_multi"}]},{"id":92,"type":"kprobe_multi","prog_id":244,"retprobe":true,"func_cnt":7,"funcs":[{"addr":18446744071977586464,"func":"schedule_timeout_interruptible","module":null},{"addr":18446744071977586528,"func":"schedule_timeout_killable","module":null},{"addr":18446744071977586592,"func":"schedule_timeout_uninterruptible","module":null},{"addr":18446744071977586656,"func":"schedule_timeout_idle","module":null},{"addr":18446744072643524816,"func":"xfs_trans_get_efd","module":"xfs"},{"addr":18446744072643578384,"func":"xfs_trans_get_buf_map","module":"xfs"},{"addr":18446744072643592992,"func":"xfs_trans_get_dqtrx","module":"xfs"}],"pids":[{"pid":188367,"comm":"kprobe_multi"}]}]
    
    When kptr_restrict is 2, the result is,
    
    $ tools/bpf/bpftool/bpftool link show
    91: kprobe_multi  prog 244
            kprobe.multi  func_cnt 7
    92: kprobe_multi  prog 244
            kretprobe.multi  func_cnt 7
    
    Signed-off-by: Yafang Shao <[email protected]>
    Reviewed-by: Quentin Monnet <[email protected]>
    Acked-by: Jiri Olsa <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    laoar authored and qmonnet committed Jul 9, 2023
    Configuration menu
    Copy the full SHA
    adb2fe1 View commit details
    Browse the repository at this point in the history
  4. bpf: Support ->fill_link_info for perf_event

    By introducing support for ->fill_link_info to the perf_event link, users
    gain the ability to inspect it using `bpftool link show`. While the current
    approach involves accessing this information via `bpftool perf show`,
    consolidating link information for all link types in one place offers
    greater convenience. Additionally, this patch extends support to the
    generic perf event, which is not currently accommodated by
    `bpftool perf show`. While only the perf type and config are exposed to
    userspace, other attributes such as sample_period and sample_freq are
    ignored. It's important to note that if kptr_restrict is not permitted, the
    probed address will not be exposed, maintaining security measures.
    
    A new enum bpf_perf_event_type is introduced to help the user understand
    which struct is relevant.
    
    Signed-off-by: Yafang Shao <[email protected]>
    Acked-by: Jiri Olsa <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    laoar authored and qmonnet committed Jul 9, 2023
    Configuration menu
    Copy the full SHA
    9626287 View commit details
    Browse the repository at this point in the history
  5. bpftool: Add perf event names

    Add new functions and macros to get perf event names. These names except
    the perf_type_name are all copied from
    tool/perf/util/{parse-events,evsel}.c, so that in the future we will
    have a good chance to use the same code.
    
    Suggested-by: Jiri Olsa <[email protected]>
    Signed-off-by: Yafang Shao <[email protected]>
    Reviewed-by: Quentin Monnet <[email protected]>
    Acked-by: Jiri Olsa <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    laoar authored and qmonnet committed Jul 9, 2023
    Configuration menu
    Copy the full SHA
    ca14739 View commit details
    Browse the repository at this point in the history
  6. bpftool: Show perf link info

    Enhance bpftool to display comprehensive information about exposed
    perf_event links, covering uprobe, kprobe, tracepoint, and generic perf
    event. The resulting output will include the following details:
    
    $ tools/bpf/bpftool/bpftool link show
    3: perf_event  prog 14
            event software:cpu-clock
            bpf_cookie 0
            pids perf_event(19483)
    4: perf_event  prog 14
            event hw-cache:LLC-load-misses
            bpf_cookie 0
            pids perf_event(19483)
    5: perf_event  prog 14
            event hardware:cpu-cycles
            bpf_cookie 0
            pids perf_event(19483)
    6: perf_event  prog 19
            tracepoint sched_switch
            bpf_cookie 0
            pids tracepoint(20947)
    7: perf_event  prog 26
            uprobe /home/dev/waken/bpf/uprobe/a.out+0x1338
            bpf_cookie 0
            pids uprobe(21973)
    8: perf_event  prog 27
            uretprobe /home/dev/waken/bpf/uprobe/a.out+0x1338
            bpf_cookie 0
            pids uprobe(21973)
    10: perf_event  prog 43
            kprobe ffffffffb70a9660 kernel_clone
            bpf_cookie 0
            pids kprobe(35275)
    11: perf_event  prog 41
            kretprobe ffffffffb70a9660 kernel_clone
            bpf_cookie 0
            pids kprobe(35275)
    
    $ tools/bpf/bpftool/bpftool link show -j
    [{"id":3,"type":"perf_event","prog_id":14,"event_type":"software","event_config":"cpu-clock","bpf_cookie":0,"pids":[{"pid":19483,"comm":"perf_event"}]},{"id":4,"type":"perf_event","prog_id":14,"event_type":"hw-cache","event_config":"LLC-load-misses","bpf_cookie":0,"pids":[{"pid":19483,"comm":"perf_event"}]},{"id":5,"type":"perf_event","prog_id":14,"event_type":"hardware","event_config":"cpu-cycles","bpf_cookie":0,"pids":[{"pid":19483,"comm":"perf_event"}]},{"id":6,"type":"perf_event","prog_id":19,"tracepoint":"sched_switch","bpf_cookie":0,"pids":[{"pid":20947,"comm":"tracepoint"}]},{"id":7,"type":"perf_event","prog_id":26,"retprobe":false,"file":"/home/dev/waken/bpf/uprobe/a.out","offset":4920,"bpf_cookie":0,"pids":[{"pid":21973,"comm":"uprobe"}]},{"id":8,"type":"perf_event","prog_id":27,"retprobe":true,"file":"/home/dev/waken/bpf/uprobe/a.out","offset":4920,"bpf_cookie":0,"pids":[{"pid":21973,"comm":"uprobe"}]},{"id":10,"type":"perf_event","prog_id":43,"retprobe":false,"addr":18446744072485508704,"func":"kernel_clone","offset":0,"bpf_cookie":0,"pids":[{"pid":35275,"comm":"kprobe"}]},{"id":11,"type":"perf_event","prog_id":41,"retprobe":true,"addr":18446744072485508704,"func":"kernel_clone","offset":0,"bpf_cookie":0,"pids":[{"pid":35275,"comm":"kprobe"}]}]
    
    For generic perf events, the displayed information in bpftool is limited to
    the type and configuration, while other attributes such as sample_period,
    sample_freq, etc., are not included.
    
    The kernel function address won't be exposed if it is not permitted by
    kptr_restrict. The result as follows when kptr_restrict is 2.
    
    $ tools/bpf/bpftool/bpftool link show
    3: perf_event  prog 14
            event software:cpu-clock
    4: perf_event  prog 14
            event hw-cache:LLC-load-misses
    5: perf_event  prog 14
            event hardware:cpu-cycles
    6: perf_event  prog 19
            tracepoint sched_switch
    7: perf_event  prog 26
            uprobe /home/dev/waken/bpf/uprobe/a.out+0x1338
    8: perf_event  prog 27
            uretprobe /home/dev/waken/bpf/uprobe/a.out+0x1338
    10: perf_event  prog 43
            kprobe kernel_clone
    11: perf_event  prog 41
            kretprobe kernel_clone
    
    Signed-off-by: Yafang Shao <[email protected]>
    Reviewed-by: Quentin Monnet <[email protected]>
    Acked-by: Jiri Olsa <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    laoar authored and qmonnet committed Jul 9, 2023
    Configuration menu
    Copy the full SHA
    86820ac View commit details
    Browse the repository at this point in the history

Commits on Jul 12, 2023

  1. bpftool: Use "fallthrough;" keyword instead of comments

    After using "__fallthrough;" in a switch/case block in bpftool's
    btf_dumper.c [0], and then turning it into a comment [1] to prevent a
    merge conflict in linux-next when the keyword was changed into just
    "fallthrough;" [2], we can now drop the comment and use the new keyword,
    no underscores.
    
    Also update the other occurrence of "/* fallthrough */" in bpftool.
    
    [0] commit 9fd496848b1c ("bpftool: Support inline annotations when dumping the CFG of a program")
    [1] commit 4b7ef71ac977 ("bpftool: Replace "__fallthrough" by a comment to address merge conflict")
    [2] commit f7a858bffcdd ("tools: Rename __fallthrough to fallthrough")
    
    Signed-off-by: Quentin Monnet <[email protected]>
    Signed-off-by: Daniel Borkmann <[email protected]>
    Link: https://lore.kernel.org/bpf/[email protected]
    qmonnet committed Jul 12, 2023
    Configuration menu
    Copy the full SHA
    6052c9a View commit details
    Browse the repository at this point in the history

Commits on Jul 19, 2023

  1. bpf: Add generic attach/detach/query API for multi-progs

    This adds a generic layer called bpf_mprog which can be reused by different
    attachment layers to enable multi-program attachment and dependency resolution.
    In-kernel users of the bpf_mprog don't need to care about the dependency
    resolution internals, they can just consume it with few API calls.
    
    The initial idea of having a generic API sparked out of discussion [0] from an
    earlier revision of this work where tc's priority was reused and exposed via
    BPF uapi as a way to coordinate dependencies among tc BPF programs, similar
    as-is for classic tc BPF. The feedback was that priority provides a bad user
    experience and is hard to use [1], e.g.:
    
      I cannot help but feel that priority logic copy-paste from old tc, netfilter
      and friends is done because "that's how things were done in the past". [...]
      Priority gets exposed everywhere in uapi all the way to bpftool when it's
      right there for users to understand. And that's the main problem with it.
    
      The user don't want to and don't need to be aware of it, but uapi forces them
      to pick the priority. [...] Your cover letter [0] example proves that in
      real life different service pick the same priority. They simply don't know
      any better. Priority is an unnecessary magic that apps _have_ to pick, so
      they just copy-paste and everyone ends up using the same.
    
    The course of the discussion showed more and more the need for a generic,
    reusable API where the "same look and feel" can be applied for various other
    program types beyond just tc BPF, for example XDP today does not have multi-
    program support in kernel, but also there was interest around this API for
    improving management of cgroup program types. Such common multi-program
    management concept is useful for BPF management daemons or user space BPF
    applications coordinating internally about their attachments.
    
    Both from Cilium and Meta side [2], we've collected the following requirements
    for a generic attach/detach/query API for multi-progs which has been implemented
    as part of this work:
    
      - Support prog-based attach/detach and link API
      - Dependency directives (can also be combined):
        - BPF_F_{BEFORE,AFTER} with relative_{fd,id} which can be {prog,link,none}
          - BPF_F_ID flag as {fd,id} toggle; the rationale for id is so that user
            space application does not need CAP_SYS_ADMIN to retrieve foreign fds
            via bpf_*_get_fd_by_id()
          - BPF_F_LINK flag as {prog,link} toggle
          - If relative_{fd,id} is none, then BPF_F_BEFORE will just prepend, and
            BPF_F_AFTER will just append for attaching
          - Enforced only at attach time
        - BPF_F_REPLACE with replace_bpf_fd which can be prog, links have their
          own infra for replacing their internal prog
        - If no flags are set, then it's default append behavior for attaching
      - Internal revision counter and optionally being able to pass expected_revision
      - User space application can query current state with revision, and pass it
        along for attachment to assert current state before doing updates
      - Query also gets extension for link_ids array and link_attach_flags:
        - prog_ids are always filled with program IDs
        - link_ids are filled with link IDs when link was used, otherwise 0
        - {prog,link}_attach_flags for holding {prog,link}-specific flags
      - Must be easy to integrate/reuse for in-kernel users
    
    The uapi-side changes needed for supporting bpf_mprog are rather minimal,
    consisting of the additions of the attachment flags, revision counter, and
    expanding existing union with relative_{fd,id} member.
    
    The bpf_mprog framework consists of an bpf_mprog_entry object which holds
    an array of bpf_mprog_fp (fast-path structure). The bpf_mprog_cp (control-path
    structure) is part of bpf_mprog_bundle. Both have been separated, so that
    fast-path gets efficient packing of bpf_prog pointers for maximum cache
    efficiency. Also, array has been chosen instead of linked list or other
    structures to remove unnecessary indirections for a fast point-to-entry in
    tc for BPF.
    
    The bpf_mprog_entry comes as a pair via bpf_mprog_bundle so that in case of
    updates the peer bpf_mprog_entry is populated and then just swapped which
    avoids additional allocations that could otherwise fail, for example, in
    detach case. bpf_mprog_{fp,cp} arrays are currently static, but they could
    be converted to dynamic allocation if necessary at a point in future.
    Locking is deferred to the in-kernel user of bpf_mprog, for example, in case
    of tcx which uses this API in the next patch, it piggybacks on rtnl.
    
    An extensive test suite for checking all aspects of this API for prog-based
    attach/detach and link API comes as BPF selftests in this series.
    
    Thanks also to Andrii Nakryiko for early API discussions wrt Meta's BPF prog
    management.
    
      [0] https://lore.kernel.org/bpf/[email protected]
      [1] https://lore.kernel.org/bpf/CAADnVQ+gEY3FjCR=+DmjDR4gp5bOYZUFJQXj4agKFHT9CQPZBw@mail.gmail.com
      [2] http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf
    
    Signed-off-by: Daniel Borkmann <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    borkmann authored and qmonnet committed Jul 19, 2023
    Configuration menu
    Copy the full SHA
    46ee985 View commit details
    Browse the repository at this point in the history
  2. bpf: Add fd-based tcx multi-prog infra with link support

    This work refactors and adds a lightweight extension ("tcx") to the tc BPF
    ingress and egress data path side for allowing BPF program management based
    on fds via bpf() syscall through the newly added generic multi-prog API.
    The main goal behind this work which we also presented at LPC [0] last year
    and a recent update at LSF/MM/BPF this year [3] is to support long-awaited
    BPF link functionality for tc BPF programs, which allows for a model of safe
    ownership and program detachment.
    
    Given the rise in tc BPF users in cloud native environments, this becomes
    necessary to avoid hard to debug incidents either through stale leftover
    programs or 3rd party applications accidentally stepping on each others toes.
    As a recap, a BPF link represents the attachment of a BPF program to a BPF
    hook point. The BPF link holds a single reference to keep BPF program alive.
    Moreover, hook points do not reference a BPF link, only the application's
    fd or pinning does. A BPF link holds meta-data specific to attachment and
    implements operations for link creation, (atomic) BPF program update,
    detachment and introspection. The motivation for BPF links for tc BPF programs
    is multi-fold, for example:
    
      - From Meta: "It's especially important for applications that are deployed
        fleet-wide and that don't "control" hosts they are deployed to. If such
        application crashes and no one notices and does anything about that, BPF
        program will keep running draining resources or even just, say, dropping
        packets. We at FB had outages due to such permanent BPF attachment
        semantics. With fd-based BPF link we are getting a framework, which allows
        safe, auto-detachable behavior by default, unless application explicitly
        opts in by pinning the BPF link." [1]
    
      - From Cilium-side the tc BPF programs we attach to host-facing veth devices
        and phys devices build the core datapath for Kubernetes Pods, and they
        implement forwarding, load-balancing, policy, EDT-management, etc, within
        BPF. Currently there is no concept of 'safe' ownership, e.g. we've recently
        experienced hard-to-debug issues in a user's staging environment where
        another Kubernetes application using tc BPF attached to the same prio/handle
        of cls_bpf, accidentally wiping all Cilium-based BPF programs from underneath
        it. The goal is to establish a clear/safe ownership model via links which
        cannot accidentally be overridden. [0,2]
    
    BPF links for tc can co-exist with non-link attachments, and the semantics are
    in line also with XDP links: BPF links cannot replace other BPF links, BPF
    links cannot replace non-BPF links, non-BPF links cannot replace BPF links and
    lastly only non-BPF links can replace non-BPF links. In case of Cilium, this
    would solve mentioned issue of safe ownership model as 3rd party applications
    would not be able to accidentally wipe Cilium programs, even if they are not
    BPF link aware.
    
    Earlier attempts [4] have tried to integrate BPF links into core tc machinery
    to solve cls_bpf, which has been intrusive to the generic tc kernel API with
    extensions only specific to cls_bpf and suboptimal/complex since cls_bpf could
    be wiped from the qdisc also. Locking a tc BPF program in place this way, is
    getting into layering hacks given the two object models are vastly different.
    
    We instead implemented the tcx (tc 'express') layer which is an fd-based tc BPF
    attach API, so that the BPF link implementation blends in naturally similar to
    other link types which are fd-based and without the need for changing core tc
    internal APIs. BPF programs for tc can then be successively migrated from classic
    cls_bpf to the new tc BPF link without needing to change the program's source
    code, just the BPF loader mechanics for attaching is sufficient.
    
    For the current tc framework, there is no change in behavior with this change
    and neither does this change touch on tc core kernel APIs. The gist of this
    patch is that the ingress and egress hook have a lightweight, qdisc-less
    extension for BPF to attach its tc BPF programs, in other words, a minimal
    entry point for tc BPF. The name tcx has been suggested from discussion of
    earlier revisions of this work as a good fit, and to more easily differ between
    the classic cls_bpf attachment and the fd-based one.
    
    For the ingress and egress tcx points, the device holds a cache-friendly array
    with program pointers which is separated from control plane (slow-path) data.
    Earlier versions of this work used priority to determine ordering and expression
    of dependencies similar as with classic tc, but it was challenged that for
    something more future-proof a better user experience is required. Hence this
    resulted in the design and development of the generic attach/detach/query API
    for multi-progs. See prior patch with its discussion on the API design. tcx is
    the first user and later we plan to integrate also others, for example, one
    candidate is multi-prog support for XDP which would benefit and have the same
    'look and feel' from API perspective.
    
    The goal with tcx is to have maximum compatibility to existing tc BPF programs,
    so they don't need to be rewritten specifically. Compatibility to call into
    classic tcf_classify() is also provided in order to allow successive migration
    or both to cleanly co-exist where needed given its all one logical tc layer and
    the tcx plus classic tc cls/act build one logical overall processing pipeline.
    
    tcx supports the simplified return codes TCX_NEXT which is non-terminating (go
    to next program) and terminating ones with TCX_PASS, TCX_DROP, TCX_REDIRECT.
    The fd-based API is behind a static key, so that when unused the code is also
    not entered. The struct tcx_entry's program array is currently static, but
    could be made dynamic if necessary at a point in future. The a/b pair swap
    design has been chosen so that for detachment there are no allocations which
    otherwise could fail.
    
    The work has been tested with tc-testing selftest suite which all passes, as
    well as the tc BPF tests from the BPF CI, and also with Cilium's L4LB.
    
    Thanks also to Nikolay Aleksandrov and Martin Lau for in-depth early reviews
    of this work.
    
      [0] https://lpc.events/event/16/contributions/1353/
      [1] https://lore.kernel.org/bpf/CAEf4BzbokCJN33Nw_kg82sO=xppXnKWEncGTWCTB9vGCmLB6pw@mail.gmail.com
      [2] https://colocatedeventseu2023.sched.com/event/1Jo6O/tales-from-an-ebpf-programs-murder-mystery-hemanth-malla-guillaume-fournier-datadog
      [3] http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf
      [4] https://lore.kernel.org/bpf/[email protected]
    
    Signed-off-by: Daniel Borkmann <[email protected]>
    Acked-by: Jakub Kicinski <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    borkmann authored and qmonnet committed Jul 19, 2023
    Configuration menu
    Copy the full SHA
    ae4b11a View commit details
    Browse the repository at this point in the history
  3. bpftool: Extend net dump with tcx progs

    Add support to dump fd-based attach types via bpftool. This includes both
    the tc BPF link and attach ops programs. Dumped information contain the
    attach location, function entry name, program ID and link ID when applicable.
    
    Example with tc BPF link:
    
      # ./bpftool net
      xdp:
    
      tc:
      bond0(4) tcx/ingress cil_from_netdev prog_id 784 link_id 10
      bond0(4) tcx/egress cil_to_netdev prog_id 804 link_id 11
    
      flow_dissector:
    
      netfilter:
    
    Example with tc BPF attach ops:
    
      # ./bpftool net
      xdp:
    
      tc:
      bond0(4) tcx/ingress cil_from_netdev prog_id 654
      bond0(4) tcx/egress cil_to_netdev prog_id 672
    
      flow_dissector:
    
      netfilter:
    
    Currently, permanent flags are not yet supported, so 'unknown' ones are dumped
    via NET_DUMP_UINT_ONLY() and once we do have permanent ones, we dump them as
    human readable string.
    
    Signed-off-by: Daniel Borkmann <[email protected]>
    Reviewed-by: Quentin Monnet <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    borkmann authored and qmonnet committed Jul 19, 2023
    Configuration menu
    Copy the full SHA
    5c3e113 View commit details
    Browse the repository at this point in the history
  4. bpf: sync tools/ uapi header with

    Seeing the following:
    
    Warning: Kernel ABI header at 'tools/include/uapi/linux/bpf.h' differs from latest version at 'include/uapi/linux/bpf.h'
    
    ...so sync tools version missing some list_node/rb_tree fields.
    
    Fixes: c3c510ce431c ("bpf: Add 'owner' field to bpf_{list,rb}_node")
    Signed-off-by: Alan Maguire <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    alan-maguire authored and qmonnet committed Jul 19, 2023
    Configuration menu
    Copy the full SHA
    94b64a4 View commit details
    Browse the repository at this point in the history

Commits on Jul 20, 2023

  1. bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign

    Currently the bpf_sk_assign helper in tc BPF context refuses SO_REUSEPORT
    sockets. This means we can't use the helper to steer traffic to Envoy,
    which configures SO_REUSEPORT on its sockets. In turn, we're blocked
    from removing TPROXY from our setup.
    
    The reason that bpf_sk_assign refuses such sockets is that the
    bpf_sk_lookup helpers don't execute SK_REUSEPORT programs. Instead,
    one of the reuseport sockets is selected by hash. This could cause
    dispatch to the "wrong" socket:
    
        sk = bpf_sk_lookup_tcp(...) // select SO_REUSEPORT by hash
        bpf_sk_assign(skb, sk) // SK_REUSEPORT wasn't executed
    
    Fixing this isn't as simple as invoking SK_REUSEPORT from the lookup
    helpers unfortunately. In the tc context, L2 headers are at the start
    of the skb, while SK_REUSEPORT expects L3 headers instead.
    
    Instead, we execute the SK_REUSEPORT program when the assigned socket
    is pulled out of the skb, further up the stack. This creates some
    trickiness with regards to refcounting as bpf_sk_assign will put both
    refcounted and RCU freed sockets in skb->sk. reuseport sockets are RCU
    freed. We can infer that the sk_assigned socket is RCU freed if the
    reuseport lookup succeeds, but convincing yourself of this fact isn't
    straight forward. Therefore we defensively check refcounting on the
    sk_assign sock even though it's probably not required in practice.
    
    Fixes: 8e368dc72e86 ("bpf: Fix use of sk->sk_reuseport from sk_assign")
    Fixes: cf7fbe660f2d ("bpf: Add socket assign support")
    Co-developed-by: Daniel Borkmann <[email protected]>
    Signed-off-by: Daniel Borkmann <[email protected]>
    Cc: Joe Stringer <[email protected]>
    Link: https://lore.kernel.org/bpf/CACAyw98+qycmpQzKupquhkxbvWK4OFyDuuLMBNROnfWMZxUWeA@mail.gmail.com/
    Reviewed-by: Kuniyuki Iwashima <[email protected]>
    Signed-off-by: Lorenz Bauer <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Martin KaFai Lau <[email protected]>
    lmb authored and qmonnet committed Jul 20, 2023
    Configuration menu
    Copy the full SHA
    786bfb6 View commit details
    Browse the repository at this point in the history

Commits on Jul 21, 2023

  1. netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link

    This commit adds support for enabling IP defrag using pre-existing
    netfilter defrag support. Basically all the flag does is bump a refcnt
    while the link the active. Checks are also added to ensure the prog
    requesting defrag support is run _after_ netfilter defrag hooks.
    
    We also take care to avoid any issues w.r.t. module unloading -- while
    defrag is active on a link, the module is prevented from unloading.
    
    Signed-off-by: Daniel Xu <[email protected]>
    Reviewed-by: Florian Westphal <[email protected]>
    Link: https://lore.kernel.org/r/5cff26f97e55161b7d56b09ddcf5f8888a5add1d.1689970773.git.dxu@dxuuu.xyz
    Signed-off-by: Alexei Starovoitov <[email protected]>
    danobi authored and qmonnet committed Jul 21, 2023
    Configuration menu
    Copy the full SHA
    81d76fa View commit details
    Browse the repository at this point in the history

Commits on Jul 28, 2023

  1. bpf: Support new sign-extension load insns

    Add interpreter/jit support for new sign-extension load insns
    which adds a new mode (BPF_MEMSX).
    Also add verifier support to recognize these insns and to
    do proper verification with new insns. In verifier, besides
    to deduce proper bounds for the dst_reg, probed memory access
    is also properly handled.
    
    Acked-by: Eduard Zingerman <[email protected]>
    Signed-off-by: Yonghong Song <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    Yonghong Song authored and qmonnet committed Jul 28, 2023
    Configuration menu
    Copy the full SHA
    31ec51a View commit details
    Browse the repository at this point in the history

Commits on Jul 31, 2023

  1. bpf: Fix an array-index-out-of-bounds issue in disasm.c

    syzbot reported an array-index-out-of-bounds when printing out bpf
    insns. Further investigation shows the insn is illegal but
    is printed out due to log level 1 or 2 before actual insn verification
    in do_check().
    
    This particular illegal insn is a MOVSX insn with offset value 2.
    The legal offset value for MOVSX should be 8, 16 and 32.
    The disasm sign-extension-size array index is calculated as
     (insn->off / 8) - 1
    and offset value 2 gives an out-of-bound index -1.
    
    Tighten the checking for MOVSX insn in disasm.c to avoid
    array-index-out-of-bounds issue.
    
    Reported-by: [email protected]
    Fixes: f835bb622299 ("bpf: Add kernel/bpftool asm support for new instructions")
    Signed-off-by: Yonghong Song <[email protected]>
    Acked-by: Eduard Zingerman <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    Yonghong Song authored and qmonnet committed Jul 31, 2023
    Configuration menu
    Copy the full SHA
    ff2cbfb View commit details
    Browse the repository at this point in the history

Commits on Aug 3, 2023

  1. bpf: change bpf_alu_sign_string and bpf_movsx_string to static

    The bpf_alu_sign_string and bpf_movsx_string introduced in commit
    f835bb622299 ("bpf: Add kernel/bpftool asm support for new instructions")
    are only used in disasm.c now, change them to static.
    
    Signed-off-by: Yang Yingliang <[email protected]>
    Reported-by: kernel test robot <[email protected]>
    Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
    Acked-by: Yonghong Song <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Martin KaFai Lau <[email protected]>
    Yang Yingliang authored and qmonnet committed Aug 3, 2023
    Configuration menu
    Copy the full SHA
    554a1b2 View commit details
    Browse the repository at this point in the history

Commits on Aug 7, 2023

  1. bpf: Add support for bpf_get_func_ip helper for uprobe program

    Adding support for bpf_get_func_ip helper for uprobe program to return
    probed address for both uprobe and return uprobe.
    
    We discussed this in [1] and agreed that uprobe can have special use
    of bpf_get_func_ip helper that differs from kprobe.
    
    The kprobe bpf_get_func_ip returns:
      - address of the function if probe is attach on function entry
        for both kprobe and return kprobe
      - 0 if the probe is not attach on function entry
    
    The uprobe bpf_get_func_ip returns:
      - address of the probe for both uprobe and return uprobe
    
    The reason for this semantic change is that kernel can't really tell
    if the probe user space address is function entry.
    
    The uprobe program is actually kprobe type program attached as uprobe.
    One of the consequences of this design is that uprobes do not have its
    own set of helpers, but share them with kprobes.
    
    As we need different functionality for bpf_get_func_ip helper for uprobe,
    I'm adding the bool value to the bpf_trace_run_ctx, so the helper can
    detect that it's executed in uprobe context and call specific code.
    
    The is_uprobe bool is set as true in bpf_prog_run_array_sleepable, which
    is currently used only for executing bpf programs in uprobe.
    
    Renaming bpf_prog_run_array_sleepable to bpf_prog_run_array_uprobe
    to address that it's only used for uprobes and that it sets the
    run_ctx.is_uprobe as suggested by Yafang Shao.
    
    Suggested-by: Andrii Nakryiko <[email protected]>
    Tested-by: Alan Maguire <[email protected]>
    [1] https://lore.kernel.org/bpf/CAEf4BzZ=xLVkG5eurEuvLU79wAMtwho7ReR+XJAgwhFF4M-7Cg@mail.gmail.com/
    Signed-off-by: Jiri Olsa <[email protected]>
    Tested-by: Viktor Malik <[email protected]>
    Acked-by: Yonghong Song <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Martin KaFai Lau <[email protected]>
    olsajiri authored and qmonnet committed Aug 7, 2023
    Configuration menu
    Copy the full SHA
    f5ad0df View commit details
    Browse the repository at this point in the history

Commits on Aug 9, 2023

  1. bpf: Add multi uprobe link

    Adding new multi uprobe link that allows to attach bpf program
    to multiple uprobes.
    
    Uprobes to attach are specified via new link_create uprobe_multi
    union:
    
      struct {
        __aligned_u64   path;
        __aligned_u64   offsets;
        __aligned_u64   ref_ctr_offsets;
        __u32           cnt;
        __u32           flags;
      } uprobe_multi;
    
    Uprobes are defined for single binary specified in path and multiple
    calling sites specified in offsets array with optional reference
    counters specified in ref_ctr_offsets array. All specified arrays
    have length of 'cnt'.
    
    The 'flags' supports single bit for now that marks the uprobe as
    return probe.
    
    Acked-by: Andrii Nakryiko <[email protected]>
    Acked-by: Yafang Shao <[email protected]>
    Signed-off-by: Jiri Olsa <[email protected]>
    Acked-by: Yonghong Song <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    olsajiri authored and qmonnet committed Aug 9, 2023
    Configuration menu
    Copy the full SHA
    c84d14f View commit details
    Browse the repository at this point in the history
  2. bpf: Add cookies support for uprobe_multi link

    Adding support to specify cookies array for uprobe_multi link.
    
    The cookies array share indexes and length with other uprobe_multi
    arrays (offsets/ref_ctr_offsets).
    
    The cookies[i] value defines cookie for i-the uprobe and will be
    returned by bpf_get_attach_cookie helper when called from ebpf
    program hooked to that specific uprobe.
    
    Acked-by: Andrii Nakryiko <[email protected]>
    Acked-by: Yafang Shao <[email protected]>
    Signed-off-by: Jiri Olsa <[email protected]>
    Acked-by: Yonghong Song <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    olsajiri authored and qmonnet committed Aug 9, 2023
    Configuration menu
    Copy the full SHA
    1a3a19d View commit details
    Browse the repository at this point in the history
  3. bpf: Add pid filter support for uprobe_multi link

    Adding support to specify pid for uprobe_multi link and the uprobes
    are created only for task with given pid value.
    
    Using the consumer.filter filter callback for that, so the task gets
    filtered during the uprobe installation.
    
    We still need to check the task during runtime in the uprobe handler,
    because the handler could get executed if there's another system
    wide consumer on the same uprobe (thanks Oleg for the insight).
    
    Cc: Oleg Nesterov <[email protected]>
    Reviewed-by: Oleg Nesterov <[email protected]>
    Signed-off-by: Jiri Olsa <[email protected]>
    Acked-by: Yonghong Song <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    olsajiri authored and qmonnet committed Aug 9, 2023
    Configuration menu
    Copy the full SHA
    a8e6b8e View commit details
    Browse the repository at this point in the history
  4. bpf: Switch BPF_F_KPROBE_MULTI_RETURN macro to enum

    Switching BPF_F_KPROBE_MULTI_RETURN macro to anonymous enum,
    so it'd show up in vmlinux.h. There's not functional change
    compared to having this as macro.
    
    Acked-by: Yafang Shao <[email protected]>
    Suggested-by: Andrii Nakryiko <[email protected]>
    Signed-off-by: Jiri Olsa <[email protected]>
    Acked-by: Yonghong Song <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    olsajiri authored and qmonnet committed Aug 9, 2023
    Configuration menu
    Copy the full SHA
    d780634 View commit details
    Browse the repository at this point in the history

Commits on Aug 11, 2023

  1. bpftool: fix perf help message

    Currently, bpftool perf subcommand has typo with the help message.
    
        $ tools/bpf/bpftool/bpftool perf help
        Usage: bpftool perf { show | list }
               bpftool perf help }
    
    Since this bpftool perf subcommand help message has the extra bracket,
    this commit fix the typo by removing the extra bracket.
    
    Signed-off-by: Daniel T. Lee <[email protected]>
    Reviewed-by: Quentin Monnet <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Martin KaFai Lau <[email protected]>
    DanielTimLee authored and qmonnet committed Aug 11, 2023
    Configuration menu
    Copy the full SHA
    ffd6e4d View commit details
    Browse the repository at this point in the history

Commits on Aug 16, 2023

  1. bpftool: Implement link show support for tcx

    Add support to dump tcx link information to bpftool. This adds a
    common helper show_link_ifindex_{plain,json}() which can be reused
    also for other link types. The plain text and json device output is
    the same format as in bpftool net dump.
    
    Below shows an example link dump output along with a cgroup link
    for comparison:
    
      # bpftool link
      [...]
      10: cgroup  prog 1977
            cgroup_id 1  attach_type cgroup_inet6_post_bind
      [...]
      13: tcx  prog 2053
            ifindex enp5s0(3)  attach_type tcx_ingress
      14: tcx  prog 2080
            ifindex enp5s0(3)  attach_type tcx_egress
      [...]
    
    Equivalent json output:
    
      # bpftool link --json
      [...]
      {
        "id": 10,
        "type": "cgroup",
        "prog_id": 1977,
        "cgroup_id": 1,
        "attach_type": "cgroup_inet6_post_bind"
      },
      [...]
      {
        "id": 13,
        "type": "tcx",
        "prog_id": 2053,
        "devname": "enp5s0",
        "ifindex": 3,
        "attach_type": "tcx_ingress"
      },
      {
        "id": 14,
        "type": "tcx",
        "prog_id": 2080,
        "devname": "enp5s0",
        "ifindex": 3,
        "attach_type": "tcx_egress"
      }
      [...]
    
    Suggested-by: Yafang Shao <[email protected]>
    Signed-off-by: Daniel Borkmann <[email protected]>
    Reviewed-by: Quentin Monnet <[email protected]>
    Acked-by: Yafang Shao <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Martin KaFai Lau <[email protected]>
    borkmann authored and qmonnet committed Aug 16, 2023
    Configuration menu
    Copy the full SHA
    c0bb967 View commit details
    Browse the repository at this point in the history
  2. bpftool: Implement link show support for xdp

    Add support to dump XDP link information to bpftool. This reuses the
    recently added show_link_ifindex_{plain,json}(). The XDP link info only
    exposes the ifindex.
    
    Below shows an example link dump output, and a cgroup link is included
    for comparison, too:
    
      # bpftool link
      [...]
      10: cgroup  prog 2466
            cgroup_id 1  attach_type cgroup_inet6_post_bind
      [...]
      16: xdp  prog 2477
            ifindex enp5s0(3)
      [...]
    
    Equivalent json output:
    
      # bpftool link --json
      [...]
      {
        "id": 10,
        "type": "cgroup",
        "prog_id": 2466,
        "cgroup_id": 1,
        "attach_type": "cgroup_inet6_post_bind"
      },
      [...]
      {
        "id": 16,
        "type": "xdp",
        "prog_id": 2477,
        "devname": "enp5s0",
        "ifindex": 3
      }
      [...]
    
    Signed-off-by: Daniel Borkmann <[email protected]>
    Reviewed-by: Quentin Monnet <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Martin KaFai Lau <[email protected]>
    borkmann authored and qmonnet committed Aug 16, 2023
    Configuration menu
    Copy the full SHA
    54fcada View commit details
    Browse the repository at this point in the history

Commits on Aug 29, 2023

  1. sync: Update libbpf submodule

    Pull latest libbpf from mirror.
    Libbpf version: 1.3.0
    Libbpf commit:  5a46421ad837e876197295844696884c8587852a
    
    Signed-off-by: Quentin Monnet <[email protected]>
    qmonnet committed Aug 29, 2023
    Configuration menu
    Copy the full SHA
    634eeb5 View commit details
    Browse the repository at this point in the history
  2. sync: Pull latest bpftool changes from kernel

    Syncing latest bpftool commits from kernel repository.
    Baseline bpf-next commit:   a3e7e6b17946f48badce98d7ac360678a0ea7393
    Checkpoint bpf-next commit: 9e3b47abeb8f76c39c570ffc924ac0b35f132274
    Baseline bpf commit:        496720b7cfb6574a8f6f4d434f23e3d1e6cfaeb9
    Checkpoint bpf commit:      23d775f12dcd23d052a4927195f15e970e27ab26
    
    Alan Maguire (1):
      bpf: sync tools/ uapi header with
    
    Daniel Borkmann (5):
      bpf: Add generic attach/detach/query API for multi-progs
      bpf: Add fd-based tcx multi-prog infra with link support
      bpftool: Extend net dump with tcx progs
      bpftool: Implement link show support for tcx
      bpftool: Implement link show support for xdp
    
    Daniel T. Lee (1):
      bpftool: fix perf help message
    
    Daniel Xu (1):
      netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link
    
    Jiri Olsa (5):
      bpf: Add support for bpf_get_func_ip helper for uprobe program
      bpf: Switch BPF_F_KPROBE_MULTI_RETURN macro to enum
      bpf: Add multi uprobe link
      bpf: Add cookies support for uprobe_multi link
      bpf: Add pid filter support for uprobe_multi link
    
    Lorenz Bauer (1):
      bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign
    
    Quentin Monnet (1):
      bpftool: Use "fallthrough;" keyword instead of comments
    
    Tiezhu Yang (1):
      asm-generic: Unify uapi bitsperlong.h for arm64, riscv and loongarch
    
    Yafang Shao (6):
      bpf: Support ->fill_link_info for kprobe_multi
      bpftool: Dump the kernel symbol's module name
      bpftool: Show kprobe_multi link info
      bpf: Support ->fill_link_info for perf_event
      bpftool: Add perf event names
      bpftool: Show perf link info
    
    Yang Yingliang (1):
      bpf: change bpf_alu_sign_string and bpf_movsx_string to static
    
    Yonghong Song (3):
      bpf: Support new sign-extension load insns
      bpf: Add kernel/bpftool asm support for new instructions
      bpf: Fix an array-index-out-of-bounds issue in disasm.c
    
     docs/bpftool-net.rst                   |  26 +-
     include/uapi/asm-generic/bitsperlong.h |  14 +-
     include/uapi/linux/bpf.h               | 150 +++++++-
     src/btf_dumper.c                       |   2 +-
     src/feature.c                          |   2 +-
     src/kernel/bpf/disasm.c                |  58 ++-
     src/link.c                             | 476 ++++++++++++++++++++++++-
     src/net.c                              |  98 ++++-
     src/netlink_dumper.h                   |   8 +
     src/perf.c                             |   2 +-
     src/xlated_dumper.c                    |   6 +-
     src/xlated_dumper.h                    |   2 +
     12 files changed, 796 insertions(+), 48 deletions(-)
    
    Signed-off-by: Quentin Monnet <[email protected]>
    qmonnet committed Aug 29, 2023
    Configuration menu
    Copy the full SHA
    0fbbac2 View commit details
    Browse the repository at this point in the history
  3. mirror: Update "fallthrough" keyword definition

    To align on the rest of the kernel code, we updated the "__fallthrough"
    keyword into simply "fallthrough". The change needs the corresponding
    definition in the headers.
    
    Signed-off-by: Quentin Monnet <[email protected]>
    qmonnet committed Aug 29, 2023
    Configuration menu
    Copy the full SHA
    a8dbc3d View commit details
    Browse the repository at this point in the history