Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve CPU cores affinity to NUMA nodes v9 #12359

Closed

Conversation

lukashino
Copy link
Contributor

Followup of #12336

Redmine ticket:
https://redmine.openinfosecfoundation.org/issues/7036
https://redmine.openinfosecfoundation.org/issues/2321

This work allows more precise thread assignment and it is done either

  • automatically - by picking assigned cores of the same NUMA locality as the interface from the worker-cpu-list
  • manually - you can specify interface-specific settings in the threading section

This works with AF-PACKET and DPDK (and should with other capture modes as well). The primary target is workers runmode but it also works with autofp receive threads.

To try this PR:

  1. compile Suri from this branch
  2. Modify the newly generated suricata.yaml according to your setup.
  3. Run as usual.

Describe changes:

v9:

  • docs update per PRv8 comments
  • CPU affinity fix for af-packet when autopinning is enabled and threads are oversubscribed
  • addressed PR comments

v8:

  • interface-specific settings now don't require hwloc dependency
  • CPU affinity refactoring changes segregated to a separate commit
  • added docs - upgrade docs and general threading docs
  • a little change in a title

v7:

  • hwloc is now an optional dependency - it can be enabled with --enable-hwloc in the configure step,
  • autopinning / autooptimizing is configurable from threading.autopin suri.yaml option,
  • making CI happy (and with that fixing a few bugs)

v6

  • support for autofp mode supported - receive-cpu-set can now contain per interface CPU affinity settings - here comes a question - should worker-cpu-set in autofp be somehow configurable to the receive-cpu-set threads?
  • YAML node "per-iface" changed to "interface-specific-cpu-set"
  • list form of (management/worker/...)-cpu-set was changed to simple tree nodes
  • cpu/prio/mode properties of (management/worker/...)-cpu-set nodes moved one YAML layer up
  • DPDK (net_bonding / PCIe address / ... ) / AFP interfaces are supported

On YAML changes:

CPU affinity section in suricata.yaml has been changed to (see suricata.yaml.in for the full picture):

threading:
  cpu-affinity:
    worker-cpu-set:
      cpu: [ "all" ]
      mode: "exclusive"
      interface-specific-cpu-set:
        - interface: "net_bonding0" # 0000:3b:00.0 # net_bonding0 # ens1f0
          cpu: [ "all" ]
        - interface: "net_bonding1"
          cpu: [ "all" ]

Below is the reasoning for it.

When starting this work I wanted to follow the same structure as CPU set nodes that is:

threading:
  cpu-affinity:
    - worker-cpu-set:
        cpu: [ "all" ]
        mode: "exclusive"

converted to JSON it looks like:

{
    "threading": {
        "cpu-affinity": [
            {
                "worker-cpu-set": {
                    "cpu": [
                        "all"
                    ],
                    "mode": "exclusive"
                }
            }
        ]
    }
}

So worker-cpu-set is an object by itself, it holds the key that is named "worker-cpu-set" and only that contains its children properties (CPU, prio, etc.).

But when adding the interface-specific node I found out that having the interface names directly as the keys for the object storing the properties (CPU, prio, etc.) can change its name to something else, in this case to net-bonding0.

threading:
  cpu-affinity:
    - worker-cpu-set:
        cpu: [ "all" ]
        mode: "exclusive"
        interface-specific-cpu-set:
          - net_bonding0:
              cpu: [ "all" ]
          - net_bonding1:
              cpu: [ "all" ]

So adding it as a value:

threading:
  cpu-affinity:
    - worker-cpu-set:
        cpu: [ "all" ]
        mode: "exclusive"
        interface-specific-cpu-set:
          - interface: net_bonding0
              cpu: [ "all" ]
          - interface: net_bonding1
              cpu: [ "all" ]

yields an invalid YAML construct.

Therefore, for interface-specific CPU affinity settings, I ended up putting it on the same level as the interface name itself. This follows the same structure as is established in e.g. capture-mode interface definition (dpdk.interfaces, af-packet).
Note: In the matter of this I transferred the base *-cpu-set nodes from the list items as part of the Ticket 2321.
The CPU affinity interface-specific design I ended up with is:

threading:
  cpu-affinity:
    worker-cpu-set:
          cpu: [ "all" ]
          mode: "exclusive"
          interface-specific-cpu-set:
            - interface: "net_bonding0" # 0000:3b:00.0 # net_bonding0 # ens1f0
              cpu: [ "all" ]
            - interface: "net_bonding1"
              cpu: [ "all" ]

JSON visualization of this:

{
    "threading": {
        "cpu-affinity": {
            "worker-cpu-set": {
                "cpu": [
                    "all"
                ],
                "mode": "exclusive",
                "interface-specific-cpu-set": [
                    {
                        "interface": "net_bonding0",
                        "cpu": [
                            "all"
                        ]
                    },
                    {
                        "interface": "net_bonding1",
                        "cpu": [
                            "all"
                        ]
                    }
                ]
            }
        }
    }
}

This leaves the base {worker,receive,management}-cpu-set nodes in the original setting and as we know it. The internal code can handle this because the names of these are known beforehand and can "step down" to the children level to inspect cpu-affinity properties. The dynamic part - interface-specific-cpu-set contains all properties on the same level.

Other unsuccessful YAML designs

Specifying the NUMA affinity in the individual capture modes

Example

dpdk:
  - interface: "3b:00.0"
    threads: [ 2, 4, 6, 8]

But management / other threads would still be required to configure elsewhere. I liked the proposed approach more because all affinity-related settings are centralized.

Not having a list nodes in the YAML

threading:
  worker-cpu-set:
    net_bonding0:
      cpu: [ 1, 3, 5 ]

YAML library changes net_bonding to net-bonding when YAML is loaded - so this is a nogo.

Have interface-related properties as children

threading:
  worker-cpu-set:
    - interface: net_bonding0
        cpu: [ 1, 3, 5 ]

This is a disallowed construct in YAML, so another nogo. A similar pattern is currently with *-cpu-set nodes but the different here is that the parent list item has a value (in this case net_bonding0). That is what is causing problems. Interface related therefore need to right under "interface".

PRO TIP: A good way to visualize different YAML structures is by using YAML to JSON converters. Currently it creates object like:

[
  {
    "interface": "3b:00.0",
    "mode": "exclusive",
  }
]

Lukas Sismis added 10 commits January 8, 2025 20:49
Split the code into multiple functions for easier readability.
Part of Ticket 2321 work to remove unnecessary lists from
the config file.

Ticket: 2321
Provide backward compatibility with the previous configuration
format to allow smooth transition to the new format.
The commit adds docs about the new format and the introduced changes.
Using the new configuration format, it is now possible to set CPU affinity
settings per interface.

The threading.autopin option has been added to automatically use CPUs from the
same NUMA node as the interface. The autopin option requires
hwloc-devel / hwloc-dev to be installed and --enable-hwloc flag in configure
script.

Ticket: 7036
@lukashino
Copy link
Contributor Author

@ct0br0 commented:

skimming through and had a few questions.

it looks like if you use all then anything on that numa will be be used?

It depends. By default if you have e.g. af-packet.interfaces[interface=eth0].threads set to auto and in CPU affinity settings you have "all" then it behaves exactly like the current implementation. With threading.autopin enabled it starts assigning first threads from the local NUMA, then from the other NUMA nodes.
If you would for instance set eth0 to 4 threads only and your NUMA node has mode than 4 cores then only core from that NUMA node will be used. DPDK has extra logic for "auto" mode to split it so the cores don't overlap but it is really best effort at the moment. I didn't want to interfere with other modes too much - I didn't know the usual deployment scenarios.

does it automatically, or could you specify, not to use other say management threads? !management or just !0 any core > you want?

No, this PR doesn't add extra expression logic like you suggest. In DPDK it is implemented to exclude management threads but in other capture modes, I didn't feel comfortable making the change.
So that's a good point for discussion. I view this work primarily as a possibility to add a manual setting of CPU affinity for individual interfaces. The "autopin" feature is a bit exploratory but, of course, it tries to be helpful as well.

could this work for pcap mode too? specify say numa0

Well, I think what you are suggesting here is to be more expressive with the CPU lists. This would allow a user to not have the knowledge of the system so much - on the level of individual CPU cores.

worker-cpu-set:
  cpu: [ "numa0,!management-cpu-set" ]

This work have not been focused on this and view it as a follow-up.

kind of a nit, "all" doesn't really sound direct, may seem like you're just going to use every core you can. "auto" or "numa" > or some combination may be more specific.

@jlucovsky commented:

on src/util-affinity.c

This is called from a while loop in one instance. Could the CPU set for the NUMA node be retrieved once and then compared multiple times instead?

It is possible, I guess I wanted to keep fewer structures around. I can rework if it is the preferred approach.

@suricata-qa
Copy link

Information: QA ran without warnings.

Pipeline 24125

@lukashino
Copy link
Contributor Author

Reloaded in #12369

@lukashino lukashino closed this Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants