Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance and efficiency improvements in daemon/server mode #1154

Merged
merged 13 commits into from
Sep 15, 2023

Commits on Sep 13, 2023

  1. k8sclient: make InClusterK8sClient() call GetK8sClient()

    We want the in-cluster client that the multus server uses to use
    the same client config (QPS, protobuf, grpc, etc) as the regular
    client.
    
    Signed-off-by: Dan Williams <[email protected]>
    dcbw committed Sep 13, 2023
    Configuration menu
    Copy the full SHA
    fff8519 View commit details
    Browse the repository at this point in the history
  2. k8sclient: bump QPS to 50

    Multus is a pretty critical piece of infrastructure, so it shouldn't
    be subject to the same lower QPS limits as most components are.
    
    Signed-off-by: Dan Williams <[email protected]>
    dcbw committed Sep 13, 2023
    Configuration menu
    Copy the full SHA
    752f28c View commit details
    Browse the repository at this point in the history
  3. server: make CmdAdd/Del/Check struct member functions

    Then we can just use the Server struct kube client and exec rather
    than passing them through the function parameters.
    
    Signed-off-by: Dan Williams <[email protected]>
    dcbw committed Sep 13, 2023
    Configuration menu
    Copy the full SHA
    6b8d24c View commit details
    Browse the repository at this point in the history
  4. vendor: add client-go and more apimachinery modules

    We'll need these for the next commit.
    
    Signed-off-by: Dan Williams <[email protected]>
    dcbw committed Sep 13, 2023
    Configuration menu
    Copy the full SHA
    7c68481 View commit details
    Browse the repository at this point in the history
  5. daemon: remove unused done channel

    Signed-off-by: Dan Williams <[email protected]>
    dcbw committed Sep 13, 2023
    Configuration menu
    Copy the full SHA
    1605ffc View commit details
    Browse the repository at this point in the history
  6. server: simplify server start

    Move server start code to a common function that both regular
    and test code can use. Also shut down the server from the
    testcases.
    
    Signed-off-by: Dan Williams <[email protected]>
    dcbw committed Sep 13, 2023
    Configuration menu
    Copy the full SHA
    cec1a53 View commit details
    Browse the repository at this point in the history

Commits on Sep 14, 2023

  1. server: use a shared informer pod cache rather than direct apiserver …

    …access
    
    When running in server mode we can use a shared informer to listen for
    Pod events from the apiserver, and grab pod info from that cache rather
    than doing direct apiserver requests each time.
    
    This reduces apiserver load and retry latency, since multus can poll
    the local cache more frequently than it should do direct apiserver
    requests.
    
    Oddly static pods don't show up in the informer by the timeout and
    require a direct apiserver request. Since static pods are not common
    and are typically long-running, it should not be a big issue to
    fall back to direct apiserver access for them.
    
    Signed-off-by: Dan Williams <[email protected]>
    dcbw committed Sep 14, 2023
    Configuration menu
    Copy the full SHA
    50c0357 View commit details
    Browse the repository at this point in the history
  2. server/config: simplify ConfigManager creation

    A couple of the setup variables for NewManager*() are already in the
    multus config that it gets passed, so use those instead of passing
    explicitly.
    
    Signed-off-by: Dan Williams <[email protected]>
    dcbw committed Sep 14, 2023
    Configuration menu
    Copy the full SHA
    4ade856 View commit details
    Browse the repository at this point in the history
  3. server/config: consolidate ConfigManager start and fsnotify watching

    Simplify setup by moving the post-creation operations like
    GenerateConfig() and PersistMultusConfig() into a new Start() function
    that also begins watching the configuration directory. This better
    encapsulates the manager functionality in the object.
    
    We can also get rid of the done channel passed to the config
    manager and just use the existing WaitGroup to determine when to
    exit the daemon main().
    
    Signed-off-by: Dan Williams <[email protected]>
    dcbw committed Sep 14, 2023
    Configuration menu
    Copy the full SHA
    8539a47 View commit details
    Browse the repository at this point in the history
  4. server/config: fix MonitorPluginConfiguration test

    The test was comparing the same configuration to itself, since
    nothing in the changed CNI configuration is used in the written
    multus configuration.
    
    Instead make sure the updated CNI config contains something
    that will be reflected in the written multus configuration,
    and while we're there use a more robust way to wait for the
    config to be written via gomega.Eventually().
    
    Signed-off-by: Dan Williams <[email protected]>
    dcbw committed Sep 14, 2023
    Configuration menu
    Copy the full SHA
    c2add82 View commit details
    Browse the repository at this point in the history
  5. server/config: un-export some functions no longer used outside the mo…

    …dule
    
    Signed-off-by: Dan Williams <[email protected]>
    dcbw committed Sep 14, 2023
    Configuration menu
    Copy the full SHA
    fb4f4aa View commit details
    Browse the repository at this point in the history
  6. server/config: use filepath.Join()

    Signed-off-by: Dan Williams <[email protected]>
    dcbw committed Sep 14, 2023
    Configuration menu
    Copy the full SHA
    b0df7dd View commit details
    Browse the repository at this point in the history
  7. server: don't set CNI config readinessindicatorfile when using Config…

    …Manager
    
    For whatever reason calling os.Stat() on the readiness indicator file
    from CmdAdd()/CmdDel() when multus is running in server mode and is
    containerized often returns "file not found", which triggers the
    polling behavior of GetReadinessIndicatorFile(). This greatly delays
    CNI operations that should be pretty quick. Even if an exponential
    backoff is used, os.Stat() can still return "file not found"
    multiple times, even though the file clearly exists.
    
    But it turns out we don't need to check the readiness file in server
    mode when running with MultusConfigFile == "auto". In this mode the
    server starts the ConfigManager which (a) waits until the file exists
    and (b) fsnotify watches the readiness and (c) exits the daemon
    immediately if the file is deleted or moved.
    
    This means we can assume that while the daemon is running and the
    server is handling CNI requests that the readiness file exists;
    otherwise the daemon would have exited. Thus CmdAdd/CmdDel don't
    need to run a lot of possibly failing os.Stat() calls in the CNI
    hot paths.
    
    Signed-off-by: Dan Williams <[email protected]>
    dcbw committed Sep 14, 2023
    Configuration menu
    Copy the full SHA
    d9c06e9 View commit details
    Browse the repository at this point in the history