Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics to acs for eni provisioning workflow monitoring #4443

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

24 changes: 24 additions & 0 deletions ecs-agent/acs/session/session.go
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,8 @@ type session struct {
disconnectJitter time.Duration
inactiveInstanceReconnectDelay time.Duration
lastConnectedTime time.Time
firstACSConnectionTime time.Time
firstDiscoverPollEndpointTime time.Time
}

// NewSession creates a new Session.
Expand Down Expand Up @@ -158,6 +160,8 @@ func NewSession(containerInstanceARN string,
disconnectJitter: wsclient.DisconnectJitterMax,
inactiveInstanceReconnectDelay: inactiveInstanceReconnectDelay,
lastConnectedTime: time.Time{},
firstACSConnectionTime: time.Time{},
firstDiscoverPollEndpointTime: time.Time{},
}
}

Expand Down Expand Up @@ -234,14 +238,20 @@ func (s *session) Start(ctx context.Context) error {
// startSessionOnce creates a session with ACS and handles requests using the passed
// in arguments.
func (s *session) startSessionOnce(ctx context.Context) error {
if s.GetFirstDiscoverPollEndpointTime().IsZero() {
s.firstDiscoverPollEndpointTime = time.Now()
}

acsEndpoint, err := s.ecsClient.DiscoverPollEndpoint(s.containerInstanceARN)

if err != nil {
logger.Error("ACS: Unable to discover poll endpoint", logger.Fields{
"containerInstanceARN": s.containerInstanceARN,
field.Error: err,
})
return err
}
s.metricsFactory.New(metrics.DiscoverPollEndpointDurationName).WithGauge(s.ecsClient.GetDiscoverPollEndpointDuration()).Done(nil)

client := s.clientFactory.New(
s.acsURL(acsEndpoint),
Expand All @@ -253,6 +263,7 @@ func (s *session) startSessionOnce(ctx context.Context) error {

// Invoke Connect method as soon as we create client. This will ensure all the
// request handlers to be associated with this client have a valid connection.
acsConnectionMetric := s.metricsFactory.New(metrics.ACSConnectionMetricDurationName)
disconnectTimer, err := client.Connect(metrics.ACSDisconnectTimeoutMetricName, s.disconnectTimeout,
s.disconnectJitter)
if err != nil {
Expand All @@ -262,8 +273,13 @@ func (s *session) startSessionOnce(ctx context.Context) error {
})
return err
}
acsConnectionMetric.Done(err)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ACS metric won't fire if client.Connect returns a non-nil error. Is that intentional? If so, this behavior is inconsistent with the DiscoverPollEndpoint metric.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because ACS infinitely retries its connection, is this metric going to be too noisy if we fire a failure metric on every failed connection?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think we would want to fire metrics on failure - to be consistent with DiscoverPollEndpoint i think i will only fire the DiscoverPollEndpoint metric for successful calls

defer disconnectTimer.Stop()

if s.GetFirstACSConnectionTime().IsZero() {
s.firstACSConnectionTime = time.Now()
}

// Record the timestamp of the last connection to ACS.
s.lastConnectedTime = time.Now()

Expand Down Expand Up @@ -475,3 +491,11 @@ func formatDockerVersion(dockerVersionValue string) string {
func (s *session) GetLastConnectedTime() time.Time {
return s.lastConnectedTime
}

func (s *session) GetFirstACSConnectionTime() time.Time {
return s.firstACSConnectionTime
}

func (s *session) GetFirstDiscoverPollEndpointTime() time.Time {
return s.firstDiscoverPollEndpointTime
}
Loading
Loading