Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Telemetry] Track event loop delays on the server #101580

Merged
merged 22 commits into from
Jun 22, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
495edc9
initial implementation
Bamieh Jun 8, 2021
a338637
Merge branch 'master' of github.com:elastic/kibana into telemetry/eve…
Bamieh Jun 8, 2021
cd643c7
update telemetry schema files
Bamieh Jun 8, 2021
c710703
update snapshot
Bamieh Jun 8, 2021
e85560e
fully test the new collector
Bamieh Jun 9, 2021
5f26398
Merge branch 'master' of github.com:elastic/kibana into telemetry/eve…
Bamieh Jun 9, 2021
69f01fc
code review changes
Bamieh Jun 10, 2021
4a454e6
Merge branch 'master' of github.com:elastic/kibana into telemetry/eve…
Bamieh Jun 10, 2021
cdfe45e
code review + integration test
Bamieh Jun 10, 2021
73a92ad
Merge branch 'master' of github.com:elastic/kibana into telemetry/eve…
Bamieh Jun 10, 2021
37f57ec
Merge branch 'master' of github.com:elastic/kibana into telemetry/eve…
Bamieh Jun 16, 2021
63b0710
Merge branch 'master' of github.com:elastic/kibana into telemetry/eve…
Bamieh Jun 17, 2021
3c53b05
add unit tests for delete
Bamieh Jun 17, 2021
7ef5929
schema update
Bamieh Jun 17, 2021
0ad3bdb
type check
Bamieh Jun 17, 2021
764a6db
Merge branch 'master' of github.com:elastic/kibana into telemetry/eve…
Bamieh Jun 21, 2021
0992e24
make type hidden + final fixes
Bamieh Jun 21, 2021
dc5f98b
Merge branch 'master' of github.com:elastic/kibana into telemetry/eve…
Bamieh Jun 21, 2021
fb82ae2
add timestamp
Bamieh Jun 21, 2021
8dcd50e
Merge branch 'master' of github.com:elastic/kibana into telemetry/eve…
Bamieh Jun 21, 2021
dfb140e
update test snapshots
Bamieh Jun 21, 2021
0cb3650
Merge branch 'master' of github.com:elastic/kibana into telemetry/eve…
Bamieh Jun 21, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0 and the Server Side Public License, v 1; you may not use this file except
* in compliance with, at your election, the Elastic License 2.0 or the Server
* Side Public License, v 1.
*/

/**
* Roll daily indices every 24h
*/
export const ROLL_DAILY_INDICES_INTERVAL = 24 * 60 * 60 * 1000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional nit: it's easy to forget updating the comment, so self-docuemented code FTW:

const Min = 60 * 1000;
const Hour = 60 * Min;
/** * Start rolling indices */
export const ROLL_DAILY_INDICES_INTERVAL = 24 * Hour
export const ROLL_INDICES_START = 5 * Min;


/**
* Start rolling indices after 5 minutes up
*/
export const ROLL_INDICES_START = 5 * 60 * 1000;

/**
* Reset the event loop delay historgram every 1 hour
*/
export const MONITOR_EVENT_LOOP_DELAYS_INTERVAL = 1 * 60 * 60 * 1000;

/**
* Reset the event loop delay historgram every 24h
*/
export const MONITOR_EVENT_LOOP_DELAYS_RESET = 24 * 60 * 60 * 1000;

/**
* Start monitoring the event loop delays after 1 minute
*/
export const MONITOR_EVENT_LOOP_DELAYS_START = 1 * 60 * 1000;

/**
* Event loop monitoring sampling rate in milliseconds.
*/
export const MONITOR_EVENT_LOOP_DELAYS_RESOLUTION = 10;
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0 and the Server Side Public License, v 1; you may not use this file except
* in compliance with, at your election, the Elastic License 2.0 or the Server
* Side Public License, v 1.
*/
import moment from 'moment';
import type { IntervalHistogram } from './event_loop_delays';

export const mockMonitorEnable = jest.fn();
export const mockMonitorPercentile = jest.fn();
export const mockMonitorReset = jest.fn();
export const mockMonitorDisable = jest.fn();
export const monitorEventLoopDelay = jest.fn().mockReturnValue({
enable: mockMonitorEnable,
percentile: mockMonitorPercentile,
disable: mockMonitorDisable,
reset: mockMonitorReset,
});

jest.doMock('perf_hooks', () => ({
monitorEventLoopDelay,
}));

function createMockHistogram(overwrites: Partial<IntervalHistogram> = {}): IntervalHistogram {
const now = moment();

return {
min: 9093120,
max: 53247999,
mean: 11993238.600747818,
exceeds: 0,
stddev: 1168191.9357543814,
fromTimestamp: now.startOf('day').toISOString(),
lastUpdatedAt: now.toISOString(),
percentiles: {
'50': 12607487,
'75': 12615679,
'95': 12648447,
'99': 12713983,
},
...overwrites,
};
}

export const mocked = {
createHistogram: createMockHistogram,
};
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0 and the Server Side Public License, v 1; you may not use this file except
* in compliance with, at your election, the Elastic License 2.0 or the Server
* Side Public License, v 1.
*/

import { Subject } from 'rxjs';

import {
mockMonitorEnable,
mockMonitorPercentile,
monitorEventLoopDelay,
mockMonitorReset,
mockMonitorDisable,
} from './event_loop_delays.mocks';
import { savedObjectsRepositoryMock } from '../../../../../core/server/mocks';
import { startTrackingEventLoopDelaysUsage, EventLoopDelaysCollector } from './event_loop_delays';

describe('EventLoopDelaysCollector', () => {
jest.useFakeTimers('modern');
const mockNow = jest.getRealSystemTime();
jest.setSystemTime(mockNow);

beforeEach(() => jest.clearAllMocks());
afterAll(() => jest.useRealTimers());

test('#constructor enables monitoring', () => {
new EventLoopDelaysCollector();
expect(monitorEventLoopDelay).toBeCalledWith({ resolution: 10 });
expect(mockMonitorEnable).toBeCalledTimes(1);
});

test('#collect returns event loop delays histogram', () => {
const eventLoopDelaysCollector = new EventLoopDelaysCollector();
const histogramData = eventLoopDelaysCollector.collect();
expect(mockMonitorPercentile).toHaveBeenNthCalledWith(1, 50);
expect(mockMonitorPercentile).toHaveBeenNthCalledWith(2, 75);
expect(mockMonitorPercentile).toHaveBeenNthCalledWith(3, 95);
expect(mockMonitorPercentile).toHaveBeenNthCalledWith(4, 99);

expect(Object.keys(histogramData)).toMatchInlineSnapshot(`
Array [
"min",
"max",
"mean",
"exceeds",
"stddev",
"fromTimestamp",
"lastUpdatedAt",
"percentiles",
]
`);
});
test('#reset resets histogram data', () => {
const eventLoopDelaysCollector = new EventLoopDelaysCollector();
eventLoopDelaysCollector.reset();
expect(mockMonitorReset).toBeCalledTimes(1);
});
test('#stop disables monitoring event loop delays', () => {
const eventLoopDelaysCollector = new EventLoopDelaysCollector();
eventLoopDelaysCollector.stop();
expect(mockMonitorDisable).toBeCalledTimes(1);
});
});

describe('startTrackingEventLoopDelaysUsage', () => {
const mockInternalRepository = savedObjectsRepositoryMock.create();
const stopMonitoringEventLoop$ = new Subject<void>();

beforeAll(() => jest.useFakeTimers('modern'));
beforeEach(() => jest.clearAllMocks());
afterEach(() => stopMonitoringEventLoop$.next());

it('initializes EventLoopDelaysCollector and starts timer', () => {
const collectionStartDelay = 1000;
startTrackingEventLoopDelaysUsage(
mockInternalRepository,
stopMonitoringEventLoop$,
collectionStartDelay
);

expect(monitorEventLoopDelay).toBeCalledTimes(1);
expect(mockMonitorPercentile).toBeCalledTimes(0);
jest.advanceTimersByTime(collectionStartDelay);
expect(mockMonitorPercentile).toBeCalled();
});

it('stores event loop delays every collectionInterval duration', () => {
const collectionStartDelay = 100;
const collectionInterval = 1000;
startTrackingEventLoopDelaysUsage(
mockInternalRepository,
stopMonitoringEventLoop$,
collectionStartDelay,
collectionInterval
);

expect(mockInternalRepository.create).toBeCalledTimes(0);
jest.advanceTimersByTime(collectionStartDelay);
expect(mockInternalRepository.create).toBeCalledTimes(1);
jest.advanceTimersByTime(collectionInterval);
expect(mockInternalRepository.create).toBeCalledTimes(2);
jest.advanceTimersByTime(collectionInterval);
expect(mockInternalRepository.create).toBeCalledTimes(3);
});

it('resets histogram every histogramReset duration', () => {
const collectionStartDelay = 0;
const collectionInterval = 1000;
const histogramReset = 5000;
startTrackingEventLoopDelaysUsage(
mockInternalRepository,
stopMonitoringEventLoop$,
collectionStartDelay,
collectionInterval,
histogramReset
);

expect(mockMonitorReset).toBeCalledTimes(0);
jest.advanceTimersByTime(collectionInterval * 5);
expect(mockMonitorReset).toBeCalledTimes(1);
jest.advanceTimersByTime(collectionInterval * 5);
expect(mockMonitorReset).toBeCalledTimes(2);
});

it('stops monitoring event loop delays once stopMonitoringEventLoop$.next is called', () => {
startTrackingEventLoopDelaysUsage(mockInternalRepository, stopMonitoringEventLoop$);

expect(mockMonitorDisable).toBeCalledTimes(0);
stopMonitoringEventLoop$.next();
expect(mockMonitorDisable).toBeCalledTimes(1);
});
});
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0 and the Server Side Public License, v 1; you may not use this file except
* in compliance with, at your election, the Elastic License 2.0 or the Server
* Side Public License, v 1.
*/

import type { EventLoopDelayMonitor } from 'perf_hooks';
import { monitorEventLoopDelay } from 'perf_hooks';
import { takeUntil, finalize, map } from 'rxjs/operators';
import { Observable, timer } from 'rxjs';
import type { ISavedObjectsRepository } from 'kibana/server';
import {
MONITOR_EVENT_LOOP_DELAYS_START,
MONITOR_EVENT_LOOP_DELAYS_INTERVAL,
MONITOR_EVENT_LOOP_DELAYS_RESET,
MONITOR_EVENT_LOOP_DELAYS_RESOLUTION,
} from './constants';
import { storeHistogram } from './saved_objects';

export interface IntervalHistogram {
fromTimestamp: string;
lastUpdatedAt: string;
min: number;
max: number;
mean: number;
exceeds: number;
stddev: number;
percentiles: {
50: number;
75: number;
95: number;
99: number;
};
}

export class EventLoopDelaysCollector {
private readonly loopMonitor: EventLoopDelayMonitor;
private fromTimestamp: Date;

constructor() {
const monitor = monitorEventLoopDelay({
resolution: MONITOR_EVENT_LOOP_DELAYS_RESOLUTION,
});
monitor.enable();
this.fromTimestamp = new Date();
this.loopMonitor = monitor;
}

public collect(): IntervalHistogram {
const { min, max, mean, exceeds, stddev } = this.loopMonitor;

return {
min,
max,
mean,
exceeds,
stddev,
fromTimestamp: this.fromTimestamp.toISOString(),
lastUpdatedAt: new Date().toISOString(),
percentiles: {
50: this.loopMonitor.percentile(50),
75: this.loopMonitor.percentile(75),
95: this.loopMonitor.percentile(95),
99: this.loopMonitor.percentile(99),
},
};
}

public reset() {
this.loopMonitor.reset();
this.fromTimestamp = new Date();
}

public stop() {
this.loopMonitor.disable();
}
}

/**
* The monitoring of the event loop starts immediately.
* The first collection of the histogram happens after 1 minute.
* The daily histogram data is updated every 1 hour.
*/
export function startTrackingEventLoopDelaysUsage(
internalRepository: ISavedObjectsRepository,
stopMonitoringEventLoop$: Observable<void>,
collectionStartDelay = MONITOR_EVENT_LOOP_DELAYS_START,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thus, we start collecting data 1 min after the start lifecycle of kibana_usage_collection plugin.

Copy link
Member Author

@Bamieh Bamieh Jun 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The monitoring of the event loop starts immediately. The first collection of the histogram happens after 1 minute. The daily histogram data is updated every 1 hour.

collectionInterval = MONITOR_EVENT_LOOP_DELAYS_INTERVAL,
histogramReset = MONITOR_EVENT_LOOP_DELAYS_RESET
) {
const eventLoopDelaysCollector = new EventLoopDelaysCollector();

const resetOnCount = Math.ceil(histogramReset / collectionInterval);
timer(collectionStartDelay, collectionInterval)
.pipe(
map((i) => (i + 1) % resetOnCount === 0),
takeUntil(stopMonitoringEventLoop$),
finalize(() => eventLoopDelaysCollector.stop())
)
.subscribe(async (shouldReset) => {
const histogram = eventLoopDelaysCollector.collect();
if (shouldReset) {
eventLoopDelaysCollector.reset();
}
await storeHistogram(histogram, internalRepository);
Comment on lines +105 to +107
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: technically, being inside a subscription, I think this await does nothing. Not sure it would even protect against unresolved rejections?

Copy link
Member Author

@Bamieh Bamieh Jun 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically it is exactly the same as without await in this case since it is inside a .subscribe(..). I've just added it to explicitly denote that a Promise is expected here, also if we want to add some logic in the future under this function it might be easy to miss adding await

});
}
Loading