Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[securitySolution] Jest memory leaks #117255

Closed
tylersmalley opened this issue Nov 3, 2021 · 38 comments · Fixed by #120324
Closed

[securitySolution] Jest memory leaks #117255

tylersmalley opened this issue Nov 3, 2021 · 38 comments · Fixed by #120324
Assignees
Labels
bug Fixes for quality problems that affect the customer experience impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. loe:small Small Level of Effort OLM Sprint Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Team:Defend Workflows “EDR Workflows” sub-team of Security Solution Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.

Comments

@tylersmalley
Copy link
Contributor

In #117188 we're working to run Jest tests by their associated config file, parallelizing the work ourselves instead of relying on Jest workers (what FB does internally). This change is a blocker to per-PR code coverage reporting. We see instability with the workers and it hides some underlying issues. You can see OOM's in the current Jest tests for any CI run, as the Jest worker is automatically restarted and picked up where it left off. We shouldn't OOM, and we shouldn't have memory leaks as it could potentially be an issue in the application code.

The issue we're facing is with the x-pack/plugins/security_solution/jest.config.js config, as it seems to have a memory leak.

There is a pretty good writeup here: https://chanind.github.io/javascript/2019/10/12/jest-tests-memory-leak.html

node --max-old-space-size=10000 --expose-gc ./node_modules/.bin/jest --runInBand --logHeapUsage --config x-pack/plugins/security_solution/jest.config.js

Memory leaks

...
 PASS  x-pack/plugins/security_solution/server/lib/timeline/saved_object/timelines/index.test.ts (4004 MB heap size)                                                                                  [202/1833]
 PASS  x-pack/plugins/security_solution/server/lib/machine_learning/authz.test.ts (4042 MB heap size)                                                                                                           
 PASS  x-pack/plugins/security_solution/server/lib/detection_engine/migrations/finalize_migration.test.ts (4076 MB heap size)                                                                                   
 PASS  x-pack/plugins/security_solution/server/lib/detection_engine/rules/create_rules.test.ts (4085 MB heap size)                                                                                              
 PASS  x-pack/plugins/security_solution/server/lib/detection_engine/migrations/update_migration_saved_object.test.ts (4106 MB heap size)                                                                        
 PASS  x-pack/plugins/security_solution/server/search_strategy/security_solution/factory/matrix_histogram/anomalies/index.test.ts (4108 MB heap size)                                                           
 PASS  x-pack/plugins/security_solution/server/lib/detection_engine/notifications/schedule_throttle_notification_actions.test.ts (4142 MB heap size)                                                            
 PASS  x-pack/plugins/security_solution/server/lib/detection_engine/migrations/create_migration_saved_object.test.ts (4165 MB heap size)                                                                        
 PASS  x-pack/plugins/security_solution/server/lib/detection_engine/migrations/get_migration_saved_objects_by_id.test.ts (4185 MB heap size)                                                                    
 PASS  x-pack/plugins/security_solution/server/lib/detection_engine/migrations/create_migration.test.ts (4214 MB heap size)                                                                                     
 PASS  x-pack/plugins/security_solution/server/lib/detection_engine/notifications/utils.test.ts (4228 MB heap size)                                                                                             
 PASS  x-pack/plugins/security_solution/server/lib/detection_engine/migrations/find_migration_saved_objects.test.ts (4250 MB heap size)                                                                         
 PASS  x-pack/plugins/security_solution/server/lib/detection_engine/rule_actions/legacy_create_rule_actions_saved_object.test.ts (4271 MB heap size)                                                            
 PASS  x-pack/plugins/security_solution/server/lib/detection_engine/migrations/get_migration_saved_objects_by_index.test.ts (4294 MB heap size)                                                                 
 PASS  x-pack/plugins/security_solution/server/lib/detection_engine/rule_actions/legacy_get_bulk_rule_actions_saved_object.test.ts (4315 MB heap size)                                                          
 PASS  x-pack/plugins/security_solution/server/lib/detection_engine/rule_actions/legacy_get_rule_actions_saved_object.test.ts (4336 MB heap size)                                                               
 PASS  x-pack/plugins/security_solution/server/search_strategy/security_solution/factory/hosts/authentications/helpers.test.ts (4342 MB heap size)
...
 PASS  x-pack/plugins/security_solution/server/endpoint/services/artifacts/artifact_client.test.ts (21.214 s, 5667 MB heap size)                                                                                
 PASS  x-pack/plugins/security_solution/server/endpoint/services/metadata/endpoint_metadata_service.test.ts (7.851 s, 5784 MB heap size)
 PASS  x-pack/plugins/security_solution/server/endpoint/routes/trusted_apps/service.test.ts (44.125 s, 5850 MB heap size)
 PASS  x-pack/plugins/security_solution/common/endpoint/service/trusted_apps/validations.test.ts (27.839 s, 5780 MB heap size)

 RUNS  x-pack/plugins/security_solution/server/endpoint/lib/policy/license_watch.test.ts

<--- Last few GCs --->

[3668992:0x5fd7290]  5117861 ms: Mark-sweep (reduce) 5809.3 (7146.1) -> 5808.9 (7100.8) MB, 4519.9 / 0.0 ms  (average mu = 0.007, current mu = 0.005) last resort GC in old space requested
[3668992:0x5fd7290]  5122451 ms: Mark-sweep (reduce) 5808.9 (7100.8) -> 5808.9 (7087.6) MB, 4590.0 / 0.0 ms  (average mu = 0.003, current mu = 0.000) last resort GC in old space requested
@tylersmalley tylersmalley added bug Fixes for quality problems that affect the customer experience Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. labels Nov 3, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@kevinlog
Copy link
Contributor

kevinlog commented Nov 3, 2021

@paul-tavares - can you take a look at the above from our team's perspective? A few files from OLM team show up here.

 PASS  x-pack/plugins/security_solution/server/endpoint/services/artifacts/artifact_client.test.ts (21.214 s, 5667 MB heap size)                                                                                
 PASS  x-pack/plugins/security_solution/server/endpoint/services/metadata/endpoint_metadata_service.test.ts (7.851 s, 5784 MB heap size)
 PASS  x-pack/plugins/security_solution/server/endpoint/routes/trusted_apps/service.test.ts (44.125 s, 5850 MB heap size)
 PASS  x-pack/plugins/security_solution/common/endpoint/service/trusted_apps/validations.test.ts (27.839 s, 5780 MB heap size)

@paul-tavares
Copy link
Contributor

@kevinlog Sorry I missed this. I have added this to our board and also added my name to the assignee list. I will take a look

@FrankHassanabad
Copy link
Contributor

FrankHassanabad commented Nov 5, 2021

@paul-tavares , @rylnd , @tylersmalley if this helps I spent the afternoon reading through reasons why it can be a memory leak. A lot of things.

However, I think some test files leak and some do not.

I made this small script utility which copies a targeted test file multiple times and then executes that one particular test multiple times in order to see if there is leak. It looks to allow me to target specific files after narrowing down to a particular directory:

#!/bin/sh

# Set your kibana home here
KIBANA_HOME=~/projects/kibana
MAX_MEMORY=800

# Useage if no arguments
if [ $# -eq 0 ]; then
  echo "Usage is ./copy_tests.sh <test name>"
  exit 1
fi

# Check if file exists or not
if [[ ! -f $1 ]]; then
  echo "File does not exist"
  exit 1
fi

# Gets the directory of your first argument
DIR="$(dirname "${1}")"

# Loops max times and copy the single test in place
for i in {1..100}
do
  cp $1 $DIR/script_copy_test_$i.test.ts
done

pushd $KIBANA_HOME;
node --max-old-space-size=$MAX_MEMORY --expose-gc ./node_modules/.bin/jest --runInBand --logHeapUsage --detectOpenHandles --no-cache --config x-pack/plugins/security_solution/jest.config.js $DIR/script_copy_test_*.ts;
popd;

# Remove all the copied scripts
rm $DIR/script_copy_test_*.test.ts

For example I think a lot of tests in the signals folder has leaks like build_event_type_signal.test.ts:

./copy_tests.sh ~/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/signals/build_event_type_signal.test.ts

But if I run this instead against create_migration.test.ts I don't see leaks:

./copy_tests.sh ~/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/migrations/create_migration.test.ts

You can add --inspect-brk above to the script and then use Chrome dev tools to examine a particular single test leak. Probably easier to run the loops on the same file then it is to use different test files looking for a leak.

@paul-tavares paul-tavares added OLM Sprint Team:Defend Workflows “EDR Workflows” sub-team of Security Solution labels Nov 8, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-onboarding-and-lifecycle-mgt (Team:Onboarding and Lifecycle Mgt)

@paul-tavares paul-tavares assigned academo and unassigned paul-tavares Nov 8, 2021
@paul-tavares
Copy link
Contributor

FYI - @academo was already looking into some issues with Jest last week, and after speaking with him, he will continue no this issue and follow through on the memory leaks

@FrankHassanabad
Copy link
Contributor

FrankHassanabad commented Nov 8, 2021

I updated my script above with this gist one now.

I isolated one memory leak coming from alerting. Take this simple gist file: sample_example.ts

And you can put that file in x-pack/plugins/alerting/server/ and run it over 100 times and it will blow up the memory. Looking at it a bit, it looks like somewhere we are not cleaning up memory in which the modules keep getting loaded again and again.

For security_solution as a way to fix this within files such as here:

where we import from the top level alerting plugin, we can easily just change that code to be:

import type {
  AlertInstanceContext,
  AlertInstanceState,
  AlertServices,
} from '../../../../../alerting/server';

// Import this not from above or you will get memory leaks from Jest tests
import { parseDuration } from '../../../../../alerting/common/parse_duration';

Then we get type erasure since we pulling import type instead of import and then we are pulling in the parseDuration directly from common in which that looks like we don't have a memory leak.

Performance is also hindered and directly related to the sheer number of modules we are pulling in with our tests. Our tests will do better memory and performance wise if we minimize the imports. But...I can't say, "why" this is leaking memory from the alerting side and if you look at even well behaved memory heaps we are pulling in ReactJS and other modules for even server side tests into memory and we do it from 1 to 10 times for even non-leaking tests. I even did the garbage collect tricks and lowered my memory to ensure I was seeing for sure that in a lot of cases we are loading a lot of memory and reseting that memory between test runs which is a big performance impact.

For example with the sample_example.ts if you time it you will see a 50% increase in performance if you avoid pulling in the top level alerting parseDuration by timing it using a regular stopwatch and not counting initial startup time.

Here is well behaved tests with sample_example.ts when it doesn't leak it still has a lot of modules loaded in memory that cannot be garbage collected and we're sitting at ~200 megs for it:

Stable when it first starts:
Screen Shot 2021-11-08 at 10 51 26 AM

Stable after a while of running:
Screen Shot 2021-11-08 at 10 51 00 AM

Here when sample_example.ts is not well behaved it will keep loading the modules into memory:

First heap snapshot, already large:
Screen Shot 2021-11-08 at 11 02 33 AM

After a while, heap snapshot grows even though I call the garbage collection (modules cannot be garbage collected):
Screen Shot 2021-11-08 at 11 02 53 AM

@pmuellr
Copy link
Member

pmuellr commented Nov 8, 2021

I just happened to see this issue - so thought I'd mention a leak we just noticed. We noticed it doing some alerting stress testing in the 7.16.0 BC's - #116636 . Caused by a new feachur in node v16 in perf_hooks function performance.mark() (marks never get cleaned up automagically). It took ~10hrs under pretty heavy load to OOM, so I thought of it as a fairly slow leak, and guessing it's not the cause here, but thought I'd mention it if folks weren't already aware.

@pmuellr
Copy link
Member

pmuellr commented Nov 8, 2021

What happens if you don't use --runInBand when you run the tests? My understanding is that option basically runs tests in the same "vm", whereas if you don't use it, it's more "separate". I use --runInBand when debugging tests w/vscode, but I don't think with other jest invocations.

I think that combined with all the jest.resetModules() is going to cause a large swath of modules to be reloaded in the same runtime - probably a lot of alerting code, and that code pulling in other plugin/core code. If I had to guess, we're OOMing on code! And it sounds like from ^^^ that there is a lot of code in memory.

This doesn't seem like something alerting has done - explicitly or implicitly - to cause this, but obviously we want to know if it is. Is there something specific we should be looking at? Maybe we need to organize our imports better to allow this kind of usage? For instance, is it that accessing the parseDuration function directly from common means fewer modules are being pulled in, compared to accessing it from server, and we should do more of that?

@FrankHassanabad
Copy link
Contributor

FrankHassanabad commented Nov 9, 2021

Is there something specific we should be looking at?

I would look at alerting/server/index.ts and when you run tests importing from there. You can manually or use the script above to duplicate the sample.test.ts which accesses it.

In all places you can add import type rather than import to encourage more typescript erasure such as here:
https://github.com/elastic/kibana/blob/main/x-pack/plugins/actions/server/actions_client.mock.ts#L9

Would be helpful to reduce and maybe even remove issues if not at least we can tell looking at a file where it is pulling imports from another vs. it is just using types.

is it that accessing the parseDuration function directly from common means fewer modules are being pulled in, compared to accessing it from server, and we should do more of that?

Yes, a lot less modules and the memory leak doesn't show up.

What happens if you don't use --runInBand

For debugging memory issues I think everyone recommends running it in band. You can remove the jest.resetModules() if you want and change around the sample.test.ts but if you run that sample around 10-30 times with memory set to around 500 megs it always seems to blows up (again I don't know just yet though why).

Even starting up it consumes a lot of memory.

@FrankHassanabad
Copy link
Contributor

FrankHassanabad commented Nov 9, 2021

Second one I think I found somewhere when importing x-pack/plugins/actions/server/mocks.ts:
https://gist.github.com/FrankHassanabad/68cef0d806f8cd9b2af50fa815af92c1

I think I can patch security_solution with the alternative imports. For this one instead of importing:

import { actionsClientMock } from './mocks';

I am going to import like so directly:

import { actionsClientMock } from './actions_client.mock';

And early results shows that doesn't leak.

@tylersmalley
Copy link
Contributor Author

tylersmalley commented Nov 9, 2021

@pmuellr, --runInBand makes the test run serially, instead of running in workers. You can see OOM's in the current Jest tests for any CI run, as the Jest worker is automatically restarted and picked up where it left off. Running with workers is resulting in a lot of instability, which runInBand should resolve.

@academo
Copy link
Contributor

academo commented Nov 9, 2021

I was checking this test in particular x-pack/plugins/security_solution/server/endpoint/services/artifacts/artifact_client.test.ts using @FrankHassanabad script and chrome memory snapshots.

Doing some kind of bisect by commenting and enabling parts of the tests here and there I found that the memory leak only happens when this imported method is used:

import { getInternalArtifactMock } from '../../schemas/artifacts/saved_objects.mock';
// ... code code code
const artifact = await getInternalArtifactMock('linux', 'v1'); // <-- this creates a memory leak.

I have to comment or remove the code from the test (skip is not enough) to stop the leak. Also is this specific line that is causing the leak on this specific test. I remove all other lines and left the test with only this and it happened.

Thinking that this specific mock code was creating the memory leak I went to the source and replace it by returning a simple object instead of the regular code but it kept failing. I then replace the whole file to only export this method that merely returned an object and there was no memory leak.

Doing now more bisect here and there in that specific method I found the leak only occurred when you import this module

import { ArtifactConstants } from '../../lib/artifacts';

that module itself is an object of strings (not an enum)

@FrankHassanabad
Copy link
Contributor

FrankHassanabad commented Nov 9, 2021

It's hard to see where but if you're tracing through you eventually find the same curious case where if your test through an import either directly or transitive imports a function through the pattern:

x-pack/plugins/${PLUGIN_NAME}/server/index.ts 

It will do two things:

  • Take at least 2x longer to load compared to other tests that do not import from this pattern
  • Take 2x-10x more memory and also leaks memory.

Here is a small test case which shows us leaking memory when we import from:

x-pack/plugins/fleet/server/index.ts 
// Put this file at "x-pack/plugins/fleet/server" and then modify "copy_tests.sh"
// to have the setting "KIBANA_PROJECT=x-pack/plugins/fleet/jest.config.js"
// And then run it:
// ./copy_tests.sh ~/projects/kibana/x-pack/plugins/fleet/server/test_sample.test.ts
// And you will see the memory leak unless you use the import outside of the pattern
// "x-pack/plugins/fleet/server"
// Any functions we import directly from ${PLUGIN_NAME}/server seems to start the memory leak.

// This will be a memory leak if we import from "x-pack/plugins/fleet/server"
// Comment this out and uncomment the line underneath it to remove the memory leak
import { relativeDownloadUrlFromArtifact } from '.';

// This will not be a memory leak if we import from within the mappings folder
// import { relativeDownloadUrlFromArtifact } from './services/artifacts/mappings';

describe('memory leak', () => {
  test('it has a memory leak by just iterating over an import', () => {
    // You just have to ensure you're touching the import, not actually invoke it to see the leak
    Object.keys(relativeDownloadUrlFromArtifact);
  });
});

Notice that I'm not running the function, I'm just touching it to ensure jest knows to load it. You could call the function and it's the same effect but this lets everyone know it's something with the imports and not specific code. You can change your jest settings and add cleanup routines as well and expose the garbage collector from NodeJS and run that like I did earlier but it leaks the same.

Once you import the file directly from the underlying x-pack/server/services/artifacts/mappings the leak stops. But at this point I don't think we should be encouraging people to import in some cases restricted zones or telling them they cannot import directly from the sever? I haven't gone deep into what the special difference is just yet.

// This will not be a memory leak if we import from within the mappings folder
import { relativeDownloadUrlFromArtifact } from './services/artifacts/mappings';

@lukeelmers
Copy link
Member

Can confirm I was able to reproduce the memory leak following Frank's instructions. I captured a few heap snapshots and spent awhile digging through them, but was not able to find anything conclusive (though my analysis was certainly not exhaustive).

In particular I am interested in Frank's theory about whether there's something going on pulling in a lot of things from bootstrap.ts and server/root/index.ts -- perhaps we could test a few more scenarios around this. cc @elastic/kibana-core

@academo
Copy link
Contributor

academo commented Nov 16, 2021

@lukeelmers I did code bisecting yesterday with @FrankHassanabad's script and narrowed down the problem to two pieces of code:

file1: src/core/server/bootstrap.ts:

const root = new Root(rawConfigService, env, onRootShutdown);

file2: src/core/server/index.ts logging object:

export const config = {
  elasticsearch: {
    schema: elasticsearchConfigSchema,
  },
  // this needs commenting
  //logging: {
   // appenders: appendersSchema as Type<AppenderConfigType>,
  //},
};

removing those both parts of code stops all memory leaking, removing only one of the parts has no effect.

I didn't go further on investigating the logging or Root objects but it might give you a clue where to start looking.

Note that merely commenting the code is enough even if it creates types or syntax errors.

@lukeelmers
Copy link
Member

Thanks @academo! This is super helpful. I'll go ahead and label this issue for the core team to investigate further. We'll keep you posted.

@lukeelmers lukeelmers added the Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc label Nov 16, 2021
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue Nov 17, 2021
…600 megs (elastic#118734)

## Summary

This addresses parts of elastic#117255

By introducing top level mocks for:
* `core/server/index.ts`
* `task_manager/server/index.ts`
* `alerting/server/index.ts`
* `actions/server/index.ts`

These top level mocks add the few required functions we use sparingly and adds them from the "restricted zones" to avoid giant typescript imports from happening from the server side which also pulls in the memory leaks. 

```ts
moduleNameMapper: {
    'core/server$': '<rootDir>/x-pack/plugins/security_solution/server/__mocks__/core.mock.ts',
    'task_manager/server$':
      '<rootDir>/x-pack/plugins/security_solution/server/__mocks__/task_manager.mock.ts',
    'alerting/server$': '<rootDir>/x-pack/plugins/security_solution/server/__mocks__/alert.mock.ts',
    'actions/server$': '<rootDir>/x-pack/plugins/security_solution/server/__mocks__/action.mock.ts',
  },
```

For testing this you can now run:
```sh
node --max-old-space-size=600 --expose-gc ./node_modules/.bin/jest --runInBand --logHeapUsage --detectOpenHandles --no-cache --config x-pack/plugins/security_solution/jest.config.dev.js x-pack/plugins/security_solution/server
```

And the server side tests will be able to complete in less than 600 megs of memory. The memory leaks and memory consumption issues are mitigated through the layers but this doesn't guarantee that in the future these won't show up again. The root of the issue(s) with the memory leaks from `core/server` aren't addressed here as those are separate concerns at this point but this at least mitigates the amount of leakage from our side for now.

### Checklist

- [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
kibanamachine added a commit that referenced this issue Nov 17, 2021
…600 megs (#118734) (#118930)

## Summary

This addresses parts of #117255

By introducing top level mocks for:
* `core/server/index.ts`
* `task_manager/server/index.ts`
* `alerting/server/index.ts`
* `actions/server/index.ts`

These top level mocks add the few required functions we use sparingly and adds them from the "restricted zones" to avoid giant typescript imports from happening from the server side which also pulls in the memory leaks. 

```ts
moduleNameMapper: {
    'core/server$': '<rootDir>/x-pack/plugins/security_solution/server/__mocks__/core.mock.ts',
    'task_manager/server$':
      '<rootDir>/x-pack/plugins/security_solution/server/__mocks__/task_manager.mock.ts',
    'alerting/server$': '<rootDir>/x-pack/plugins/security_solution/server/__mocks__/alert.mock.ts',
    'actions/server$': '<rootDir>/x-pack/plugins/security_solution/server/__mocks__/action.mock.ts',
  },
```

For testing this you can now run:
```sh
node --max-old-space-size=600 --expose-gc ./node_modules/.bin/jest --runInBand --logHeapUsage --detectOpenHandles --no-cache --config x-pack/plugins/security_solution/jest.config.dev.js x-pack/plugins/security_solution/server
```

And the server side tests will be able to complete in less than 600 megs of memory. The memory leaks and memory consumption issues are mitigated through the layers but this doesn't guarantee that in the future these won't show up again. The root of the issue(s) with the memory leaks from `core/server` aren't addressed here as those are separate concerns at this point but this at least mitigates the amount of leakage from our side for now.

### Checklist

- [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios

Co-authored-by: Frank Hassanabad <[email protected]>
@pgayvallet
Copy link
Contributor

@FrankHassanabad I see you merged #118734. Was that blocking you for something? As all test suites are impacted, I wonder if that kind of local patching / workaround is really a good idea?

Also FWIW, this issue is planned for our next sprint (which starts next week)

TinLe pushed a commit to TinLe/kibana that referenced this issue Nov 20, 2021
…600 megs (elastic#118734)

## Summary

This addresses parts of elastic#117255

By introducing top level mocks for:
* `core/server/index.ts`
* `task_manager/server/index.ts`
* `alerting/server/index.ts`
* `actions/server/index.ts`

These top level mocks add the few required functions we use sparingly and adds them from the "restricted zones" to avoid giant typescript imports from happening from the server side which also pulls in the memory leaks. 

```ts
moduleNameMapper: {
    'core/server$': '<rootDir>/x-pack/plugins/security_solution/server/__mocks__/core.mock.ts',
    'task_manager/server$':
      '<rootDir>/x-pack/plugins/security_solution/server/__mocks__/task_manager.mock.ts',
    'alerting/server$': '<rootDir>/x-pack/plugins/security_solution/server/__mocks__/alert.mock.ts',
    'actions/server$': '<rootDir>/x-pack/plugins/security_solution/server/__mocks__/action.mock.ts',
  },
```

For testing this you can now run:
```sh
node --max-old-space-size=600 --expose-gc ./node_modules/.bin/jest --runInBand --logHeapUsage --detectOpenHandles --no-cache --config x-pack/plugins/security_solution/jest.config.dev.js x-pack/plugins/security_solution/server
```

And the server side tests will be able to complete in less than 600 megs of memory. The memory leaks and memory consumption issues are mitigated through the layers but this doesn't guarantee that in the future these won't show up again. The root of the issue(s) with the memory leaks from `core/server` aren't addressed here as those are separate concerns at this point but this at least mitigates the amount of leakage from our side for now.

### Checklist

- [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
@pgayvallet
Copy link
Contributor

Gonna take some time to take a second look on this.

Thanks @FrankHassanabad for the script btw, helps a lot.

A remark though, In the provided sample test (the one importing parseDuration either from . or from ../common/parse_duration), I observe than even with the deep import. the heap does increase, even if way less than when imported from .

 PASS  x-pack/plugins/alerting/server/script_copy_test_17.test.ts (77 MB heap size)
  memory leak as soon as we pull in and use "parseDuration" from "alerting
    ✓ parseDuration (63 ms)

 PASS  x-pack/plugins/alerting/server/script_copy_test_65.test.ts (93 MB heap size)
  memory leak as soon as we pull in and use "parseDuration" from "alerting
    ✓ parseDuration (71 ms)

 PASS  x-pack/plugins/alerting/server/script_copy_test_75.test.ts (105 MB heap size)
  memory leak as soon as we pull in and use "parseDuration" from "alerting
    ✓ parseDuration (74 ms)

 PASS  x-pack/plugins/alerting/server/script_copy_test_91.test.ts (116 MB heap size)
  memory leak as soon as we pull in and use "parseDuration" from "alerting
    ✓ parseDuration (78 ms)

Even an empty test, without any import, does trigger a heap increase

 PASS  x-pack/plugins/alerting/server/script_copy_test_17.test.ts (76 MB heap size)
  memory leak as soon as we pull in and use "parseDuration" from "alerting
    ✓ parseDuration (64 ms)

 PASS  x-pack/plugins/alerting/server/script_copy_test_75.test.ts (93 MB heap size)
  memory leak as soon as we pull in and use "parseDuration" from "alerting
    ✓ parseDuration (72 ms)

 PASS  x-pack/plugins/alerting/server/script_copy_test_65.test.ts (104 MB heap size)
  memory leak as soon as we pull in and use "parseDuration" from "alerting
    ✓ parseDuration (66 ms)

 PASS  x-pack/plugins/alerting/server/script_copy_test_81.test.ts (115 MB heap size)
  memory leak as soon as we pull in and use "parseDuration" from "alerting
    ✓ parseDuration (70 ms)

10mb increase on each file seems like enormous for just an empty test file?

@pgayvallet
Copy link
Contributor

Okay nevermind, that's the time jest start all the parallel runner in the process. Once all runners are up, the memory stabilize when running the empty test.

@pgayvallet
Copy link
Contributor

Spent a few hours, didn't really find anything relevant, but here's some observations:

Memory consumption on an empty test

Okay, so... First thing I don't get is that the memory consumption is different when attached to the debugger or not. (when using, or not, --inspect-brk)

Using a plain empty test:

// no imports at all 

describe("I've no idea what I'm doing", () => {
  beforeEach(() => {
    jest.clearAllMocks();
    jest.resetAllMocks();
    jest.restoreAllMocks();
    jest.resetModules();
    jest.clearAllMocks();
  });

  afterEach(() => {
    jest.clearAllMocks();
    jest.resetAllMocks();
    jest.restoreAllMocks();
    jest.resetModules();
  });

  test('parseDuration', () => {
    if (global.gc) {
      // console.log('I am able to GC...');
      global.gc();
    } else {
      // console.log('I cannot GC...');
    }
    expect(true).toEqual(true);
  });
});

Running the script with CHROME_INSPECT=true (so with the --inspect-brk parameter), the memory takes some time to stabilize (not even sure why, I thought running in band was not doing any kind of parallelizing), but it does stabilize.

Pierres-MBP-2:kibana pgayvallet$ ./memory_leak.sh x-pack/plugins/alerting/server/sample_test.ts
~/DEV/workspaces/elastic/kibana ~/DEV/workspaces/elastic/kibana
Debugger listening on ws://127.0.0.1:9229/8d3a6d68-43da-44ff-b5a5-d6797db416fb
For help, see: https://nodejs.org/en/docs/inspector
Debugger attached.
 PASS  x-pack/plugins/alerting/server/script_copy_test_17.test.ts (79 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_75.test.ts (93 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_91.test.ts (102 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_65.test.ts (110 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_81.test.ts (119 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_28.test.ts (127 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_38.test.ts (136 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_5.test.ts (144 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_53.test.ts (153 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_43.test.ts (160 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_88.test.ts (161 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_98.test.ts (161 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_31.test.ts (161 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_21.test.ts (161 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_42.test.ts (161 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_52.test.ts (161 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_20.test.ts (161 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_30.test.ts (161 MB heap size)

Now, running the script with CHROME_INSPECT=false (so without the --inspect-brk parameter), the heap memory grows

Pierres-MBP-2:kibana pgayvallet$ ./memory_leak.sh x-pack/plugins/alerting/server/sample_test.ts
~/DEV/workspaces/elastic/kibana ~/DEV/workspaces/elastic/kibana
 PASS  x-pack/plugins/alerting/server/script_copy_test_17.test.ts (77 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_75.test.ts (93 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_65.test.ts (105 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_91.test.ts (116 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_81.test.ts (127 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_28.test.ts (138 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_38.test.ts (149 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_5.test.ts (160 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_53.test.ts (172 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_43.test.ts (183 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_88.test.ts (195 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_98.test.ts (206 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_31.test.ts (217 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_21.test.ts (228 MB heap size)

Memory leak and core import

Now, the second thing is, I can't reproduce the memory not leaking when commenting the lines in core as specified in #117255 (comment)

With this test

// `x-pack/plugins/alerting/server/sample_test.ts`

import { parseDuration } from '.';

describe('memory leak as soon as we pull in and use "parseDuration" from "alerting', () => {
  beforeEach(() => {
    jest.clearAllMocks();
    jest.resetAllMocks();
    jest.restoreAllMocks();
    jest.resetModules();
    jest.clearAllMocks();
  });

  afterEach(() => {
    jest.clearAllMocks();
    jest.resetAllMocks();
    jest.restoreAllMocks();
    jest.resetModules();
  });

  test('parseDuration', () => {
    if (global.gc) {
      // console.log('I am able to GC...');
      global.gc();
    } else {
      // console.log('I cannot GC...');
    }
    parseDuration('5s');
    expect(true).toEqual(true);
  });
});

and these patches (the ones suggested to be solving the leak in #117255 (comment)):

Index: src/core/server/bootstrap.ts
===================================================================
diff --git a/src/core/server/bootstrap.ts b/src/core/server/bootstrap.ts
--- a/src/core/server/bootstrap.ts	(revision 14ed0cb89952fd6ff904f723b2e0a435961a5aa2)
+++ b/src/core/server/bootstrap.ts	(date 1637839868421)
@@ -8,7 +8,7 @@
 
 import chalk from 'chalk';
 import { CliArgs, Env, RawConfigService } from './config';
-import { Root } from './root';
+// import { Root } from './root';
 import { CriticalError } from './errors';
 
 interface BootstrapArgs {
@@ -44,7 +44,7 @@
   const rawConfigService = new RawConfigService(env.configs, applyConfigOverrides);
   rawConfigService.loadConfig();
 
-  const root = new Root(rawConfigService, env, onRootShutdown);
+  // const root = new Root(rawConfigService, env, onRootShutdown);
 
   process.on('SIGHUP', () => reloadConfiguration());
Index: src/core/server/index.ts
===================================================================
diff --git a/src/core/server/index.ts b/src/core/server/index.ts
--- a/src/core/server/index.ts	(revision 14ed0cb89952fd6ff904f723b2e0a435961a5aa2)
+++ b/src/core/server/index.ts	(date 1637839900052)
@@ -54,7 +54,7 @@
 import { CapabilitiesSetup, CapabilitiesStart } from './capabilities';
 import { MetricsServiceSetup, MetricsServiceStart } from './metrics';
 import { StatusServiceSetup } from './status';
-import { AppenderConfigType, appendersSchema, LoggingServiceSetup } from './logging';
+// import { AppenderConfigType, appendersSchema, LoggingServiceSetup } from './logging';
 import { CoreUsageDataStart, CoreUsageDataSetup } from './core_usage_data';
 import { I18nServiceSetup } from './i18n';
 import { DeprecationsServiceSetup, DeprecationsClient } from './deprecations';
@@ -579,6 +579,6 @@
     schema: elasticsearchConfigSchema,
   },
   logging: {
-    appenders: appendersSchema as Type<AppenderConfigType>,
+    // appenders: appendersSchema as Type<AppenderConfigType>,
   },
 };

With CHROME_INSPECT=false, and commented code in core

Pierres-MBP-2:kibana pgayvallet$ ./memory_leak.sh x-pack/plugins/alerting/server/sample_test.ts
~/DEV/workspaces/elastic/kibana ~/DEV/workspaces/elastic/kibana
 PASS  x-pack/plugins/alerting/server/script_copy_test_75.test.ts (37.586 s, 175 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_17.test.ts (255 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_65.test.ts (310 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_91.test.ts (371 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_81.test.ts (431 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_28.test.ts (492 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_38.test.ts (552 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_5.test.ts (613 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_53.test.ts (673 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_43.test.ts (734 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_88.test.ts (796 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_98.test.ts (856 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_31.test.ts (916 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_21.test.ts (977 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_42.test.ts (1037 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_52.test.ts (1097 MB heap size)

With CHROME_INSPECT=true, and commented code in core

Pierres-MBP-2:kibana pgayvallet$ ./memory_leak.sh x-pack/plugins/alerting/server/sample_test.ts
~/DEV/workspaces/elastic/kibana ~/DEV/workspaces/elastic/kibana
Debugger listening on ws://127.0.0.1:9229/cbc45010-c80d-4d83-b5b9-b826e22bc294
For help, see: https://nodejs.org/en/docs/inspector
Debugger attached.
 PASS  x-pack/plugins/alerting/server/script_copy_test_17.test.ts (76.659 s, 183 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_75.test.ts (5.984 s, 258 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_65.test.ts (310 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_91.test.ts (367 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_81.test.ts (423 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_28.test.ts (5.097 s, 480 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_38.test.ts (536 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_5.test.ts (593 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_53.test.ts (649 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_43.test.ts (706 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_88.test.ts (763 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_98.test.ts (820 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_31.test.ts (876 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_21.test.ts (933 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_42.test.ts (989 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_52.test.ts (5.418 s, 1045 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_20.test.ts (5.384 s, 1102 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_30.test.ts (5.585 s, 1159 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_99.test.ts (5.622 s, 1215 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_89.test.ts (5.229 s, 1271 MB heap size)

Even when being very aggressive with the cuts from src/core/server/index:

// src/core/server/index.ts
export { DEFAULT_APP_CATEGORIES, APP_WRAPPER_CLASS } from '../utils';
export { ServiceStatusLevels } from './status/types';
export const config = {};
// yup, that's all

I observe the same leak (here with CHROME_INSPECT=true)

Pierres-MBP-2:kibana pgayvallet$ ./memory_leak.sh x-pack/plugins/alerting/server/sample_test.ts
~/DEV/workspaces/elastic/kibana ~/DEV/workspaces/elastic/kibana
Debugger listening on ws://127.0.0.1:9229/aa5031c2-8750-4f03-a54f-91ebe957dccc
For help, see: https://nodejs.org/en/docs/inspector
Debugger attached.
 PASS  x-pack/plugins/alerting/server/script_copy_test_17.test.ts (53.18 s, 154 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_75.test.ts (214 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_65.test.ts (255 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_91.test.ts (300 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_81.test.ts (344 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_28.test.ts (389 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_38.test.ts (434 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_5.test.ts (478 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_53.test.ts (523 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_43.test.ts (568 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_88.test.ts (613 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_98.test.ts (658 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_31.test.ts (703 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_21.test.ts (747 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_42.test.ts (792 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_52.test.ts (837 MB heap size)

But I do confirm that using the absolute import for parseDuration solves the leak:

-- import { parseDuration } from '.';
++ import { parseDuration } from '../common/parse_duration';

give the following output

Pierres-MBP-2:kibana pgayvallet$ ./memory_leak.sh x-pack/plugins/alerting/server/sample_test.ts
~/DEV/workspaces/elastic/kibana ~/DEV/workspaces/elastic/kibana
Debugger listening on ws://127.0.0.1:9229/6d7d055a-172c-40c0-83ef-271ba8821f8d
For help, see: https://nodejs.org/en/docs/inspector
Debugger attached.
 PASS  x-pack/plugins/alerting/server/script_copy_test_17.test.ts (79 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_65.test.ts (93 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_75.test.ts (102 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_91.test.ts (110 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_81.test.ts (119 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_28.test.ts (127 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_38.test.ts (136 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_5.test.ts (144 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_53.test.ts (153 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_43.test.ts (160 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_88.test.ts (161 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_98.test.ts (161 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_31.test.ts (161 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_21.test.ts (161 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_42.test.ts (161 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_52.test.ts (161 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_20.test.ts (162 MB heap size)

Importing directly from src/core/server from the test

I then tried to adapt the test to import something directly from sr/core/server/index.

With all files reverted to main's state, and then importing only from src/core/server from the test

// x-pack/plugins/alerting/server/sample_test.ts
-- import { parseDuration } from '.';
++ import '../../../../src/core/server';

When --inspect-brk is not provided, I observe a leak of +- 30mb per run.

Now, the best part, when adding --inspect-brk and attaching, the process terminates with a segfault...

~/DEV/workspaces/elastic/kibana ~/DEV/workspaces/elastic/kibana
Debugger listening on ws://127.0.0.1:9229/a764d1c8-a626-4aba-942f-f0f6f843ef4d
For help, see: https://nodejs.org/en/docs/inspector
Debugger attached.
 PASS  x-pack/plugins/alerting/server/script_copy_test_17.test.ts (32.868 s, 131 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_75.test.ts (172 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_65.test.ts (195 MB heap size)
 PASS  x-pack/plugins/alerting/server/script_copy_test_91.test.ts (196 MB heap size)
 RUNS  x-pack/plugins/alerting/server/script_copy_test_81.test.ts
./memory_leak.sh: line 54: 70673 Segmentation fault: 11  node --max-old-space-size=$MAX_MEMORY --inspect-brk --expose-gc ./node_modules/.bin/jest --verbose --runInBand --logHeapUsage --detectOpenHandles --no-cache --config $KIBANA_PROJECT $DIR/script_copy_test_*.ts

If I change the import to

import '../../../../src/core/server/elasticsearch/client';

That's the same thing: when debugger is not attached, I observe a 23mb/run leak. But when the debugger is attached, I get a segfault after a few runs.

I can't explain the fact that the segfault is not occuring when importing indirectly (e.g when the test imports import { parseDuration } from '.';)...

Observation on the memory usage

Going back to the initial state of the code and the test (the test importing import { parseDuration } from '.';),

First thing is, if I trust https://chanind.github.io/javascript/2019/10/12/jest-tests-memory-leak.html, in case of a leak caused by monkey patching, the growing memory is caused by the objects that can't be collected, which are in the JSArrayBufferData or array category of the memory.

However, in a memory dump of 1200mb, 900meg of the memory used is in the (string) category

For example, the react-dom.development.js file's source (taking 2m) is present 25 times (which is the number of times the test did run before the memory snapshot)

Screenshot 2021-11-25 at 12 23 20

Same for esprima.js

Screenshot 2021-11-25 at 12 23 47

And a lot of others, such as enzyme, moment or even internal source files.

In summary

  • I'm really not sure the leak is really coming (only) from core
  • I don't understand why most of the memory is used by sources when it's not what is described by https://chanind.github.io/javascript/2019/10/12/jest-tests-memory-leak.html
  • not sure why I encountered segfault in some cases, maybe it's just the debugger interaction? Maybe it's something more?
  • I have no idea what I'm doing

@pgayvallet
Copy link
Contributor

pgayvallet commented Nov 25, 2021

Okay. I've been doing all my tests with --max-old-space-size=2000, and with the recent changes in node's GC (and the fact that apparently global.gc does nothing, at least in tests), this was causing the memory to increase because node doesn't care about GC'ing anymore until memory is around 50+% of the max allocation. (when hooked to the debugger, the debugger forces GCs before snapshots which explains the difference, at least that's my assumption). By using 300 or 500 as max-old-space-size, everything is way better, as GCs are performed during the batch of tests.

Now, I went back to my tests (all tests includes a Object.keys(importedStuff) to be sure that the import is effective)

  • with import { parseDuration } from '.' leaks 🔴
  • with import { parseDuration } from '.' with src/core/server/index.ts being empty leaks 🔴
  • with import { parseDuration } from '../common/parse_duration'; passes 🟢
  • with import { config } from '../../../../src/core/server' passes 🟢 (with or without the changes in [securitySolution] Jest memory leaks #117255 (comment))
  • with deep import { appendersSchema } from '../../../../src/core/server/logging'; passes 🟢

FWIW, empty src/core/server/index I used:

export const ServiceStatusLevels = { available: '', degraded: '', unavailable: '', critical: '' };
export const config = {};
export const APP_WRAPPER_CLASS = '';
export const DEFAULT_APP_CATEGORIES = {};

So unless someone can come with a scenario where importing indirectly from core would cause a leak when importing directly doesn't, plus the fact that the leak occurs when src/core/server doesn't import/export anything, I think we can exclude core as being the source of the leak?

@academo
Copy link
Contributor

academo commented Nov 29, 2021

@pgayvallet thanks for taking the time and running the tests. did you check the part I reported about the Root class and the logging?

dmlemeshko pushed a commit that referenced this issue Nov 29, 2021
…600 megs (#118734)

## Summary

This addresses parts of #117255

By introducing top level mocks for:
* `core/server/index.ts`
* `task_manager/server/index.ts`
* `alerting/server/index.ts`
* `actions/server/index.ts`

These top level mocks add the few required functions we use sparingly and adds them from the "restricted zones" to avoid giant typescript imports from happening from the server side which also pulls in the memory leaks. 

```ts
moduleNameMapper: {
    'core/server$': '<rootDir>/x-pack/plugins/security_solution/server/__mocks__/core.mock.ts',
    'task_manager/server$':
      '<rootDir>/x-pack/plugins/security_solution/server/__mocks__/task_manager.mock.ts',
    'alerting/server$': '<rootDir>/x-pack/plugins/security_solution/server/__mocks__/alert.mock.ts',
    'actions/server$': '<rootDir>/x-pack/plugins/security_solution/server/__mocks__/action.mock.ts',
  },
```

For testing this you can now run:
```sh
node --max-old-space-size=600 --expose-gc ./node_modules/.bin/jest --runInBand --logHeapUsage --detectOpenHandles --no-cache --config x-pack/plugins/security_solution/jest.config.dev.js x-pack/plugins/security_solution/server
```

And the server side tests will be able to complete in less than 600 megs of memory. The memory leaks and memory consumption issues are mitigated through the layers but this doesn't guarantee that in the future these won't show up again. The root of the issue(s) with the memory leaks from `core/server` aren't addressed here as those are separate concerns at this point but this at least mitigates the amount of leakage from our side for now.

### Checklist

- [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
@FrankHassanabad
Copy link
Contributor

FrankHassanabad commented Nov 30, 2021

@FrankHassanabad I see you merged #118734. Was that blocking you for something? As all test suites are impacted, I wonder if that kind of local patching / workaround is really a good idea?

@pgayvallet That's only for tests within security_solutions. Engineers were complaining that they couldn't run the tests without using a lot of memory and it was to ensure if someone from operations wanted to re-combine the tests they could.

That PR does reduce the leaks. I can remove it once the leaks are fixed. One thing of note is that the performance of the tests improved too since we cut away those parts (as an aside).

@FrankHassanabad
Copy link
Contributor

FrankHassanabad commented Dec 1, 2021

++ to @pgayvallet running our tests... 🙏 thank you for looking at this.

So unless someone can come with a scenario where importing indirectly from core would cause a leak when importing directly doesn't, plus the fact that the leak occurs when src/core/server doesn't import/export anything, I think we can exclude core as being the source of the leak?

@pgayvallet , I think maybe a big pro of what I did with #118734 is that I can test when a particular plugin such as core/server no longer has a memory leak by removing the line in which I mock the imports pulling in that code.

For example, in this draft PR here I removed core/server mock from all the server side tests:
https://github.com/elastic/kibana/pull/120051/files

and then if I run this command:

node --max-old-space-size=1000 ./node_modules/.bin/jest --runInBand --logHeapUsage --no-cache --config x-pack/plugins/security_solution/jest.config.dev.js x-pack/plugins/security_solution/server

With 1000 of memory it still leaks enough to blow up with a OOM error. Where if I keep the core/server mock I can run the tests at 600 which still seems to be pointing to a memory leak happening from code that is importing from core/server

@FrankHassanabad
Copy link
Contributor

FrankHassanabad commented Dec 1, 2021

In addition to my advice above with the draft PR, you can re-use the ./copy_tests.sh on individual tests within core/server to find ones that have leaks. For example I found that when I use the ./copy_tests.sh file with these settings:

KIBANA_PROJECT=src/core/jest.config.js
MAX_MEMORY=220

and run it like so:

./copy_tests.sh ~/projects/kibana/src/core/server/server.test.ts

I see it blow up with an OOM after ~32 iterations which seems to indicate at least that test which is isolated to core/sever has a memory leak. Not saying there aren't other leaks in the other plugins but it seems like something is going on within Kibana core (or an interaction with tests in jest).

You can change the iterations and the memory size to see that test appears to be leaking memory. For example I changed the loop to be an iteration cycle of 400 like so for i in {1..400} and then increased the MAX_MEMORY=300 and I see it blow up at the iteration cycle of ~106

@pgayvallet
Copy link
Contributor

pgayvallet commented Dec 1, 2021

did you check the part I reported about the Root class and the logging?

Yea, as described in #117255 (comment), I performed a test in the alerting plugin importing parseDuration, both with deep import and from the root index of the plugin, and I do not observe the same things. Even with src/core/server/index.ts being empty (or just with your suggested edits), the memory was leaking.

I just checked @FrankHassanabad's scenario regarding src/core/server/server.test.ts leaking, and I can confirm that I reproduce a heap out of memory with

MAX_MEMORY=250
KIBANA_PROJECT=src/core/jest.config.js
TEST_FILE=/src/core/server/server.test.ts
ITERATIONS=100

around the 43th iteration. The leak is so small that you can't see it just by looking at the heap size variation between the runs though, which gonna make it incredibly fun to progress further, as it takes a LOT of time to observe the presence of not of the leak.

Now, given that

  1. the import tree of src/core/server/server.test.ts leaks, according to the previous test

and

  1. a test importing x-pack/plugins/alerting/server/index.ts does leak even when the root index file of core/server is empty according to my tests in [securitySolution] Jest memory leaks #117255 (comment) (please feel free to double check my observations just to make sure)

and

  1. (I would love to be able to really confirm or infirm this one) such leaks in jest environment can only be caused by patching native modules

my conclusion is that the source of the leak is coming from a import performed both from the import tree of src/core/server/server.test.ts and of x-pack/plugins/alerting/server/index.ts, which is likely to be either one of our packages, or (more likely) one of our dependencies.

@academo do you remember which test you used to have the changes you described in #117255 (comment) get rid of the leak? I would love to have a reproduction scenario with a (single) test file where the leak (and non-leak) is more significant per run to avoid having to wait 10mins between each bisection.

@academo
Copy link
Contributor

academo commented Dec 1, 2021

@pgayvallet I used a dummy test inside fleet. Literally a new file with a describe, one test that only imported the server/core and then Object.keys and expect(true).toBe(true)

@pgayvallet
Copy link
Contributor

pgayvallet commented Dec 2, 2021

After thinking a bit about the src/core/server/server.test.ts test crashing with a heap out of memory error, I really wonder if that's really a memory leak, and not just the memory increasing because of the metrics collected by jest. I mean, it's only a +- 1mb increase per run, which could very likely be caused by the test runner collecting data.

From my understanding of https://chanind.github.io/javascript/2019/10/12/jest-tests-memory-leak.html, a leak because of patching native modules would be way more significant. The example they're using to demonstrate the leak is quite simple, and the dependency tree is way smaller than what we can have by just what src/core/server/server.test.ts is importing, and yet they observe a 30mb increase per run. Given that this memory increase is quite close to what I can observe with my test importing x-pack/alerting/server/index (#117255 (comment)), I'm really starting to wondering if this <1mb increase with this core test is not just the runner's collected data.

I tried to attach the debugger to inspect a memory dump while running to see if I could find anything, but as observed in #117255 (comment), if the suite completes without --inspect-brk, I get a segfault when running the exact same scenario with the debugger attached after using --inspect-brk. I don't know what to think about this to be honest, in one hand, encountering a segfault while searching for a leak can potentially be a confirmation that we're in the right direction, on the other hand, we don't encounter the problem when not attached to the debugger, so this could also potentially be a problem either with the debug mode or with what the chrome debugger does...

Just to have a track of it, I installed segfault-handler locally to have more info about the seg fault, and the crash is caused by some post-hook in node's GC itself. Not sure what to do with this info though.

PID 51656 received SIGSEGV for address: 0x0
0   segfault-handler.node               0x0000000113996fb0 _ZL16segfault_handleriP9__siginfoPv + 304
1   libsystem_platform.dylib            0x00007fff20377d7d _sigtramp + 29
2   ???                                 0x0000000000000004 0x0 + 4
3   node                                0x000000010eff3020 _ZN2v88internal13GlobalHandles4Node31PostGarbageCollectionProcessingEPNS0_7IsolateE + 128
4   node                                0x000000010eff3aa7 _ZN2v88internal13GlobalHandles31PostGarbageCollectionProcessingENS0_16GarbageCollectorENS_15GCCallbackFlagsE + 199
5   node                                0x000000010f03a8f5 _ZN2v88internal4Heap14CollectGarbageENS0_15AllocationSpaceENS0_23GarbageCollectionReasonENS_15GCCallbackFlagsE + 2453
6   node                                0x000000010f03bf9f _ZN2v88internal4Heap24PreciseCollectAllGarbageEiNS0_23GarbageCollectionReasonENS_15GCCallbackFlagsE + 95
7   node                                0x000000010ef03dd9 _ZN2v88internal25FunctionCallbackArguments4CallENS0_15CallHandlerInfoE + 265
8   node                                0x000000010ef038a6 _ZN2v88internal12_GLOBAL__N_119HandleApiCallHelperILb0EEENS0_11MaybeHandleINS0_6ObjectEEEPNS0_7IsolateENS0_6HandleINS0_10HeapObjectEEESA_NS8_INS0_20FunctionTemplateInfoEEENS8_IS4_EENS0_16BuiltinArgumentsE + 550
9   node                                0x000000010ef0301f _ZN2v88internal21Builtin_HandleApiCallEiPmPNS0_7IsolateE + 255
10  node                                0x000000010f773eb9 Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_BuiltinExit + 57
11  node                                0x000000010f7068aa Builtins_InterpreterEntryTrampoline + 202
./memory_leak.sh: line 54: 51656 Segmentation fault: 11

@academo

I used a dummy test inside fleet. Literally a new file with a describe, one test that only imported the server/core and then Object.keys and expect(true).toBe(true)

I double-checked just to be sure, but I can confirm I do not get to the same observations

With this test ran from x-pack/alerting/server,

import { parseDuration } from '.';

describe('memory leak as soon as we pull in and use "parseDuration" from "alerting', () => {
  test('parseDuration', () => {
    Object.keys(parseDuration);
    expect(true).toEqual(true);
  });
});

The memory leaks (~45mb per run) even with or without this patch applied

to go further, I also tried to totally mock src/core/server,

by adding this to x-pack/plugins/alerting/jest.config.js:

  moduleNameMapper: {
    'core/server$': '<rootDir>/x-pack/plugins/alerting/core.mock.ts',
  }

with x-pack/plugins/alerting/core.mock.ts being:

export const bootstrap = 'foo';
export const ServiceStatusLevels = { available: '', degraded: '', unavailable: '', critical: '' };
export const config = {};
export const APP_WRAPPER_CLASS = '';
export const DEFAULT_APP_CATEGORIES = {};

and this the test file adapted just to be sure that the module name mapper is used:

import { parseDuration } from '.';
import { bootstrap } from '../../../../src/core/server';

describe('memory leak as soon as we pull in and use "parseDuration" from "alerting', () => {
  test('parseDuration', () => {
    expect(bootstrap).toEqual('foo');
    Object.keys(parseDuration);
  });
});

and I observe the same ~45mb per run leak.

Now, this is where it gets interesting. If I also stub all alerting direct dependencies in x-pack/plugins/alerting/jest.config.js via

  moduleNameMapper: {
    'core/server$': '<rootDir>/x-pack/plugins/alerting/core.mock.ts',
    'actions/server$': '<rootDir>/x-pack/plugins/alerting/dep.mock.ts',
    'spaces/server$': '<rootDir>/x-pack/plugins/alerting/dep.mock.ts',
    'security/server$': '<rootDir>/x-pack/plugins/alerting/dep.mock.ts',
    'licensing/server$': '<rootDir>/x-pack/plugins/alerting/dep.mock.ts',
    'task_manager/server$': '<rootDir>/x-pack/plugins/alerting/dep.mock.ts',
    'event_log/server$': '<rootDir>/x-pack/plugins/alerting/dep.mock.ts',
    'encrypted_saved_objects/server$': '<rootDir>/x-pack/plugins/alerting/dep.mock.ts',
    'features/server$': '<rootDir>/x-pack/plugins/alerting/dep.mock.ts',
    'usage_collection/server$': '<rootDir>/x-pack/plugins/alerting/dep.mock.ts',
    'kibana_utils/server$': '<rootDir>/x-pack/plugins/alerting/dep.mock.ts',
  },

The memory is stable.

So this really makes me think that the 1mb memory increase observed in src/core/server/server.test.ts is a wrong lead, likely caused just by increasing memory usage of the jest reporting. At least what is certain is that there is a way more significant leak (which is way more in phase which what https://chanind.github.io/javascript/2019/10/12/jest-tests-memory-leak.html describes) somewhere in the dependency tree of alerting, and this is probably what we should focus on in my opinion.

@pgayvallet
Copy link
Contributor

pgayvallet commented Dec 2, 2021

So, I started commenting one by one the stubs in moduleNameMapper to identify the direct dependencies that are causing the leak. To be sure to identify all the dependencies causing the leak if there were multiple, I did not use bisecting, and did this one line at a time.

FWIW, I used 250MB max memory and 100 runs, and assumed there was no leak if the runs completed successfully.

NOT LEAKING when using real module:

  • kibana_utils
  • usage_collection
  • features
  • encrypted_saved_objects
  • event_log
  • task_manager
  • licensing
  • security
  • spaces

LEAKING when using real module:

  • actions

Even with all other dependencies mocked, the leak is obvious and 'clean' (approx 15/20mb per run increase). Just to be sure, I also did a test with all dependencies using the real implementation except for actions that was mocked, and there is no leak in that scenario. Then I did another attempt by just importing import '../../actions/server'; directly from the test with actions not mocked (while the rest still being stubbed), and the leak is present.

All this means the cause is somewhere within actions or its dependency tree. Yay, one step closer.

cc @elastic/kibana-alerting-services this doesn't mean this is coming directly from actions, it could also be from one it its dependency (either plugin or module), I will continue the investigation, but just so you known.

@pgayvallet
Copy link
Contributor

pgayvallet commented Dec 2, 2021

So, actions does not have any additional plugin dependencies compared to alerting, which means that the leak is coming either from actions itself, or from it's packages or node_modules imports, but not from other parts of the Kibana codebase.

All the following tests were performed in the same 'sandbox' as other tests (with src/core/server and all registered plugin dependencies being mocked via moduleNameMapper in the jest configuration, see #117255 (comment))

First thing I did was to see if importing actions/server/plugin.ts directly was causing the leak. It does.

Then, I used bisections to try to get closer to the source. And here's something interesting:

I isolated a reproduction scenario with only two imports:

import { getAuthorizationModeBySource } from '../../actions/server/authorization/get_authorization_mode_by_source';
import { ensureSufficientLicense } from '../../actions/server/lib/ensure_sufficient_license';

meaning that this reproduces the leak:

import { getAuthorizationModeBySource } from '../../actions/server/authorization/get_authorization_mode_by_source';
import { ensureSufficientLicense } from '../../actions/server/lib/ensure_sufficient_license';

describe('memory leak are hard', () => {
  test('this leaks', () => {
    Object.keys(getAuthorizationModeBySource);
    Object.keys(ensureSufficientLicense);
  });
});

However, the problem seems to be caused by some interactions between the two files (or their nested import trees), because using only one import, or the other, does not reproduce the leak.

This does not leak

// import { getAuthorizationModeBySource } from '../../actions/server/authorization/get_authorization_mode_by_source';
import { ensureSufficientLicense } from '../../actions/server/lib/ensure_sufficient_license';

describe('memory leak are hard', () => {
  test('this does not leak', () => {
    // Object.keys(getAuthorizationModeBySource);
    Object.keys(ensureSufficientLicense);
  });
});

This does not either

import { getAuthorizationModeBySource } from '../../actions/server/authorization/get_authorization_mode_by_source';
// import { ensureSufficientLicense } from '../../actions/server/lib/ensure_sufficient_license';

describe('memory leak are hard', () => {
  test('this does not leak', () => {
    Object.keys(getAuthorizationModeBySource);
    // Object.keys(ensureSufficientLicense);
  });
});

Not 100% sure how to progress further. I guess I will try to comment imports in these files see what it does.

Also note that given that these two imports are causing the leak, there may be other combination(S) of imports from actions/server/plugin leading to reproduce the leak too. But we should probably focus on trying to understand this reproduction scenario first.

@mshustov
Copy link
Contributor

mshustov commented Dec 2, 2021

However, the problem seems to be caused by some interactions between the two files (or their nested import trees), because using only one import, or the other, does not reproduce the leak.

Did you check whether there are circular imports between the files?

@pgayvallet
Copy link
Contributor

I isolated the imports in the two files that are causing this conflict-ish situation

In x-pack/plugins/actions/server/authorization/get_authorization_mode_by_source.ts, the import is

import { ActionExecutionSource, isSavedObjectExecutionSource } from '../lib';

in x-pack/plugins/actions/server/lib/ensure_sufficient_license.ts, the import is

import { ServerLogActionTypeId, IndexActionTypeId } from '../builtin_action_types';

Having only these lines in each file causes the leak. Commenting one (or both) removes the leak.

I then tried to reproduce the leak by importing these two dependencies directly, but this does NOT cause a leak:

import '../../actions/server/builtin_action_types';
import '../../actions/server/lib';

describe('memory leak are hard', () => {
  test('because we can', () => {
    expect('leak').toBeTruthy();
  });
});

At this point I agree with @mshustov, this looks like a circular dependency between x-pack/plugins/actions/server/authorization/get_authorization_mode_by_source.ts and x-pack/plugins/actions/server/lib/ensure_sufficient_license.ts though (both probably) actions/server/builtin_action_types and actions/server/lib

I apologize about the very verbose reporting by the way, but I want to keep track of the whole thinking in case we'll need to eventually do this again elsewhere.

@pgayvallet
Copy link
Contributor

pgayvallet commented Dec 2, 2021

So... I tried identifying a circular dependency passing though both those files, but unfortunately without any success
What I did

  • manually following the import tree hoping to find something, but given the number of files, it is like looking for a needle in a haystack
  • using madge (reports +- 50 internal circular imports in the actions plugin)
  • using an adapted version of src/dev/run_find_plugins_with_circular_deps.ts (reports +- 20 internal circular imports in the action plugin)

To be honest, I had issues with both scripts.

  • madge

Even after manually changing import to import type in a lot of places to have the dependency tree parser ignore type-only imports and specifying the path to the tsconfig, the parser don't match a folder to its index file (convert imports such as from '../foo' to from ./foo/index'), which is quite problematic as a huge part of the tree is ignored. I manually changed some imports to point explicitely to the index file instead of its folder, but I couldn't do it everywhere, so the report is probably incomplete.

  • src/dev/run_find_plugins_with_circular_deps.ts

Couldn't find a way to have him ignoring type imports, but even when following them, the script reports a lot less circular import results than madge, so...

Also, in theory, circular cycles passing though a type import should be cut at runtime as only used by TS. However I'm not 100% sure this is true for jest module loader, as I remember that in core, we had to move some imports around at some point to have code that was working in production behave correctly with the jest environment, so that doesn't help either.

I can't find a proper way to progress further from here, and am currently stuck.

@pgayvallet
Copy link
Contributor

pgayvallet commented Dec 2, 2021

So, as I couldn't think of anything better, I just decided to manually follow the leak from the x-pack/plugins/actions/server/authorization/get_authorization_mode_by_source.ts file via bisection. This was a great moment of fun, but I think I got it.

  • x-pack/plugins/actions/server/authorization/get_authorization_mode_by_source.ts
    -> import { ActionExecutionSource, isSavedObjectExecutionSource } from '../lib';
  • x-pack/plugins/actions/server/lib/index.ts
    -> export { ActionExecutor } from './action_executor';
  • x-pack/plugins/actions/server/lib/action_executor.ts
    -> import { withSpan } from '@kbn/apm-utils';
  • packages/kbn-apm-utils/src/index.ts
    -> import agent from 'elastic-apm-node';

This package is importing the apm agent, which, if I'm not mistaken, patches native module at import time. I'm not sure why the leak was appearing only when importing both files in the test, but...

To make sure, I reverted all my local changes in the actions plugin, and just commented this single line (external plugins and core dependencies still being stubbed):

// x-pack/plugins/actions/server/lib/action_executor.ts
-- import { withSpan } from '@kbn/apm-utils';

No more leak for

import { getAuthorizationModeBySource } from '../../actions/server/authorization/get_authorization_mode_by_source';
import { ensureSufficientLicense } from '../../actions/server/lib/ensure_sufficient_license';

describe('memory leak are hard', () => {
  test('because we can', () => {
    Object.keys(getAuthorizationModeBySource);
    Object.keys(ensureSufficientLicense);
  });
});

No more leak for

import { plugin } from '../../actions/server';

describe('memory leak are hard', () => {
  test('because we can', () => {
    Object.keys(plugin);
  });
});

Now... then I proceeded to remove all the moduleNameMapper entries from x-pack/plugins/alerting/jest.config.js to unstub all the actions external dependencies, and... well, I observe that the leak is present if I remove either core's stub or all the plugin's stub (did not try bisecting the plugins yet).

Not sure to get it to be honest, as I previously tested running a test importing alerting/server/index.ts with only actions being mocked, and this wasn't leaking, so this is very strange.

@pgayvallet
Copy link
Contributor

Removed all the stubs from moduleNameMapper of the jest config, and stubbed elastic-apm-node instead:

 'elastic-apm-node': '<rootDir>/x-pack/plugins/alerting/dep.mock.ts',

That's it. No more leak.

Now, we just have to decide how we handle this.

Could we potentially add this elastic-apm-node entry inside moduleNameMapper of the @kbn/test jest preset all our jest configurations are using @elastic/kibana-operations?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. loe:small Small Level of Effort OLM Sprint Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Team:Defend Workflows “EDR Workflows” sub-team of Security Solution Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.
Projects
None yet
Development

Successfully merging a pull request may close this issue.