Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent results in Indexer Vulnerability Detection performance tests #5667

Closed
fcaffieri opened this issue Aug 9, 2024 · 6 comments
Closed
Assignees
Labels

Comments

@fcaffieri
Copy link
Member

fcaffieri commented Aug 9, 2024

Description

Analyzing Release 4.9.0 - Beta 1 - Vulnerability Detection performance test It is found that the Indexer 2 CPU plot appears incorrect or inconsistent.
In the rest of the plots you can see that the indexer was not working or went silent. There are no errors detected in the logs that refer to this

Detail:

artifacts.tgz.zip

image

4.8.1:

image

4.9.0

image

More research required

@MARCOSD4
Copy link
Member

After investigating the possible cause of this issue, it has been found that this is not a consistent problem. These Vulnerability Detection performance tests have been run numerous times since 4.8.0 and this problem has been appearing without any apparent pattern between the different tests (medium, high, and very high). For example, this issue was reported for the medium activity case in 4.9.0-beta1 but has not appeared in the recent 4.9.0-beta2 tests. In addition, I have re-run the medium activity test with the same parameters and the same thing has not occurred either:

Build: https://ci.wazuh.info/job/CLUSTER-Workload_benchmarks_metrics/625/
Artifacts: artifacts.zip

This leads us to conclude that this is not a consistent problem and that its cause is more complicated to determine.

This problem has already been reported in this issue in order to unify the processes that appeared to be split when monitoring them. This division makes the results inconsistent and difficult to compare and analyze. Therefore, to improve the accuracy of the analysis and its usability, it was decided to add a parameter to unify the generated threads into a single one. This parameter should be used in these tests, as it was introduced precisely for use here, and it will also allow the comparison to be more accurate.
To check this, I re-launched the tests for the very high activity case, which was the only one where this problem appeared in 4.9.0-beta2, checking the option to unify the processes. The results should be analyzed tomorrow: https://ci.wazuh.info/job/CLUSTER-Workload_benchmarks_metrics/627/

@MARCOSD4
Copy link
Member

Moved to On hold until the build is complete.

@wazuhci wazuhci moved this from In progress to On hold in Release 4.9.0 Aug 13, 2024
@rauldpm
Copy link
Member

rauldpm commented Aug 13, 2024

@MARCOSD4 could it be that there is a master node and two workers? that would explain why one node has less resource usage (master) and why two of them have a high usage (workers), but we need to know if this behavior is expected with that workload

@MARCOSD4
Copy link
Member

Conclusion

Indeed, as @rauldpm points out, one indexer's low use of resources is because it acts as a coordinator node and delegates the operations to the rest of the indexers. This occurs if you keep the default configuration when installing Wazuh, as in the case of these tests. Then, OpenSearch automatically chooses a coordinator node and the rest of the indexers perform the data ingest operations. It can also happen that all the nodes distribute the operations among them and there is no coordinator as such, in that case, all the indexers would have acceptable resource usage values, as has also been seen in other tests.

Therefore, the behavior reported in this issue is expected and should be considered in the future for these tests. In addition, it would be advisable to launch the tests using the UNIFY_BINARY_PROCESSES_FOR_PLOTTING pipeline option to avoid confusion with the child processes and make the analysis more accurate, as mentioned above.

@wazuhci wazuhci moved this from In progress to Pending review in Release 4.9.0 Aug 14, 2024
@wazuhci wazuhci moved this from Pending review to In review in Release 4.9.0 Aug 14, 2024
@santipadilla
Copy link
Member

LGTM

@rauldpm
Copy link
Member

rauldpm commented Aug 14, 2024

Good job @MARCOSD4! We will take it into account in the next release of testing

  1. We must identify which node is the master in both versions and compare those nodes, for the workers we will just compare them randomly as we can't identify them individually
  2. W will set the unify daemon in the pipeline

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Status: Done
Development

No branches or pull requests

4 participants