-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unstable Reliability test results #4386
Comments
Updated reliability test results:3/5 tests failed.Failures:
Testing: Verify that no worker disconnects from the master once they are connected.
Testing: Look for any error messages in the logs of the cluster nodes.
It seems related to: The main difference with the mentioned issue is that this behavior happened during all the activity in all the 25 workers. It was not related to OS.
Detected ERROR was always as the following log:
Testing: Check that cluster logs appear in the expected order.
This pattern happens in every worker.
It seems that log messages in Integrity check and Agent-groups recv have changed, however, it is clear that the message has the same meaning. |
Update:Tests were conducted to investigate the possibility of observing unstable results; however, the results remain consistent. The attached reports indicate the same outcomes as the 'new updated' results reported in this issue. Here are the links to the reports: It still appears peculiar that when performing specific searches with the old report results, some logs before the update were not found in each log. However, it is evident that the same artifacts were used, as indicated by the test results where the date of the footprinted logs remains consistent. On the other hand, the only variation observed was the Python and Linux version of the old report compared to the news. It will require continuing to monitor test reports over time and observing for repetitions to identify a common factor associated with result variations. |
I was not able to reproduce the first error
but the second one
It seems like some error lines from the first iteration appear in the attached artifacts, but the expected behavior is the one that I obtained in this attempt. This seems to be expected because the worker logs contain the first successful connection, as well as the connection loss and connection attempts. Besides, this commit 2bde406 should also fix the error message
and the next regex
|
UpdateExecuting tests from different venv to check the test's stability in different environment settings. No differences were detected from the last report. A new related issue was opened: |
UpdateUnfortunately, conducting tests in different environments and on several occasions, we have been unable to reproduce the error. |
UpdateMoving ETA from Aug4 to Aug8 because release-related tasks were developed during this current issue |
UpdateMoving ETA from Aug8 to Aug9 because release-related tasks were developed during this current issue review |
LGTM! great analysis @pro-akim @roronoasins |
Description
Different results were found running reliability tests over the same artifacts obtained from the workload benchmark test.
Research should be performed.
Details
Performed: wazuh/wazuh#18017
A member of the framework team tried to recreate #4364
Different Reliability report was detected.
Performing pytest test_cluster_logs/test_cluster_connection, it was reported:
Meanwhile in the first test was:
Therefore, using the same artifact downloaded from 274
And running the same test one more time,
Results of reports are both different
Previous report
New report
Checking data from logs, it is possible to see that both worked with the same data (dates the same in reliability/test_cluster/test_cluster_logs/test_cluster_worker_logs_order
/test_cluster_worker_logs_order.py::test_check_logs_order_workers)
The text was updated successfully, but these errors were encountered: