Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flakiness on TestFilestreamMetadataUpdatedOnRename #42213

Conversation

belimawr
Copy link
Contributor

@belimawr belimawr commented Jan 3, 2025

Proposed commit message

For some reason this test became flaky, the root of the flakiness is not on the test, it is on how a rename operation is detected. Even though this test uses os.Rename, it does not seem to be an atomic operation. https://www.man7.org/linux/man-pages/man2/rename.2.html does not make it clear whether 'renameat' (used by os.Rename) is atomic.

On a flaky execution, the file is actually perceived as removed and then a new file is created, both with the same inode. This happens on a system that does not reuse inodes as soon they're freed. Because the file is detected as removed, it's state is also removed. Then when more data is added, only the offset of the new data is tracked by the registry, causing the test to fail.

A workaround for this is to not remove the state when the file is removed, hence clean_removed: false is set in the test config.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

## Disruptive User Impact
## Author's Checklist

How to test this PR locally

Run TestFilestreamMetadataUpdatedOnRename and ensure it does not fail

cd filebeat
go test -tags integration -run=TestFilestreamMetadataUpdatedOnRename -v -count=100 ./input/filestream/.

Related issues

I first saw this test failing on #41954, however the issue/fix does not seem related to the PR, so I'm creating a PR with the standalone fix so it can be easily backported to 8.x

Buildkite failure

## Use cases
## Screenshots
## Logs

For some reason this test became flaky, the root of the flakiness
is not on the test, it is on how a rename operation is detected.
Even though this test uses `os.Rename`, it does not seem to be an atomic
operation. https://www.man7.org/linux/man-pages/man2/rename.2.html
does not make it clear whether 'renameat' (used by `os.Rename`) is
atomic.

On a flaky execution, the file is actually perceived as removed
and then a new file is created, both with the same inode. This
happens on a system that does not reuse inodes as soon they're
freed. Because the file is detected as removed, it's state is also
removed. Then when more data is added, only the offset of the new
data is tracked by the registry, causing the test to fail.

A workaround for this is to not remove the state when the file is
removed, hence `clean_removed: false` is set in the test config.
@belimawr belimawr added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Jan 3, 2025
@belimawr belimawr self-assigned this Jan 3, 2025
@belimawr belimawr requested a review from a team as a code owner January 3, 2025 21:59
@belimawr belimawr requested review from faec and VihasMakwana January 3, 2025 21:59
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Jan 3, 2025
Copy link
Contributor

mergify bot commented Jan 3, 2025

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @belimawr? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit

Copy link
Contributor

mergify bot commented Jan 3, 2025

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Jan 3, 2025
@belimawr
Copy link
Contributor Author

belimawr commented Jan 6, 2025

main already has this fix, so I'm closing it in favour of #42221 created directly in 8.x

@belimawr belimawr closed this Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants