Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[osquery 5.13.1] osqueryd crashing on Fedora 40 host with distributed queries #20594

Closed
dherder opened this issue Jul 18, 2024 · 14 comments
Closed
Assignees
Labels
bug Something isn't working as documented customer-redwine #g-endpoint-ops Endpoint ops product group ~osquery core Relates to a change in osquery core. P2 Prioritize as urgent :release Ready to write code. Scheduled in a release. See "Making changes" in handbook. ~released bug This bug was found in a stable release.
Milestone

Comments

@dherder
Copy link
Contributor

dherder commented Jul 18, 2024

Fleet version: 4.53.1


πŸ’₯ Β Actual behavior

When running fleetd on a Fedora 40 host, continued osqueryd crashing is observed.

πŸ§‘β€πŸ’» Β Steps to reproduce

Crash logs:
https://drive.google.com/file/d/1oazsLdWMmoMLRMYPbINV409n199Fj_sj/view?usp=drive_link

@dherder dherder added bug Something isn't working as documented :reproduce Involves documenting reproduction steps in the issue :incoming New issue in triage process. customer-redwine labels Jul 18, 2024
@dherder
Copy link
Contributor Author

dherder commented Jul 18, 2024

For whoever looks at this issue, we have repro'd in our cloud eval environment. Please reach out to me for the creds to access.

@sharon-fdm sharon-fdm added :release Ready to write code. Scheduled in a release. See "Making changes" in handbook. #g-endpoint-ops Endpoint ops product group and removed :reproduce Involves documenting reproduction steps in the issue labels Jul 19, 2024
@sharon-fdm
Copy link
Collaborator

Reproduced by @dherder removing the reproduce label.

@dherder
Copy link
Contributor Author

dherder commented Jul 19, 2024

@sharon-fdm this is a high priority issue for prospect-redwine. Should we add the "P2" label?

@lucasmrod lucasmrod added the ~osquery core Relates to a change in osquery core. label Jul 19, 2024
@lucasmrod
Copy link
Member

Seems related to kolide/launcher#1773

And see public discussion here.

@sharon-fdm
Copy link
Collaborator

sharon-fdm commented Jul 19, 2024

@dherder, makes sense to me to add P2.
We can swap it with other items in the sprint (@noahtalerman maybe swap with #19561).
@lukeheath, please approve.

@lukeheath lukeheath added the P2 Prioritize as urgent label Jul 19, 2024
@lukeheath
Copy link
Member

@dherder @sharon-fdm Agreed, upgrading to P2.

@sharon-fdm
Copy link
Collaborator

Thanks @lukeheath

@lukeheath lukeheath added the ~released bug This bug was found in a stable release. label Jul 19, 2024
@sharon-fdm sharon-fdm added this to the 4.56.0-tentative milestone Jul 23, 2024
@lukeheath lukeheath modified the milestones: 4.56.0-tentative, 4.55.0-tentative Jul 23, 2024
@lucasmrod lucasmrod removed this from the 1.29.0-fleetd milestone Jul 24, 2024
@lucasmrod lucasmrod removed the :incoming New issue in triage process. label Jul 24, 2024
@lucasmrod
Copy link
Member

lucasmrod commented Jul 24, 2024

UPDATE:

How to reproduce

We've reproduced the issue in Fedora 38 and 40 (most likely it's on Fedora 39 too). You can reproduce this by running the following query: select 1 from rpm_packages rp, os_version ov where rp.name = "foo-fedora-playbooks" AND ov.name = "Fedora Linux"; on such systems (it will crash osquery 3 out 5 times or so).

What's happening

I've opened an osquery issue that describes the findings so far: osquery/osquery#8384.

Next steps

Stefano from the osquery team will try to upgrade librpm in osquery from 4.18.0 to 4.18.2. 4.18.2 has the following fix related to the segfault: https://patchwork.yoctoproject.org/project/oe-core/patch/[email protected]. We believe that upgrading may fix the crash.

@JoStableford
Copy link
Contributor

@JoStableford
Copy link
Contributor

@sharon-fdm sharon-fdm added this to the 4.55.0-tentative milestone Jul 31, 2024
@lucasmrod lucasmrod changed the title osqueryd crashing on Fedora 40 host with distributed queries [osquery 5.13.1/5.14.0 (TBD)] osqueryd crashing on Fedora 40 host with distributed queries Aug 1, 2024
@lucasmrod lucasmrod removed this from the 4.55.0 milestone Aug 1, 2024
@lucasmrod
Copy link
Member

Removing Fleet's milestone as this is an osquery core bug (being fixed in 5.13.1 or 5.14.0, TBD)

@lucasmrod
Copy link
Member

I can confirm that the update of librpm from 4.18.0 to 4.18.2 fixed the issue (no more crashes when querying rpm_packages). (Verified by downloading artifacts from today's master builds which contain the bug fix.)

@lucasmrod lucasmrod changed the title [osquery 5.13.1/5.14.0 (TBD)] osqueryd crashing on Fedora 40 host with distributed queries [osquery 5.13.1] osqueryd crashing on Fedora 40 host with distributed queries Aug 9, 2024
@xpkoala
Copy link
Contributor

xpkoala commented Aug 19, 2024

Confirmed no crashes on Fedora 38.

@sharon-fdm sharon-fdm added this to the 4.56.0-tentative milestone Aug 19, 2024
@fleet-release
Copy link
Contributor

Fedora host finds calm,
Crashing clouds part with each query,
Fleet sails smooth, no harm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as documented customer-redwine #g-endpoint-ops Endpoint ops product group ~osquery core Relates to a change in osquery core. P2 Prioritize as urgent :release Ready to write code. Scheduled in a release. See "Making changes" in handbook. ~released bug This bug was found in a stable release.
Development

No branches or pull requests

7 participants