feat(events tracking): add abstract class and logging implementation #80117

victoria-yining-huang · 2024-10-31T23:29:25Z

need to track the completion of each stage, to 1) compute events conversion rates 2) enable debugging visibility into where events are being dropped

the usage will be heavily sampled to not blow up traffic

this PR only adds REDIS_PUT stage, in subsequent PRs I will add all the other stages listed in EventStageStatus class

!!!!!IMPORTANT!!!!!!
hash based sampling
here's a blog post explaining hash based sampling, which would provide "all or nothing" logging for the events sampled across the entire pipeline. That's the idea I want to implement

the hashing algorithm used must be consistent and uniformly distributed in order for all or nothing sampling to work.
I cannot find references that say that md5 is consistent and evenly distributed other than various stackoverflow pages. All the official sources are too academic and long and i can't understand

for reviewers:
please review with the thoughts of how this can be generalized to other pipelines as well, such as errors

src/sentry/utils/event_tracker.py

codecov · 2024-11-01T23:59:39Z

Codecov Report

Attention: Patch coverage is 92.85714% with 2 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
src/sentry/utils/event_tracker.py	92.00%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #80117      +/-   ##
==========================================
- Coverage   78.48%   78.48%   -0.01%     
==========================================
  Files        7210     7207       -3     
  Lines      319532   319607      +75     
  Branches    43963    43989      +26     
==========================================
+ Hits       250797   250841      +44     
- Misses      62348    62371      +23     
- Partials     6387     6395       +8

lynnagara · 2024-11-14T19:09:28Z

src/sentry/ingest/consumer/processors.py

@@ -202,6 +203,10 @@ def process_event(
        else:
            with metrics.timer("ingest_consumer._store_event"):
                cache_key = processing_store.store(data)
+            track_sampled_event(
+                data["event_id"], data.get("type"), TransactionStageStatus.REDIS_PUT


I think you want the pipeline name not data.get("type") here. They are not always the same.

im using data.get("type") to make it generalized, so when this is eventually extended for errors, it will work too. Do you have any concerns a bout this?

hardcoded it to only take transactions for now

Yes, you cannot use data.get("type") in errors - there are many different types going through the errors pipeline. The pipeline name should be the generalized thing.

src/sentry/utils/event_tracker.py

src/sentry/options/defaults.py

src/sentry/utils/event_tracker.py

victoria-yining-huang · 2024-11-16T00:36:03Z

The reason originally was because it was planned to be stored in redis. smaller enums means higher sampling rate for the same memory usage.
As we are not storing them in redis, there is no reason for int enum. Please turn them to strings.

@fpacifici int in logging should still be cheaper than string in google logs? Or is that too negligible

add updated list of enums add sampling add redis put add sampling logic add extra remove class add .value change enum value to int use IntEnum add test first pass add hash sampling update status enum docstring comment add wip wip tests pass change to should_track add TransactionStageStatus return if rate 0 add unit test remove old test add comments add TODO add event type use options automator update comment option to 0 only use options override in tests another way

lynnagara · 2024-11-21T01:42:27Z

tests/sentry/utils/test_event_tracker.py

+if __name__ == "__main__":
+    unittest.main()


why was this needed?

…80117) [design doc](https://www.notion.so/sentry/Conversion-rate-of-ingest-transactions-to-save-trx-1298b10e4b5d801ab517c8e2218d13d5) need to track the completion of each stage, to 1) compute events conversion rates 2) enable debugging visibility into where events are being dropped the usage will be heavily sampled to not blow up traffic this PR only adds REDIS_PUT stage, in subsequent PRs I will add all the other stages listed in EventStageStatus class **!!!!!IMPORTANT!!!!!!** hash based sampling here's a [blog post](https://www.rsyslog.com/doc/tutorials/hash_sampling.html) explaining hash based sampling, which would provide "all or nothing" logging for the events sampled across the entire pipeline. That's the idea I want to implement the hashing algorithm used must be consistent and uniformly distributed in order for all or nothing sampling to work. I cannot find references that say that md5 is consistent and evenly distributed other than various [stackoverflow pages](https://crypto.stackexchange.com/questions/14967/distribution-for-a-subset-of-md5). All the official sources are too academic and long and i can't understand ---------- for reviewers: please review with the thoughts of how this can be generalized to other pipelines as well, such as errors

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Oct 31, 2024

vercel bot deployed to Preview October 31, 2024 23:33 View deployment

fpacifici reviewed Nov 1, 2024

View reviewed changes

src/sentry/utils/event_tracker.py Outdated Show resolved Hide resolved

src/sentry/utils/event_tracker.py Outdated Show resolved Hide resolved

lynnagara reviewed Nov 1, 2024

View reviewed changes

src/sentry/utils/event_tracker.py Outdated Show resolved Hide resolved

lynnagara reviewed Nov 1, 2024

View reviewed changes

src/sentry/utils/event_tracker.py Outdated Show resolved Hide resolved

lynnagara reviewed Nov 1, 2024

View reviewed changes

src/sentry/utils/event_tracker.py Outdated Show resolved Hide resolved

victoria-yining-huang force-pushed the vic/add_logging_module branch from a147a53 to c09a7c4 Compare November 1, 2024 20:31

vercel bot deployed to Preview November 1, 2024 20:38 View deployment

victoria-yining-huang force-pushed the vic/add_logging_module branch from 2ebf8d5 to 8b5234b Compare November 1, 2024 20:58

vercel bot deployed to Preview November 1, 2024 21:05 View deployment

lynnagara reviewed Nov 1, 2024

View reviewed changes

src/sentry/utils/event_tracker.py Outdated Show resolved Hide resolved

victoria-yining-huang force-pushed the vic/add_logging_module branch from 9d4217c to 3143ea2 Compare November 1, 2024 23:20

victoria-yining-huang requested a review from a team as a code owner November 1, 2024 23:20

vercel bot deployed to Preview November 1, 2024 23:27 View deployment

victoria-yining-huang requested review from a team as code owners November 4, 2024 22:06

vercel bot deployed to Preview November 12, 2024 21:03 View deployment

vercel bot deployed to Preview November 12, 2024 21:17 View deployment

vercel bot deployed to Preview November 13, 2024 20:22 View deployment

victoria-yining-huang force-pushed the vic/add_logging_module branch from 43d3e93 to 260c0db Compare November 14, 2024 19:06

lynnagara reviewed Nov 14, 2024

View reviewed changes

src/sentry/utils/event_tracker.py Outdated Show resolved Hide resolved

lynnagara reviewed Nov 14, 2024

View reviewed changes

src/sentry/utils/event_tracker.py Outdated Show resolved Hide resolved

lynnagara reviewed Nov 14, 2024

View reviewed changes

src/sentry/utils/event_tracker.py Outdated Show resolved Hide resolved

vercel bot deployed to Preview November 14, 2024 19:13 View deployment

lynnagara reviewed Nov 14, 2024

View reviewed changes

src/sentry/utils/event_tracker.py Outdated Show resolved Hide resolved

fpacifici reviewed Nov 15, 2024

View reviewed changes

src/sentry/options/defaults.py Outdated Show resolved Hide resolved

src/sentry/utils/event_tracker.py Outdated Show resolved Hide resolved

src/sentry/utils/event_tracker.py Show resolved Hide resolved

vercel bot deployed to Preview November 16, 2024 00:38 View deployment

vercel bot deployed to Preview November 16, 2024 01:01 View deployment

victoria-yining-huang force-pushed the vic/add_logging_module branch from bdc30bc to 384a541 Compare November 18, 2024 21:47

vercel bot deployed to Preview November 18, 2024 21:50 View deployment

victoria-yining-huang force-pushed the vic/add_logging_module branch from 384a541 to 8bb87dc Compare November 18, 2024 22:01

vercel bot deployed to Preview November 18, 2024 22:05 View deployment

add flag

7c589e3

vercel bot deployed to Preview November 18, 2024 23:41 View deployment

fixed tests

da9c427

vercel bot deployed to Preview November 19, 2024 18:06 View deployment

change to string

e3c267c

vercel bot deployed to Preview November 20, 2024 19:50 View deployment

victoria-yining-huang merged commit e5c6492 into master Nov 20, 2024
50 checks passed

victoria-yining-huang deleted the vic/add_logging_module branch November 20, 2024 20:50

lynnagara reviewed Nov 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(events tracking): add abstract class and logging implementation #80117

feat(events tracking): add abstract class and logging implementation #80117

victoria-yining-huang commented Oct 31, 2024 •

edited

Loading

codecov bot commented Nov 1, 2024 •

edited

Loading

lynnagara Nov 14, 2024

victoria-yining-huang Nov 16, 2024

victoria-yining-huang Nov 19, 2024

lynnagara Nov 21, 2024

victoria-yining-huang commented Nov 16, 2024 •

edited

Loading

lynnagara Nov 21, 2024

feat(events tracking): add abstract class and logging implementation #80117

feat(events tracking): add abstract class and logging implementation #80117

Conversation

victoria-yining-huang commented Oct 31, 2024 • edited Loading

codecov bot commented Nov 1, 2024 • edited Loading

Codecov Report

lynnagara Nov 14, 2024

Choose a reason for hiding this comment

victoria-yining-huang Nov 16, 2024

Choose a reason for hiding this comment

victoria-yining-huang Nov 19, 2024

Choose a reason for hiding this comment

lynnagara Nov 21, 2024

Choose a reason for hiding this comment

victoria-yining-huang commented Nov 16, 2024 • edited Loading

lynnagara Nov 21, 2024

Choose a reason for hiding this comment

victoria-yining-huang commented Oct 31, 2024 •

edited

Loading

codecov bot commented Nov 1, 2024 •

edited

Loading

victoria-yining-huang commented Nov 16, 2024 •

edited

Loading