Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(subscriptions): Use RPC in subscriptions pipeline #6499

Merged
merged 34 commits into from
Nov 26, 2024

Conversation

shruthilayaj
Copy link
Member

@shruthilayaj shruthilayaj commented Nov 4, 2024

Updates the subscription pipeline to support RPCSubscriptionData.

Copy link

codecov bot commented Nov 4, 2024

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
2535 1 2534 5
View the top 1 failed tests by shortest run time
tests.subscriptions.test_subscription.TestEAPSpansRPCSubscriptionCreator::test[EAP spans RPC subscription]
Stack Traces | 0.184s run time
Traceback (most recent call last):
  File ".../tests/subscriptions/test_subscription.py", line 397, in test
    identifier = creator.create(subscription, self.timer)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../snuba/subscriptions/subscription.py", line 36, in create
    data.validate()
  File ".../snuba/subscriptions/data.py", line 189, in validate
    raise InvalidSubscriptionError("Multiple project IDs not supported.")
snuba.datasets.entity_subscriptions.validators.InvalidSubscriptionError: Multiple project IDs not supported.

To view more test analytics, go to the Test Analytics Dashboard
Got feedback? Let us know on Github

Comment on lines 370 to 373
rounded_ts = (
int(timestamp.replace(tzinfo=UTC).timestamp() / self.time_window_sec)
* self.time_window_sec
)
Copy link
Member Author

@shruthilayaj shruthilayaj Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hack for now, we want to round this to the lowest available granularity (15s) more context here

Comment on lines 348 to 353
if (self.request_name, self.request_version) not in REQUEST_TYPE_ALLOWLIST:
raise InvalidSubscriptionError(
f"{self.request_name} {self.request_version} not supported."
)

# TODO: Validate no group by, having, order by etc
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add more validation in a follow up so it's easier to review

@shruthilayaj shruthilayaj marked this pull request as ready for review November 8, 2024 21:29
@shruthilayaj shruthilayaj requested review from a team as code owners November 8, 2024 21:29
Copy link
Member

@volokluev volokluev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can the create subscription RPC call not be its own PR?

@shruthilayaj
Copy link
Member Author

shruthilayaj commented Nov 8, 2024

Why can the create subscription RPC call not be its own PR?

@volokluev Creating a subscription requires validation, so I needed to build and run the request before storing it. That introduces the RPCSubscriptionData class. And basically once I introduced it in one place, I kinda had to follow all the typing issues and handle it everywhere.

@shruthilayaj shruthilayaj changed the base branch from master to shruthi/feat/create-rpc-subscriptions November 18, 2024 16:56
@shruthilayaj shruthilayaj changed the title feat(subscriptions): Add create subscriptions RPC feat(subscriptions): Use RPC in subscriptions pipeline Nov 18, 2024
Base automatically changed from shruthi/feat/create-rpc-subscriptions to master November 20, 2024 18:43
@@ -95,7 +104,7 @@ def from_string(cls, value: str) -> SubscriptionIdentifier:


@dataclass(frozen=True, kw_only=True)
class SubscriptionData(ABC):
class _SubscriptionData(ABC, Generic[TRequest]):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_request returns generic type TRequest and run_query accepts TRequest.
So for SnQLSubscriptionData, build_request returns Request and run_query accepts Request
and for RPCSubscriptionData, build_request returns TimeSeriesRequest and run_query accepts TimeSeriesRequest

Copy link
Member

@volokluev volokluev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some cleanup comments but overall this is good to go

Comment on lines +220 to +221
# TODO: update it to round to the lowest granularity
# rounded_ts = int(timestamp.replace(tzinfo=UTC).timestamp() / 15) * 15
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this TODO still relevant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, waiting on https://github.com/getsentry/projects/issues/364 to be finished

entity_key,
PartitionId(partition_index),
)
entity = get_entity(EntityKey.EVENTS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this line here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, bad copy paste

PartitionId(partition_index),
)
entity = get_entity(EntityKey.EVENTS)
store.create(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in your tests, you should be creating subscriptions through the api call, not the implementation detail as you are doing here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 389 to 390
def setup_method(self) -> None:
self.dataset = get_dataset("metrics")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Comment on lines 80 to 150


def build_rpc_subscription(resolution: timedelta, org_id: int) -> Subscription:
return Subscription(
SubscriptionIdentifier(PartitionId(1), uuid.uuid4()),
RPCSubscriptionData.from_proto(
CreateSubscriptionRequestProto(
time_series_request=TimeSeriesRequest(
meta=RequestMeta(
project_ids=[1],
organization_id=org_id,
cogs_category="something",
referrer="something",
),
aggregations=[
AttributeAggregation(
aggregate=Function.FUNCTION_SUM,
key=AttributeKey(
type=AttributeKey.TYPE_FLOAT, name="test_metric"
),
label="sum",
extrapolation_mode=ExtrapolationMode.EXTRAPOLATION_MODE_SAMPLE_WEIGHTED,
),
],
filter=TraceItemFilter(
comparison_filter=ComparisonFilter(
key=AttributeKey(type=AttributeKey.TYPE_STRING, name="foo"),
op=ComparisonFilter.OP_NOT_EQUALS,
value=AttributeValue(val_str="bar"),
)
),
),
time_window_secs=300,
resolution_secs=int(resolution.total_seconds()),
),
EntityKey.EAP_SPANS,
),
)


@pytest.fixture
def expected_rpc_subs() -> MutableSequence[Subscription]:
return [
build_rpc_subscription(timedelta(minutes=1), 2)
for count in range(randint(1, 50))
]


@pytest.fixture
def extra_rpc_subs() -> MutableSequence[Subscription]:
return [
build_rpc_subscription(timedelta(minutes=3), 1)
for count in range(randint(1, 50))
]


@patch("snuba.settings.SLICED_STORAGE_SETS", {"events_analytics_platform": 3})
@patch(
"snuba.settings.LOGICAL_PARTITION_MAPPING",
{"events_analytics_platform": {0: 0, 1: 1, 2: 2}},
)
def test_filter_rpc_subscriptions(expected_rpc_subs, extra_rpc_subs) -> None: # type: ignore
importlib.reload(scheduler)

filtered_subs = filter_subscriptions(
subscriptions=expected_rpc_subs + extra_rpc_subs,
entity_key=EntityKey.EAP_SPANS,
metrics=DummyMetricsBackend(strict=True),
slice_id=2,
)
assert filtered_subs == expected_rpc_subs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is not necessary.

  1. eap_spans is not a sliced storage
  2. this is far in the weeds of the implementation details of the subscription scheduler

@@ -3,6 +3,7 @@
should be stored. These do not require individual physical partitions but allow
for repartitioning with less code changes per physical change.
"""

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔪

@@ -98,15 +116,20 @@ def decode(self, value: KafkaPayload) -> ScheduledSubscriptionTask:

entity_key = EntityKey(scheduled_subscription_dict["entity"])

data = scheduled_subscription_dict["task"]["data"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: the docstring of this class should probably change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volokluev
Copy link
Member

When you go to merge this, please notify the oncall in #discuss-eng-sns this change is touching a lot of crucial infra. It's good you have done an end to end test locally. We should make sure that it doesn't break things in S4S when it's rolled out

@shruthilayaj
Copy link
Member Author

When you go to merge this, please notify the oncall in #discuss-eng-sns this change is touching a lot of crucial infra. It's good you have done an end to end test locally. We should make sure that it doesn't break things in S4S when it's rolled out

Will do, I'll address your comments and merge tomorrow 👍

@shruthilayaj shruthilayaj merged commit ffa50b5 into master Nov 26, 2024
30 checks passed
@shruthilayaj shruthilayaj deleted the shruthi/feat/add-create-subscriptions-endpoint branch November 26, 2024 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants