-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: weekly digest #26337
feat: weekly digest #26337
Conversation
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the |
@joethreepwood once this is in we will start sending the events to posthog. You and I should iterate on the actual email copy, then we can turn it on for a percent of people and see how it performs. |
posthog/tasks/scheduled.py
Outdated
@@ -130,6 +132,14 @@ def setup_periodic_tasks(sender: Celery, **kwargs: Any) -> None: | |||
name="update quota limiting", | |||
) | |||
|
|||
# Send all periodic digest reports | |||
weekly_digest_end_date = datetime.now().replace(hour=0, minute=0, second=0, microsecond=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The weekly_digest_end_date
will be evaluated when the code is first loaded/instantiated, not when the task runs. This means the datetime will be fixed to whatever time the code was initialized, which is probably not what you want for a weekly report.
You could move the datetime calculation inside the function:
def send_all_periodic_digest_reports(end_date=None):
if end_date is None:
end_date = datetime.now().replace(hour=0, minute=0, second=0, microsecond=0)
# rest of your report logic...
sender.add_periodic_task(
crontab(hour="9", minute="0", day_of_week="mon"),
send_all_periodic_digest_reports.s(), # Remove the end_date parameter
name="send all weekly digest reports",
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the only blocker from my comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I do want to define the time when it's called. That way we can be flexible with the date. Say the task fails and we want to resend it on a later date, but still only get the updates for the prior week. We need to define the end date at the beginning in that case. Similar to usage reports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No but this won't work because the date time will be fixed to the time the server started so it'll get reset each time there is a deployment. My suggested change would make it so it uses the time when the function is called based on the cron.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OH
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The date is still flexible with an optional end date arg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sorry I was just being dumb, good call-out, will fix
) -> None: | ||
if timestamp and isinstance(timestamp, str): | ||
try: | ||
timestamp = parser.isoparse(timestamp) | ||
except ValueError: | ||
timestamp = None | ||
|
||
if not organization_id and not team_id: | ||
raise ValueError("Either organization_id or team_id must be provided") | ||
|
||
if is_cloud(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find if statements like this hard to read, I just ran this through Claude to cleanup and it might be a better pattern to do something like (not tested and not a blocker):
def get_distinct_ids_for_team(team_id: int) -> list[str]:
"""Get distinct IDs for all users with access to a team."""
team = Team.objects.get(id=team_id)
return [user.distinct_id for user in team.all_users_with_access()]
def get_distinct_ids_for_organization(organization_id: int) -> list[str]:
"""Get distinct IDs for all organization members."""
return list(
OrganizationMembership.objects.filter(organization_id=organization_id)
.values_list("user__distinct_id", flat=True)
)
def get_org_distinct_id(organization_id: int) -> str:
"""Get distinct ID for organization owner or fallback."""
org_owner = get_org_owner_or_first_user(organization_id)
return org_owner.distinct_id if org_owner and org_owner.distinct_id else f"org-{organization_id}"
def capture_cloud_event(
name: str,
properties: dict,
organization_id: int | None,
team_id: int | None,
send_for_all_members: bool,
timestamp: datetime,
) -> None:
"""Capture analytics event for cloud deployment."""
distinct_ids = []
# Determine organization_id if not provided
if not organization_id and team_id:
team = Team.objects.get(id=team_id)
organization_id = team.organization_id
# Get distinct_ids based on send_for_all_members flag
if send_for_all_members:
if organization_id:
distinct_ids = get_distinct_ids_for_organization(organization_id)
elif team_id:
distinct_ids = get_distinct_ids_for_team(team_id)
else:
distinct_ids = [get_org_distinct_id(organization_id)] if organization_id else []
# Capture events for all distinct_ids
for distinct_id in distinct_ids:
pha_client.capture(
distinct_id,
name,
{**properties, "scope": "user"},
groups={"organization": organization_id, "instance": settings.SITE_URL},
timestamp=timestamp,
)
if organization_id:
pha_client.group_identify("organization", organization_id, properties)
def capture_self_hosted_event(
name: str,
properties: dict,
timestamp: datetime,
) -> None:
"""Capture analytics event for self-hosted deployment."""
pha_client.capture(
get_machine_id(),
name,
{**properties, "scope": "machine"},
groups={"instance": settings.SITE_URL},
timestamp=timestamp,
)
pha_client.group_identify("instance", settings.SITE_URL, properties)
def capture_analytics_event(
name: str,
properties: dict,
organization_id: int | None = None,
team_id: int | None = None,
send_for_all_members: bool = False,
timestamp: datetime | None = None,
) -> None:
"""Main entry point for capturing analytics events."""
if is_cloud():
capture_cloud_event(
name,
properties,
organization_id,
team_id,
send_for_all_members,
timestamp,
)
else:
capture_self_hosted_event(name, properties, timestamp)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need to check and fail somewhere if one or the other isn't provided, though. Otherwise this will fail silently.
9c37ddb
to
3b0189e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think the task end date is wrong but looks good outside of that.
Problem
I'd like to start letting people know what is happening in their posthog org/data every week. This is a first stab at that.
Changes
On mondays, gathers up a bunch of things that have happened in a project, like new feature flags created or surveys launched, and sends and event to posthog with that info. The event will be consumed by customer.io and an email sent out with the deets.
Todo:
👉 Stay up-to-date with PostHog coding conventions for a smoother review.
Does this work well for both Cloud and self-hosted?
How did you test this code?
Added a test, but need to move it when I abstract this out