Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: weekly digest #26337

Merged
merged 16 commits into from
Dec 12, 2024
Merged

feat: weekly digest #26337

merged 16 commits into from
Dec 12, 2024

Conversation

raquelmsmith
Copy link
Member

@raquelmsmith raquelmsmith commented Nov 21, 2024

Problem

I'd like to start letting people know what is happening in their posthog org/data every week. This is a first stab at that.

Changes

On mondays, gathers up a bunch of things that have happened in a project, like new feature flags created or surveys launched, and sends and event to posthog with that info. The event will be consumed by customer.io and an email sent out with the deets.

Todo:

  • Add survey response data - I'm actually gonna do this later

👉 Stay up-to-date with PostHog coding conventions for a smoother review.

Does this work well for both Cloud and self-hosted?

How did you test this code?

Added a test, but need to move it when I abstract this out

@posthog-bot
Copy link
Contributor

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week. If you want to permanentely keep it open, use the waiting label.

@raquelmsmith raquelmsmith requested review from joethreepwood and a team November 29, 2024 22:20
@raquelmsmith
Copy link
Member Author

@joethreepwood once this is in we will start sending the events to posthog. You and I should iterate on the actual email copy, then we can turn it on for a percent of people and see how it performs.

@posthog-bot posthog-bot removed the stale label Dec 2, 2024
posthog/tasks/test/test_usage_report.py Outdated Show resolved Hide resolved
@@ -130,6 +132,14 @@ def setup_periodic_tasks(sender: Celery, **kwargs: Any) -> None:
name="update quota limiting",
)

# Send all periodic digest reports
weekly_digest_end_date = datetime.now().replace(hour=0, minute=0, second=0, microsecond=0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The weekly_digest_end_date will be evaluated when the code is first loaded/instantiated, not when the task runs. This means the datetime will be fixed to whatever time the code was initialized, which is probably not what you want for a weekly report.

You could move the datetime calculation inside the function:

def send_all_periodic_digest_reports(end_date=None):
    if end_date is None:
        end_date = datetime.now().replace(hour=0, minute=0, second=0, microsecond=0)
    # rest of your report logic...

sender.add_periodic_task(
    crontab(hour="9", minute="0", day_of_week="mon"),
    send_all_periodic_digest_reports.s(),  # Remove the end_date parameter
    name="send all weekly digest reports",
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the only blocker from my comments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I do want to define the time when it's called. That way we can be flexible with the date. Say the task fails and we want to resend it on a later date, but still only get the updates for the prior week. We need to define the end date at the beginning in that case. Similar to usage reports.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No but this won't work because the date time will be fixed to the time the server started so it'll get reset each time there is a deployment. My suggested change would make it so it uses the time when the function is called based on the cron.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OH

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The date is still flexible with an optional end date arg

Copy link
Member Author

@raquelmsmith raquelmsmith Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sorry I was just being dumb, good call-out, will fix

) -> None:
if timestamp and isinstance(timestamp, str):
try:
timestamp = parser.isoparse(timestamp)
except ValueError:
timestamp = None

if not organization_id and not team_id:
raise ValueError("Either organization_id or team_id must be provided")

if is_cloud():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find if statements like this hard to read, I just ran this through Claude to cleanup and it might be a better pattern to do something like (not tested and not a blocker):

def get_distinct_ids_for_team(team_id: int) -> list[str]:
    """Get distinct IDs for all users with access to a team."""
    team = Team.objects.get(id=team_id)
    return [user.distinct_id for user in team.all_users_with_access()]

def get_distinct_ids_for_organization(organization_id: int) -> list[str]:
    """Get distinct IDs for all organization members."""
    return list(
        OrganizationMembership.objects.filter(organization_id=organization_id)
        .values_list("user__distinct_id", flat=True)
    )

def get_org_distinct_id(organization_id: int) -> str:
    """Get distinct ID for organization owner or fallback."""
    org_owner = get_org_owner_or_first_user(organization_id)
    return org_owner.distinct_id if org_owner and org_owner.distinct_id else f"org-{organization_id}"

def capture_cloud_event(
    name: str,
    properties: dict,
    organization_id: int | None,
    team_id: int | None,
    send_for_all_members: bool,
    timestamp: datetime,
) -> None:
    """Capture analytics event for cloud deployment."""
    distinct_ids = []
    
    # Determine organization_id if not provided
    if not organization_id and team_id:
        team = Team.objects.get(id=team_id)
        organization_id = team.organization_id

    # Get distinct_ids based on send_for_all_members flag
    if send_for_all_members:
        if organization_id:
            distinct_ids = get_distinct_ids_for_organization(organization_id)
        elif team_id:
            distinct_ids = get_distinct_ids_for_team(team_id)
    else:
        distinct_ids = [get_org_distinct_id(organization_id)] if organization_id else []

    # Capture events for all distinct_ids
    for distinct_id in distinct_ids:
        pha_client.capture(
            distinct_id,
            name,
            {**properties, "scope": "user"},
            groups={"organization": organization_id, "instance": settings.SITE_URL},
            timestamp=timestamp,
        )

    if organization_id:
        pha_client.group_identify("organization", organization_id, properties)

def capture_self_hosted_event(
    name: str,
    properties: dict,
    timestamp: datetime,
) -> None:
    """Capture analytics event for self-hosted deployment."""
    pha_client.capture(
        get_machine_id(),
        name,
        {**properties, "scope": "machine"},
        groups={"instance": settings.SITE_URL},
        timestamp=timestamp,
    )
    pha_client.group_identify("instance", settings.SITE_URL, properties)

def capture_analytics_event(
    name: str,
    properties: dict,
    organization_id: int | None = None,
    team_id: int | None = None,
    send_for_all_members: bool = False,
    timestamp: datetime | None = None,
) -> None:
    """Main entry point for capturing analytics events."""
    if is_cloud():
        capture_cloud_event(
            name,
            properties,
            organization_id,
            team_id,
            send_for_all_members,
            timestamp,
        )
    else:
        capture_self_hosted_event(name, properties, timestamp)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to check and fail somewhere if one or the other isn't provided, though. Otherwise this will fail silently.

posthog/tasks/periodic_digest.py Show resolved Hide resolved
@raquelmsmith raquelmsmith force-pushed the feat/weekly-digest-emails branch from 9c37ddb to 3b0189e Compare December 10, 2024 16:19
Copy link
Contributor

@zlwaterfield zlwaterfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think the task end date is wrong but looks good outside of that.

@raquelmsmith raquelmsmith merged commit f8c4cc2 into master Dec 12, 2024
89 checks passed
@raquelmsmith raquelmsmith deleted the feat/weekly-digest-emails branch December 12, 2024 04:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants