Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sprint 1.45.1 2/2 - Feb 20 to March 3 #14253

Closed
lharries opened this issue Feb 15, 2023 · 6 comments
Closed

Sprint 1.45.1 2/2 - Feb 20 to March 3 #14253

lharries opened this issue Feb 15, 2023 · 6 comments
Labels
sprint Sprint planning

Comments

@lharries
Copy link
Contributor

lharries commented Feb 15, 2023

Global Sprint Planning

3 things that might take us down

  1. Hot partition - pipeline actively working on this
  2. Postgres - we're in a temporary state, need to migrate, but blocked by postgres in capture.
  3. Persons and person distinct id is filled

Retro: What can we do better next sprint?

Support hero this sprint

Week 1: Raquel
Week 2: Marius

Team sprint planning

For your team sprint planning copy this template into a comment below for each team.

# Team ___

## Retro

<!-- Grab the high and low priority items from last time and add whether that item was completed or not -->

- 

## Hang over items from previous sprint

<!-- For each item, decide to re-prioritise (and add below) or deprioritise -->

- Item 1. prioritised/deprioritise

## OKR

1. OKR, status (red/yellow/green) and action points if yellow/red


### High priority

-

### Low priority / side quests

-

@lharries lharries added the sprint Sprint planning label Feb 15, 2023
@mariusandra
Copy link
Collaborator

mariusandra commented Feb 15, 2023

Team Product Analytics

Retro & hang over

  • Work with team infra to get US ClickHouse less bespoke @macobo
    • Concluded with a fail. Infra team lacks capabilities to make this happen, and introducing new tooling for configuration management is out of scope. Everything got documented.
  • PostHog 3000 prerequisites: continue yeeting ants @Twixes
    • Spent A LOT (!!!) of time on test coverage.
    • Built a couple of new components to replace antd ones. Still building some important ones this week.
    • Work on yeeting will continue on ongoing basis, will shift to PH3000.
  • HogQL unleashed & data exploration loose ends @mariusandra
  • Bonus: Paul shipped a lot of stuff based off a user interview.

OKR & Strategy

OKRs

  • Objective 1: Ship PostHog 3000 UX. 10 happy ICP beta customers.
  • Objective 2: Make PostHog performance frustration free for our 10 largest customers
  • Objective: Systematically prevent regressions across PostHog Part 2

Strategy

  • Improve the core UX to be more focused on product engineers:
    • A slick experience
    • More powerful querying than our competitors can offer (for example, SQL access) that answers the long tail of questions
    • PostHog 3000 UX = a design uplift including dark mode to encourage more word of mouth

Sprint planning 🏮🏮🏮

People

  • @Twixes off 20-22 (Week 1 Mon-Wed)
  • @macobo off 24-03 (Week 2, plus Week 1 Friday)
  • @yakkomajuri off 20-21 (Week 1 Mon-Wed)
  • Johannes support secondary Week 1
  • @pauldambra support secondary Week 2

Plan next sprint

  • Data Exploration @thmsobrmlr @pauldambra

    • Converting the last missing components. Data fetching via dataNodeLogic
    • Dashboards with queries on them, insight active view.
    • Why? Making insights and dashboards as flexible as we need them for PostHog 3000 and HogQL.
  • PostHog 3000 @Twixes

    • Nail the layout paradigm.
    • Why? Starting from the outer shell.
  • HogQL @mariusandra

    • Direct SQL querying in beta
    • Why? Requested by a lot of users.
  • Performance @macobo @yakkomajuri

    • Map and clarify high level potential wins
    • Wrap up all the refresh work.
    • Why? Performance is paramount. Refreshes needed a refresh.

Sidequests / parking lot

  • Editor panels PostHog 3000

@benjackwhite benjackwhite pinned this issue Feb 15, 2023
@benjackwhite
Copy link
Contributor

benjackwhite commented Feb 15, 2023

Team Check-yourself-before-you-RECord-yourself

People

  • Alex out 2/20-21
  • Ben out 2/27-3/3

Retro

Hang over items from previous sprint

  • Alpha release of Mobile Recordings
    • At some point we just have to go for this... Maybe like we did with network recordings, talk about it in Slack and try and get some initial users to try it out with us
  • Testing upgraded rrweb
    • We spent this sprint doing prereq work for upgrading to rrweb2, and we'll dogfood and test it in preparation for release next sprint.

OKR

  • O1: Eliminate the biggest reasons product engineers would choose a competitor over PostHog Session Recording
    • 🟡 KR1: Release iOS Mobile Recordings to 5 happy beta testers
    • 🟢 KR2: Release the ‘Network Tab’ to 10 happy customers
  • O2 Recordings work and are available
    • 🟢 KR1: Rollout RRWeb2 and improve our ability to fix / improve
    • 🔴 KR2: Move to new S3-backed storage for cheaper and longer storage

High priority

  • @benjackwhite MVP of alternative ingestion for Session Recordings
    why? - If we can get this running in parallel with our current system, at least for one team (us) we can validate a lot of our initial assumptions (cost, effort etc.)
  • @alexkim205 Notes on recordings
    why? - Users have asked for a way to leave custom metadata on recordings

Low priority / side quests

  • Optimizing recordings query
  • Persons modal -> Recordings playlist

@raquelmsmith
Copy link
Member

raquelmsmith commented Feb 15, 2023

Team CA$H 💸

Retro

Lost much of this week due to offsite so lots of stuff will roll over to next sprint.

High priority

  • Hook up pricing page to billing server @raquelmsmith
    • Status: Technically not launched, but should be done by the end of the week
  • Rate limiting with special guest @benjackwhite & @kappa90
    • At the final lap, might be fully merged early next week, but we won't rate limit, just log for now, for safety
  • Defaulting users into a paid plan on signup @kappa90
    • We have done all the background grunt work, we need to add something in the frontend to explain users what's going on

Low priority

  • Verify email addresses @raquelmsmith
    • Just addressing some reviews
  • Billing emails @kappa90
    • Nothing has been done, prioritized for next sprint
  • Improve onboarding flow to explain better session recordings and autocapture @raquelmsmith

OKR

  1. Objective 1: Feel confident in our definitions and metrics for all our areas of responsibility

  2. Objective 2: Improve conversion to paid

    • Status: 🍋
    • Metric we set at the beginning is not relevant (we don't know where the number came from)
    • Currently at ~1% conversion rate from signup to paid, and ~3% from just activation (Discoveries) to paid
    • Action point: run the experiments we have planned, all work done so far is going to enable experimenting much faster
  3. Objective 3: We can iterate quickly on pricing

    • Status: 🍏
    • We have refactored the whole billing service to use Plans that can be quickly iterated upon to experiment on both pricing and feature allowances
    • We just need to finalize the last cherries on the pie

High priority

  • Rate limiting, if it's not done by the end of this sprint @kappa90
  • Defaulting users into a trial of the paid plan on signup @kappa90
  • Use the billing Plans API to gatekeep features @raquelmsmith
  • Add a new paid plan with no free allotment @raquelmsmith
  • Billing emails @kappa90

Low priority

@ellie
Copy link
Contributor

ellie commented Feb 15, 2023

🚀 Infra Team 🚀

Retro

  • We put off scaling our monitoring due to other priorities, but it needs doing now. Prioritised the work
  • Similar to the above, a few historically low priority tasks have made themselves high priority
  • Goodbye Guido 😢

@ellie is OOO next week (20th-24th). Driving back to the UK + moving house, out of London.

OKRs

Objective: Reduce TOIL

  • KR: Migrate postgres
    80% of the way there 🙌 Just need to jump over to Aurora. Waiting on improvements elsewhere to de-risk, after the last migration incident
  • KR: Consolidate infrastructure into the correct AWS accounts (specifically US)
    Lack of progress, will need to carry over to next Q. More data migrations. Was a stretch even with a 3 person team.

Objective: Support other teams and enable other teams to self-serve

  • KR: Most of the pipeline team has written/contributed to their own alerting system
    Alerting + monitoring DX improvements: VictoriaMetrics change should help greatly there
  • Metric to track: % of non-infra hero time supporting other teams
    As a % of team time, we now spend ~50% on supporting other teams.

Objective: Infra spend flat at $120k:
Forecast EOM is $118k, down 11% from January.
We are currently running 2x the databases, which are expensive. Should continue to decrease.
Account consolidation will reduce this further

Plan

  • Finish migrating to VictoriaMetrics from Prometheus (dashboards + alerting switched over, etc) @ellie
  • Setup read replicas on Aurora + use them in our app @ellie
  • Separate deployments for /decide/, /s/ @ellie
  • Setup AWS SSO + IAM codification @danielxnj
  • Continuing with SOC 2 @danielxnj

@EDsCODE
Copy link
Member

EDsCODE commented Feb 15, 2023

Retro:

  • Eliminate the biggest reasons product engineers would choose a competitor over PostHog Feature Flags
  • Make the experience of creating a feature flag in PostHog slicker
    • N/A
    • Originally was going to go for environments but priority is secondary

Planning:

  • Address feature flag instrumentation clarity—There are a handful of ways that flags can be used and not a lot of guidance in app on how to handle different situations @liyiy

    • Features
    • Question wizard that helps you get started on flag
    • Updated snippets/guides on how to use various
    • Tutorials?
    • Quick start feature flag/experiments
    • Follow up with growth team and marketing team on how to emphasize flags
  • Provide ready to go analysis after flag deployment—We have a motley of floating tools that users can use to analyze flags but they’re slow/unclear at the moment (insights on flags, recordings on flags) @EDsCODE

    • Exposures?
    • Premade insights
    • Faster recordings load/player load
    • @neilkakkar all events on insights

@fuziontech
Copy link
Member

fuziontech commented Feb 15, 2023

🚰 Team Pipeline🚰

Retro

People

@hazzadous - Pretty hectic sprint. Hot partitions. Postgres/PGB pain. Some other stuff. A lot of fire fighting.
@xvello - current operational load is unsustainable: capture & ingestion incidents, export baby-sitting… We need to regroup
@xvello - adding proper prometheus instrumentation require initial fundamental work (push gateway for celery tasks done, multi-process collection for plugin-server not done) before we pick-up the pace
@tomasfarias - developing overflow consumption required a refactor of some of the plugin-server code. This was well overdue. Thanks Harry!
@lharries - No comment. Plead the 5th

Last sprint progress:

  • PoE “Person Overrides” read & write path shipped and queries enabled for teams who already use PoE @hazzadous
    why: Query Performance while maintaining Data Quality

  • Postgres incident follow-ups @hazzadous

    • why: Data trust, operational safety
    • Status:
      • Backfill is done
    • TODO:
      • Events/Decide endpoints - making sure these are both ok with a bit of downtime
      • Any other work that needs to be done for Aurora switch
  • Exports problems (slow S3 & Snowflake) & metrics @fuziontech

    • why: Customers are actively depending on this and it's not working well
    • Status:
      • Automatic detection and repartitioning is ✅
      • Not quite fully fixed ⌛
    • TODO
      • Need to dig further into it and fix a few more bugs 🐛
      • Observability so that we can diagnose progress
      • Testing
      • UI rework

Hang over items from previous sprint

  • Overflow partition work? - TBD - monitoring
  • Phantom S3 export? - They have full set of data, just need to repartition by hour instead of day

OKR

O1: Performance

  • 🟡 KR1 (Person on Events): Pushed back due to ops issues, de-prioritized to focus on exports & capture reliability this sprint
  • 🔴 KR2 (Capture efficiency): dropped thanks to infra-level costs savings, lightweight capture will reduce costs as a nice side-effect

O2: Reliability

  • 🟢 KR1 (Prometheus migration): Good progress on main alerts last sprint, to be continued
  • 🟡 KR2 (Runbook coverage): On hold behind other priorities
  • 🟢 KR3 (Capture spike management): Good progress last sprint, need to monitor, and design phase 2
  • 🔴 KR4 (App error handling): Not started, on hold below other priorities

High priority

Support secondary (Luigi): Harry

  • Historical exports: make them prod-ready: confirm recent fixes helped, improve monitoring, add ability to stop exports (wishlist: resume them on failure)

    • why: a pain point for customers (slowness, duplicates), while the feature is table-stakes
    • who: Tomas & James
  • Lightweight capture (remove PG dependency): enable token resolution at plugin-server for all messages, phase out PG lookup at capture, still keep some token validation at capture (best effort PG lookup, or just validate the shape?)

    • why: reduce impact of the second prod-us Postgres switch next month
    • who: Xavier & Tiina
  • Person on Events?

    • why: Gotta go fast!
    • who: Yakko and Luigi ;)

Low priority / side quests

  • Monitor the state of our local spike detection at capture: make sure it does not trigger too much
  • Design global version of spike detection: use the new partition stats table + redis set?

Not doing

  • Multiple webhook destinations
  • Assist team recordings with new storage work

@Twixes Twixes unpinned this issue Mar 10, 2023
@Twixes Twixes closed this as completed Mar 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sprint Sprint planning
Projects
None yet
Development

No branches or pull requests

8 participants