Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matching Engine Analytics: Percentage of users with availability data #311

Open
3 tasks
vingkan opened this issue Jul 22, 2022 · 2 comments
Open
3 tasks
Assignees
Labels
data Related to data or types enhancement New feature or request pipeline Related to the offline pipelines

Comments

@vingkan
Copy link
Collaborator

vingkan commented Jul 22, 2022

Goal

Calculate what percentage of users in a community have filled out their availability.

This well help us track whether people are eligible for schedule matches at all. Currently, there is no way for users to edit their schedule, so for now, there will be no real data to test this feature with, which means you will have to write good unit tests to ensure that it will work correctly when the real data arrives.

Definition of Done

  • Write a function that takes in a list of User objects and returns a tuple of two counts
    • The first count in the tuple is the number of users who have availability data
    • The second count in the tuple is the total number of users
  • Write unit tests for the above function
    • Why a tuple? It will be more reliable to test the numerator and denominator, rather than the percentage
  • Integrate the above function into the matching pipeline
    • Call your function in the display_internal_matching_metrics() task, which is used in the matching flow
    • Add a line to log the percentage of users who have filled out their availability, using the tuple output
    • Run the matching flow with run flow matching to make sure that it succeeds and logs the correct information

Code Pointers

Implementation and Tests

  • Write your own function (no need for a class) to calculate this tuple
  • You can create a new file pipeline/transform/schedule.py for your function
  • You can create a new file pipeline/transform/schedule_test.py for your unit tests
  • Similar to your core project, read examples from the codebase to see how to use functions, tests, and types

User Availability Data

Just like in your schedule match generator, you can check whether a user has availability data by accessing the schedule field of the User class:

schedule: List[Availability] = field(default_factory=list)

Pipeline Integration

  • This function display_internal_matching_metrics() is a Prefect task, which runs as part of the matching flow
  • It takes calculated metrics and then displays them by writing them to the pipeline logs.
  • You can get the list of users in the community from the MatchingOutput parameter by accessing output.users
  • Call your function, then log the percentage of users who have availability data
  • If you want, you can add a helper function to format a tuple into a percentage

@task
def display_internal_matching_metrics(
output: MatchingOutput, metrics: MatchingMetrics
):
"""Task to display matching engine metrics."""
logger = prefect.context.get("logger")
proposed_matches_per_user = render_counts_per_user(
output.users, metrics.n_proposed_matches_per_user
)
matched_user_emails = render_user_emails(output.users)
logger.info(f"\nMatches Proposed per User:\n{proposed_matches_per_user}")
logger.info(f"\nEmails of Matched Users:\n{matched_user_emails}")

@vingkan vingkan added enhancement New feature or request pipeline Related to the offline pipelines data Related to data or types labels Jul 22, 2022
@vingkan vingkan changed the title Matching Engine Analytics: Percentage of users with schedule data Matching Engine Analytics: Percentage of users with availability data Jul 22, 2022
@rbrooks6
Copy link
Collaborator

@vingkan Emma is done with her core project now and is ready to start the stretch assignment! 🎉 What did you have in mind for this assignment in terms of details? Could you please add details or explain it to me so that I can add details to the issue?

@vingkan
Copy link
Collaborator Author

vingkan commented Jul 30, 2022

@rbrooks6 Thanks for the reminder! I updated this issue with more details. The instructions for #312 will be very similar, so if @emmadiamon finishes this task quickly, she can follow a similar approach for that task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data Related to data or types enhancement New feature or request pipeline Related to the offline pipelines
Projects
None yet
Development

No branches or pull requests

3 participants