-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving the Yelp Bean matching algorithm #300
base: main
Are you sure you want to change the base?
Conversation
…nto jenny_debin_2023-11-08
Incorporating more user attributes in the matching mechanism will require having more columns in the postgres user table. To start, we can/will manually alter+update the table to add parameters (like language, location, manager id) to the postgres table. In the future, we will have to figure out a programmatic way to fill the user table - either by editing the current cron job, if the source have the fields we need), or another cron job need to be created to pull data from the coreAPI and update each user record |
This is an interesting decision to talk through. My previous assumptions were that we were getting this level of matching / user segmentation by creating the right subscriptions to split folks up by say office, location, interests etc. With this change that all becomes murky and makes me think we are trying to move to a place where we have one or very few subscriptions. Is that accurate? |
We are not trying to change the number of subscriptions. The idea here is that we want to avoid matching people who are in the same organization/have the same manager, as the idea of Beans is to connect with people across Yelp. It would be awkward to talk to your teammate through Beans match as you see/work with each other everyday. |
Does it not work by applying rules? E.g. https://github.com/Yelp/beans/blob/master/api/yelp_beans/matching/pair_match.py#L23 can be used to avoid matching people in the same org |
Yes, rules can avoid matching people with the exact same attribute. However, this change is aim to increase the "interesting-ness" of the pairs by maximizing the diversity within each pair. IIUC, the current subscription mechanism is based on available meeting time and interest. I do see value in matching people that are more different within each subscription, this can spice up convo and enable more cross-background learning/discussion. This is how I imagine this feature does: I want to be matched with people that are working in domains that are different than mine, during my ML bean time. |
Some additional context that could help here: I set beans up at Twitch(since left) and folks have been using it to meet each other. There are very many different subscriptions that exist, from a company wide one to specific locations, to meetings within an org to 1 on 1 setups within a team using beans. Each of these has different expectations for criteria to enforce for matches. E.g. Location wise people don't want to be matched with someone on the same team but for the within team subscription, that is what folks actually want. Is there a way to make these code changes work using the rules systems so we can preserve the flexibility this affords each meeting subscription? |
Oh, these code changes functions alongside existing rules and subscription set. The algor respects the existing matching rules and each subscription's user pool. We are only re-shaping how pairs are created (currently completely random) under each subscription. As an example, when we generate pairs for UK tea time, the high level steps are:
Its worth noting that the code is a marginal improvement on how users are matched, it is not trying to change the flow of the current match process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the back and forth and explaining your thoughts. I think what is an optimal pairing is a pretty subjective decision but good with this for a v1
return float(intersection) / union | ||
|
||
|
||
def get_pairwise_distance( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to make the attributes used configurable? I think it'd be great to have the choice of attributes to apply be something that can be configured differently for different subscriptions
distance += dist_2 | ||
|
||
# tenure | ||
dist_3 = abs(int(user_a_attributes["days_since_start"]) - int(user_b_attributes["days_since_start"])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tenure is a bit subjective. I don't have strong opinions here if it doesn't lead to starvation. Fundamental to this assumption is that tenured folks know each other and so optimize for meeting newer less tenured people.
I think this works for v1 but I'll be curious to hear feedback on whether folks not getting matched with similarly tenured people gets noticed. Perhaps eventually we should get to a place where we can ask users to tell us their preferences for matching
…into jenny_debin_2023-11-08
We modified the match_utils so that the meeting weights between 2 users are calculated based on attributes instead of being uniformly set to 1