-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Roll up stats by patch frequently #5
base: main
Are you sure you want to change the base?
Conversation
|
||
|
||
def upsert_to_mongo(target, docs): | ||
for doc in docs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new upsert logic in loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many records are going to being upserted?
Either way, this should be batched should that it isn't hammering the DB with x requests. I'm not sure about the python way to do this, but there are plenty of examples in the javascript code for upserting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only 65 records per patch! That's gotta be a pretty light lift no? although keep in mind i'm new to mongo...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not too bad, but definitely bulk upsert. Don't want to update one at a time.
|
||
# programmaticaly generate elo bins within the query: | ||
def generate_elo_bins(elo_cutoffs): | ||
branch_list = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elo bins generated from constant
'$match': { | ||
'gameMode': '1V1_SUPREMACY', | ||
'matchDate': { | ||
'$gte': patch_start, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
each query just grabs matches within one patch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if this runs every hour, are you saying that it will scan every match for a given patch every time? Ideally, I think we would only want to scan the matches that have been inserted since the last run and add them to the calculations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it will roll up every match for the patch each hour. I'm not sure how much of a burden on the db that would be. The trade-off is simplicity. If we want to only grab matches this pipeline hasn't rolled-up yet, the pipeline has to become aware of when it was last run (and if that run was for all patches or just the most recent).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd definitely only query for matches since the last run. Should be a simple way to do it. That way, we can use an index on date to easily find the new matches to add to the calculation.
extract-stats/stats_by_patch.py
Outdated
|
||
# example event | ||
# event = { | ||
# "ingest_all": False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will be set to true for initial ingestion
'$match': { | ||
'gameMode': '1V1_SUPREMACY', | ||
'matchDate': { | ||
'$gte': patch_start, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if this runs every hour, are you saying that it will scan every match for a given patch every time? Ideally, I think we would only want to scan the matches that have been inserted since the last run and add them to the calculations.
|
||
|
||
def upsert_to_mongo(target, docs): | ||
for doc in docs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many records are going to being upserted?
Either way, this should be batched should that it isn't hammering the DB with x requests. I'm not sure about the python way to do this, but there are plenty of examples in the javascript code for upserting.
No description provided.