Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roll up stats by patch frequently #5

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

jdduprey
Copy link
Collaborator

No description provided.



def upsert_to_mongo(target, docs):
for doc in docs:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new upsert logic in loop

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many records are going to being upserted?

Either way, this should be batched should that it isn't hammering the DB with x requests. I'm not sure about the python way to do this, but there are plenty of examples in the javascript code for upserting.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only 65 records per patch! That's gotta be a pretty light lift no? although keep in mind i'm new to mongo...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not too bad, but definitely bulk upsert. Don't want to update one at a time.


# programmaticaly generate elo bins within the query:
def generate_elo_bins(elo_cutoffs):
branch_list = []
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elo bins generated from constant

'$match': {
'gameMode': '1V1_SUPREMACY',
'matchDate': {
'$gte': patch_start,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each query just grabs matches within one patch

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if this runs every hour, are you saying that it will scan every match for a given patch every time? Ideally, I think we would only want to scan the matches that have been inserted since the last run and add them to the calculations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it will roll up every match for the patch each hour. I'm not sure how much of a burden on the db that would be. The trade-off is simplicity. If we want to only grab matches this pipeline hasn't rolled-up yet, the pipeline has to become aware of when it was last run (and if that run was for all patches or just the most recent).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd definitely only query for matches since the last run. Should be a simple way to do it. That way, we can use an index on date to easily find the new matches to add to the calculation.


# example event
# event = {
# "ingest_all": False,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be set to true for initial ingestion

'$match': {
'gameMode': '1V1_SUPREMACY',
'matchDate': {
'$gte': patch_start,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if this runs every hour, are you saying that it will scan every match for a given patch every time? Ideally, I think we would only want to scan the matches that have been inserted since the last run and add them to the calculations.



def upsert_to_mongo(target, docs):
for doc in docs:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many records are going to being upserted?

Either way, this should be batched should that it isn't hammering the DB with x requests. I'm not sure about the python way to do this, but there are plenty of examples in the javascript code for upserting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants