Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cronjob to backfill corporate partner user groups' memberships and classifications #57

Merged
merged 15 commits into from
Apr 18, 2024
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions .github/workflows/manual_corp_user_group_sync.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
name: Manually Sync Eras Corporate User Group Sync

on:
workflow_dispatch:

jobs:
manual_sync:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/[email protected]
yuenmichelle1 marked this conversation as resolved.
Show resolved Hide resolved

- name: Login to GitHub Container Registry
uses: docker/login-action@v2
yuenmichelle1 marked this conversation as resolved.
Show resolved Hide resolved
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- uses: azure/login@v1
yuenmichelle1 marked this conversation as resolved.
Show resolved Hide resolved
with:
creds: ${{ secrets.AZURE_AKS }}

- name: Set the target AKS cluster
uses: Azure/aks-set-context@v3
yuenmichelle1 marked this conversation as resolved.
Show resolved Hide resolved
with:
cluster-name: microservices
resource-group: kubernetes

- name: Modify & apply template
run: kubectl create -f kubernetes/manual_corp_user_group_sync.yml
yuenmichelle1 marked this conversation as resolved.
Show resolved Hide resolved
2 changes: 1 addition & 1 deletion config/credentials/production.yml.enc
Original file line number Diff line number Diff line change
@@ -1 +1 @@
lCfkIMPH/fwl6Cf1K+EWfeiHK/ITstJhGzRXMTS7XqvUG2odRAmlIlbRDibfLoGPj7kPMJzwvnpHgZN2ey09RS1dp3ZzrorY6JC3gnnsrB9rX0+dLZihtj+Tlvtzwd8H4Yv3OM3um0jWr9DD3sOq4N3a7kSm1Wqkr+UqqGrWLusESvEeDV/c6sJkjjI8q0Sbv34Mc3inSOZVcIk1DocxneYA1dSWDOWqbhF1vd2p53ubqGJIq3RPQEt7881IECek3iDuBXPPVaw6H2KaerBv0utnMhmST3mcNtAeAEzWcw/j8TztMqvaFowmp5+lo0jNJR4wNOUQaT7XiOtRFZfA6i9T0gUtqAz+gZxJshRarYpbKUb703PvQ4WAsh2Fs9g+v0PPuW2sSK6tfEzn84c9EQUCeQVKTeATsQlXAGuO9vYdKTw0ahnJzdFfql5LXUH7VJ+12cNbOanjVFjykcs3qY2N69PK1/ESAPqkyK8Fe0fosfAepUVm3rd7VLNMc2qGdWAxTAfdj59xYNaGU97+xRSATBMUdsGNoFcyv7WTUUxcn01Qz9WqyLdCzyIABEwvcXWtc3pNJ3U7bwC0fsqQkL+zqt883eYGDxU4Mn0H5IX21OfqTYDuPB3mQltuQe2KUq2KZcCLtBVce0nRGKx82dEhnjP+PRWk7moCsG8eyNqqi1i1ZqZcDemPTXL48+IgCLfmMYju7uDWTM5OWG9e+TWEqUowwUV+Gp5AoGKUt/DtfityIQmdJde7+6YaxYDsDEKZfP6na4C2Zyt6qXtCWYIBXo8sEuEE7j38lrWFgNODc8k53PgSBp6R0lPjzw5/U9HPzBKmkn705JdfRN5kYbKVpCPOQqMpjYU1jSWpqdO4ZgobEWT5n3XtHCuV6oO/P5t2lbwm+mVuVotrjv3p7/x+CbN+nTvA9Kma7mOdK8FMrQlqXD/FpnSvSLuO5/3Q7LltdKF3njmmUmsqFnbpXOaxKpgDb+ijeuwndEZ1hvyLepcxahBevbOS+ZOtdWHmfGhTlZ4uCgbgAXJ04FH8xFrf1/VG4GtMHuapT6tlJcxqBer4--8e7ThvhXjefdpI+0--QG+VUWhyp9cydSvpzydc0w==
hD5l4bBd4gVw4DS/oeGO3XDXcwPUCSVCt+TFbmQsy+Hx4mBSTLF22nRljM/KDXPSU2b2RqbOl3AcDTeZWlNHqK0OUL68CPjrOWK0KFOqKPgqsh+ydEcjpCWkm1A5mQdxCO4GQ2zjZwZxeQvZbvGiPz1sc4WpVd5zILUcYeiK2P8lLqcLdSHroQbG9+L/0VT8d+0B1mNsdJ339S3cI0y+4Cp9evwi3l7fjIRpHN10tPFJJkuKd90syGOif1tDvtScl1bfkK3f7xmNL2lzDBMEGq2BzhFte/QZmBKYnHEeAolmMthLMQqQdZTftGUrxaZeh4OtzcKwerox81ZmUjVkubTGJEsSrB2Q8iR5RMZNscmjbzem6YEiT4mef7HeAECasZG5IbwHAMMh+s/BnKVWUlS0Srit5Zt40Judyde1K1UgjxsCsqWJS4JYcfhkmBzPi7x1paxGlWcezsEAJZFqjiqzIzvc4cvDoyCaEO1FMfbvOlHWgTH5A2xV2xnCiwE2DgkrVR9a9vaCkwxAdNV44Esq8npeVkytF39khDuaWWsn2dh+3j6vcnzGyI2gbIgyYAv7aV0+JEPBGRlPHHhVt3nUzYQUHaJHUL+yhjkIuMKxIilkOig+RhZDjdujR+kZdqP8KyVXuAg8Ui/h+/kG2XMtQR8YhHIQZFhZk9TVQQK2IC7PKmqJEo/V8zh53q37vaODthUNzACp9ww7UsZ6/35oconmuVpa1Ucijahw5Fiv1WEGG2oI0QPEhEQToG5dWHi2DbYwmQe1d/deG/CdFKMhdGaCGeOC/fZWTtUqKfDp70eytfic8I1jwsK1p3smO3/PZYNXwo78VXaZVKepqhp8IgVxPoNIY4YtmaEpigcHMmm409quT7/mX3fTpdGqcUPBVxx94w5xYVACmStGIGP2AesIgfdbxz0HzDf1aNMj8iEZuyo1gQ4rUiD6YygbzdDXOjuVa3Aq8ZvO3ixkJWFB65w6+KsLmt93zodHndH4N62PnmUw6eAJCmIxbBaJqU1kim8IQd0zXv5Bx3hMq7GVulMZccsIzZoCWDACJic+jsYM3Rp7UBP1WYsGpfOSWC14PXQGVwmXKhG3hn3a/Lim/nkMKUclx1Hy2W+qpDPKDl27uQPxF9wt4ZwkAyBdKXgDCpq11GPI2Kkj5Zvhrx+labnulixV6HfZpQJdcVk11Rqf5O8l6v/AHjSaQ6qrEztNLZL/IE5cqeWiZZZOIxUQuYVa6aldsEH9+niHzD6/U6JvHVP0dRj5PNF0TepNtvFkzUKZI+fTG6R6OYyMSBcOOW7i6brSCNM1I+Yx3+F7eACovJQhl9spb0YEbxvENYeHStlWYUnxFlqslbPHIikB/2J1BMzpscVC7AR2nJgBifoGcrMX2w9cCH2SgGW2HFkr21b51YuXQo99vBrk0dUudDC7s1f4bnudtaLZOUjrCcY0DudT8RUcDrVfJpkxOvGgcI6zxIzoXLu9SW3bS27zJWfhhZxXqljfczHgXThFWqlQd518XUGmfxGrg9C+fP2ix6ngAaIAkywavbLvfltYe6mkJGrTIjVAns6GX3QtroLJYvvCq2i0BHRJSvISJVfzZ7E6JYLdVJS9ur3C3jrIGl4iiiVJeuwjcJA7NFlrey9aN/MR/4jwj7TlMqi+47bQGThT1nzyZmvkB2EJpO271Vw1dSbUsaj5hz5dKiypav3puMXHq7IguQv12gUaRX5ccMaNqDbyXhC0jWWa9xoFsisAEqNCNCrwosivND18fbaaNUZ2G0QuuPFjYlTiEKvXuaaip/QOmYlQ8q99arps7rn6SK8rhRX3ymhxLXyx5yIjtI+BXMvylXM05OVMWmhm7ST3p7ecqENCS0Y2cA119mjUd6TG+ulLyw4l3UGIBAwLrUcraNvpiYNUQcx+y07OfHAjct4lIdseaCqnqPB37W4k9p6aYo+CRlaE/uKtckdkBCtlHxxeoREyQyepTgtI2uq+i+UEeYmqQrHIAEsT+C76cSaeuwINRYQ6z6AjrmvgvQ4LjteyV2ky1M7ckBGZMnh1mLcB6jN0zEsqwgs74yYeZnmEVN0nwOc+3M9OHwZEjIIbgedHBXhOF/y+Aa9dSomYaHrmVDzrRvUmF2z0gqNQAcbL7TVo9FmwLxBYUFqORXLlf0B6An5vpDzEslw3sKSx4+hbclYDAGKsLuPPeFD1O2D3v+WluUqwCSdkVXNNnPD0rZUT/8jVVSBDb11Vvbtl603cTaZJcRujvynJllx1jky3RiVuTkG7HBGTLIozy9Na6eQzATBZazRQC7cVobySMsyf4CKtHcAH97Vl5fSrYcBtTkgTbWrZrVfMJwT6ufnSyDXHkOXWhi37J9J5Ig/w6c4G1yQcjOa12jsHsWhRTNt8XY6b9r2vW1kqA2+bbtW9Z8o8NcskNCoPKOKzeRlEJ2cPvAFV5Fh+H9FZN6UpCxMDEuggBjAB/J3EYjFG6XEVqEOhfXMVOmRJ9ywsp7whxtk1F5kxhRDXomOsCxydd3gGAyIdwF0NHIptjaa7DIrxbCB+u+BtmytZzE00QwoeffVmLhoa8JyTMHGdK7ion+bfuFnEiP8/m1O8uvFesnfDzWp6+/F8Au4m2GACcZcjn0TsoNW7iYy6xMXn32X7rKcs12baJFbLsH4xJfdOPM0RlH16E79n6g8p7uUb8RSzrJ0cUsLC5dRVhw2hQWNbzQudEMtciIvvfGAkDBpUcpu/4ZkmS8RdblmJ+BDT0yFKddWdD217EYzrPbrpXXomPU95kEq9z9i0EKA5e8hjGt1roAgGh51/sezhrLVGmAeIPodxbxbWOWH3UTt+HUP5QpnRq8k2qILk4FuiFRlHUTNLvVtSdk1Z8XybLwmYMdoGTSQjD1mSmVSouA==--UPEZvym9PPXcBjaM--xrk0OyBCCPOBgaRp2gjxIg==
28 changes: 28 additions & 0 deletions kubernetes/corp_user_groups_cron_sync.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
apiVersion: batch/v1
kind: CronJob
metadata:
name: eras-corp-partners-cron-sync-job
spec:
schedule: "0 3 * * *"
jobTemplate:
spec:
template:
metadata:
name: eras-corp-partners-sync
spec:
containers:
- name: eras-corp-partners-sync
image: ghcr.io/zooniverse/eras
env:
- name: RAILS_LOG_TO_STDOUT
value: "true"
- name: RAILS_ENV
value: production
- name: RAILS_MASTER_KEY
valueFrom:
secretKeyRef:
name: eras-production
key: rails-master-key
command: ['ruby', './scripts/user_group_membership_classification_backfill.rb']
restartPolicy: Never
backoffLimit: 2
25 changes: 25 additions & 0 deletions kubernetes/manual_corp_user_group_sync.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
apiVersion: batch/v1
kind: Job
metadata:
generateName: eras-corp-partners-sync-
spec:
template:
metadata:
name: eras-corp-partners-sync
spec:
containers:
- name: eras-corp-partners-sync
image: ghcr.io/zooniverse/eras
env:
- name: RAILS_LOG_TO_STDOUT
value: "true"
- name: RAILS_ENV
value: production
- name: RAILS_MASTER_KEY
valueFrom:
secretKeyRef:
name: eras-production
key: rails-master-key
command: ['ruby', './scripts/user_group_membership_classification_backfill.rb']
restartPolicy: Never
backoffLimit: 2
7 changes: 7 additions & 0 deletions scripts/backfill_classifications.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
##
## This script was used in VM when first introducing ERAS. We needed to backfill classifications into ERAS db.
## Unfortunately there was too much data to do a straight copy from panoptes db to copy to eras db.
## The script was followed up with save_classifications_chunk_in_files.py and copy_classifications_from_files.py, which copies from panoptes
## DB to csvs and then csvs to Eras DB. See PR: https://github.com/zooniverse/eras/pull/40
##

import os
import psycopg
from datetime import datetime
Expand Down
5 changes: 5 additions & 0 deletions scripts/backfill_talk_comments.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
##
## This script was used in VM when first introducing ERAS. We needed to backfill talk comments into ERAS db.
## This script is a straight COPY FROM Talk DB to COPY TO ERAS DB.
##

import os
import psycopg
from datetime import datetime
Expand Down
8 changes: 8 additions & 0 deletions scripts/copy_classifications_from_files.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
##
## This script along with save_classifications_chunk_in_files.py was used in VM when first introducing ERAS.
## We needed to backfill classifications into ERAS db.
## The script was preluded with backfll_classifications.py which does a straight copy from panoptes db to copy to eras db.
## There was too much data to do a straight copy from panoptes db to copy to eras db, so we had to chunk in files.
## See PR: https://github.com/zooniverse/eras/pull/40
##

import os
import psycopg
from datetime import datetime
Expand Down
52 changes: 52 additions & 0 deletions scripts/panoptes_membership_client.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# frozen_string_literal: true

require 'pg'
require '../config/environment'

class PanoptesMembershipClient
def user_ids_not_in_user_group(user_group_id, domain_formats)
conn.exec(
"SELECT id FROM users
WHERE email ILIKE ANY(STRING_TO_ARRAY('#{domain_formats.join(',')}', ','))
AND id NOT IN (SELECT user_id FROM memberships where user_group_id=#{user_group_id})
"
).entries.map { |res| res['id'].to_i }
end

def insert_memberships(user_group_id, user_ids)
memberships_to_create = user_memberships(user_group_id, user_ids)

member_creation_sql_query = memberships_insert_query(memberships_to_create)

conn.exec_params(member_creation_sql_query, memberships_to_create.flatten)
end

private

def conn
@conn ||= PG.connect(Rails.application.credentials.panoptes_db_uri, sslmode: 'require')
end

def user_memberships(user_group_id, user_ids)
memberships_to_create = []
user_ids.each do |user_id|
# membership in array order: user_id, user_group_id, state, roles
membership = [
user_id,
user_group_id,
0,
'{"group_member"}'
]
memberships_to_create << membership
end
memberships_to_create
end

def memberships_insert_query(memberships_to_create)
# Values is a string that will look like ($1, $2, $3, $4), ($5, $6, $7, $8), ..etc..
values = Array.new(memberships_to_create.length) do |i|
"($#{(4 * i) + 1}, $#{(4 * i) + 2}, $#{(4 * i) + 3}, $#{(4 * i) + 4})"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like kind of a kludgy way to build a parameterized query. Since this doesn't take any input, I don't feel like you gotta use conn.exec_params instead of just building the query string directly. Even though it took me second to parse this, I'm still only like 3/10 on it.

end.join(',')
"INSERT INTO memberships (user_id, user_group_id, state, roles) VALUES #{values}"
end
end
8 changes: 8 additions & 0 deletions scripts/save_classifications_chunk_in_files.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
##
## This script along with copy_classifications_from_files.py was used in VM when first introducing ERAS.
## We needed to backfill classifications into ERAS db.
## The script was preluded with backfll_classifications.py which does a straight copy from panoptes db to copy to eras db.
## There was too much data to do a straight copy from panoptes db to copy to eras db, so we had to chunk in files.
## See PR: https://github.com/zooniverse/eras/pull/40
##

import os
import psycopg
from datetime import datetime
Expand Down
3 changes: 1 addition & 2 deletions scripts/user_group_membership_classification_backfill.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@

current_time = now.strftime("%H:%M:%S")
print("BEFORE Time =", current_time)

parser = argparse.ArgumentParser()
parser.add_argument("-ug", "--user_group_id", type=int)
parser.add_argument('email_domain_formats')
Expand Down Expand Up @@ -47,7 +46,7 @@
panoptes_db_conn.commit()

# eras get classification_events of not_in_group_yet_user_ids that does not have user_group_id within their user_group_ids classification_event
eras_cursor.execute("SELECT classification_id, event_time, session_time, project_id, user_id, workflow_id, created_at, updated_at, user_group_ids from classification_events WHERE user_id = ANY(%s) AND %s!=ANY(user_group_ids)", (not_in_group_yet_user_ids, user_group_id))
eras_cursor.execute("SELECT classification_id, event_time, session_time, project_id, user_id, workflow_id, created_at, updated_at, user_group_ids from classification_events WHERE user_id IN %s AND %s!=ANY(user_group_ids)", (not_in_group_yet_user_ids, user_group_id))
classification_events_to_backfill = eras_cursor.fetchall()

# create classification_user_group
Expand Down
74 changes: 74 additions & 0 deletions scripts/user_group_membership_classification_backfill.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# frozen_string_literal: true

require '../config/environment'
require './panoptes_membership_client'
require 'json'

corporate_user_groups_str = Rails.application.credentials.corporate_user_groups
yuenmichelle1 marked this conversation as resolved.
Show resolved Hide resolved
corporate_partners = JSON.parse(corporate_user_groups_str)

puts 'Starting Classification and Membership Backfill...'

panoptes_client = PanoptesMembershipClient.new

corporate_partners.each do |corporate_partner|
yuenmichelle1 marked this conversation as resolved.
Show resolved Hide resolved
puts "Geting Ids of users that are not in group yet for #{corporate_partner['corp_name']}..."
not_yet_member_user_ids = panoptes_client.user_ids_not_in_user_group(corporate_partner['user_group_id'], corporate_partner['domain_formats'])

puts "Query found #{not_yet_member_user_ids.length} users not in the #{corporate_partner['corp_name']} user_group..."

next unless not_yet_member_user_ids.length.positive?

puts "Creating Memberships for #{corporate_partner['corp_name']}..."
panoptes_client.insert_memberships(corporate_partner['user_group_id'], not_yet_member_user_ids)

puts 'Querying Eras ClassificationEvents of newly created members...'
classification_events_to_backfill = ClassificationEvent.where('user_id IN (?)', not_yet_member_user_ids)

next unless classification_events_to_backfill.length.positive?

puts 'Creating Classification User Groups...'
classification_user_groups_to_create = []
classification_events_to_backfill.each do |classification|
classification_user_group = {
classification_id: classification.classification_id,
event_time: classification.event_time,
project_id: classification.project_id,
workflow_id: classification.workflow_id,
user_id: classification.user_id,
session_time: classification.session_time,
user_group_id: corporate_partner['user_group_id']
}
classification_user_groups_to_create << classification_user_group
end

ClassificationUserGroup.upsert_all(classification_user_groups_to_create,
unique_by: %i[classification_id event_time user_group_id user_id])

puts 'ClassificationUserGroup Upsert Finished...'
end

today = Date.today.to_s
two_days_ago = (Date.today - 2).to_s
puts 'Classification and Membership Backfill Finished. Starting CA Refresh...'
puts 'Refreshing Continuous Aggregates dealing with User Groups...'

puts 'Refreshing Daily Group Classifications Count And Time...'
ActiveRecord::Base.connection.exec_query("CALL refresh_continuous_aggregate('daily_group_classification_count_and_time', '#{two_days_ago}', '#{today}')")

puts 'Refreshing Daily Group Classifications Count And Time Per Project...'
ActiveRecord::Base.connection.exec_query("CALL refresh_continuous_aggregate('daily_group_classification_count_and_time_per_project', '#{two_days_ago}', '#{today}')")

puts 'Refreshing Daily Group Classifications Count And Time Per User...'
ActiveRecord::Base.connection.exec_query("CALL refresh_continuous_aggregate('daily_group_classification_count_and_time_per_user', '#{two_days_ago}', '#{today}')")

puts 'Refreshing Daily Group Classifications Count And Time Per User And Project...'
ActiveRecord::Base.connection.exec_query("CALL refresh_continuous_aggregate('daily_group_classification_count_and_time_per_user_per_project', '#{two_days_ago}', '#{today}')")

puts 'Refreshing Daily Group Classifications Count And Time Per User And Workflow...'
ActiveRecord::Base.connection.exec_query("CALL refresh_continuous_aggregate('daily_group_classification_count_and_time_per_user_per_workflow', '#{two_days_ago}', '#{today}')")

puts 'Refreshing Daily Group Classifications Count And Time Per Workflow...'
ActiveRecord::Base.connection.exec_query("CALL refresh_continuous_aggregate('daily_group_classification_count_and_time_per_workflow', '#{two_days_ago}', '#{today}')")

puts 'Stats User Group Membership and Classification Backfill Completed'
Loading