Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cronjob to backfill corporate partner user groups' memberships and classifications #57

Merged
merged 15 commits into from
Apr 18, 2024
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion config/credentials/production.yml.enc
Original file line number Diff line number Diff line change
@@ -1 +1 @@
lCfkIMPH/fwl6Cf1K+EWfeiHK/ITstJhGzRXMTS7XqvUG2odRAmlIlbRDibfLoGPj7kPMJzwvnpHgZN2ey09RS1dp3ZzrorY6JC3gnnsrB9rX0+dLZihtj+Tlvtzwd8H4Yv3OM3um0jWr9DD3sOq4N3a7kSm1Wqkr+UqqGrWLusESvEeDV/c6sJkjjI8q0Sbv34Mc3inSOZVcIk1DocxneYA1dSWDOWqbhF1vd2p53ubqGJIq3RPQEt7881IECek3iDuBXPPVaw6H2KaerBv0utnMhmST3mcNtAeAEzWcw/j8TztMqvaFowmp5+lo0jNJR4wNOUQaT7XiOtRFZfA6i9T0gUtqAz+gZxJshRarYpbKUb703PvQ4WAsh2Fs9g+v0PPuW2sSK6tfEzn84c9EQUCeQVKTeATsQlXAGuO9vYdKTw0ahnJzdFfql5LXUH7VJ+12cNbOanjVFjykcs3qY2N69PK1/ESAPqkyK8Fe0fosfAepUVm3rd7VLNMc2qGdWAxTAfdj59xYNaGU97+xRSATBMUdsGNoFcyv7WTUUxcn01Qz9WqyLdCzyIABEwvcXWtc3pNJ3U7bwC0fsqQkL+zqt883eYGDxU4Mn0H5IX21OfqTYDuPB3mQltuQe2KUq2KZcCLtBVce0nRGKx82dEhnjP+PRWk7moCsG8eyNqqi1i1ZqZcDemPTXL48+IgCLfmMYju7uDWTM5OWG9e+TWEqUowwUV+Gp5AoGKUt/DtfityIQmdJde7+6YaxYDsDEKZfP6na4C2Zyt6qXtCWYIBXo8sEuEE7j38lrWFgNODc8k53PgSBp6R0lPjzw5/U9HPzBKmkn705JdfRN5kYbKVpCPOQqMpjYU1jSWpqdO4ZgobEWT5n3XtHCuV6oO/P5t2lbwm+mVuVotrjv3p7/x+CbN+nTvA9Kma7mOdK8FMrQlqXD/FpnSvSLuO5/3Q7LltdKF3njmmUmsqFnbpXOaxKpgDb+ijeuwndEZ1hvyLepcxahBevbOS+ZOtdWHmfGhTlZ4uCgbgAXJ04FH8xFrf1/VG4GtMHuapT6tlJcxqBer4--8e7ThvhXjefdpI+0--QG+VUWhyp9cydSvpzydc0w==
hD5l4bBd4gVw4DS/oeGO3XDXcwPUCSVCt+TFbmQsy+Hx4mBSTLF22nRljM/KDXPSU2b2RqbOl3AcDTeZWlNHqK0OUL68CPjrOWK0KFOqKPgqsh+ydEcjpCWkm1A5mQdxCO4GQ2zjZwZxeQvZbvGiPz1sc4WpVd5zILUcYeiK2P8lLqcLdSHroQbG9+L/0VT8d+0B1mNsdJ339S3cI0y+4Cp9evwi3l7fjIRpHN10tPFJJkuKd90syGOif1tDvtScl1bfkK3f7xmNL2lzDBMEGq2BzhFte/QZmBKYnHEeAolmMthLMQqQdZTftGUrxaZeh4OtzcKwerox81ZmUjVkubTGJEsSrB2Q8iR5RMZNscmjbzem6YEiT4mef7HeAECasZG5IbwHAMMh+s/BnKVWUlS0Srit5Zt40Judyde1K1UgjxsCsqWJS4JYcfhkmBzPi7x1paxGlWcezsEAJZFqjiqzIzvc4cvDoyCaEO1FMfbvOlHWgTH5A2xV2xnCiwE2DgkrVR9a9vaCkwxAdNV44Esq8npeVkytF39khDuaWWsn2dh+3j6vcnzGyI2gbIgyYAv7aV0+JEPBGRlPHHhVt3nUzYQUHaJHUL+yhjkIuMKxIilkOig+RhZDjdujR+kZdqP8KyVXuAg8Ui/h+/kG2XMtQR8YhHIQZFhZk9TVQQK2IC7PKmqJEo/V8zh53q37vaODthUNzACp9ww7UsZ6/35oconmuVpa1Ucijahw5Fiv1WEGG2oI0QPEhEQToG5dWHi2DbYwmQe1d/deG/CdFKMhdGaCGeOC/fZWTtUqKfDp70eytfic8I1jwsK1p3smO3/PZYNXwo78VXaZVKepqhp8IgVxPoNIY4YtmaEpigcHMmm409quT7/mX3fTpdGqcUPBVxx94w5xYVACmStGIGP2AesIgfdbxz0HzDf1aNMj8iEZuyo1gQ4rUiD6YygbzdDXOjuVa3Aq8ZvO3ixkJWFB65w6+KsLmt93zodHndH4N62PnmUw6eAJCmIxbBaJqU1kim8IQd0zXv5Bx3hMq7GVulMZccsIzZoCWDACJic+jsYM3Rp7UBP1WYsGpfOSWC14PXQGVwmXKhG3hn3a/Lim/nkMKUclx1Hy2W+qpDPKDl27uQPxF9wt4ZwkAyBdKXgDCpq11GPI2Kkj5Zvhrx+labnulixV6HfZpQJdcVk11Rqf5O8l6v/AHjSaQ6qrEztNLZL/IE5cqeWiZZZOIxUQuYVa6aldsEH9+niHzD6/U6JvHVP0dRj5PNF0TepNtvFkzUKZI+fTG6R6OYyMSBcOOW7i6brSCNM1I+Yx3+F7eACovJQhl9spb0YEbxvENYeHStlWYUnxFlqslbPHIikB/2J1BMzpscVC7AR2nJgBifoGcrMX2w9cCH2SgGW2HFkr21b51YuXQo99vBrk0dUudDC7s1f4bnudtaLZOUjrCcY0DudT8RUcDrVfJpkxOvGgcI6zxIzoXLu9SW3bS27zJWfhhZxXqljfczHgXThFWqlQd518XUGmfxGrg9C+fP2ix6ngAaIAkywavbLvfltYe6mkJGrTIjVAns6GX3QtroLJYvvCq2i0BHRJSvISJVfzZ7E6JYLdVJS9ur3C3jrIGl4iiiVJeuwjcJA7NFlrey9aN/MR/4jwj7TlMqi+47bQGThT1nzyZmvkB2EJpO271Vw1dSbUsaj5hz5dKiypav3puMXHq7IguQv12gUaRX5ccMaNqDbyXhC0jWWa9xoFsisAEqNCNCrwosivND18fbaaNUZ2G0QuuPFjYlTiEKvXuaaip/QOmYlQ8q99arps7rn6SK8rhRX3ymhxLXyx5yIjtI+BXMvylXM05OVMWmhm7ST3p7ecqENCS0Y2cA119mjUd6TG+ulLyw4l3UGIBAwLrUcraNvpiYNUQcx+y07OfHAjct4lIdseaCqnqPB37W4k9p6aYo+CRlaE/uKtckdkBCtlHxxeoREyQyepTgtI2uq+i+UEeYmqQrHIAEsT+C76cSaeuwINRYQ6z6AjrmvgvQ4LjteyV2ky1M7ckBGZMnh1mLcB6jN0zEsqwgs74yYeZnmEVN0nwOc+3M9OHwZEjIIbgedHBXhOF/y+Aa9dSomYaHrmVDzrRvUmF2z0gqNQAcbL7TVo9FmwLxBYUFqORXLlf0B6An5vpDzEslw3sKSx4+hbclYDAGKsLuPPeFD1O2D3v+WluUqwCSdkVXNNnPD0rZUT/8jVVSBDb11Vvbtl603cTaZJcRujvynJllx1jky3RiVuTkG7HBGTLIozy9Na6eQzATBZazRQC7cVobySMsyf4CKtHcAH97Vl5fSrYcBtTkgTbWrZrVfMJwT6ufnSyDXHkOXWhi37J9J5Ig/w6c4G1yQcjOa12jsHsWhRTNt8XY6b9r2vW1kqA2+bbtW9Z8o8NcskNCoPKOKzeRlEJ2cPvAFV5Fh+H9FZN6UpCxMDEuggBjAB/J3EYjFG6XEVqEOhfXMVOmRJ9ywsp7whxtk1F5kxhRDXomOsCxydd3gGAyIdwF0NHIptjaa7DIrxbCB+u+BtmytZzE00QwoeffVmLhoa8JyTMHGdK7ion+bfuFnEiP8/m1O8uvFesnfDzWp6+/F8Au4m2GACcZcjn0TsoNW7iYy6xMXn32X7rKcs12baJFbLsH4xJfdOPM0RlH16E79n6g8p7uUb8RSzrJ0cUsLC5dRVhw2hQWNbzQudEMtciIvvfGAkDBpUcpu/4ZkmS8RdblmJ+BDT0yFKddWdD217EYzrPbrpXXomPU95kEq9z9i0EKA5e8hjGt1roAgGh51/sezhrLVGmAeIPodxbxbWOWH3UTt+HUP5QpnRq8k2qILk4FuiFRlHUTNLvVtSdk1Z8XybLwmYMdoGTSQjD1mSmVSouA==--UPEZvym9PPXcBjaM--xrk0OyBCCPOBgaRp2gjxIg==
28 changes: 28 additions & 0 deletions kubernetes/cron_sync.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
apiVersion: batch/v1
kind: CronJob
metadata:
name: eras-corp-partners-cron-sync-job
spec:
schedule: "0 3 * * *"
yuenmichelle1 marked this conversation as resolved.
Show resolved Hide resolved
jobTemplate:
spec:
template:
metadata:
name: eras-corp-partners-sync
spec:
containers:
- name: eras-corp-partners-sync
image: ghcr.io/zooniverse/eras
env:
- name: RAILS_LOG_TO_STDOUT
value: "true"
- name: RAILS_ENV
value: production
- name: RAILS_MASTER_KEY
valueFrom:
secretKeyRef:
name: eras-production
key: rails-master-key
command: ['ruby', './scripts/user_group_membership_classification_backfill.rb']
restartPolicy: Never
backoffLimit: 2
yuenmichelle1 marked this conversation as resolved.
Show resolved Hide resolved
7 changes: 7 additions & 0 deletions scripts/backfill_classifications.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
##
## This script was used in VM when first introducing ERAS. We needed to backfill classifications into ERAS db.
## Unfortunately there was too much data to do a straight copy from panoptes db to copy to eras db.
## The script was followed up with save_classifications_chunk_in_files.py and copy_classifications_from_files.py, which copies from panoptes
## DB to csvs and then csvs to Eras DB. See PR: https://github.com/zooniverse/eras/pull/40
##

import os
import psycopg
from datetime import datetime
Expand Down
5 changes: 5 additions & 0 deletions scripts/backfill_talk_comments.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
##
## This script was used in VM when first introducing ERAS. We needed to backfill talk comments into ERAS db.
## This script is a straight COPY FROM Talk DB to COPY TO ERAS DB.
##

import os
import psycopg
from datetime import datetime
Expand Down
8 changes: 8 additions & 0 deletions scripts/copy_classifications_from_files.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
##
## This script along with save_classifications_chunk_in_files.py was used in VM when first introducing ERAS.
## We needed to backfill classifications into ERAS db.
## The script was preluded with backfll_classifications.py which does a straight copy from panoptes db to copy to eras db.
## There was too much data to do a straight copy from panoptes db to copy to eras db, so we had to chunk in files.
## See PR: https://github.com/zooniverse/eras/pull/40
##

import os
import psycopg
from datetime import datetime
Expand Down
51 changes: 51 additions & 0 deletions scripts/panoptes_membership_client.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# frozen_string_literal: true

require 'pg'
require '../config/environment'

class PanoptesMembershipClient
def user_ids_not_in_user_group(user_group_id, domain_formats)
conn.exec(
"SELECT id FROM users
WHERE email ILIKE ANY(STRING_TO_ARRAY('#{domain_formats.join(',')}', ','))
AND id NOT IN (SELECT user_id FROM memberships where user_group_id=#{user_group_id})
"
).entries.map { |res| res['id'].to_i }
end

def insert_memberships(user_group_id, user_ids)
memberships_to_create = user_memberships(user_group_id, user_ids)

member_creation_sql_query = memberships_insert_query(memberships_to_create)

conn.exec_params(member_creation_sql_query, memberships_to_create.flatten)
end

private

def conn
@conn ||= PG.connect(Rails.application.credentials.panoptes_db_uri, sslmode: 'require')
end

def user_memberships(user_group_id, user_ids)
memberships_to_create = []
user_ids.each do |user_id|
# membership in array order: user_id, user_group_id, state, roles
membership = [
user_id,
user_group_id,
0,
'{"group_member"}'
]
memberships_to_create << membership
end
memberships_to_create
end

def memberships_insert_query(memberships_to_create)
# Values is part of sql query will look like ($1, $2, $3, $4), ($5, $6, $7, $8), ..etc..
values = memberships_to_create.length.times.map { |i| "($#{(4 * i) + 1}, $#{(4 * i) + 2}, $#{(4 * i) + 3}, $#{(4 * i) + 4})" }.join(',')

"INSERT INTO memberships (user_id, user_group_id, state, roles) VALUES #{values}"
end
end
8 changes: 8 additions & 0 deletions scripts/save_classifications_chunk_in_files.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
##
## This script along with copy_classifications_from_files.py was used in VM when first introducing ERAS.
## We needed to backfill classifications into ERAS db.
## The script was preluded with backfll_classifications.py which does a straight copy from panoptes db to copy to eras db.
## There was too much data to do a straight copy from panoptes db to copy to eras db, so we had to chunk in files.
## See PR: https://github.com/zooniverse/eras/pull/40
##

import os
import psycopg
from datetime import datetime
Expand Down
3 changes: 1 addition & 2 deletions scripts/user_group_membership_classification_backfill.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@

current_time = now.strftime("%H:%M:%S")
print("BEFORE Time =", current_time)

parser = argparse.ArgumentParser()
parser.add_argument("-ug", "--user_group_id", type=int)
parser.add_argument('email_domain_formats')
Expand Down Expand Up @@ -47,7 +46,7 @@
panoptes_db_conn.commit()

# eras get classification_events of not_in_group_yet_user_ids that does not have user_group_id within their user_group_ids classification_event
eras_cursor.execute("SELECT classification_id, event_time, session_time, project_id, user_id, workflow_id, created_at, updated_at, user_group_ids from classification_events WHERE user_id = ANY(%s) AND %s!=ANY(user_group_ids)", (not_in_group_yet_user_ids, user_group_id))
eras_cursor.execute("SELECT classification_id, event_time, session_time, project_id, user_id, workflow_id, created_at, updated_at, user_group_ids from classification_events WHERE user_id IN %s AND %s!=ANY(user_group_ids)", (not_in_group_yet_user_ids, user_group_id))
classification_events_to_backfill = eras_cursor.fetchall()

# create classification_user_group
Expand Down
74 changes: 74 additions & 0 deletions scripts/user_group_membership_classification_backfill.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# frozen_string_literal: true

require '../config/environment'
require './panoptes_membership_client'
require 'json'

corporate_user_groups_str = Rails.application.credentials.corporate_user_groups
yuenmichelle1 marked this conversation as resolved.
Show resolved Hide resolved
corporate_partners = JSON.parse(corporate_user_groups_str)

puts 'Starting Classification and Membership Backfill...'

panoptes_client = PanoptesMembershipClient.new

corporate_partners.each do |corporate_partner|
yuenmichelle1 marked this conversation as resolved.
Show resolved Hide resolved
puts "Geting Ids of users that are not in group yet for #{corporate_partner['corp_name']}..."
not_yet_member_user_ids = panoptes_client.user_ids_not_in_user_group(corporate_partner['user_group_id'], corporate_partner['domain_formats'])

puts "Query found #{not_yet_member_user_ids.length} users not in the #{corporate_partner['corp_name']} user_group..."

next unless not_yet_member_user_ids.length.positive?

puts "Creating Memberships for #{corporate_partner['corp_name']}..."
panoptes_client.insert_memberships(corporate_partner['user_group_id'], not_yet_member_user_ids)

puts 'Querying Eras ClassificationEvents of newly created members...'
classification_events_to_backfill = ClassificationEvent.where('user_id IN (?)', not_yet_member_user_ids)

next unless classification_events_to_backfill.length.positive?

puts 'Creating Classification User Groups...'
classification_user_groups_to_create = []
classification_events_to_backfill.each do |classification|
classification_user_group = {
classification_id: classification.classification_id,
event_time: classification.event_time,
project_id: classification.project_id,
workflow_id: classification.workflow_id,
user_id: classification.user_id,
session_time: classification.session_time,
user_group_id: corporate_partner['user_group_id']
}
classification_user_groups_to_create << classification_user_group
end

ClassificationUserGroup.upsert_all(classification_user_groups_to_create,
unique_by: %i[classification_id event_time user_group_id user_id])

puts 'ClassificationUserGroup Upsert Finished...'
end

today = Date.today.to_s
two_days_ago = (Date.today - 2).to_s
puts 'Classification and Membership Backfill Finished. Starting CA Refresh...'
puts 'Refreshing Continuous Aggregates dealing with User Groups...'

puts 'Refreshing Daily Group Classifications Count And Time...'
ActiveRecord::Base.connection.exec_query("CALL refresh_continuous_aggregate('daily_group_classification_count_and_time', '#{two_days_ago}', '#{today}')")

puts 'Refreshing Daily Group Classifications Count And Time Per Project...'
ActiveRecord::Base.connection.exec_query("CALL refresh_continuous_aggregate('daily_group_classification_count_and_time_per_project', '#{two_days_ago}', '#{today}')")

puts 'Refreshing Daily Group Classifications Count And Time Per User...'
ActiveRecord::Base.connection.exec_query("CALL refresh_continuous_aggregate('daily_group_classification_count_and_time_per_user', '#{two_days_ago}', '#{today}')")

puts 'Refreshing Daily Group Classifications Count And Time Per User And Project...'
ActiveRecord::Base.connection.exec_query("CALL refresh_continuous_aggregate('daily_group_classification_count_and_time_per_user_per_project', '#{two_days_ago}', '#{today}')")

puts 'Refreshing Daily Group Classifications Count And Time Per User And Workflow...'
ActiveRecord::Base.connection.exec_query("CALL refresh_continuous_aggregate('daily_group_classification_count_and_time_per_user_per_workflow', '#{two_days_ago}', '#{today}')")

puts 'Refreshing Daily Group Classifications Count And Time Per Workflow...'
ActiveRecord::Base.connection.exec_query("CALL refresh_continuous_aggregate('daily_group_classification_count_and_time_per_workflow', '#{two_days_ago}', '#{today}')")

puts 'Stats User Group Membership and Classification Backfill Completed'
Loading