Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore lost data - 3rd attempt #1473

Closed
mahalakshme opened this issue Jan 15, 2024 · 7 comments
Closed

Restore lost data - 3rd attempt #1473

mahalakshme opened this issue Jan 15, 2024 · 7 comments
Assignees

Comments

@mahalakshme
Copy link

mahalakshme commented Jan 15, 2024

Expected:

select * from individual where observations->>'24dabc3a-6562-4521-bd42-5fff11ea5c46' is null and created_date_time > '2023-11-27'::timestamptz and created_date_time < '2023-12-14'::timestamptz;;
  -- no of records created between the dates when the issue was in prod - only 18 rows

select * from individual where observations->>'24dabc3a-6562-4521-bd42-5fff11ea5c46' is null;
-- only 4 rows in oct 29 snapshot is null

So if we had had the right backups - at max we would lose only data for 22 individuals. So we should work towards achieving this.

Actual - Current

select * from individual where observations->>'24dabc3a-6562-4521-bd42-5fff11ea5c46' is null;

But currently 219 rows dont have houshold number(considering this field since it is a mandatory field)

What so far done:

Previously - data was restored from prerelease - it had data from 30th November prod data. But since the issue was introduced on 27th November there were updates to 600 individuals because of which it had already lost data.
Image

Next attempt was made to restore data from data shared from client(downloaded from reports). But still around 200 individuals didint have even primary data, and even for those updated, we were able to restore only the data available in the downloaded report and not all data of the individuals.

The above had led to major issues for the client.

What can be done:

We have a backup of prod taken on 29th October. So that will have the old data other than the those created after it.

select * from individual where observations->>'24dabc3a-6562-4521-bd42-5fff11ea5c46' is null and created_date_time > '2023-10-28'::timestamptz and created_date_time < '2023-12-15'::timestamptz;;
 -- created between based on backup - only 35 rows

So at max if we restore from the above we should lose only 35+4 = data for 39 individuals. Considering we have also restored from 30th Nov prerelease backup, that might have the data for the ones missed by the restore of Oct-30 backup.

Acceptance criteria:

  • Restore data(individual, encounter, program encounter, program enrolment) for calcutta kids and IHMP from the Oct-29 prod snapshot. This data should not override if any value for a field exists already(since this will be the latest) since the data from snapshot will be old.
    -- I have restored the data from the snapshot to a RDS instance(host name: oct-29-snapshot-db.avniproject.org) which can be accessed via staging EC2 instance. staging env will continue to work as usual connecting to its staging db.
  • As part of this card(Restore lost data for orgs inherited from org1 avni-server#668) looks like only data created from 27-Nov was restored.. Make sure from 30th Nov dump as well, data created from 30th October to 30th November is restored for CK and IHMP. And the data from here will be the latest and need to override data from 29th October data dump.
  • Also update somewhere with no of individuals missing the mandatory fields for both orgs before and after the fix.
@mahalakshme mahalakshme converted this from a draft issue Jan 15, 2024
@mahalakshme mahalakshme moved this from In Analysis Review to Ready in Avni Product Jan 15, 2024
@petmongrels petmongrels moved this from Ready to In Progress in Avni Product Jan 15, 2024
@petmongrels petmongrels self-assigned this Jan 15, 2024
@petmongrels petmongrels moved this from In Progress to Code Review Ready in Avni Product Jan 15, 2024
@mahalakshme mahalakshme moved this from Code Review Ready to In Code Review in Avni Product Jan 15, 2024
@mahalakshme
Copy link
Author

Was unable to review completely. Will review after 5pm since need to be away for a work. But found the below:

  • same will be run for ihmp also?
select * from program_encounter where encounter_type_id=35 and observations->>'f90bb5df-c7fa-4266-8141-3534f3e47e6c' is null and encounter_date_time is not null order by last_modified_date_time desc;

oct 30 backup had 200 less data with nulls for the program encounter. If we have lost data of encounters outside of the date range (27thNov - 14th Dec - we need to try restore them).

@petmongrels
Copy link

not planning to run for encounters in CK also

@mahalakshme
Copy link
Author

@petmongrels will analyse if need to run for encounters, we can run for individual and enrolment for now.

Regarding this point:
As part of this card(Restore lost data for orgs inherited from org1 avni-server#668) looks like only data created from 27-Nov was restored.. Make sure from 30th Nov dump as well, data created from 30th October to 30th November is restored for CK and IHMP. And the data from here will be the latest and need to override data from 29th October data dump.

  • there was an issue with addition of created_data_time in SQL in update SQL and we ran for ck right. we ran it for ihmp also?

Adding some data here for later reference:
In prod: -- individual
set role calcutta_kids;
select count(*) from public.individual where observations->>'07cf0ae9-9738-418f-b1bd-bf69ebd20e3c' is null and created_date_time < '2023-10-29'::timestamptz and created_date_time > '2019-10-29'; -- 773

in oct29 backup:
select count(*) from individual where observations->>'07cf0ae9-9738-418f-b1bd-bf69ebd20e3c' is null and created_date_time < '2023-10-29'::timestamptz and created_date_time > '2019-10-29'::timestamptz; -- 0

In prod: --enrolment
select count(*) from program_enrolment where program_id in (2) and observations->>'1ae1338b-fa7f-40c0-b11a-232531eb5919' is null and program_enrolment.created_date_time < '2023-10-29'::timestamptz and created_date_time > '2019-10-29'::timestamptz;--629

in oct29 backup:
select count(*) from program_enrolment where program_id in (2) and observations->>'1ae1338b-fa7f-40c0-b11a-232531eb5919' is null and program_enrolment.created_date_time < '2023-10-29'::timestamptz and created_date_time > '2019-10-29'::timestamptz; --0

@mahalakshme mahalakshme moved this from In Code Review to QA Ready in Avni Product Jan 16, 2024
@petmongrels petmongrels moved this from QA Ready to In Progress in Avni Product Jan 16, 2024
@petmongrels petmongrels moved this from In Progress to Code Review Ready in Avni Product Jan 16, 2024
@petmongrels petmongrels moved this from Code Review Ready to In Progress in Avni Product Jan 16, 2024
@petmongrels petmongrels moved this from In Progress to Code Review Ready in Avni Product Jan 16, 2024
@petmongrels petmongrels moved this from Code Review Ready to In QA in Avni Product Jan 16, 2024
@mahalakshme
Copy link
Author

mahalakshme commented Jan 18, 2024

@petmongrels from the below queries I think for ihmp it will help to restore data both for registeration and enrolment

check for IHMP resgisteration form - individual subject type - contact number concept :
current prod data:
select count(*) from public.individual where observations->>'82fa0dbb-92f9-4ec2-9263-49054e64d909' is null and created_date_time < '2023-10-29'::timestamptz and created_date_time > '2019-10-29'
and subject_type_id=1; -- 45357

oct 29 backup:
select count(*) from public.individual where observations->>'82fa0dbb-92f9-4ec2-9263-49054e64d909' is null and created_date_time < '2023-10-29'::timestamptz and created_date_time > '2019-10-29'
and subject_type_id=1; --45077

check for eligible couple enrolment - gravida concept:
current prod data:
select count(*) from program_enrolment where observations->>'dc2c23e9-19ad-471f-81d1-213069ccc975' is null and created_date_time < '2023-10-29'::timestamptz and created_date_time > '2019-10-29'
and program_id=11; -- 356

oct 29 backup:
select count(*) from program_enrolment where observations->>'dc2c23e9-19ad-471f-81d1-213069ccc975' is null and created_date_time < '2023-10-29'::timestamptz and created_date_time > '2019-10-29'
and program_id=11; -- 41

@petmongrels
Copy link

petmongrels commented Jan 18, 2024 via email

@mahalakshme
Copy link
Author

@petmongrels no I have added created_date_time filter in the query to consider the same count of data for both. also verified the same by running the below query in both prod and oct 29 dumps - returned the same result:

select count(*) from public.individual where created_date_time < '2023-10-29'::timestamptz and created_date_time > '2019-10-29'
                                         and subject_type_id=1; -- 101271

@mahalakshme
Copy link
Author

@petmongrels also from the below queries found that for enrolment data for program id 11, we will not be able to restore only data created outside the data loss period(except for one) from the below query which is a win for us.

executed on prod:

select count(*) from program_enrolment where observations->>'dc2c23e9-19ad-471f-81d1-213069ccc975' is null and created_date_time < '2023-10-29'::timestamptz and created_date_time > '2019-10-29'
and program_id=11; -- 356

select count(*) from program_enrolment where observations->>'dc2c23e9-19ad-471f-81d1-213069ccc975' is null and created_date_time < '2023-12-16'::timestamptz and created_date_time > '2019-10-29'
                                         and program_id=11; -- 419

select count(*) from program_enrolment where observations->>'dc2c23e9-19ad-471f-81d1-213069ccc975' is null and created_date_time < '2023-12-15'::timestamptz and created_date_time > '2023-11-27'
                                         and program_id=11; -- 62

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants