Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUMULUS-3806 update Orca Recon Report to use postgres as source of truth #3794

Open
wants to merge 17 commits into
base: feature/es-phase-2
Choose a base branch
from

Conversation

Jkovarik
Copy link
Member

@Jkovarik Jkovarik commented Sep 6, 2024

Summary: Summary of changes

Addresses Internal/Orca reconciliation report components from CUMULUS-3806

Changes

  • CUMULUS-3806
    • Update @cumulus/db/lib/granule.getGranulesByApiPropertiesQuery to
      be parameterized and include a modifier on temporalBoundByCreatedAt
    • Remove endpoint call to and all tests for Internal Reconciliation Reports
      and updated API to throw an error if report is requested
    • Update Orca reconciliation reports to pull granules for comparison from
      postgres via getGranulesByApiPropertiesQuery

PR Checklist

  • Update CHANGELOG
  • Unit tests
  • Ad-hoc testing - Deploy changes and test manually
  • Integration tests

@Jkovarik Jkovarik changed the base branch from master to feature/es-phase-2 September 6, 2024 19:10
@@ -612,114 +623,6 @@ describe('When there are granule differences and granule reconciliation is run',
});
});

// TODO: the internal report functionality will be removed after collections/granules is changed to no longer use ES
xdescribe('Create an Internal Reconciliation Report to monitor internal discrepancies', () => {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is removed as internal reports are going away.


const pgGranulesIterator = new QuerySearchClient(
granulesSearchQuery,
100 // arbitrary limit on how items are fetched at once
Copy link
Member Author

@Jkovarik Jkovarik Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewers: looking for opinions on value/approach here. Given we're explicitly bounding this on performance testing per team decision/given existing limitations in the tool, I'm not sure if that veers into YAGNI given this might need refactored. We're using this value for the other recon reports, however....

@Jkovarik Jkovarik changed the title CUMULUS-3806 update Orca Recon Report to nominally use postgres CUMULUS-3806 update Orca Recon Report to use postgres as source of truth Sep 11, 2024
const {
granules: granulesTable,
collections: collectionsTable,
providers: providersTable,
} = TableNames;
const temporalColumn = temporalBoundByCreatedAt ? 'created_at' : 'updated_at';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have index on granules.created_at. Do we need to create one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jennyhliu that's a complicated question. We should discuss with the PO re: perf requirements.

CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we update this if 'internal' reconciliation report is no longer needed?

Copy link
Member Author

@Jkovarik Jkovarik Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question - this should just be removed along with it's units/etc. I'd been originally planning on doing that in a separate PR, but at this point it can just be included here.

Copy link
Member Author

@Jkovarik Jkovarik Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jennyhliu - after doing work on downstream parts of this ticket, I'm going to punt that to PR#3 in this ticket.

const nextOrcaId = `${nextOrcaItem.id}:${nextOrcaItem.collectionId}`;
if (nextCumulusId < nextOrcaId) {
// Found an item that is only in Cumulus and not in ORCA.
addGranuleToReport({
// eslint-disable-next-line no-await-in-loop
await addGranuleToReport({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so addGranuleToReport is likely called for every granule to retrieve files?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jennyhliu correct - this will increase the number of database calls, but by making them per-granule given how the rec reports are currently setup this should effectively disperse that load.

We absolutely need to improve the overall setup for these tests, that work should neatly fall under the 'improve rec reports' epic.

@Jkovarik
Copy link
Member Author

Jkovarik commented Oct 1, 2024

@jennyhliu I believe I've addressed your comments, thanks for the review! 🙇🏻 feature has also been merged in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants