Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't re-index unchanged entries #116

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

joshwlewis
Copy link
Member

Another effort to reduce the number of pulls: check the database to see if we have the exact entry already. We don't need to refetch the upstream image when the image address (which includes the sha) hasn't changed.

select exists(
select name
from buildpacks
where buildpacks.namespace = $1
Copy link
Contributor

@edmorley edmorley Jun 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super-familiar with how indexes work for Postgres, so I could be completely mis-remembering/making this up, but does the WHERE column order need to match the column order in the index for it to work? Or does that only affect when using eg GROUP BY?

ie: The existing index has the "name" column first:
https://github.com/buildpacks/registry-api/blob/80f586fee0b869a6d1c59fc4ba61fc1522926a2e/db/schema.rb#L34C14-L34C44

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My experience tells me that postgres will figure it out, but I don't mind reordering just in case it helps in some small bit.

@edmorley
Copy link
Contributor

edmorley commented Jul 19, 2023

@joshwlewis I took a look at the indexer logs in production, and whilst we're no longer hitting any rate limits after #115,
the whole indexing process takes ~5-6mins from start to finish with the current "pull everything" approach. This will worsen over time, as the number of buildpack/version combinations naturally grows.

As such, I think it would still be worth merging this PR, to both lower that 5-6 mins (which affects the latency of publishing a new buildpack release) and also protect against future rate limit issues.

As an added bonus, after this PR merges the updated_at field returned by the API will actually match the last time an entry's metadata changed, and not the last time the metadata was needlessly refreshed :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants