Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add artifact docker image data source #9521

Merged

Conversation

dibunker
Copy link
Contributor

@dibunker dibunker commented Nov 26, 2023

Closes hashicorp/terraform-provider-google#16557 and similar to hashicorp/terraform-provider-google#4317, which requested the same kind of data source but for gcr, although gcr is deprecated and that issue is still open.

This PR adds a data source that makes it easier to obtain information about docker images stored in an Artifact Registry repo.

I've added:

  • Handwritten docs for the new data source

Acceptance test showing the data source can retrieve the docker image information when provided the following scenarios

  • An image name
  • An image name and tag
  • An image name and digest
  • An image name that does not have URL safe characters.

Release Note Template for Downstream PRs (will be copied)

`google_artifact_registry_docker_image`

Copy link

google-cla bot commented Nov 26, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@modular-magician
Copy link
Collaborator

Hello! I am a robot. It looks like you are a: Community Contributor Googler Core Contributor. Tests will require approval to run.

@roaks3, a repository maintainer, has been assigned to review your changes. If you have not received review feedback within 2 business days, please leave a comment on this PR asking them to take a look.

You can help make sure that review is quick by doing a self-review and by running impacted tests locally.

@modular-magician modular-magician added the awaiting-approval Pull requests that need reviewer's approval to run presubmit tests label Nov 26, 2023
@dibunker
Copy link
Contributor Author

Hey @roaks3 when you get a chance, can you take a look at my PR?

@modular-magician modular-magician removed the awaiting-approval Pull requests that need reviewer's approval to run presubmit tests label Dec 5, 2023
@modular-magician
Copy link
Collaborator

Hi there, I'm the Modular magician. I've detected the following information about your changes:

Diff report

Your PR generated some diffs in downstreams - here they are.

Terraform GA: Diff ( 4 files changed, 575 insertions(+))
Terraform Beta: Diff ( 4 files changed, 575 insertions(+))

@roaks3
Copy link
Contributor

roaks3 commented Dec 5, 2023

@dibunker sorry for the delay on this. I've taken a first pass and this mostly looks good. Running tests now, but I'm going to need to spend more time on this one verifying the fields and patterns being used, since this is a handwritten datasource.

@modular-magician
Copy link
Collaborator

Tests analytics

Total tests: 3263
Passed tests 2931
Skipped tests: 332
Affected tests: 0

$\textcolor{green}{\textsf{All tests passed in REPLAYING mode.}}$
View the build log

Copy link
Contributor

@roaks3 roaks3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a first pass (and haven't looked at tests yet), but the main callout I have is the field names being used, since they don't exactly match the API. We should just be careful we are using intuitive terms.

Optional: true,
Description: `Project ID of the project.`,
},
"repository": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you find precedence for this sort of naming? The API itself uses a mix of terms: repository_name, location (and registry_location), and docker_image (and image_id and digest), and I mostly want to make sure that these fields will be clear to users, especially if we are parsing an API field into multiple parts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, there is a lot of similar field names going on. I was thinking about it this morning, would it be better to replace the project, repository, and region with something like artifact_registry_repository_id and have users reference the id attribute from an google_artifact_registry_repository datasource or resource?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @roaks3 I wrote a commit that removes the project and location fields and changes the repository field to be the fully-qualified identity of the repository.

This is inline with how the repository field is being used in the google_artifact_registry_repository resource, when they use repository within the upstream_policies block.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely reasonable, but I think I would stick with the original split, simply because after looking through a large number of GCP's datasources, this seems to be the far more common pattern. I think in particular it benefits users that have a project/region set up at the provider level. So I would suggest project, location, and repository_id. IIRC location implies a region or zone can be used, so you may want to check if a zone is valid, and then decide between location and region. repository_id matches the repository resource field name.

You could still make an argument for the artifact_registry_repository_id you're suggesting, but for consistency I think this is worth changing.

For the docker_image field, that name seems fine to me (image_name might also work), but I'm a little concerned about the digest field. It seems that term is more often used for the whole name + sha, so could be ambiguous in this datasource, but TBH I'm not familiar enough with how these terms are used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Subserial as a member of the service team, could you weigh in on the field names used here? Do they reasonably match the API?

Copy link
Contributor

@Subserial Subserial Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing this to the repository data source, project, location and repository_id are best for consistency.

image, digest and tag describe the image, but I think it would be best as mentioned below to use a single image_name value, then if necessary parsing out the tag to get the digest.

This additionally removes the ambiguity with the computed tags field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roaks3 After changing to image_name field from the conversation below, I've split apart the repository argument to be as @Subserial described: a project, location, and repository_id.

Optional: true,
Description: `The image digest to fetch. This cannot be used if tag is provided.`,
},
"tag": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the field names, this one in particular is a bit ambiguous, because the resource already has a list of tags, and this is really more like a query/filter parameter. You may be able to find precedence for something like this as well, but if not I'll try to look around for similar data source patterns in the provider.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, tag and tags would easily be mixed up. The intent of the issue was to have a data source that would return the URI of a particular image:tag docker image, so I think it's fine to keep those two field names. I added the tags simply because it was available information from the REST API -- it wasn't specifically asked for in the issue, but I could see someone using the tags information when populating the description field for a compute instance or cloud run service.

I'd think renaming the tags field to something less ambiguous would be good, maybe all_image_tags or all_tags?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roaks3 I found that the google_compute_regions uses the status field as a filter parameter. Could this be sufficient precedence for the tag field used in this data source?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I think that makes sense in terms of some precedence. I don't see this come up very much in existing datasources, so the tag + tags might be the best we can do in terms of modeling.

However, looking at https://registry.terraform.io/providers/calxus/docker/latest/docs/data-sources/image, we may want to consider using just image_name, and having the tag and digest inferred from that value. This would be particularly appealing if we don't need to parse anything client-side, and can send the name up to the server for each case, because then we can rely on the server logic and the client has less ways to break or surprise users. Based on your current implementation though, I suspect the server doesn't recognize a tag in the image name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roaks3 Sounds good! I've implemented the logic to infer tags and digest from image_name, and removed those arguments from the data source.

return fmt.Errorf("Error setting api endpoint")
}

urlRequest, err = transport_tpg.AddQueryParams(urlRequest, map[string]string{"orderBy": "update_time desc"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just verifying, is this sorting going to be relevant for all (or most) use cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more of an optimization: since there can be only one image with a particular label, it's most likely going to be one that was updated recently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that makes sense.

I think I was also asking about this case (from your doc change): If a digest or tag is not provided, then the last updated version of the image will be fetched.. Is the most recently updated going to typically be what users want?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roaks3 Yes, I think that's a reasonable expectation that users have. It's also the behavior of other API calls, for example when calling the image of a family, the latest image is returned.

if err != nil {
return err
}
out:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we refactor this into a function (or multiple functions) instead of using a break label? I think it would be more clear, and we barely make use of this construct in the provider.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can refactor the for loop into a function which returns once an image is found, instead of having the break out, if that's more inline with the other constructions in the provider.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roaks3 I'm pushed a commit that refactors and replaces the break label construction into a function.

return dockerImages
}

func convertResponseToStruct(res map[string]interface{}) DockerImage {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On first read, I'm wondering if this function is needed. If you didn't have it, and tweak the d.Set(...) calls, that might be slightly more straight-forward / consistent with other resources.

Copy link
Contributor Author

@dibunker dibunker Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it, because on a previous iteration I was getting compiler errors when iterating through the list of tags from the REST API response -- unmarshaling the response into a struct made the compiler happy, so that's what I went with.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could do a d.Set() call instead of the dockerImage.tags = stringTags line you have, but keep the rest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow what you're suggesting? Isn't d.Set() specific to the *schema.ResourceData type? Here I'm setting fields in the DockerImage struct.

@modular-magician modular-magician added the awaiting-approval Pull requests that need reviewer's approval to run presubmit tests label Dec 7, 2023
@dibunker
Copy link
Contributor Author

@roaks3 just pinging in case this fell off your radar. Do the latest changes I submitted address all your concerns?

@roaks3
Copy link
Contributor

roaks3 commented Mar 29, 2024

/gcbrun

Copy link
Contributor

@roaks3 roaks3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dibunker thank you for the ping, this did fall off of my plate and I apologize for the delay.

I was going to kick off the tests, but when I went to look, I think they'll need to be adjusted to work in our various test environments.

Beyond that, I do want to review the code again, but ran out of time this week. I will plan to do that early next week, and then we should have this close to complete.

// https://console.cloud.google.com/artifacts/docker/go-containerregistry/us/gcr.io
// Currently, gcr.io does not provide a imageSizeBytes or buildTime field in the JSON response
const testAccDataSourceArtifactRegistryDockerImageConfig = `
provider "google" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test config might cause issues in our test environment:

  1. You should remove the provider block
  2. When run in our test environments, we won't have these projects already created. You'll want to use the primary test project (which means you shouldn't need to specify it), but then you'll need to seed test data there. Maybe you can create what you need with another TF resource (it looks like other tests have done so)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I can remove the provider block. I just had it to make sure that the data block for testTag and testDigest were using the project defined within a provider block.

If that's not necessary, I can remove the provider block and define the project fields for the testTag and testDigest data sources.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, I just checked and it does look like the block won't break things, but we have no other tests that define a provider-level default for project. I would stay consistent with those and specify project on each resource/data-source that needs it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roaks3 Alright, removed the provider block and updated the tests to include the project field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it looks like my second point from above got lost in the mix. We don't have a cloudrun project already created in our test environment, so I'd expect the config to fail. You probably want to pass envvar.GetTestProjectFromEnv() into your test, which is the id of our primary test project, but then you'd need to add config to your test to set it up so these data sources can retrieve data.

Does that make sense? What you have here now looks like it is written against resources that you created ahead of time, and we just need to replicate that.

@github-actions github-actions bot requested a review from roaks3 April 2, 2024 15:39
Copy link

@dibunker, this PR is waiting for action from you. Please address any comments or change requests, or re-request review from a core reviewer if no action is required.

Image showing the re-request review button

If no action is taken, this PR will be closed in 14 days.

This notification can be disabled with the disable-automatic-closure label.

@roaks3
Copy link
Contributor

roaks3 commented Jun 4, 2024

/gcbrun

@roaks3
Copy link
Contributor

roaks3 commented Jun 4, 2024

I'm going to try running tests just to make sure we get the error I'm expecting, but I think this is basically blocked on tests running due to #9521 (comment) (test project resource needs to be included in test config)

Copy link
Contributor

@roaks3 roaks3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like I can't actually run the checks (the CI pipeline has undergone a number of changes over the past month or two).

@dibunker sorry for the inconvenience, but this will need to be rebased to move forward.

@dibunker
Copy link
Contributor Author

dibunker commented Jun 4, 2024

No worries @roaks3 I should be able to look into it later this week.

@dibunker
Copy link
Contributor Author

dibunker commented Jun 4, 2024

@roaks3 Were you saying that the cloudrun project wouldn't be acceptable for your test environment? There was the brief discussion originally in my issue, preciously because Terraform is not able to create a docker image to setup as a test fixture.

@roaks3
Copy link
Contributor

roaks3 commented Jun 6, 2024

If cloudrun is a public project that can be referenced in this context, then that would be fine, I'm just not clear if there is a public project with this name. After rebase I can run the tests to confirm, but my thought was that we would need this infrastructure to be created for the test. And if it can't be created, we may need to reference existing resources in another way.

dibunker added 7 commits June 13, 2024 08:17
- Filter the list response by image name
- Added the `DockerImage` struct.
Test for a data source using a tag, and one for using a digest
- Created a yellow box for the Oauth scopes required
- Edited the attribute reference examples
dibunker added 12 commits June 13, 2024 08:17
Refactored the code a bit, too
The `buildTime` and `imageSizeBytes` are available in this repo.
Replace the break out logic with a function
Removing these in favor of using a fully-qualified repository ID.  This is in-line with the API usage of `repository` within the `upstream_policies` block.
- Parse `image_name` to detect a digest, tag, or none.
- Remove `digest` and `tag`
- Change `image` to `image_name` and updated description
- Splits into `project`, `location`, and `repository_id`
@dibunker dibunker force-pushed the add-artifact-docker-image-data-source branch from 9387906 to d9c4979 Compare June 13, 2024 15:24
@github-actions github-actions bot requested a review from roaks3 June 13, 2024 15:24
@dibunker
Copy link
Contributor Author

@roaks3 Okay, rebased the branch and the acceptance test worked on my local environment. The cloudrun artifact registry seems to be public -- you can access all the docker information using the API calls, at least.

@roaks3
Copy link
Contributor

roaks3 commented Jun 13, 2024

Awesome! I'll run tests now and see how it does.

@modular-magician modular-magician added service/artifactregistry and removed awaiting-approval Pull requests that need reviewer's approval to run presubmit tests labels Jun 13, 2024
@modular-magician
Copy link
Collaborator

Hi there, I'm the Modular magician. I've detected the following information about your changes:

Diff report

Your PR generated some diffs in downstreams - here they are.

google provider: Diff ( 4 files changed, 550 insertions(+))
google-beta provider: Diff ( 4 files changed, 550 insertions(+))

@modular-magician
Copy link
Collaborator

Tests analytics

Total tests: 24
Passed tests: 24
Skipped tests: 0
Affected tests: 0

Click here to see the affected service packages
  • artifactregistry

$\textcolor{green}{\textsf{All tests passed!}}$
View the build log

Copy link
Contributor

@roaks3 roaks3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests look good, and code looks good after scanning through again. Thanks for your patience with this one, will get it merged later today.

@roaks3 roaks3 merged commit 9156b0c into GoogleCloudPlatform:main Jun 14, 2024
11 checks passed
BBBmau pushed a commit to BBBmau/magic-modules that referenced this pull request Jun 24, 2024
@dibunker dibunker deleted the add-artifact-docker-image-data-source branch June 26, 2024 21:28
pcostell pushed a commit to pcostell/magic-modules that referenced this pull request Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Artifact registry docker image data source
4 participants