Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support specifying GCP account credentials as a config option. #4855

Merged
merged 7 commits into from
Apr 11, 2024

Conversation

teo-tsirpanis
Copy link
Member

@teo-tsirpanis teo-tsirpanis commented Apr 4, 2024

SC-44515


TYPE: CONFIG
DESC: Add vfs.gcs.service_account_credential config option that specifies a Google Cloud service account credential JSON string.


TYPE: CONFIG
DESC: Add vfs.gcs.external_account_credential config option that specifies a Google Cloud Workload Identity Federation credential JSON string.

Copy link

@teo-tsirpanis
Copy link
Member Author

I don't know why the tests are failing; they succeed on my local Windows machine in both Release and Debug mode.

@ihnorton ihnorton requested a review from thetorpedodog April 4, 2024 14:08
Copy link

@thetorpedodog thetorpedodog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should be able to get this tested sometime today, so will get back to you with information about whether everything works end-to-end once I have done so.

In the meantime, I have some minor thoughts about naming, to be consistent with the way Google names and documents things. None of these are requirements, just ideas to consider.

  • vfs.gcs.service_account_credentials might be better named vfs.gcs.[service_?]account_key, since Google Cloud describes these files as "account keys". (As for the inclusion or exclusion of service_, I think you might be able to provide a user account JSON string here and have it work, but I'm not positive. Even then, it's probably OK to include the service_ since that’s the intended use case, but it could be omitted for to keep the config name shorter.)
  • vfs.gcs.external_account_credentials: maybe workload_identity_config[uration], federated_identity_config[uration], or identity_pool_config[uration] since those better match to the public naming and documentation of workload identity federation. It's not really a credential in and of itself; it's more like a configuration that tells you how to fetch a credential. (That said it is sometimes described as an "external account" in code. Naming is hard ¯\_(°_o)_/¯.)

Out of everything I think I'm most confident in that we should use _key for the service account key configuration field. Beyond that, it's just vibes.

@teo-tsirpanis
Copy link
Member Author

I took the names from the Google Cloud functions MakeServiceAccountCredentials and MakeExternalAccountCredentials. For comparison Python calls the function from_service_account_info; I couldn't find an equivalent for external credentials. @KiterLuc what do you think?

you might be able to provide a user account JSON string here and have it work

I checked it and unfortunately this logic exists only for Application Default Credentials (the file in GOOGLE_APPLICATION_CREDENTIALS).

@thetorpedodog
Copy link

I took the names from the Google Cloud functions MakeServiceAccountCredentials and MakeExternalAccountCredentials. For comparison Python calls the function from_service_account_info; I couldn't find an equivalent for external credentials. @KiterLuc what do you think?

The way I look at it, it’s taking some input (the key, the federation info) to make the credentials; i.e., the credentials are the output, not the input. Also see the specific doc comment on MakeServiceAccountCredentials where the input is referred to specifically as a key.

@thetorpedodog
Copy link

I can confirm that everything seems to work here! I built this version of TileDB, pointed the Python library at it in a venv, and used this code:

#!/usr/bin/env python

import argparse
import shutil

import tiledb


def main() -> None:
    parser = argparse.ArgumentParser()
    parser.add_argument("--output", help="file to write to", required=True)
    parser.add_argument(
        "--impersonate", help="impersonate these accounts to open the file"
    )
    group = parser.add_mutually_exclusive_group()
    group.add_argument(
        "--federated-config",
        help="file with a federated workflow configuration",
        type=argparse.FileType(),
    )
    group.add_argument(
        "--service-account-key",
        help="file with a service account key",
        type=argparse.FileType(),
    )
    parser.add_argument("input_file", help="file to read from")

    parsed = parser.parse_args()

    tdb_cfg = {}
    if fedfile := parsed.federated_config:
        with fedfile:
            tdb_cfg["vfs.gcs.external_account_credentials"] = fedfile.read()
    if keyfile := parsed.service_account_key:
        with keyfile:
            tdb_cfg["vfs.gcs.service_account_credentials"] = keyfile.read()

    if parsed.impersonate:
        tdb_cfg["vfs.gcs.impersonate_service_account"] = parsed.impersonate

    vfs = tiledb.VFS(tdb_cfg)

    with vfs.open(parsed.input_file, "rb") as infile:
        with vfs.open(parsed.output, "wb") as outfile:
            shutil.copyfileobj(infile, outfile)


if __name__ == "__main__":
    main()

Cases I tested it with:

$ ./open-with-tiledb.py \
>   --output key-output.png \
>   --service-account-key ./gcp-account-key-impersonator.json \
>   --impersonate [email protected] \
>   gcs://some-bucket/some-file.png
$ ./open-with-tiledb.py \
>   --output federated-output.png \
>   --federated-config ./impersonate-from-url.json \
>   gcs://some-bucket/some-file.png

Weirdly, it seems like using application-default credentials didn’t work right (i.e., specifying neither service-account-key nor federated-config), though running gsutil cp gs://some-bucket/some-file.png gsutil-output.png did work. Could this be a me problem?

Here is the relevant output:

tiledb.cc.TileDBError: [TileDB::GCS] Error: Get object size failed on: some-file.png (Retry policy exhausted, with a last message of Could not create a OAuth2 access token to authenticate the request. The request was not sent, as such an access token is required to complete the request successfully. Learn more about Google Cloud authentication at https://cloud.google.com/docs/authentication. The underlying error message was: PerformWork() - CURL error [6]=Couldn't resolve host name)

@teo-tsirpanis
Copy link
Member Author

I will rename them tomorrow to vfs.gcs.service_account_key and vfs.gcs.workload_identity_configuration, if there are no other objections.

@thetorpedodog
Copy link

I will rename them tomorrow to vfs.gcs.service_account_key and vfs.gcs.workload_identity_configuration, if there are no other objections.

I am happy with both of those names but feel free to use another if you come up with something you think is better. We have time to decide before this is finally merged.

@thetorpedodog
Copy link

Weirdly, it seems like using application-default credentials didn’t work right (i.e., specifying neither service-account-key nor federated-config), though running gsutil cp gs://some-bucket/some-file.png gsutil-output.png did work. Could this be a me problem?

It was a me problem. There is a difference between being logged in to your project within the Google Cloud CLI and actually having application-default credentials.

You are logged into Google Cloud services within the Google Cloud command-line utilities when you run gcloud init. After doing so, you need to run gcloud auth application-default login for that account to be made available as application-default credentials.

After doing so, my ADCs were created and all three test cases worked perfectly.

@teo-tsirpanis teo-tsirpanis marked this pull request as ready for review April 8, 2024 16:16
@teo-tsirpanis
Copy link
Member Author

Test failures were fixed.

@teo-tsirpanis teo-tsirpanis marked this pull request as draft April 8, 2024 16:17
Fixes tests in CI. We set the `CLOUD_STORAGE_ENDPOINT` environment variable, which prevented the credentials config options from having effect.
Because setting the credentials config options is a more explicit action, it should take precedence over the environment variable.
@teo-tsirpanis teo-tsirpanis marked this pull request as ready for review April 10, 2024 14:15
@KiterLuc KiterLuc requested a review from robertbindar April 10, 2024 14:33
Copy link

@thetorpedodog thetorpedodog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! There are a couple comments and strings that should probably be updated to use the new terminology but functionally everything is in order.

* neither is specified, Application Default Credentials will be used. <br>
* **Default**: ""
* - `vfs.gcs.workload_identity_configuration` <br>
* Set the JSON string with Workload Identity Federation credentials.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: credentials to configuration here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

Comment on lines 198 to 199
"Both GCS service account credentials and external account "
"credentials were specified; picking the former");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another string to update with the new terminology (maybe the exact config key name?). I also might phrase this as something like "service account key set; ignoring workload identity configuration" to make it a little clearer (at least in my opinion; you are of course free to leave it as-is).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

@KiterLuc KiterLuc removed the request for review from robertbindar April 10, 2024 14:44
@KiterLuc KiterLuc closed this Apr 10, 2024
@KiterLuc KiterLuc reopened this Apr 10, 2024
@KiterLuc KiterLuc closed this Apr 11, 2024
@KiterLuc KiterLuc reopened this Apr 11, 2024
Copy link
Contributor

@robertbindar robertbindar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd usually prefer fixing the precedence of the 2 configs+env in a test, it means it's intentional and we support it that way, I'm also ok with merging this as is.

@KiterLuc KiterLuc merged commit 0ae11e2 into dev Apr 11, 2024
57 checks passed
@KiterLuc KiterLuc deleted the teo/gcs-credentials branch April 11, 2024 12:26
teo-tsirpanis added a commit that referenced this pull request Apr 11, 2024
[SC-44515](https://app.shortcut.com/tiledb-inc/story/44515/support-specifying-gcp-account-credentials-as-a-config-option)

---
TYPE: CONFIG
DESC: Add `vfs.gcs.service_account_credential` config option that
specifies a Google Cloud service account credential JSON string.

---
TYPE: CONFIG
DESC: Add `vfs.gcs.external_account_credential` config option that
specifies a Google Cloud Workload Identity Federation credential JSON
string.
KiterLuc pushed a commit that referenced this pull request Apr 11, 2024
… a config option. (#4871)

Backport 0ae11e2 from #4855.

---
TYPE: CONFIG
DESC: Add `vfs.gcs.impersonate_service_account` option that specifies a
service account to impersonate, or a comma-separated list for delegated
impersonation.

---
TYPE: IMPROVEMENT
DESC: Stop using deprecated Google Cloud SDK APIs.
dudoslav pushed a commit that referenced this pull request Apr 17, 2024
[SC-44515](https://app.shortcut.com/tiledb-inc/story/44515/support-specifying-gcp-account-credentials-as-a-config-option)

---
TYPE: CONFIG
DESC: Add `vfs.gcs.service_account_credential` config option that
specifies a Google Cloud service account credential JSON string.

---
TYPE: CONFIG
DESC: Add `vfs.gcs.external_account_credential` config option that
specifies a Google Cloud Workload Identity Federation credential JSON
string.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants