Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pin tensorflow_version on TPU Node Full Test so that the test passes #9786

Merged

Conversation

zeleena
Copy link
Contributor

@zeleena zeleena commented Jan 9, 2024

Pin tensorflow_version on TPU Node Full Test so that the test passes. Fixes hashicorp/terraform-provider-google#16703

Because mmv1/products/tpu/Node.yaml already uses tpu_node_full.tf.erb for documentation and tpu_node_full_test.tf.erb for tests, this change shouldn't affect public documentation since we are just updating tpu_node_full_test.tf.erb.

Release Note Template for Downstream PRs (will be copied)


@modular-magician
Copy link
Collaborator

Hello! I am a robot. It looks like you are a: Community Contributor Googler Core Contributor. Tests will run automatically.

@zli82016, a repository maintainer, has been assigned to review your changes. If you have not received review feedback within 2 business days, please leave a comment on this PR asking them to take a look.

You can help make sure that review is quick by doing a self-review and by running impacted tests locally.

@modular-magician
Copy link
Collaborator

Hi there, I'm the Modular magician. I've detected the following information about your changes:

Diff report

Your PR generated some diffs in downstreams - here they are.

Terraform GA: Diff ( 1 file changed, 1 insertion(+), 4 deletions(-))
Terraform Beta: Diff ( 1 file changed, 1 insertion(+), 4 deletions(-))
TF OiCS: Diff ( 1 file changed, 1 insertion(+), 4 deletions(-))

@modular-magician
Copy link
Collaborator

Tests analytics

Total tests: 4
Passed tests 4
Skipped tests: 0
Affected tests: 0

Click here to see the affected service packages
  • tpu

$\textcolor{green}{\textsf{All tests passed in REPLAYING mode.}}$
View the build log

@zli82016
Copy link
Member

zli82016 commented Jan 9, 2024

Hello, @zeleena , did you run the test TestAccTPUNode_tpuNodeFullTestExample locally and did it pass or not?

@zeleena
Copy link
Contributor Author

zeleena commented Jan 10, 2024

Hi @zli82016. I ran the test locally. On its own it fails with

bootstrap_test_utils.go:428: Error bootstrapping shared test global address "tf-bootstrap-addr-vpc-network-1": googleapi: Error 400: Invalid value for field 'resource.network': 'projects//global/networks/tf-bootstrap-net-vpc-network-1'. The project '' was not found., invalid

which seems unrelated to the fix here. Instead, it seems related to the test_vars_overrides on network_name and the function acctest.BootstrapSharedServiceNetworkingConnection(t, "vpc-network-1").

I had to modify the networking aspects to mirror the basic example. It may still hit capacity errors, but that is retry-able, and when it can get the capacity, the test passes

$ make testacc TEST=./google/services/tpu TESTARGS='-run=TestAccTPUNode_tpuNodeFullTestExample'
TF_ACC=1 TF_SCHEMA_PANIC_ON_ERROR=1 go test ./google/services/tpu -v -run=TestAccTPUNode_tpuNodeFullTestExample -timeout 240m -ldflags="-X=github.com/hashicorp/terraform-provider-google/version.ProviderVersion=acc"
=== RUN   TestAccTPUNode_tpuNodeFullTestExample
=== PAUSE TestAccTPUNode_tpuNodeFullTestExample
=== CONT  TestAccTPUNode_tpuNodeFullTestExample
--- PASS: TestAccTPUNode_tpuNodeFullTestExample (269.03s)
PASS
ok  	github.com/hashicorp/terraform-provider-google/google/services/tpu	269.162s

This fix won't address the capacity errors, but it will address the random set of Tensorflow versions bug.

@zli82016
Copy link
Member

BootstrapSharedServiceNetworkingConnection

Thanks for testing it locally.

The reason for bootstrapped network error could be that the environment variable GOOGLE_PROJECT is not set locally and is not related to this PR.

Copy link
Member

@zli82016 zli82016 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zli82016 zli82016 merged commit 927f3c9 into GoogleCloudPlatform:main Jan 10, 2024
12 checks passed
@zeleena zeleena deleted the tpu-tensorflow-versions-list branch January 10, 2024 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failing test(s): TestAccTPUNode_tpuNodeFullTestExample
3 participants