Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NetworkConfig and NetworkConfigs to TPU v2 QueuedResource #12482

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
111 changes: 111 additions & 0 deletions mmv1/products/tpuv2/QueuedResource.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,28 @@ examples:
test_env_vars:
project: 'PROJECT_NAME'
skip_vcr: true
- name: 'tpu_v2_queued_resource_full'
primary_resource_id: 'qr'
min_version: 'beta'
vars:
qr_name: 'test-qr'
tpu_name: 'test-tpu'
network_name: 'tpu-net'
subnet_name: 'tpu-subnet'
test_env_vars:
project: 'PROJECT_NAME'
skip_vcr: true
- name: 'tpu_v2_queued_resource_network_configs'
primary_resource_id: 'qr'
min_version: 'beta'
vars:
qr_name: 'test-qr'
tpu_name: 'test-tpu'
network_name: 'tpu-net'
subnet_name: 'tpu-subnet'
test_env_vars:
project: 'PROJECT_NAME'
skip_vcr: true
parameters:
- name: 'zone'
type: String
Expand Down Expand Up @@ -107,3 +129,92 @@ properties:
- name: 'description'
description: |
Text description of the TPU.
- name: 'networkConfig'
type: NestedObject
description: |
Network configurations for the TPU node.
immutable: true
default_from_api: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default_from_api has a couple drawbacks:

  • Users won't be able to know accurate information about the end state of the resource (even if it could be known)
  • Users can't set the field to its empty/zero value.

I just want to double-check that this field (and the rest marked with default_from_api) don't have other options?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for highlighting. The network and subnetwork fields do get a default value from the API if not set. Consequently, those are subfields of networkConfig, which has subfields where some have API defaults and some do not, so I set networkConfig to get its default from the API so that its subfields like network and subnetwork can get these default values. This is also the case for a similar resource TPUv2 VM, where networkConfig has default_from_api: true and has been working as expected.

I double checked queueCount and that does not get a default value from the TPU API, so I removed default_from_api from the queueCount subfield here and below.

conflicts:
- network_configs
properties:
- name: 'network'
type: String
description: |
The name of the network for the TPU node. It must be a preexisting Google Compute Engine
network. If none is provided, "default" will be used.
immutable: true
default_from_api: true
- name: 'subnetwork'
type: String
description: |
The name of the subnetwork for the TPU node. It must be a preexisting Google Compute
Engine subnetwork. If none is provided, "default" will be used.
immutable: true
default_from_api: true
- name: 'enableExternalIps'
type: Boolean
description: |
Indicates that external IP addresses would be associated with the TPU workers. If set to
false, the specified subnetwork or network should have Private Google Access enabled.
immutable: true
send_empty_value: true
- name: 'canIpForward'
type: Boolean
description: |
Allows the TPU node to send and receive packets with non-matching destination or source
IPs. This is required if you plan to use the TPU workers to forward routes.
immutable: true
send_empty_value: true
- name: 'queueCount'
type: Integer
description: |
Specifies networking queue count for TPU VM instance's network interface.
required: false
immutable: true
- name: 'networkConfigs'
type: Array
description: |
Repeated network configurations for the TPU node. This field is used to specify multiple
network configs for the TPU node.
min_version: 'beta'
immutable: true
conflicts:
- network_config
item_type:
type: NestedObject
properties:
- name: 'network'
type: String
description: |
The name of the network for the TPU node. It must be a preexisting Google Compute Engine
network. If none is provided, "default" will be used.
immutable: true
default_from_api: true
- name: 'subnetwork'
type: String
description: |
The name of the subnetwork for the TPU node. It must be a preexisting Google Compute
Engine subnetwork. If none is provided, "default" will be used.
immutable: true
default_from_api: true
- name: 'enableExternalIps'
type: Boolean
description: |
Indicates that external IP addresses would be associated with the TPU workers. If set to
false, the specified subnetwork or network should have Private Google Access enabled.
immutable: true
send_empty_value: true
- name: 'canIpForward'
type: Boolean
description: |
Allows the TPU node to send and receive packets with non-matching destination or source
IPs. This is required if you plan to use the TPU workers to forward routes.
immutable: true
send_empty_value: true
- name: 'queueCount'
type: Integer
description: |
Specifies networking queue count for TPU VM instance's network interface.
required: false
immutable: true
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
resource "google_tpu_v2_queued_resource" "{{$.PrimaryResourceId}}" {
provider = google-beta

name = "{{index $.Vars "qr_name"}}"
zone = "us-central1-c"
project = "{{index $.TestEnvVars "project"}}"

tpu {
node_spec {
parent = "projects/{{index $.TestEnvVars "project"}}/locations/us-central1-c"
node_id = "{{index $.Vars "tpu_name"}}"
node {
runtime_version = "tpu-vm-tf-2.13.0"
accelerator_type = "v2-8"
description = "Text description of the TPU."

network_config {
can_ip_forward = true
enable_external_ips = true
network = google_compute_network.network.id
subnetwork = google_compute_subnetwork.subnet.id
queue_count = 32
}
}
}
}
}

resource "google_compute_subnetwork" "subnet" {
provider = google-beta

name = "{{index $.Vars "subnet_name"}}"
ip_cidr_range = "10.0.0.0/16"
region = "us-central1"
network = google_compute_network.network.id
}

resource "google_compute_network" "network" {
provider = google-beta

name = "{{index $.Vars "network_name"}}"
auto_create_subnetworks = false
}

Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
resource "google_tpu_v2_queued_resource" "{{$.PrimaryResourceId}}" {
provider = google-beta

name = "{{index $.Vars "qr_name"}}"
zone = "us-central1-c"
project = "{{index $.TestEnvVars "project"}}"

tpu {
node_spec {
parent = "projects/{{index $.TestEnvVars "project"}}/locations/us-central1-c"
node_id = "{{index $.Vars "tpu_name"}}"
node {
runtime_version = "tpu-vm-tf-2.13.0"
accelerator_type = "v2-8"

network_configs = [
{
can_ip_forward = true
enable_external_ips = true
network = google_compute_network.network_0.id
subnetwork = google_compute_subnetwork.subnet_0.id
queue_count = 32
},
{
can_ip_forward = true
enable_external_ips = true
network = google_compute_network.network_1.id
subnetwork = google_compute_subnetwork.subnet_1.id
queue_count = 32
}
]
zeleena marked this conversation as resolved.
Show resolved Hide resolved
}
}
}
}

resource "google_compute_subnetwork" "subnet_0" {
provider = google-beta

name = "{{index $.Vars "subnet_name"}}-0"
ip_cidr_range = "10.0.0.0/16"
region = "us-central1"
network = google_compute_network.network_0.id
}

resource "google_compute_network" "network_0" {
provider = google-beta

name = "{{index $.Vars "network_name"}}-0"
auto_create_subnetworks = false
}

resource "google_compute_subnetwork" "subnet_1" {
provider = google-beta

name = "{{index $.Vars "subnet_name"}}-1"
ip_cidr_range = "10.1.0.0/16"
region = "us-central1"
network = google_compute_network.network_1.id
}

resource "google_compute_network" "network_1" {
provider = google-beta

name = "{{index $.Vars "network_name"}}-1"
auto_create_subnetworks = false
}
Loading