Skip to content

Commit

Permalink
Added support for RoCE - zVM HCP (IBM#276)
Browse files Browse the repository at this point in the history
- Added support for RoCE - zVM HCP 
- Updated Deletion procedure for HCP 
- Updated parm files  for HCP zVM

---------

Signed-off-by: veera-damisetti <[email protected]>
Signed-off-by: DAMISETTI-VEERABHADRARAO <[email protected]>
Signed-off-by: Amadeuds Podvratnik <[email protected]>
Signed-off-by: Klaus Smolin <[email protected]>
Signed-off-by: K Shiva Sai <[email protected]>
Signed-off-by: Sumit Kumar Solanki <[email protected]>
Signed-off-by: root <[email protected]>
Signed-off-by: Mohammed Zeeshan Ahmed <[email protected]>
Co-authored-by: Amadeuds Podvratnik <[email protected]>
Co-authored-by: Klaus Smolin <[email protected]>
Co-authored-by: k-shiva-sai <[email protected]>
Co-authored-by: K Shiva Sai <[email protected]>
Co-authored-by: Sumit Solanki <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: Mohammed Ahmed <[email protected]>
Co-authored-by: Sumit Kumar Solanki <[email protected]>
  • Loading branch information
9 people authored May 29, 2024
1 parent 822f97d commit 318e0c8
Show file tree
Hide file tree
Showing 13 changed files with 118 additions and 29 deletions.
1 change: 1 addition & 0 deletions docs/run-the-playbooks-for-hypershift.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
* If using dynamic IP for agents, make sure you have entries in DHCP Server for macaddresses you are using in installation to map to IPv4 addresses and along with this DHCP server should make your IPs to use nameserver which you have configured.
## Note:
* As of now we are supporting only macvtap for hypershift Agent based installation for KVM compute nodes.
* Supported network modes for zVM : vswitch, OSA, RoCE

## Step-1: Setup Ansible Vault for Management Cluster Credentials
### Overview
Expand Down
2 changes: 1 addition & 1 deletion docs/set-variables-group-vars.md
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@
**hypershift.agents_parms.ram** | RAM for agents | 16384
**hypershift.agents_parms.vcpus** | vCPUs for agents | 4
**hypershift.agents_parms.nameserver** | Nameserver to be used for agents | 192.168.10.1
**hypershift.agents_parms.zvm_parameters.network_mode** | Network mode for zvm nodes <br /> Supported modes: vswitch,osa | vswitch
**hypershift.agents_parms.zvm_parameters.network_mode** | Network mode for zvm nodes <br /> Supported modes: vswitch,osa, RoCE | vswitch
**hypershift.agents_parms.zvm_parameters.disk_type** | Disk type for zvm nodes <br /> Supported disk types: fcp, dasd | dasd
**hypershift.agents_parms.zvm_parameters.vcpus** | CPUs for each zvm node | 4
**hypershift.agents_parms.zvm_parameters.memory** | RAM for each zvm node | 16384
Expand Down
2 changes: 0 additions & 2 deletions inventories/default/group_vars/all.yaml.template
Original file line number Diff line number Diff line change
Expand Up @@ -308,12 +308,10 @@ hypershift:
# AgentServiceConfig Parameters

asc:
url_for_ocp_release_file:
db_volume_size: "10Gi"
fs_volume_size: "10Gi"
ocp_version:
iso_url:
root_fs_url:
mce_namespace: multicluster-engine # This is the Recommended Namespace for Multicluster Engine operator

agents_parms:
Expand Down
2 changes: 1 addition & 1 deletion playbooks/create_agents_and_wait_for_install_complete.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
- name: Scale Nodepool & Configure Haproxy on bastion for hosted workers
hosts: bastion_hypershift
roles:
- scale_nodepool_and_wait_for_workers_hypershift
- scale_nodepool_and_wait_for_compute_hypershift
- add_hc_workers_to_haproxy_hypershift

- name: Wait for all Console operators to come up
Expand Down
8 changes: 1 addition & 7 deletions roles/boot_zvm_nodes_hypershift/tasks/main.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,12 +43,6 @@
shell: oc get agents -n {{ hypershift.hcp.clusters_namespace }}-{{ hypershift.hcp.hosted_cluster_name }} --no-headers -o custom-columns=NAME:.metadata.name,APPROVED:.spec.approved | awk '$2 == "false"'
register: agent_name

- name: Approve agents
- name: Approve agents
shell: oc -n {{ hypershift.hcp.clusters_namespace }}-{{ hypershift.hcp.hosted_cluster_name }} patch agent {{ agent_name.stdout.split(' ')[0] }} -p '{"spec":{"approved":true,"hostname":"compute-{{ item }}.{{ hypershift.hcp.hosted_cluster_name }}.{{ hypershift.hcp.basedomain }}"}}' --type merge
when: "{{ hypershift.mce.version != '2.4' }}"

- name: Approve agents and patch installer args
shell: oc -n {{ hypershift.hcp.clusters_namespace }}-{{ hypershift.hcp.hosted_cluster_name }} patch agent {{ agent_name.stdout.split(' ')[0] }} -p '{"spec":{"approved":true,"hostname":"compute-{{item}}.{{ hypershift.hcp.hosted_cluster_name }}.{{ hypershift.hcp.basedomain }}","installerArgs":"[\"--append-karg\",\"rd.neednet=1\", \"--append-karg\", \"ip={{ hypershift.agents_parms.zvm_parameters.nodes[item].interface.ip }}::{{ hypershift.agents_parms.zvm_parameters.gateway }}:{{ hypershift.agents_parms.zvm_parameters.subnetmask }}:compute-{{ item }}.{{ hypershift.hcp.hosted_cluster_name }}.{{ hypershift.hcp.basedomain }}:{{ hypershift.agents_parms.zvm_parameters.nodes[item].interface.ifname }}:none\", \"--append-karg\", \"nameserver={{ hypershift.agents_parms.zvm_parameters.nameserver }}\", \"--append-karg\",\"rd.znet={{ hypershift.agents_parms.zvm_parameters.nodes[item].interface.nettype }},{{ hypershift.agents_parms.zvm_parameters.nodes[item].interface.subchannels }},{{ hypershift.agents_parms.zvm_parameters.nodes[item].interface.options }}\",\"--append-karg\", {% if hypershift.agents_parms.zvm_parameters.disk_type | lower != 'fcp' %}\"rd.dasd=0.0.{{ hypershift.agents_parms.zvm_parameters.nodes[item].dasd.disk_id }}\"{% else %}\"rd.zfcp={{ hypershift.agents_parms.zvm_parameters.nodes[item].lun[0].paths[0].fcp}},{{ hypershift.agents_parms.zvm_parameters.nodes[item].lun[0].paths[0].wwpn }},{{ hypershift.agents_parms.zvm_parameters.nodes[item].lun[0].id }}\"{% endif %}]"}}' --type merge
when: "{{ hypershift.mce.version == '2.4' }}"


5 changes: 4 additions & 1 deletion roles/boot_zvm_nodes_hypershift/templates/boot_nodes.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
parser.add_argument("--kernel", type=str, help="kernel URI", required=True, default='')
parser.add_argument("--cmdline", type=str, help="kernel cmdline", required=True, default='')
parser.add_argument("--initrd", type=str, help="Initrd URI", required=True, default='')
parser.add_argument("--network", type=str, help="Network mode for zvm nodes Supported modes: OSA, vswitch ", required=True)
parser.add_argument("--network", type=str, help="Network mode for zvm nodes Supported modes: OSA, vswitch, RoCE ", required=True)

args = parser.parse_args()

Expand All @@ -25,6 +25,9 @@
if args.network.lower() == 'osa':
interfaces=[{ "type": "osa", "id": "{{ hypershift.agents_parms.zvm_parameters.nodes[item].interface.subchannels.split(',') | map('regex_replace', '0.0.', '') | join(',') }}"}]

elif args.network.lower() == 'roce':
interfaces=[{ "type": "pci", "id": "{{ hypershift.agents_parms.zvm_parameters.nodes[item].interface.ifname }}"}]

guest_parameters = {
"boot_method": "network",
"storage_volumes" : [],
Expand Down
30 changes: 27 additions & 3 deletions roles/create_agentserviceconfig_hypershift/tasks/main.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,4 @@
---
- name: Get OCP Release Version
shell: curl -s {{ hypershift.asc.url_for_ocp_release_file }} | awk '/machine-os / { print $2 }'
register: ocp_release_version

- name: Create Config map mirror-config ( For updating AgentServiceConfig with the brew mirror information )
template:
Expand All @@ -11,6 +8,33 @@
- name: Deploy Config map - mirror config
shell: oc apply -f /root/ansible_workdir/mirror-config.yaml

- name: Downloading ISO for fetching RHCOS version
get_url:
url: "{{ hypershift.asc.iso_url }}"
dest: /root/ansible_workdir/s390x.iso

- name: Mounting ISO
mount:
path: "/mnt"
src: "/root/ansible_workdir/s390x.iso"
fstype: "iso9660"
opts: "loop"
state: "mounted"

- name: Getting RHCOS Version
shell: cat /mnt/coreos/kargs.json | jq -r '.default' | cut -d'-' -f2- | cut -d ' ' -f 1
register: ocp_release_version

- name: unmount ISO
mount:
path: "/mnt"
state: "unmounted"

- name: Delete ISO file
file:
path: "/root/ansible_workdir/s390x.iso"
state: absent

- name: Create agenterviceconfig.yaml
template:
src: agent_service_config.yaml.j2
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,4 @@ spec:
- openshiftVersion: "{{ hypershift.asc.ocp_version }}"
version: "{{ ocp_release_version.stdout_lines[0] }}"
url: "{{ hypershift.asc.iso_url }}"
rootFSUrl: "{{ hypershift.asc.root_fs_url }}"
cpuArchitecture: "{{ hypershift.hcp.arch }}"
10 changes: 8 additions & 2 deletions roles/create_hcp_InfraEnv_hypershift/tasks/main.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,11 @@
dest: /usr/local/bin/
remote_src: true

- name: Get ICSP for Hosted Control Plane
template:
src: icsp.yaml.j2
dest: /root/ansible_workdir/icsp.yaml

- name: Create a Hosted Cluster
command: >
hcp create cluster agent
Expand All @@ -61,9 +66,10 @@
--api-server-address=api.{{ hypershift.hcp.hosted_cluster_name }}.{{ hypershift.hcp.basedomain }}
--ssh-key ~/.ssh/{{ env.ansible_key_name }}.pub
{% if hypershift.hcp.high_availabiliy == false %}
--control-plane-availability-policy "SingleReplica"
--control-plane-availability-policy "SingleReplica"
{% endif %}
--infra-availability-policy "SingleReplica"
--infra-availability-policy "SingleReplica"
--image-content-sources /root/ansible_workdir/icsp.yaml
--release-image=quay.io/openshift-release-dev/ocp-release:{{ hypershift.hcp.ocp_release }}
{% set release_image = lookup('env', 'HCP_RELEASE_IMAGE') %}
{% if release_image is defined and release_image != '' %}
Expand Down
9 changes: 9 additions & 0 deletions roles/create_hcp_InfraEnv_hypershift/templates/icsp.yaml.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
- mirrors:
- brew.registry.redhat.io
source: registry.redhat.io
- mirrors:
- brew.registry.redhat.io
source: registry.stage.redhat.io
- mirrors:
- brew.registry.redhat.io
source: registry-proxy.engineering.redhat.com
73 changes: 64 additions & 9 deletions roles/delete_resources_bastion_hypershift/tasks/main.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,41 @@
- name: Scale in Nodepool
command: oc -n {{ hypershift.hcp.clusters_namespace }} scale nodepool {{ hypershift.hcp.hosted_cluster_name }} --replicas 0

- name: Wait for Worker Nodes to Detach
k8s_info:
api_version: v1
kind: Node
kubeconfig: "/root/ansible_workdir/hcp-kubeconfig"
register: nodes
until: nodes.resources | length == 0
retries: 30
delay: 10
- block:
- name: Wait for Worker Nodes to Detach
k8s_info:
api_version: v1
kind: Node
kubeconfig: "/root/ansible_workdir/hcp-kubeconfig"
register: nodes
until: nodes.resources | length == 0
retries: 30
delay: 10
rescue:
- name: Getting basedomain
shell: oc get hc {{ hypershift.hcp.hosted_cluster_name }} -n {{ hypershift.hcp.clusters_namespace }} -o json | jq -r '.spec.dns.baseDomain'
register: base_domain

- name: Deleting the compute nodes manually
command: oc delete no compute-{{item}}.{{ hypershift.hcp.hosted_cluster_name }}.{{ base_domain.stdout }} --kubeconfig /root/ansible_workdir/hcp-kubeconfig
loop: "{{ range(hypershift.agents_parms.agents_count|int) | list }}"

- name: Get machine names
command: oc get machine.cluster.x-k8s.io -n {{ hypershift.hcp.clusters_namespace }}-{{ hypershift.hcp.hosted_cluster_name }} --no-headers
register: machines_info

- name: Create List for machines
set_fact:
machines: []

- name: Get the List of machines
set_fact:
machines: "{{ machines + [machines_info.stdout.split('\n')[item].split(' ')[0]] }}"
loop: "{{ range(hypershift.agents_parms.agents_count|int) | list }}"

- name: Patch the machines to remove finalizers
shell: oc patch machine.cluster.x-k8s.io "{{ machines[item] }}" -n "{{ hypershift.hcp.clusters_namespace }}-{{ hypershift.hcp.hosted_cluster_name }}" -p '{"metadata":{"finalizers":null}}' --type=merge
loop: "{{ range(hypershift.agents_parms.agents_count|int) | list }}"

- name: Wait for Agentmachines to delete
k8s_info:
Expand Down Expand Up @@ -74,6 +100,35 @@
name: "{{ hypershift.hcp.clusters_namespace }}"
state: absent

- name: Wait for managed cluster resource to be deleted
shell: oc get managedcluster "{{ hypershift.hcp.hosted_cluster_name }}"
register: managedcluster
until: managedcluster.rc != 0
retries: 50
delay: 25
when: hypershift.mce.delete == true
ignore_errors: yes

- fail:
msg: "Managed cluster resource for HCP is not getting deleted"
when: managedcluster.rc == 0 and managedcluster.attempts >= 40

- name: Disable local-cluster component in MCE
command: oc patch mce {{ hypershift.mce.instance_name }} -p '{"spec":{"overrides":{"components":[{"name":"local-cluster","enabled":false}]}}}' --type merge

- name: Wait for local-cluster components to be deleted
shell: oc get ns local-cluster
register: localcluster
until: localcluster.rc != 0
retries: 40
delay: 20
when: hypershift.mce.delete == true
ignore_errors: yes

- fail:
msg: "local-cluster namespace is still present"
when: localcluster.rc == 0 and localcluster.attempts >= 40

- name: Delete AgentServiceConfig
k8s:
api_version: agent-install.openshift.io/v1beta1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
when: hypershift.compute_node_type | lower != 'zvm'

- name: Patch Agents
shell: oc -n {{ hypershift.hcp.clusters_namespace }}-{{ hypershift.hcp.hosted_cluster_name }} patch agent {{ agents[item] }} -p '{"spec":{"installation_disk_id":"/dev/vda","approved":true,"hostname":"worker-{{item}}.{{ hypershift.hcp.hosted_cluster_name }}.{{ hypershift.hcp.basedomain }}"}}' --type merge
shell: oc -n {{ hypershift.hcp.clusters_namespace }}-{{ hypershift.hcp.hosted_cluster_name }} patch agent {{ agents[item] }} -p '{"spec":{"installation_disk_id":"/dev/vda","approved":true,"hostname":"compute-{{item}}.{{ hypershift.hcp.hosted_cluster_name }}.{{ hypershift.hcp.basedomain }}"}}' --type merge
loop: "{{ range(hypershift.agents_parms.agents_count|int) | list }}"
when: hypershift.compute_node_type | lower != 'zvm'

Expand Down
Original file line number Diff line number Diff line change
@@ -1 +1 @@
rd.neednet=1 console=ttysclp0 coreos.live.rootfs_url=http://{{ hypershift.bastion_hypershift }}:8080/rootfs.img ip={{ hypershift.agents_parms.zvm_parameters.nodes[item].interface.ip }}::{{ hypershift.agents_parms.zvm_parameters.gateway }}:{{ hypershift.agents_parms.zvm_parameters.subnetmask }}::{{ hypershift.agents_parms.zvm_parameters.nodes[item].interface.ifname }}:none nameserver={{ hypershift.agents_parms.zvm_parameters.nameserver }} zfcp.allow_lun_scan=0 rd.znet={{ hypershift.agents_parms.zvm_parameters.nodes[item].interface.nettype }},{{ hypershift.agents_parms.zvm_parameters.nodes[item].interface.subchannels }},{{ hypershift.agents_parms.zvm_parameters.nodes[item].interface.options }} {% if hypershift.agents_parms.zvm_parameters.disk_type | lower != 'fcp' %}rd.dasd=0.0.{{ hypershift.agents_parms.zvm_parameters.nodes[item].dasd.disk_id }}{% else %}rd.zfcp={{ hypershift.agents_parms.zvm_parameters.nodes[item].lun[0].paths[0].fcp}},{{ hypershift.agents_parms.zvm_parameters.nodes[item].lun[0].paths[0].wwpn }},{{ hypershift.agents_parms.zvm_parameters.nodes[item].lun[0].id }} {% endif %} random.trust_cpu=on rd.luks.options=discard ignition.firstboot ignition.platform.id=metal console=tty1 console=ttyS1,115200n8 coreos.inst.persistent-kargs="console=tty1 console=ttyS1,115200n8"
rd.neednet=1 ai.ip_cfg_override=1 console=ttysclp0 coreos.live.rootfs_url=http://{{ hypershift.bastion_hypershift }}:8080/rootfs.img ip={{ hypershift.agents_parms.zvm_parameters.nodes[item].interface.ip }}::{{ hypershift.agents_parms.zvm_parameters.gateway }}:{{ hypershift.agents_parms.zvm_parameters.subnetmask }}{% if hypershift.agents_parms.zvm_parameters.network_mode | lower != 'roce' %}::{{ hypershift.agents_parms.zvm_parameters.nodes[item].interface.ifname }}:none{% endif %} nameserver={{ hypershift.agents_parms.zvm_parameters.nameserver }} zfcp.allow_lun_scan=0 {% if hypershift.agents_parms.zvm_parameters.network_mode | lower != 'roce' %}rd.znet={{ hypershift.agents_parms.zvm_parameters.nodes[item].interface.nettype }},{{ hypershift.agents_parms.zvm_parameters.nodes[item].interface.subchannels }},{{ hypershift.agents_parms.zvm_parameters.nodes[item].interface.options }}{% endif %} {% if hypershift.agents_parms.zvm_parameters.disk_type | lower != 'fcp' %}rd.dasd=0.0.{{ hypershift.agents_parms.zvm_parameters.nodes[item].dasd.disk_id }}{% else %}rd.zfcp=0.0.{{ hypershift.agents_parms.zvm_parameters.nodes[item].lun[0].paths[0].fcp}},{{ hypershift.agents_parms.zvm_parameters.nodes[item].lun[0].paths[0].wwpn }},{{ hypershift.agents_parms.zvm_parameters.nodes[item].lun[0].id }} {% endif %} random.trust_cpu=on rd.luks.options=discard ignition.firstboot ignition.platform.id=metal console=tty1 console=ttyS1,115200n8 coreos.inst.persistent-kargs="console=tty1 console=ttyS1,115200n8"

0 comments on commit 318e0c8

Please sign in to comment.