-
Notifications
You must be signed in to change notification settings - Fork 54
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'master' into khushboo-rancher-patch-1
- Loading branch information
Showing
50 changed files
with
1,146 additions
and
270 deletions.
There are no files selected for viewing
3 changes: 3 additions & 0 deletions
3
docs/content/manual/Test-cases-to-reproduce-attach-detach-issues/_index.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
--- | ||
title: Test cases to reproduce issues related to attach detach | ||
--- |
79 changes: 79 additions & 0 deletions
79
...-reproduce-attach-detach-issues/attachment-detachment-issues-reproducibility.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
--- | ||
title: Test cases to reproduce attachment-detachment issues | ||
--- | ||
**Prerequisite:** Have an environment with just with 2 worker nodes or taint 1 out of 3 worker node to be `NoExecute` & `NoSchedule`. | ||
This will serve as a constrained fallback and limited source of recovery in the event of failure. | ||
|
||
|
||
#### 1. Kill the engines and instance manager repeatedly | ||
**Given** 1 RWO and 1 RWX volume is attached to a pod. | ||
And Both the volumes have 2 replicas. | ||
And Random data is continuously being written to the volume using command `dd if=/dev/urandom of=file1 count=100 bs=1M conv=fsync status=progress oflag=direct,sync` | ||
|
||
**When** One replica rebuilding is triggered by crashing the IM | ||
AND Immediately IM associated with another replica is crashed | ||
AND After crashing IMs, detaching of Volume is tried either by pod deletion or using Longhorn UI | ||
|
||
**Then** Volume should not stuck in attaching-detaching loop | ||
|
||
**When** Volume is detached and manually attached again. | ||
And Engine running on the node where is volume is attached in killed | ||
|
||
**Then** Volume should recover once the engine is back online. | ||
|
||
#### 2. Illegal values in Volume/Snap.meta | ||
**Given** 1 RWO and 1 RWX volume is attached to a pod. | ||
And Both the volumes have 2 replicas. | ||
|
||
**When** Some random values are set in the Volume/snap meta file | ||
And If replica rebuilding is triggered and the IM associated with another replica is also crashed | ||
|
||
**Then** Volume should not stuck in attaching-detaching loop | ||
|
||
|
||
#### 3. Deletion of Volume/Snap.meta | ||
**Given** 1 RWO and 1 RWX volume is attached to a pod. | ||
And Both the volumes have 2 replicas. | ||
|
||
**When** The Volume & snap meta files are deleted one by one. | ||
And If replica rebuilding is triggered and the IM associated with another replica is also crashed | ||
|
||
**Then** Volume should not stuck in attaching-detaching loop | ||
|
||
#### 4. Failed replica tries to rebuild from other just crashed replica - https://github.com/longhorn/longhorn/issues/4212 | ||
**Given** 1 RWO and 1 RWX volume is attached to a pod. | ||
And Both the volumes have 2 replicas. | ||
And Random data is continuously being written to the volume using command `dd if=/dev/urandom of=file1 count=100 bs=1M conv=fsync status=progress oflag=direct,sync` | ||
|
||
**When** One replica rebuilding is triggered by crashing the IM | ||
AND Immediately IM associated with another replica is crashed | ||
|
||
**Then** Volume should not stuck in attaching-detaching loop. | ||
|
||
#### 5. Volume attachment Modification/deletion | ||
|
||
**Given** A deployment and statefulSet are created with same name and attached to Longhorn Volume. | ||
AND Some data is written and their md5sum is computed | ||
|
||
**When** The statefulSet and Deployment are deleted without deleting the volumes | ||
And Same named new statefulSet and Deployment are created with new PVCs. | ||
And Before above deployed workload could attach to volumes, attached node is rebooted | ||
|
||
**Then** After node reboot completion, volumes should reflect right status. | ||
And the newly created deployment and statefulSet should get attached to the volumes. | ||
|
||
**When** The volume attachments of above workloads are deleted. | ||
And above workloads are deleted and recreated immediately. | ||
|
||
**Then** No multi attach or other errors should be observed. | ||
|
||
#### 6. Use monitoring/word press/db workloads | ||
**Given** Monitoring and word press and any other db related workload are deployed in the system | ||
And All the volumes have 2 replicas. | ||
And Random data is continuously being written to the volume using command `dd if=/dev/urandom of=file1 count=100 bs=1M conv=fsync status=progress oflag=direct,sync` | ||
|
||
**When** One replica rebuilding is triggered by crashing the IM | ||
AND Immediately IM associated with another replica is crashed | ||
|
||
**Then** Volume should not stuck in attaching-detaching loop. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
47 changes: 47 additions & 0 deletions
47
...tent/manual/release-specific/v1.6.0/test-rebuild-in-meta-blocks-engine-start.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
--- | ||
title: Test `Rebuild` in volume.meta blocks engine start | ||
--- | ||
|
||
## Related issue | ||
https://github.com/longhorn/longhorn/issues/6626 | ||
|
||
## Test with patched image | ||
|
||
**Given** a patched longhorn-engine image with the following code change. | ||
```diff | ||
diff --git a/pkg/sync/sync.go b/pkg/sync/sync.go | ||
index b48ddd46..c4523f11 100644 | ||
--- a/pkg/sync/sync.go | ||
+++ b/pkg/sync/sync.go | ||
@@ -534,9 +534,9 @@ func (t *Task) reloadAndVerify(address, instanceName string, repClient *replicaC | ||
return err | ||
} | ||
|
||
- if err := repClient.SetRebuilding(false); err != nil { | ||
- return err | ||
- } | ||
+ // if err := repClient.SetRebuilding(false); err != nil { | ||
+ // return err | ||
+ // } | ||
return nil | ||
} | ||
``` | ||
**And** a patched longhorn-instance-manager image with the longhorn-engine vendor updated. | ||
**And** Longhorn is installed with the patched images. | ||
**And** the `data-locality` setting is set to `disabled`. | ||
**And** the `auto-salvage` setting is set to `true`. | ||
**And** a new StorageClass is created with `NumberOfReplica` set to `1`. | ||
**And** a StatefulSet is created with `Replica` set to `1`. | ||
**And** the node of the StatefulSet Pod and the node of its volume Replica are different. This is necessary to trigger the rebuilding in reponse to the data locality setting update later. | ||
**And** Volume have 1 running Replica. | ||
**And** data exists in the volume. | ||
|
||
**When** the `data-locality` setting is set to `best-effort`. | ||
**And** the replica rebuilding is completed. | ||
**And** the `Rebuilding` in the replicas's `volume.meta` file is `true`. | ||
**And** Delete the instance manager Pod of the Replica. | ||
|
||
**Then** the Replica should be running. | ||
**And** the StatefulSet Pod should restart. | ||
**And** the `Rebuilding` in replicas's `volume.meta` file should be `false`. | ||
**And** the data should remain intact. |
216 changes: 216 additions & 0 deletions
216
docs/content/manual/release-specific/v1.6.0/test-storage-network.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,216 @@ | ||
--- | ||
title: Setup and test storage network when Multus version is above v4.0.0 | ||
--- | ||
|
||
## Related issue | ||
https://github.com/longhorn/longhorn/issues/6953 | ||
|
||
## Test storage network | ||
|
||
### Create AWS instances | ||
**Given** Create VPC. | ||
- VPC only | ||
- IPv4 CIDR 10.0.0.0/16 | ||
|
||
*And* Create an internet gateway. | ||
- Attach to VPC | ||
|
||
*And* Add the internet gateway to the VPC `Main route table`, `Routes`. | ||
- Destination 0.0.0.0/0 | ||
|
||
*And* Create 2 subnets in the VPC. | ||
- Subnet-1: 10.0.1.0/24 | ||
- Subnet-2: 10.0.2.0/24 | ||
|
||
*And* Launch 3 EC2 instances. | ||
- Use the created VPC | ||
- Use subnet-1 for network interface 1 | ||
- Use subnet-2 for network interface 2 | ||
- Disable `Auto-assign public IP` | ||
- Add security group inbound rule to allow `All traffic` from `Anywhere-IPv4` | ||
- Stop `Source/destination check` | ||
|
||
*And* Create 3 elastic IPs. | ||
|
||
*And* Associate one of the elastic IP to one of the EC2 instance network interface 1. | ||
- Repeat for the other 2 EC2 instances with the remain elastic IPs. | ||
|
||
|
||
### Setup instances | ||
|
||
**Given** K3s K8s cluster installed on EC2 instances. | ||
|
||
*And* Deploy Multus DaemonSet on the control-plane node. | ||
- Download YAML. | ||
``` | ||
curl -O https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-cni/v4.0.2/deployments/multus-daemonset.yml | ||
``` | ||
- Edit YAML. | ||
``` | ||
diff --git a/deployments/multus-daemonset.yml b/deployments/multus-daemonset.yml | ||
index ab626a66..a7228942 100644 | ||
--- a/deployments/multus-daemonset.yml | ||
+++ b/deployments/multus-daemonset.yml | ||
@@ -145,7 +145,7 @@ data: | ||
] | ||
} | ||
], | ||
- "kubeconfig": "/etc/cni/net.d/multus.d/multus.kubeconfig" | ||
+ "kubeconfig": "/var/lib/rancher/k3s/agent/etc/cni/net.d/multus.d/multus.kubeconfig" | ||
} | ||
--- | ||
apiVersion: apps/v1 | ||
@@ -179,12 +179,13 @@ spec: | ||
serviceAccountName: multus | ||
containers: | ||
- name: kube-multus | ||
- image: ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot | ||
+ image: ghcr.io/k8snetworkplumbingwg/multus-cni:v4.0.2 | ||
command: ["/thin_entrypoint"] | ||
args: | ||
- "--multus-conf-file=auto" | ||
- "--multus-autoconfig-dir=/host/etc/cni/net.d" | ||
- "--cni-conf-dir=/host/etc/cni/net.d" | ||
+ - "--multus-kubeconfig-file-host=/var/lib/rancher/k3s/agent/etc/cni/net.d/multus.d/multus.kubeconfig" | ||
resources: | ||
requests: | ||
cpu: "100m" | ||
@@ -222,10 +223,10 @@ spec: | ||
volumes: | ||
- name: cni | ||
hostPath: | ||
- path: /etc/cni/net.d | ||
+ path: /var/lib/rancher/k3s/agent/etc/cni/net.d | ||
- name: cnibin | ||
hostPath: | ||
- path: /opt/cni/bin | ||
+ path: /var/lib/rancher/k3s/data/current/bin | ||
- name: multus-cfg | ||
configMap: | ||
name: multus-cni-config | ||
``` | ||
- Apply YAML to K8s cluster. | ||
``` | ||
kubectl apply -f multus-daemonset.yml.new | ||
``` | ||
|
||
*And* Download `ipvlan` and put to K3s binaries path to all cluster nodes. | ||
``` | ||
curl -OL https://github.com/containernetworking/plugins/releases/download/v1.3.0/cni-plugins-linux-amd64-v1.3.0.tgz | ||
tar -zxvf cni-plugins-linux-amd64-v1.3.0.tgz | ||
cp ipvlan /var/lib/rancher/k3s/data/current/bin/ | ||
``` | ||
|
||
*And* Setup flannels on all cluster nodes. | ||
``` | ||
# Update nodes eth1 IP to N1, N2, N3 | ||
N1="10.0.2.95" | ||
N2="10.0.2.139" | ||
N3="10.0.2.158" | ||
NODES=(${N1} ${N2} ${N3}) | ||
STORAGE_NETWORK_PREFIX="192.168" | ||
ETH1_IP=`ip a | grep eth1 | grep -Eo 'inet (addr:)?([0-9]*\.){3}[0-9]*' | awk '{print $2}'` | ||
count=1 | ||
for n in "${NODES[@]}"; do | ||
[[ ${ETH1_IP} != $n ]] && ((count=count+1)) && continue | ||
NET=$count | ||
break | ||
done | ||
cat << EOF > /run/flannel/multus-subnet-${STORAGE_NETWORK_PREFIX}.0.0.env | ||
FLANNEL_NETWORK=${STORAGE_NETWORK_PREFIX}.0.0/16 | ||
FLANNEL_SUBNET=${STORAGE_NETWORK_PREFIX}.${NET}.0/24 | ||
FLANNEL_MTU=1472 | ||
FLANNEL_IPMASQ=true | ||
EOF | ||
``` | ||
*And* Setup routes on all cluster nodes. | ||
``` | ||
# Update nodes eth1 IP to N1, N2, N3 | ||
N1="10.0.2.95" | ||
N2="10.0.2.139" | ||
N3="10.0.2.158" | ||
STORAGE_NETWORK_PREFIX="192.168" | ||
ACTION="add" | ||
ETH1_IP=`ip a | grep eth1 | grep -Eo 'inet (addr:)?([0-9]*\.){3}[0-9]*' | awk '{print $2}'` | ||
[[ ${ETH1_IP} != ${N1} ]] && ip r ${ACTION} ${STORAGE_NETWORK_PREFIX}.1.0/24 via ${N1} dev eth1 | ||
[[ ${ETH1_IP} != ${N2} ]] && ip r ${ACTION} ${STORAGE_NETWORK_PREFIX}.2.0/24 via ${N2} dev eth1 | ||
[[ ${ETH1_IP} != ${N3} ]] && ip r ${ACTION} ${STORAGE_NETWORK_PREFIX}.3.0/24 via ${N3} dev eth1 | ||
``` | ||
|
||
*And* Deploy `NetworkAttachmentDefinition`. | ||
``` | ||
cat << EOF > nad-192-168-0-0.yaml | ||
apiVersion: "k8s.cni.cncf.io/v1" | ||
kind: NetworkAttachmentDefinition | ||
metadata: | ||
name: demo-192-168-0-0 | ||
namespace: kube-system | ||
#namespace: longhorn-system | ||
spec: | ||
config: '{ | ||
"cniVersion": "0.3.1", | ||
"type": "flannel", | ||
"subnetFile": "/run/flannel/multus-subnet-192.168.0.0.env", | ||
"dataDir": "/var/lib/cni/multus-subnet-192.168.0.0", | ||
"delegate": { | ||
"type": "ipvlan", | ||
"master": "eth1", | ||
"mode": "l3", | ||
"capabilities": { | ||
"ips": true | ||
} | ||
}, | ||
"kubernetes": { | ||
"kubeconfig": "/etc/cni/net.d/multus.d/multus.kubeconfig" | ||
} | ||
}' | ||
EOF | ||
kubectl apply -f nad-192-168-0-0.yaml | ||
``` | ||
|
||
|
||
### Test storage network | ||
**Given** Longhorn deployed. | ||
|
||
**When** Update storage network setting value to `kube-system/demo-192-168-0-0`. | ||
|
||
**Then** Instance manager pods should restart. | ||
|
||
*And* Should have storage network in `k8s.v1.cni.cncf.io/network-status` instance manager pods annotations. | ||
- Should have 2 network in `k8s.v1.cni.cncf.io/network-status` annotation | ||
- `kube-system/demo-192-168-0-0` should exist in `k8s.v1.cni.cncf.io/network-status` annotation | ||
- `kube-system/demo-192-168-0-0` should use `lhnet1` interface. | ||
- `kube-system/demo-192-168-0-0` should be in `192.168.0.0/16` subnet. | ||
*And* Should be able to create/attach/detach/delete volumes successfully. | ||
- Example: | ||
``` | ||
Annotations: k8s.v1.cni.cncf.io/network-status: | ||
[{ | ||
"name": "cbr0", | ||
"interface": "eth0", | ||
"ips": [ | ||
"10.42.2.35" | ||
], | ||
"mac": "26:a7:d3:0d:af:68", | ||
"default": true, | ||
"dns": {} | ||
},{ | ||
"name": "kube-system/demo-192-168-0-0", | ||
"interface": "lhnet1", | ||
"ips": [ | ||
"192.168.2.230" | ||
], | ||
"mac": "02:d3:d9:0b:2e:50", | ||
"dns": {} | ||
}] | ||
k8s.v1.cni.cncf.io/networks: [{"namespace": "kube-system", "name": "demo-192-168-0-0", "interface": "lhnet1"}] | ||
``` | ||
- Should see engine/replica `storageIP` in `192.168.0.0` subnet. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.