Reusing Access Points #1026

mskanth972 · 2023-06-06T04:21:14Z

Is this a bug fix or adding new feature?

When performing dynamic provisioning to mount Amazon Elastic File System (EFS) on an Elastic Kubernetes Service (EKS) cluster, an access point (folder) is automatically created within EFS. This access point is assigned a unique pathname based on the base path name /dynamic_provisioning/pvc-id, where the "pvc-id" represents a randomly generated Persistent Volume Claim (PVC) ID given by Kubernetes. The access point serves as the location where the actual data is stored.

However, a challenge arises when the EKS cluster experiences downtime, deleted accidentally or user wants to connect same AP for newly created EKS cluster. In such cases, when creating a new EKS cluster and performed dynamic provisioning a new access point is created with a different base pathname i.e., PVC ID. This poses an issue because the previous access point, which contains valuable data, becomes inaccessible from the pods as paths are mis-matching.

To overcome this challenge, a solution needs to be implemented to enable seamless data access and preservation across EKS cluster. The objective is to establish a mechanism that allows pods to access and manipulate data from the previous access point when user wants to.

To address the challenge of accessing previous access points in the EFS CSI driver during EKS cluster chnages, we propose the following approach.
First, we will leverage the client token feature provided by EFS, which serves as an idempotent identifier for access points. Additionally, we will introduce a new parameter in the storage class of the EFS CSI driver to enable this functionality.

To make the Access point re-usable user can set reuseAccessPoint=true parameter in Storageclass.yaml. When a new access point is created, we will assign the given PVC name as the client token. This establishes a connection between the PVC name and the access point, ensuring easy identification and matching. So, next time if the user wants to reconnect to existing AP from a different cluster, the same PVC name has to be mentioned.

Now, let's consider the scenario where the user creates a new cluster. In this case, when a user wants to access the same old access point, they can simply provide the original PVC name during the mounting process. As the parameter is set to true in StorageClass, the EFS CSI driver will check for client tokens associated with the specific Elastic file system. If a matching client token is found for the provided PVC name, indicating the existence of an access point associated with that PVC name, the driver will recognize it as the same old access point. Instead of creating a new access point, it will reuse the existing one. This seamless transition ensures that users can access their old access points and the data within them.

Note: Client Token in Access point has a limit upto 64 char lengthIssue can occur when there are two PVCs with 64 + character length where first 64 characters are the same, So implemented hashing function to hash the names to < 64 char hash.

What is this PR about? / Why do we need it?

What testing is done?

[mskanth@MSD]$ cat pod.yaml 
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name:clienttoketest7
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: efs-sc
  resources:
    requests:
      storage: 5Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: efs-app
spec:
  containers:
    - name: app
      image: centos
      command: ["/bin/sh"]
      args: ["-c", "while true; do echo $(date -u) >> /data/out; sleep 5; done"]
      volumeMounts:
        - name: persistent-storage
          mountPath: /data
  volumes:
    - name: persistent-storage
      persistentVolumeClaim:
        claimName: clienttoketest7

Sample storage class

[mskanth@MSD]$ cat storageclass.yaml 
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: efs-sc
provisioner: efs.csi.aws.com
mountOptions: 
  - iam
reclaimPolicy: Retain
parameters:
  provisioningMode: efs-ap
  fileSystemId: fs-12341234
  directoryPerms: "700"
  gidRangeStart: "1000" # optional
  gidRangeEnd: "2000" # optional
  basePath: "/dynamic_provisioning" # optional
  reuseAccessPoint: "true"

Logs

I0622 04:23:56.514543 1 controller.go:63] CreateVolume: called with args {Name:pvc-b3751fe7-6bb2-45de-a0fa-4b001c81d519 CapacityRange:required_bytes:5368709120 VolumeCapabilities:[mount:<mount_flags:"iam" > access_mode:<mode:MULTI_NODE_MULTI_WRITER > ] Parameters:map[basePath:/dynamic_provisioning csi.storage.k8s.io/pv/name:pvc-b3751fe7-6bb2-45de-a0fa-4b001c81d519 csi.storage.k8s.io/pvc/name:clienttokentest7 csi.storage.k8s.io/pvc/namespace:default directoryPerms:700 fileSystemId:fs-09309933ee18d2ada gidRangeEnd:2000 gidRangeStart:1000 provisioningMode:efs-ap reuseAccessPoint:true] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0622 04:23:56.514644 1 cloud.go:292] Calling DescribeFileSystems with input: {
 FileSystemId: "fs-12341234"
}
I0622 04:23:56.659117 1 gid_allocator.go:52] Recieved getNextGid for fsId: fs-12341234, min: 1000, max: 2000
I0622 04:23:56.659141 1 cloud.go:258] AccessPointOptions to find AP : &{CapacityGiB:5368709120 FileSystemId:fs-12341234 Uid:1002 Gid:1002 DirectoryPerms:700 DirectoryPath:/dynamic_provisioning/pvc-b3751fe7-6bb2-45de-a0fa-4b001c81d519 Tags:map[efs.csi.aws.com/cluster:true]}
I0622 04:23:56.659156 1 cloud.go:259] ClientToken to find AP : clienttokentest7
I0622 04:23:56.746413 1 cloud.go:277] ClientToken found : efs-claim1
I0622 04:23:56.746418 1 cloud.go:277] ClientToken found : efs-claim
I0622 04:23:56.746617 1 cloud.go:277] ClientToken found : pvc-6655d232-ec2c-4573-9b44-d88d3676de16
I0622 04:23:56.746620 1 cloud.go:277] ClientToken found : pvc-72800a83-8944-4af8-bb2e-ec941722fa97
I0622 04:23:56.746622 1 cloud.go:277] ClientToken found : efs-apresue1
I0622 04:23:56.746625 1 cloud.go:277] ClientToken found : efs-apreuse2
I0622 04:23:56.746657 1 cloud.go:277] ClientToken found : clienttokentest6
I0622 04:23:56.746660 1 cloud.go:277] ClientToken found : clienttokentest7
I0622 04:23:56.746666 1 cloud.go:171] Existing AccessPoint found : &{AccessPointId:fsap-06209626d3d178877 FileSystemId:fs-09309933ee18d2ada AccessPointRootDir:/dynamic_provisioning/pvc-373df757-4a50-48ed-a584-c6fcdb8fbde1 CapacityGiB:0}

k8s-ci-robot · 2023-06-06T04:21:26Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mskanth972

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [mskanth972]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pkg/driver/controller.go

pkg/cloud/cloud.go

mskanth972 · 2023-06-22T04:42:18Z

/test pull-aws-efs-csi-driver-e2e

mskanth972 · 2023-06-28T03:26:15Z

/test pull-aws-efs-csi-driver-unit

Ashley-wenyizha · 2023-09-19T15:09:56Z

docs/README.md

@@ -39,6 +39,8 @@ The following CSI interfaces are implemented:
 | subPathPattern        |        | `/${.PV.name}`  | true     | The template used to construct the subPath under which each of the access points created under Dynamic Provisioning. Can be made up of fixed strings and limited variables, is akin to the 'subPathPattern' variable on the [nfs-subdir-external-provisioner](https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner) chart. Supports `.PVC.name`,`.PVC.namespace` and `.PV.name` |
 | ensureUniqueDirectory |        | true            | true     | **NOTE: Only set this to false if you're sure this is the behaviour you want**.<br/> Used when dynamic provisioning is enabled, if set to true, appends the a UID to the pattern specified in `subPathPattern` to ensure that access points will not accidentally point at the same directory.                                                                                                |
 | az                    |        | ""              | true     | Used for cross-account mount. `az` under storage class parameter is optional. If specified, mount target associated with the az will be used for cross-account mount. If not specified, a random mount target will be picked for cross account mount                                                                                                                                          |
+| ensureUniqueDirectory |        | true            | true     |


Why did we remove the explanation part of this param?

aws-efs-csi-driver/docs/README.md

Line 38 in 158d6f7

| ensureUniqueDirectory | | true | true | **NOTE: Only set this to false if you're sure this is the behaviour you want**.<br/> Used when dynamic provisioning is enabled, if set to true, appends the a UID to the pattern specified in `subPathPattern` to ensure that access points will not accidentally point at the same directory. |

I need to rebase this PR. I will do that and will push the changes

Ashley-wenyizha · 2023-09-19T17:22:25Z

/lgtm

andrewhharmon · 2023-10-04T16:18:51Z

Im struggling to implement this. i want to access the same EFS data across 2 clusters. The description above talks about using usePvcName=true but the example storage class uses reuseAccessPoint: "true". Also not sure if i need to set ensureUniqueDirectory to false. Is there an example somewhere that shows how to do this?

mskanth972 · 2023-10-04T16:28:43Z

Im struggling to implement this. i want to access the same EFS data across 2 clusters. The description above talks about using usePvcName=true but the example storage class uses reuseAccessPoint: "true". Also not sure if i need to set ensureUniqueDirectory to false. Is there an example somewhere that shows how to do this?

Sorry for the above example. We implemented that with a different parameter while testing and we changed to a new name that is reuseAccessPoint. Its totally upto the user in setting EnsureUniquedirectory parameter as it works either way. And will change the example mentioned above

andrewhharmon · 2023-10-04T16:47:03Z

OK, so before i used this setting, I saw the Client token in efs with a value of like pvc-xxx-xxx--xxx now i see a random string. I was thinking it would be the name of the PVC. is that not correct? I also don't see logging like i see in yours that shows it trying to look up the token. I'm setting the pvc name like so

kind: PersistentVolumeClaim
metadata:
  name: efs-claim

andrewhharmon · 2023-10-04T16:53:54Z

sorry, i am seeing some logging similar to above.

efs-plugin W1004 16:39:29.723010       1 gid_allocator.go:93] Requested GID range (50000:7000000) exceeds EFS Access Point limit (1000) per Filesystem. Driver will not allocate GIDs outside of this limit.
efs-plugin I1004 16:39:29.723067       1 controller.go:296] Using PV name for access point directory.
efs-plugin I1004 16:39:29.723640       1 controller.go:303] Using /pvc-ac121517-a2bf-4b94-a739-b4545c281f99 as the access point directory.
efs-plugin I1004 16:39:29.723666       1 cloud.go:266] ClientToken to find AP : d3580977b85988d6cd4a7d1a8b4745f839b633cd685f264e261e1f6953af77ae
efs-plugin I1004 16:39:29.764853       1 cloud.go:178] Existing AccessPoint found : &{AccessPointId:fsap-06d6d1615e941f8d3 FileSystemId:fs-057e96c51cbf0a652 AccessPointRootDir:/pvc-8699b6a6-f028-4926-b758-bdb7ab5f1e59 CapacityGiB:0 PosixUser:<nil>}
Stream closed EOF for kube-system/efs-csi-controller-655f7cf88c-vvjd7 (efs-plugin)

so maybe i do have it working as expected. Also, i see you comment on the hashing function for the pvc name, so that must explain why im seeing a random string for the clientToken instead of my PVC name.

mskanth972 · 2023-10-04T17:19:32Z

@andrewhharmon , yes you are correct. It was the hashed character.

andrewhharmon · 2023-10-05T13:24:44Z

Follow up question for you. So if we set the reclaim policy to Retain, if we delete the PVC, the PV will stay as well as the access point. But we could potentially end up with lots of unbound PVs. If we delete one of the PVs (bound or unbound), it will also delete the access point. But we may still have other PVs using that access point still. So we'd have a PV pointing to an access point that doesn't exist? Is there any advise on the best way to accomplish this?

bittracer · 2024-06-10T02:28:09Z

Follow up question for you. So if we set the reclaim policy to Retain, if we delete the PVC, the PV will stay as well as the access point. But we could potentially end up with lots of unbound PVs. If we delete one of the PVs (bound or unbound), it will also delete the access point. But we may still have other PVs using that access point still. So we'd have a PV pointing to an access point that doesn't exist? Is there any advise on the best way to accomplish this?

@mskanth972 @Ashley-wenyizha I am running into the similar issue as @andrewhharmon has pointed here. Are there any suggestions on handling this situation?

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 6, 2023

k8s-ci-robot requested review from justinsb and wongma7 June 6, 2023 04:21

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 6, 2023

mskanth972 force-pushed the APreuse branch from ad1cd83 to 7b0df0e Compare June 14, 2023 04:25

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 14, 2023

mskanth972 force-pushed the APreuse branch from 7b0df0e to 209f81e Compare June 15, 2023 03:32

mskanth972 changed the title ~~APreuse~~ Reusing Access Points Jun 19, 2023