-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update clean lost node without requiring a instanceId #503
Update clean lost node without requiring a instanceId #503
Conversation
commit 0f341b8 Author: guruprasad <[email protected]> Date: Thu Jan 2 19:51:25 2025 +0530 english grammer fix commit 3071eae Author: guruprasad <[email protected]> Date: Thu Jan 2 15:15:30 2025 +0530 print java version in the end of packer build for better logs commit fd95fb5 Author: guruprasad <[email protected]> Date: Thu Jan 2 14:53:02 2025 +0530 english rewording better readability commit 89cefb9 Author: guruprasad <[email protected]> Date: Thu Jan 2 14:47:05 2025 +0530 keep empty line - diff reduction commit 7eb7d1a Author: guruprasad <[email protected]> Date: Thu Jan 2 14:40:52 2025 +0530 spotless apply commit 22514b3 Author: guruprasad <[email protected]> Date: Thu Jan 2 14:38:26 2025 +0530 fix the google creds to use correct constructor; java 21 image for agent, and clarifying contributor notes commit a6c4140 Merge: 635271d c70e47e Author: guruprasad <[email protected]> Date: Thu Jan 2 12:41:12 2025 +0530 Merge branch 'integration-test-updates' of github.com:gbhat618/google-compute-engine-plugin into integration-test-updates commit 635271d Author: guruprasad <[email protected]> Date: Thu Jan 2 12:40:58 2025 +0530 use java21 for agent image and simplify assertion jenkinsrule utilities commit 35bd816 Author: guruprasad <[email protected]> Date: Thu Jan 2 12:14:56 2025 +0530 Use BuildWatcher instead of TailLog for simple commit c70e47e Author: Guruprasad Bhat <[email protected]> Date: Thu Jan 2 11:12:59 2025 +0530 Update CONTRIBUTING.md Co-authored-by: Jesse Glick <[email protected]> commit 12658e1 Author: guruprasad <[email protected]> Date: Mon Dec 23 18:31:27 2024 +0530 update jenkins test timeout commit 79b8ffd Author: guruprasad <[email protected]> Date: Mon Dec 23 18:04:27 2024 +0530 minor updates to integration tests and failsafe timeout commit 35a9f74 Author: guruprasad <[email protected]> Date: Mon Dec 23 16:47:58 2024 +0530 remove the timeout config from documentation command commit 37ad3df Author: guruprasad <[email protected]> Date: Mon Dec 23 16:47:10 2024 +0530 pom update as per jenkins documentation commit 4a49441 Author: guruprasad <[email protected]> Date: Mon Dec 23 16:45:15 2024 +0530 update some its for better test and assertions commit 05b4e95 Author: guruprasad <[email protected]> Date: Fri Dec 20 16:32:55 2024 +0530 update integration tests
src/main/java/com/google/jenkins/plugins/computeengine/CleanLostNodesWork.java
Fixed
Show fixed
Hide fixed
src/main/java/com/google/jenkins/plugins/computeengine/CleanLostNodesWork.java
Fixed
Show fixed
Hide fixed
src/main/java/com/google/jenkins/plugins/computeengine/client/ComputeClientV2.java
Show resolved
Hide resolved
src/test/java/com/google/jenkins/plugins/computeengine/integration/CleanLostNodesWorkIT.java
Show resolved
Hide resolved
src/main/java/com/google/jenkins/plugins/computeengine/CleanLostNodesWork.java
Outdated
Show resolved
Hide resolved
PR requires #495 |
src/test/java/com/google/jenkins/plugins/computeengine/CleanLostNodesWorkTest.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/jenkins/plugins/computeengine/CleanLostNodesWork.java
Outdated
Show resolved
Hide resolved
src/test/java/com/google/jenkins/plugins/computeengine/integration/CleanLostNodesWorkIT.java
Outdated
Show resolved
Hide resolved
src/test/java/com/google/jenkins/plugins/computeengine/integration/CleanLostNodesWorkIT.java
Outdated
Show resolved
Hide resolved
src/test/java/com/google/jenkins/plugins/computeengine/integration/CleanLostNodesWorkIT.java
Outdated
Show resolved
Hide resolved
src/test/java/com/google/jenkins/plugins/computeengine/integration/ComputeClientV2IT.java
Outdated
Show resolved
Hide resolved
src/test/java/com/google/jenkins/plugins/computeengine/integration/RealJenkinsLogUtil.java
Show resolved
Hide resolved
src/main/java/com/google/jenkins/plugins/computeengine/ComputeEngineCloud.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/jenkins/plugins/computeengine/CleanLostNodesWork.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/jenkins/plugins/computeengine/CleanLostNodesWork.java
Show resolved
Hide resolved
src/main/java/com/google/jenkins/plugins/computeengine/CleanLostNodesWork.java
Show resolved
Hide resolved
src/main/java/com/google/jenkins/plugins/computeengine/CleanLostNodesWork.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/jenkins/plugins/computeengine/CleanLostNodesWork.java
Show resolved
Hide resolved
* (<a href="https://github.com/GoogleCloudPlatform/gcp-plugin-core-java">gcp-plugin-core-java</a>, last updated in December 2019). | ||
* Consideration for the gradual evolution of this class is suggested, including the re-implementation of methods | ||
* currently utilized from the Graphite library, to ensure dependency solely on the Google API Java Client Services | ||
* library (<a href="https://github.com/googleapis/google-api-java-client-services">google-api-java-client-services</a>). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC google-api-java-client-services
corresponds to com.google.api.services
packages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep that is correct. There is the Compute, and similarly the Instance classes are there, which are the same that graphite is importing as well.
So just now I tried to check a bit on the history of these classes,
- It seems that this plugin was originally having the ComputeClient code,
- and it was moved to graphite via: Consolidate platform-agnostic client code from the Jenkins plugins into one place within this library GoogleCloudPlatform/gcp-plugin-core-java#1 and Update to use ComputeClient from gcp-plugin-core. #140.
Looking at the graphite library and our usages,
- graphite library has total 6 client implementation (one of which is
ComputeClient
) - our usage is only these 6 classes
✗ grep -h -r "import com.google.cloud.graphite.platforms.plugin.client" ./src | sort -u import com.google.cloud.graphite.platforms.plugin.client.ClientFactory; import com.google.cloud.graphite.platforms.plugin.client.ComputeClient.OperationException; import com.google.cloud.graphite.platforms.plugin.client.ComputeClient; import com.google.cloud.graphite.platforms.plugin.client.model.GuestAttribute; import com.google.cloud.graphite.platforms.plugin.client.model.InstanceResourceData; import com.google.cloud.graphite.platforms.plugin.client.util.ClientUtil;
So currently I am thinking like,
- We should bring back the
ComputeClient
code from graphite into this repo back (as a separate PR); theComputeClientV2
should be merged into theComputeClient
. - Another thought is, to fork the graphite into
jenkinsci
, but that means, we need to manage (possibly delete) the 5 unused client classes (and associated wrappers); which then leaves the repo with only 6 classes, so may not be a good idea to maintain a whole repo for just that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are making any major changes to the plugin (beyond what is necessary for this PR and specifically #503 (comment)), it should be to switch to whatever Java client library Google currently recommends, rather than maintaining our own stuff for no good reason.
src/main/java/com/google/jenkins/plugins/computeengine/client/ComputeClientV2.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/jenkins/plugins/computeengine/client/ComputeClientV2.java
Outdated
Show resolved
Hide resolved
src/test/java/com/google/jenkins/plugins/computeengine/integration/CleanLostNodesWorkIT.java
Outdated
Show resolved
Hide resolved
src/test/java/com/google/jenkins/plugins/computeengine/integration/RealJenkinsLogUtil.java
Outdated
Show resolved
Hide resolved
src/test/java/com/google/jenkins/plugins/computeengine/integration/CleanLostNodesWorkIT.java
Show resolved
Hide resolved
src/test/java/com/google/jenkins/plugins/computeengine/integration/CleanLostNodesWorkIT.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/jenkins/plugins/computeengine/CleanLostNodesWork.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/jenkins/plugins/computeengine/CleanLostNodesWork.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/jenkins/plugins/computeengine/CleanLostNodesWork.java
Outdated
Show resolved
Hide resolved
…rement of case conversions
src/main/java/com/google/jenkins/plugins/computeengine/CleanLostNodesWork.java
Outdated
Show resolved
Hide resolved
…stNodesWork.java Co-authored-by: Jesse Glick <[email protected]>
src/main/java/com/google/jenkins/plugins/computeengine/CleanLostNodesWork.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/jenkins/plugins/computeengine/CleanLostNodesWork.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/jenkins/plugins/computeengine/CleanLostNodesWork.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/jenkins/plugins/computeengine/CleanLostNodesWork.java
Outdated
Show resolved
Hide resolved
…stNodesWork.java Co-authored-by: Robert Sandell <[email protected]>
…stNodesWork.java Co-authored-by: Robert Sandell <[email protected]>
…stNodesWork.java Co-authored-by: Robert Sandell <[email protected]>
…stNodesWork.java Co-authored-by: Robert Sandell <[email protected]>
Problem in existing implementation
The current CleanLostNodesWork relies on the
instanceId
, a unique id for each GCP cloud that is created in the Jenkins controller.There is an issue with this unique_id based logic,
The logic to mark a node as lost is by checking the existence of the remote instance with the current controller.
This leads to issues in two scenarios,
Proposed solution
This PR proposes to use a different logic for finding a lost node, wherein, the controller will keep updating a label called
jenkins_node_last_in_use
with an epoch timestamp value. Any controller can delete a VM if the VM is not local to the controller and itsjenkins_node_last_in_use
is greater than the periodic task's 3 execution duration.This logic overcomes both of the limitations explained above and is more robust to ensure lost VMs of any controller are deleted as long as there is at least one Jenkins controller that uses the same GCP project.
idea given by #408 (comment)
And solves: #157 where cloning the Jenkins controller into staging and testing environments led to VM deletion race between controllers.
Testing done
Added new integration test using
RealJenkinsRule
with2
controllers, simulating the lost node scenario and a long running pipeline scenariointegration test logs
Submitter checklist