Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update clean lost node without requiring a instanceId #503

Merged
merged 36 commits into from
Jan 21, 2025
Merged
Show file tree
Hide file tree
Changes from 31 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
a91653c
clean lost node working
gbhat618 Jan 2, 2025
c3201ac
developing test
gbhat618 Jan 3, 2025
65bd743
revert the ITUtil changes
gbhat618 Jan 3, 2025
40a1222
Squashed commit of the following:
gbhat618 Jan 3, 2025
b7de0f2
clean nodes in working - needs some more cleanup
gbhat618 Jan 3, 2025
e96b05a
tests and code in working codition
gbhat618 Jan 4, 2025
ca8e589
update the compute client v2 integration tests
gbhat618 Jan 4, 2025
7f2d305
improve comments and small code updates
gbhat618 Jan 4, 2025
caae499
Revert "Squashed commit of the following:"
gbhat618 Jan 4, 2025
97c74c7
minor improvements comments and code
gbhat618 Jan 4, 2025
fe03a64
Update src/test/java/com/google/jenkins/plugins/computeengine/integra…
gbhat618 Jan 7, 2025
dcda478
Update src/test/java/com/google/jenkins/plugins/computeengine/integra…
gbhat618 Jan 7, 2025
92bd182
Update src/test/java/com/google/jenkins/plugins/computeengine/integra…
gbhat618 Jan 7, 2025
e47588d
Update src/test/java/com/google/jenkins/plugins/computeengine/integra…
gbhat618 Jan 7, 2025
4e322d7
Update src/main/java/com/google/jenkins/plugins/computeengine/CleanLo…
gbhat618 Jan 7, 2025
9c840b5
Update src/main/java/com/google/jenkins/plugins/computeengine/CleanLo…
gbhat618 Jan 7, 2025
f17d80e
Update src/test/java/com/google/jenkins/plugins/computeengine/integra…
gbhat618 Jan 7, 2025
6315666
Update src/test/java/com/google/jenkins/plugins/computeengine/integra…
gbhat618 Jan 7, 2025
4cd89ca
Squashed commit of the following:
gbhat618 Jan 7, 2025
5484655
improve code and fix ComputeClientV2 intialization
gbhat618 Jan 7, 2025
38bb7f6
Revert "Squashed commit of the following:"
gbhat618 Jan 7, 2025
fabd527
Merge branch 'develop' of github.com:jenkinsci/google-compute-engine-…
gbhat618 Jan 7, 2025
c243b28
spotless
gbhat618 Jan 7, 2025
1071692
more accurate log message
gbhat618 Jan 7, 2025
0a15aa7
update log recorder with static list to capture more log records than…
gbhat618 Jan 7, 2025
3a8030b
update DateFormatter formatting requirement docs
gbhat618 Jan 7, 2025
012cdc7
add Locale.ROOT in lowercase and uppercase coversions
gbhat618 Jan 7, 2025
a87a569
improve the comment with better clarity
gbhat618 Jan 7, 2025
458b0a8
use the format with hard coded zone info for readability, so no requi…
gbhat618 Jan 7, 2025
deb6d70
remove the non-required javadoc in timestamp string
gbhat618 Jan 7, 2025
28cc50d
Update src/main/java/com/google/jenkins/plugins/computeengine/CleanLo…
gbhat618 Jan 7, 2025
da07d6f
Update src/main/java/com/google/jenkins/plugins/computeengine/CleanLo…
gbhat618 Jan 16, 2025
ee35bbb
Update src/main/java/com/google/jenkins/plugins/computeengine/CleanLo…
gbhat618 Jan 16, 2025
ee6ad39
Update src/main/java/com/google/jenkins/plugins/computeengine/CleanLo…
gbhat618 Jan 16, 2025
c8ec0dd
Update src/main/java/com/google/jenkins/plugins/computeengine/CleanLo…
gbhat618 Jan 16, 2025
fec2061
spotless-apply after applying review suggestions
gbhat618 Jan 16, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -16,17 +16,23 @@

package com.google.jenkins.plugins.computeengine;

import static com.google.jenkins.plugins.computeengine.ComputeEngineCloud.CLOUD_ID_LABEL_KEY;
gbhat618 marked this conversation as resolved.
Show resolved Hide resolved
import static java.util.Collections.emptyList;

import com.google.api.services.compute.model.Instance;
import com.google.common.annotations.VisibleForTesting;
import com.google.common.collect.ImmutableMap;
import com.google.jenkins.plugins.computeengine.client.ComputeClientV2;
import hudson.Extension;
import hudson.model.PeriodicWork;
import hudson.model.Slave;
import java.io.IOException;
import java.security.GeneralSecurityException;
import java.time.LocalDateTime;
import java.time.OffsetDateTime;
import java.time.ZoneOffset;
import java.time.format.DateTimeFormatter;
import java.time.temporal.ChronoUnit;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.logging.Level;
import java.util.logging.Logger;
Expand All @@ -39,11 +45,27 @@
@Symbol("cleanLostNodesWork")
public class CleanLostNodesWork extends PeriodicWork {
protected final Logger logger = Logger.getLogger(getClass().getName());
public static final String NODE_IN_USE_LABEL_KEY = "jenkins_node_last_refresh";
public static final long RECURRENCE_PERIOD = Long.parseLong(
System.getProperty(CleanLostNodesWork.class.getName() + ".recurrencePeriod", String.valueOf(HOUR)));

@VisibleForTesting
public static final int LOST_MULTIPLIER = 3;
gbhat618 marked this conversation as resolved.
Show resolved Hide resolved
/**
* The formatter for the label timestamp value as per google label format,
* "The value can only contain lowercase letters, numeric characters, underscores and dashes.
* The value can be at most 63 characters long. International characters are allowed".
*/
private static final DateTimeFormatter formatter = DateTimeFormatter.ofPattern("yyyy_MM_dd't'HH_mm_ss_SSS'z'");

/** {@inheritDoc} */
@Override
public long getRecurrencePeriod() {
return HOUR;
return RECURRENCE_PERIOD;
}

public static String getLastRefreshLabelVal() {
return formatter.format(OffsetDateTime.now(ZoneOffset.UTC));
}

/** {@inheritDoc} */
Expand All @@ -55,58 +77,106 @@

private void cleanCloud(ComputeEngineCloud cloud) {
logger.log(Level.FINEST, "Cleaning cloud " + cloud.getCloudName());
List<Instance> remoteInstances = findRemoteInstances(cloud);
ComputeClientV2 clientV2;
try {
clientV2 = cloud.getClientV2();
} catch (GeneralSecurityException | IOException ex) {
logger.log(Level.WARNING, "Error getting clientV2 for cloud " + cloud.getCloudName(), ex);
return;
}
List<Instance> remoteInstances = findRunningRemoteInstances(clientV2);
Set<String> localInstances = findLocalInstances(cloud);
if (!(localInstances.isEmpty() || remoteInstances.isEmpty())) {
updateLocalInstancesLabel(clientV2, localInstances, remoteInstances);
}
remoteInstances.stream()
.filter(remote -> isOrphaned(remote, localInstances))
.forEach(remote -> terminateInstance(remote, cloud));
}

private boolean isOrphaned(Instance remote, Set<String> localInstances) {
String instanceName = remote.getName();
logger.log(Level.FINEST, "Checking instance " + instanceName);
return !localInstances.contains(instanceName);
/* It is necessary to check if the remote instance is present in localInstances.
The `remote` instance has an old timestamp because it hasn't been fetched again
after the `updateLocalInstancesLabel` call, to avoid extra network calls.
*/
if (localInstances.contains(remote.getName())) {
return false;
}
gbhat618 marked this conversation as resolved.
Show resolved Hide resolved
String nodeLastRefresh = remote.getLabels().get(NODE_IN_USE_LABEL_KEY);
if (nodeLastRefresh == null) {
return false;
}
OffsetDateTime lastRefresh =
LocalDateTime.parse(nodeLastRefresh, formatter).atOffset(ZoneOffset.UTC);
boolean isOrphan = lastRefresh
.plus(RECURRENCE_PERIOD * LOST_MULTIPLIER, ChronoUnit.MILLIS)
.isBefore(OffsetDateTime.now(ZoneOffset.UTC));
logger.log(
Level.FINEST,
"Instance " + remote.getName() + " last_refresh label value: " + nodeLastRefresh + ", isOrphan: "
+ isOrphan);
gbhat618 marked this conversation as resolved.
Show resolved Hide resolved
return isOrphan;
}

private void terminateInstance(Instance remote, ComputeEngineCloud cloud) {
String instanceName = remote.getName();
logger.log(Level.INFO, "Remote instance " + instanceName + " not found locally, removing it");
logger.log(Level.INFO, "Removing orphaned instance: " + instanceName);
try {
cloud.getClient().terminateInstanceAsync(cloud.getProjectId(), remote.getZone(), instanceName);
} catch (IOException ex) {
logger.log(Level.WARNING, "Error terminating remote instance " + instanceName, ex);
}
}

private List<ComputeEngineCloud> getClouds() {
return Jenkins.get().clouds.stream()
.filter(cloud -> cloud instanceof ComputeEngineCloud)
.map(cloud -> (ComputeEngineCloud) cloud)
.collect(Collectors.toList());
}

private Set<String> findLocalInstances(ComputeEngineCloud cloud) {
return Jenkins.get().getNodes().stream()
var localInstances = Jenkins.get().getNodes().stream()
.filter(node -> node instanceof ComputeEngineInstance)
.map(node -> (ComputeEngineInstance) node)
.filter(node -> node.getCloud().equals(cloud))
.map(Slave::getNodeName)
.collect(Collectors.toSet());
logger.log(Level.FINEST, "Found " + localInstances.size() + " local instances");
gbhat618 marked this conversation as resolved.
Show resolved Hide resolved
return localInstances;
}

private List<Instance> findRemoteInstances(ComputeEngineCloud cloud) {
Map<String, String> filterLabel = ImmutableMap.of(CLOUD_ID_LABEL_KEY, cloud.getInstanceId());
private List<Instance> findRunningRemoteInstances(ComputeClientV2 clientV2) {
try {
return cloud.getClient().listInstancesWithLabel(cloud.getProjectId(), filterLabel).stream()
.filter(instance -> shouldTerminateStatus(instance.getStatus()))
.collect(Collectors.toList());
var remoteInstances = clientV2.retrieveInstanceByLabelKeyAndStatus(NODE_IN_USE_LABEL_KEY, "RUNNING");
logger.log(Level.FINEST, "Found " + remoteInstances.size() + " running remote instances");
gbhat618 marked this conversation as resolved.
Show resolved Hide resolved
return remoteInstances;
} catch (IOException ex) {
logger.log(Level.WARNING, "Error finding remote instances", ex);
return emptyList();
}
}

private boolean shouldTerminateStatus(String status) {
return !status.equals("STOPPING");
/**
* Updates the label of the local instances to indicate they are still in use. The method makes N network calls
* for N local instances, couldn't find any bulk update apis.
*/
private void updateLocalInstancesLabel(
ComputeClientV2 clientV2, Set<String> localInstances, List<Instance> remoteInstances) {
var remoteInstancesByName =
remoteInstances.stream().collect(Collectors.toMap(Instance::getName, instance -> instance));
var labelToUpdate = ImmutableMap.of(NODE_IN_USE_LABEL_KEY, getLastRefreshLabelVal());
for (String instanceName : localInstances) {
var remoteInstance = remoteInstancesByName.get(instanceName);
if (remoteInstance == null) {
continue;
}
try {
clientV2.updateInstanceLabels(remoteInstance, labelToUpdate);
logger.log(Level.FINEST, "Updated label for instance " + instanceName);
gbhat618 marked this conversation as resolved.
Show resolved Hide resolved
} catch (IOException e) {
logger.log(Level.WARNING, "Error updating label for instance " + instanceName, e);
}
}

Check warning on line 180 in src/main/java/com/google/jenkins/plugins/computeengine/CleanLostNodesWork.java

View check run for this annotation

ci.jenkins.io / Code Coverage

Not covered lines

Lines 82-180 are not covered by tests
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
import com.google.common.base.Strings;
import com.google.common.collect.ImmutableMap;
import com.google.jenkins.plugins.computeengine.client.ClientUtil;
import com.google.jenkins.plugins.computeengine.client.ComputeClientV2;
import com.google.jenkins.plugins.credentials.oauth.GoogleOAuth2Credentials;
import edu.umd.cs.findbugs.annotations.NonNull;
import hudson.Extension;
Expand All @@ -48,6 +49,7 @@
import hudson.util.ListBoxModel;
import java.io.IOException;
import java.io.PrintStream;
import java.security.GeneralSecurityException;
import java.util.ArrayList;
import java.util.Collection;
import java.util.Iterator;
Expand Down Expand Up @@ -90,6 +92,7 @@
private List<InstanceConfiguration> configurations;

private transient volatile ComputeClient client;
private transient volatile ComputeClientV2 clientV2;
private boolean noDelayProvisioning;

@DataBoundConstructor
Expand Down Expand Up @@ -161,6 +164,8 @@

// Apply a label that identifies the name of this instance configuration
configuration.appendLabel(CONFIG_LABEL_KEY, configuration.getNamePrefix());
configuration.appendLabel(
CleanLostNodesWork.NODE_IN_USE_LABEL_KEY, CleanLostNodesWork.getLastRefreshLabelVal());
}
}
setInstanceId(instanceId);
Expand Down Expand Up @@ -209,6 +214,17 @@
return client;
}

public ComputeClientV2 getClientV2() throws IOException, GeneralSecurityException {
if (clientV2 == null) {
synchronized (this) {
if (clientV2 == null) {
clientV2 = ClientUtil.createComputeClientV2(projectId, credentialsId);
}
}
}
return clientV2;

Check warning on line 225 in src/main/java/com/google/jenkins/plugins/computeengine/ComputeEngineCloud.java

View check run for this annotation

ci.jenkins.io / Code Coverage

Not covered lines

Lines 218-225 are not covered by tests
}

/**
* Set configurations for this cloud.
*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,10 @@
import com.cloudbees.plugins.credentials.CredentialsProvider;
import com.cloudbees.plugins.credentials.domains.DomainRequirement;
import com.google.api.client.auth.oauth2.Credential;
import com.google.api.client.googleapis.javanet.GoogleNetHttpTransport;
import com.google.api.client.http.HttpTransport;
import com.google.api.client.json.jackson2.JacksonFactory;
import com.google.api.services.compute.Compute;
import com.google.cloud.graphite.platforms.plugin.client.ClientFactory;
import com.google.common.base.Preconditions;
import com.google.common.base.Strings;
Expand All @@ -19,6 +22,7 @@
import java.security.GeneralSecurityException;
import java.util.List;
import java.util.Optional;
import jenkins.model.Jenkins;

/** Utilities for using the gcp-plugin-core clients. */
public class ClientUtil {
Expand Down Expand Up @@ -88,4 +92,17 @@
private static Credential getGoogleCredential(GoogleRobotCredentials credentials) throws GeneralSecurityException {
return credentials.getGoogleCredential(new ComputeEngineScopeRequirement());
}

public static ComputeClientV2 createComputeClientV2(String projectId, String credentialsId)
throws GeneralSecurityException, IOException {
Credential httpRequestInitializer = ClientUtil.getGoogleCredential(
ClientUtil.getRobotCredentials(Jenkins.get(), ImmutableList.of(), credentialsId));
Compute compute = new Compute.Builder(
GoogleNetHttpTransport.newTrustedTransport(), new JacksonFactory(), httpRequestInitializer)
.setGoogleClientRequestInitializer(request ->
request.setRequestHeaders(request.getRequestHeaders().setUserAgent(APPLICATION_NAME)))
.setApplicationName(ClientUtil.APPLICATION_NAME)
.build();
return new ComputeClientV2(projectId, compute);

Check warning on line 106 in src/main/java/com/google/jenkins/plugins/computeengine/client/ClientUtil.java

View check run for this annotation

ci.jenkins.io / Code Coverage

Not covered lines

Lines 98-106 are not covered by tests
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
package com.google.jenkins.plugins.computeengine.client;
gbhat618 marked this conversation as resolved.
Show resolved Hide resolved

import com.google.api.services.compute.Compute;
import com.google.api.services.compute.model.Instance;
import com.google.api.services.compute.model.InstancesScopedList;
import com.google.api.services.compute.model.InstancesSetLabelsRequest;
import java.io.IOException;
import java.util.List;
import java.util.Map;
import java.util.Objects;
import java.util.stream.Collectors;
import lombok.Getter;

/**
* Extends {@link com.google.cloud.graphite.platforms.plugin.client.ComputeClient} with additional functionalities.
* <p>This class serves as a venue for implementing features not available in the archived Graphite Java library
* (<a href="https://github.com/GoogleCloudPlatform/gcp-plugin-core-java">gcp-plugin-core-java</a>, last updated in December 2019).
* Consideration for the gradual evolution of this class is suggested, including the re-implementation of methods
* currently utilized from the Graphite library, to ensure dependency solely on the Google API Java Client Services
* library (<a href="https://github.com/googleapis/google-api-java-client-services">google-api-java-client-services</a>).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC google-api-java-client-services corresponds to com.google.api.services packages?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that is correct. There is the Compute, and similarly the Instance classes are there, which are the same that graphite is importing as well.

So just now I tried to check a bit on the history of these classes,

Looking at the graphite library and our usages,

  • graphite library has total 6 client implementation (one of which is ComputeClient)
  • our usage is only these 6 classes
    ✗ grep -h -r "import com.google.cloud.graphite.platforms.plugin.client" ./src | sort -u
    import com.google.cloud.graphite.platforms.plugin.client.ClientFactory;
    import com.google.cloud.graphite.platforms.plugin.client.ComputeClient.OperationException;
    import com.google.cloud.graphite.platforms.plugin.client.ComputeClient;
    import com.google.cloud.graphite.platforms.plugin.client.model.GuestAttribute;
    import com.google.cloud.graphite.platforms.plugin.client.model.InstanceResourceData;
    import com.google.cloud.graphite.platforms.plugin.client.util.ClientUtil;

So currently I am thinking like,

  • We should bring back the ComputeClient code from graphite into this repo back (as a separate PR); the ComputeClientV2 should be merged into the ComputeClient.
  • Another thought is, to fork the graphite into jenkinsci, but that means, we need to manage (possibly delete) the 5 unused client classes (and associated wrappers); which then leaves the repo with only 6 classes, so may not be a good idea to maintain a whole repo for just that.

Copy link
Member

@jglick jglick Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are making any major changes to the plugin (beyond what is necessary for this PR and specifically #503 (comment)), it should be to switch to whatever Java client library Google currently recommends, rather than maintaining our own stuff for no good reason.

* This approach aims to eventually eliminate the reliance on the Graphite library.
*/
public class ComputeClientV2 {

private final String projectId;

@Getter
private final Compute compute;

public ComputeClientV2(String projectId, Compute compute) {
this.projectId = projectId;
this.compute = compute;
}

/**
* Updates the labels of a specified {@code instance} by merging or replacing them with {@code newLabels}.
* <p>This method adds any new labels found in {@code newLabels} to the instance's existing labels and updates
* the values of any existing labels if they are also present in {@code newLabels}. Labels existing on the instance
* that are not in {@code newLabels} remain unchanged. This operation can only result in the addition of new
* labels or the modification of existing ones.
*
* @param instance the instance whose labels are to be updated; must not be {@code null}
* @param newLabels the new labels to be merged with or replace the existing labels of the instance; must not be {@code null}
* @throws IOException if an I/O error occurs during the label update process.
*/
public void updateInstanceLabels(Instance instance, Map<String, String> newLabels) throws IOException {
var allLabels = instance.getLabels();
allLabels.putAll(newLabels);
var labelsRequest = new InstancesSetLabelsRequest()
.setLabels(allLabels)
.setLabelFingerprint(instance.getLabelFingerprint());
String zoneLink = instance.getZone();
String zone = zoneLink.substring(zoneLink.lastIndexOf("/") + 1);
compute.instances()
.setLabels(projectId, zone, instance.getName(), labelsRequest)
.execute();
}

/**
* Fetches instances by label key existence and status.
* <p>Applies Google Compute Engine aggregated list syntax for filtering:
* <a href="https://cloud.google.com/compute/docs/reference/rest/v1/instances/aggregatedList">aggregatedList API</a>.
*
* @param key the non-empty label key to filter by.
* @param status the instance status (RUNNING, STOPPING, etc.) as defined in:
* <a href="https://cloud.google.com/compute/docs/instances/instance-lifecycle#instance-states">Instance States</a>.
* @return List of {@link Instance} matching criteria, or empty list if none.
* @throws IOException for communication issues with Compute Engine API.
*/
public List<Instance> retrieveInstanceByLabelKeyAndStatus(String key, String status) throws IOException {
String filter = "labels." + key + ":*" + " AND status=" + status;
var response =
compute.instances().aggregatedList(projectId).setFilter(filter).execute();
var items = response.getItems();
if (items == null) {
return List.of();
}
return items.values().stream()
.map(InstancesScopedList::getInstances)
.filter(Objects::nonNull)
.flatMap(List::stream)
.collect(Collectors.toList());

Check warning on line 82 in src/main/java/com/google/jenkins/plugins/computeengine/client/ComputeClientV2.java

View check run for this annotation

ci.jenkins.io / Code Coverage

Not covered lines

Lines 27-82 are not covered by tests
}
}
Loading
Loading