Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: mainly refactor engine codes #260

Merged
merged 2 commits into from
Dec 10, 2024
Merged

Conversation

derekbit
Copy link
Member

@derekbit derekbit commented Dec 8, 2024

Which issue(s) this PR fixes:

Issue longhorn/longhorn#9919

Signed-off-by: Derek Su [email protected]

What this PR does / why we need it:

Special notes for your reviewer:

Additional documentation or context

@derekbit derekbit self-assigned this Dec 8, 2024
Copy link

coderabbitai bot commented Dec 8, 2024

Walkthrough

The pull request introduces modifications primarily to the EngineCreate method across various files, specifically removing the upgradeRequired parameter. This change affects the SPDKClient, Server, and related test files. Additionally, enhancements to the Engine struct in pkg/spdk/engine.go include new methods for managing engine state and improved error handling. The replica.go file sees updates to error messages and logging for consistency. Overall, the changes streamline method signatures and improve code clarity without altering existing functionalities.

Changes

File Change Summary
pkg/client/client.go Updated EngineCreate method signature by removing upgradeRequired parameter.
pkg/spdk/engine.go Added new methods: isNewEngine, checkInitiatorAndTargetCreationRequirements, releasePorts, releaseTargetPort. Refactored Create, handleFrontend, SwitchOverTarget, and DeleteTarget methods for improved logic and error handling. Renamed dmDeviceBusy to dmDeviceIsBusy.
pkg/spdk/replica.go Updated error messages and logging statements for consistency.
pkg/spdk/server.go Updated EngineCreate method to remove req.UpgradeRequired parameter from e.Create method call.
pkg/spdk_test.go Modified test cases to reflect changes in EngineCreate function calls, removing the last boolean parameter.

Assessment against linked issues

Objective Addressed Explanation
Volume should auto salvage then reattach and become healthy (#9919) It is unclear if the changes directly address the bug.

Possibly related PRs

  • feat(auto-salvage): v2 support #222: The changes in this PR involve the EngineCreate method in pkg/client/client.go, which now includes a salvageRequested parameter. This is directly related to the main PR's modification of the EngineCreate method, where the upgradeRequired parameter was removed, indicating a shared focus on simplifying method signatures.
  • fix: decrease ctrlr_loss_timeout_sec for base bdev  #247: The modifications to the Engine struct and its methods in this PR, particularly the Create method, are relevant as they also involve changes to method signatures and parameters, similar to the main PR's focus on the EngineCreate method.

Suggested reviewers

  • shuo-wu
  • c3y1huang
  • DamiaSan

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
pkg/spdk/engine.go (1)

2170-2216: Simplify error handling logic in SwitchOverTarget

The error handling in the deferred function within SwitchOverTarget is complex due to nested if statements.

To improve readability and maintainability, consider refactoring the nested if statements. Here's a suggested change:

defer func() {
	if err != nil {
		e.log.WithError(err).Warnf("Failed to switch over engine to target address %s", newTargetAddress)

		disconnected, errCheck := e.isTargetDisconnected()
		if errCheck != nil {
			e.log.WithError(errCheck).Warnf("Failed to check if target %s is disconnected", newTargetAddress)
			return
		}

		if disconnected && currentTargetAddress != "" {
			if errConnect := e.connectTarget(currentTargetAddress); errConnect != nil {
				e.log.WithError(errConnect).Warnf("Failed to connect target back to %s", currentTargetAddress)
				return
			}

			e.log.Infof("Connected target back to %s", currentTargetAddress)

			if errReload := e.reloadDevice(); errReload != nil {
				e.log.WithError(errReload).Warnf("Failed to reload device mapper")
			} else {
				e.log.Infof("Reloaded device mapper for connecting target back to %s", currentTargetAddress)
			}
		}
	} else {
		e.ErrorMsg = ""
		e.log.Infof("Switched over target to %s", newTargetAddress)
	}

	e.UpdateCh <- nil
}()
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 880499b and 9f1146a.

⛔ Files ignored due to path filters (9)
  • go.mod is excluded by !go.mod
  • go.sum is excluded by !**/*.sum, !go.sum
  • vendor/github.com/longhorn/go-spdk-helper/pkg/nvme/initiator.go is excluded by !vendor/**
  • vendor/github.com/longhorn/go-spdk-helper/pkg/nvme/nvme.go is excluded by !vendor/**
  • vendor/github.com/longhorn/go-spdk-helper/pkg/nvme/nvmecli.go is excluded by !vendor/**
  • vendor/github.com/longhorn/go-spdk-helper/pkg/spdk/client/advanced.go is excluded by !vendor/**
  • vendor/github.com/longhorn/go-spdk-helper/pkg/types/types.go is excluded by !vendor/**
  • vendor/github.com/longhorn/go-spdk-helper/pkg/util/device.go is excluded by !vendor/**
  • vendor/modules.txt is excluded by !vendor/**
📒 Files selected for processing (5)
  • pkg/client/client.go (1 hunks)
  • pkg/spdk/engine.go (29 hunks)
  • pkg/spdk/replica.go (4 hunks)
  • pkg/spdk/server.go (1 hunks)
  • pkg/spdk_test.go (6 hunks)
✅ Files skipped from review due to trivial changes (1)
  • pkg/spdk/replica.go
🔇 Additional comments (4)
pkg/spdk/engine.go (1)

486-491: Verify port allocation when both initiator and target are created

When both initiatorCreationRequired and targetCreationRequired are true, the same port is assigned to both e.Port and e.TargetPort. This could lead to conflicts if the initiator and target cannot share the same port.

Please ensure that using the same port for both the initiator and target is acceptable in your network configuration. If not, consider allocating separate ports for each to prevent potential issues.

pkg/client/client.go (1)

477-477: Simplification of EngineCreate method signature

The removal of the upgradeRequired parameter from the EngineCreate method reduces complexity and clarifies the method's purpose.

pkg/spdk/server.go (1)

881-881: Consistency in EngineCreate method update

The modifications to the EngineCreate method in the Server align with the updated method signature in the client. This ensures consistency across the codebase.

pkg/spdk_test.go (1)

271-271: Update test cases to match new EngineCreate signature

The test cases have been correctly updated to reflect the removal of the upgradeRequired parameter in the EngineCreate method. The tests remain valid and continue to verify the intended functionality.

Also applies to: 354-354, 530-530, 1228-1228, 1243-1243, 1354-1354

pkg/spdk/engine.go Outdated Show resolved Hide resolved
Copy link

codecov bot commented Dec 8, 2024

Codecov Report

Attention: Patch coverage is 0.35461% with 281 lines in your changes missing coverage. Please review.

Project coverage is 8.31%. Comparing base (2be9d5e) to head (fbe02ea).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
pkg/spdk/engine.go 0.00% 276 Missing ⚠️
pkg/spdk/replica.go 0.00% 4 Missing ⚠️
pkg/spdk/server.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##            main    #260      +/-   ##
========================================
- Coverage   8.38%   8.31%   -0.07%     
========================================
  Files         21      21              
  Lines       7159    7239      +80     
========================================
+ Hits         600     602       +2     
- Misses      6478    6557      +79     
+ Partials      81      80       -1     
Flag Coverage Δ
unittests 8.31% <0.35%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@derekbit derekbit changed the title chore(vendor): update dependencies refactor: mainly refactor engine codes Dec 8, 2024
Copy link

mergify bot commented Dec 8, 2024

This pull request is now in conflict. Could you fix it @derekbit? 🙏

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
pkg/spdk/engine.go (2)

121-147: Consider improving error message clarity

The error message in the checkInitiatorAndTargetCreationRequirements method could be more descriptive.

-		err = fmt.Errorf("invalid initiator and target addresses for engine %s creation with initiator address %v and target address %v", e.Name, initiatorIP, targetIP)
+		err = fmt.Errorf("pod IP %v does not match either initiator address %v or target address %v for engine %s creation", podIP, initiatorIP, targetIP, e.Name)

2170-2209: Improve error recovery in SwitchOverTarget

The error recovery logic in SwitchOverTarget could be more robust by adding retries for target reconnection.

Consider adding a retry mechanism for target reconnection:

if currentTargetAddress != "" {
+	maxRetries := 3
+	for i := 0; i < maxRetries; i++ {
		if errConnect := e.connectTarget(currentTargetAddress); errConnect != nil {
			e.log.WithError(errConnect).Warnf("Failed to connect target back to %s (attempt %d/%d)", currentTargetAddress, i+1, maxRetries)
+			if i < maxRetries-1 {
+				time.Sleep(time.Second * time.Duration(i+1))
+				continue
+			}
		} else {
			e.log.Infof("Connected target back to %s", currentTargetAddress)
			if errReload := e.reloadDevice(); errReload != nil {
				e.log.WithError(errReload).Warnf("Failed to reload device mapper")
			} else {
				e.log.Infof("Reloaded device mapper for connecting target back to %s", currentTargetAddress)
			}
+			break
		}
+	}
}
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 9f1146a and fc7178e.

⛔ Files ignored due to path filters (9)
  • go.mod is excluded by !go.mod
  • go.sum is excluded by !**/*.sum, !go.sum
  • vendor/github.com/longhorn/go-spdk-helper/pkg/nvme/initiator.go is excluded by !vendor/**
  • vendor/github.com/longhorn/go-spdk-helper/pkg/nvme/nvme.go is excluded by !vendor/**
  • vendor/github.com/longhorn/go-spdk-helper/pkg/nvme/nvmecli.go is excluded by !vendor/**
  • vendor/github.com/longhorn/go-spdk-helper/pkg/spdk/client/advanced.go is excluded by !vendor/**
  • vendor/github.com/longhorn/go-spdk-helper/pkg/types/types.go is excluded by !vendor/**
  • vendor/github.com/longhorn/go-spdk-helper/pkg/util/device.go is excluded by !vendor/**
  • vendor/modules.txt is excluded by !vendor/**
📒 Files selected for processing (5)
  • pkg/client/client.go (1 hunks)
  • pkg/spdk/engine.go (29 hunks)
  • pkg/spdk/replica.go (4 hunks)
  • pkg/spdk/server.go (1 hunks)
  • pkg/spdk_test.go (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/spdk/server.go
  • pkg/spdk/replica.go
🔇 Additional comments (8)
pkg/client/client.go (1)

477-477: LGTM: Simplified EngineCreate method signature

The removal of the upgradeRequired parameter streamlines the method signature while maintaining core functionality. This change aligns with the refactoring objectives.

pkg/spdk_test.go (4)

271-271: LGTM: Updated test call site for EngineCreate

The test has been correctly updated to match the new method signature.


354-354: LGTM: Updated test call site for EngineCreate in re-attach scenario

The test has been correctly updated to match the new method signature in the volume re-attach scenario.


1228-1228: LGTM: Updated test call sites for EngineCreate in snapshot revert

Both calls to EngineCreate in the snapshot revert scenario have been correctly updated to match the new method signature.

Also applies to: 1243-1243


1354-1354: LGTM: Updated test call site for EngineCreate in target-only scenario

The test has been correctly updated to match the new method signature in the target-only test scenario.

pkg/spdk/engine.go (3)

117-119: LGTM: Added clear helper method for engine state check

The isNewEngine method provides a clear way to determine if an engine instance is new based on IP addresses.


2439-2450: LGTM: Clean port release implementation

The releaseTargetPort method provides a clean way to release target ports and update state.


596-610: ⚠️ Potential issue

Potential double port release issue

The releasePorts function could attempt to release the same port twice if e.Port equals e.TargetPort.

Apply this fix:

func (e *Engine) releasePorts(superiorPortAllocator *commonbitmap.Bitmap) (err error) {
-	ports := map[int32]struct{}{
-		e.Port:       {},
-		e.TargetPort: {},
-	}
+	ports := make(map[int32]struct{})
+	if e.Port != 0 {
+		ports[e.Port] = struct{}{}
+	}
+	if e.TargetPort != 0 && e.TargetPort != e.Port {
+		ports[e.TargetPort] = struct{}{}
+	}

	if errRelease := releasePortIfExists(superiorPortAllocator, ports, &e.Port); errRelease != nil {
		err = multierr.Append(err, errRelease)
	}
	if errRelease := releasePortIfExists(superiorPortAllocator, ports, &e.TargetPort); errRelease != nil {
		err = multierr.Append(err, errRelease)
	}

	return err
}

Likely invalid or redundant comment.

pkg/spdk/engine.go Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
pkg/spdk/engine.go (2)

143-143: Improve error message clarity

The error message could be more descriptive by including the pod IP for easier debugging.

-		err = fmt.Errorf("invalid initiator and target addresses for engine %s creation with initiator address %v and target address %v", e.Name, initiatorIP, targetIP)
+		err = fmt.Errorf("invalid initiator and target addresses for engine %s creation: pod IP %v does not match initiator address %v or target address %v", e.Name, podIP, initiatorIP, targetIP)

232-235: Improve error handling in target creation path

The error handling here could be more specific. When BdevRaidGet fails with an error other than "no such device", we should log the specific error before returning.

		_, err := spdkClient.BdevRaidGet(e.Name, 0)
		if err != nil && !jsonrpc.IsJSONRPCRespErrorNoSuchDevice(err) {
+			e.log.WithError(err).Errorf("Failed to get raid bdev %v during engine creation", e.Name)
			return nil, errors.Wrapf(err, "failed to get raid bdev %v during engine creation", e.Name)
		}
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between fc7178e and b24d45a.

📒 Files selected for processing (5)
  • pkg/client/client.go (1 hunks)
  • pkg/spdk/engine.go (29 hunks)
  • pkg/spdk/replica.go (4 hunks)
  • pkg/spdk/server.go (1 hunks)
  • pkg/spdk_test.go (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • pkg/client/client.go
  • pkg/spdk/replica.go
  • pkg/spdk/server.go
  • pkg/spdk_test.go
🔇 Additional comments (2)
pkg/spdk/engine.go (2)

597-600: Prevent double release of ports

As noted in a previous review, there's a risk of attempting to release the same port twice if e.Port and e.TargetPort are equal.


2404-2435: 🛠️ Refactor suggestion

Improve error handling in DeleteTarget

Consider collecting all errors during cleanup instead of returning on the first error. This ensures we attempt to clean up all resources even if some operations fail.

func (e *Engine) DeleteTarget(spdkClient *spdkclient.Client, superiorPortAllocator *commonbitmap.Bitmap) (err error) {
	e.log.Infof("Deleting target with target port %d", e.TargetPort)

+	var errs []error
	
	if err := spdkClient.StopExposeBdev(e.Nqn); err != nil {
-		return errors.Wrapf(err, "failed to stop expose bdev while deleting target instance for engine %s", e.Name)
+		errs = append(errs, errors.Wrapf(err, "failed to stop expose bdev"))
	}

	if err := e.releaseTargetPort(superiorPortAllocator); err != nil {
-		return errors.Wrapf(err, "failed to release target port while deleting target instance for engine %s", e.Name)
+		errs = append(errs, errors.Wrapf(err, "failed to release target port"))
	}

	e.log.Infof("Deleting raid bdev %s while deleting target instance", e.Name)
	if _, err := spdkClient.BdevRaidDelete(e.Name); err != nil && !jsonrpc.IsJSONRPCRespErrorNoSuchDevice(err) {
-		return errors.Wrapf(err, "failed to delete raid bdev after engine %s while deleting target instance", e.Name)
+		errs = append(errs, errors.Wrapf(err, "failed to delete raid bdev"))
	}

	for replicaName, replicaStatus := range e.ReplicaStatusMap {
		e.log.Infof("Disconnecting replica %s while deleting target instance", replicaName)
		if err := disconnectNVMfBdev(spdkClient, replicaStatus.BdevName); err != nil {
			e.log.WithError(err).Warnf("Engine failed to disconnect replica %s while deleting target instance", replicaName)
			replicaStatus.Mode = types.ModeERR
+			errs = append(errs, errors.Wrapf(err, "failed to disconnect replica %s", replicaName))
		}
		replicaStatus.BdevName = ""
	}

+	if len(errs) > 0 {
+		return fmt.Errorf("failed to delete target instance for engine %s: %v", e.Name, errs)
+	}
	return nil
}

Likely invalid or redundant comment.

pkg/spdk/engine.go Show resolved Hide resolved
Longhorn 9919

Signed-off-by: Derek Su <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
pkg/spdk/engine.go (1)

2441-2452: Good separation of target port release logic

The releaseTargetPort method cleanly encapsulates port release logic, but consider adding a check to prevent double release of ports.

 func (e *Engine) releaseTargetPort(superiorPortAllocator *commonbitmap.Bitmap) error {
-    releaseTargetPortRequired := e.TargetPort != 0
+    if e.TargetPort == 0 {
+        return nil
+    }
 
-    // Release the target port
-    if releaseTargetPortRequired {
-        if err := superiorPortAllocator.ReleaseRange(e.TargetPort, e.TargetPort); err != nil {
-            return errors.Wrapf(err, "failed to release target port %d", e.TargetPort)
-        }
+    if err := superiorPortAllocator.ReleaseRange(e.TargetPort, e.TargetPort); err != nil {
+        return errors.Wrapf(err, "failed to release target port %d", e.TargetPort)
     }
+
     e.TargetPort = 0
 
     return nil
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b24d45a and 4a48c43.

⛔ Files ignored due to path filters (9)
  • go.mod is excluded by !go.mod
  • go.sum is excluded by !**/*.sum, !go.sum
  • vendor/github.com/longhorn/go-spdk-helper/pkg/nvme/initiator.go is excluded by !vendor/**
  • vendor/github.com/longhorn/go-spdk-helper/pkg/nvme/nvme.go is excluded by !vendor/**
  • vendor/github.com/longhorn/go-spdk-helper/pkg/nvme/nvmecli.go is excluded by !vendor/**
  • vendor/github.com/longhorn/go-spdk-helper/pkg/spdk/client/advanced.go is excluded by !vendor/**
  • vendor/github.com/longhorn/go-spdk-helper/pkg/types/types.go is excluded by !vendor/**
  • vendor/github.com/longhorn/go-spdk-helper/pkg/util/device.go is excluded by !vendor/**
  • vendor/modules.txt is excluded by !vendor/**
📒 Files selected for processing (5)
  • pkg/client/client.go (1 hunks)
  • pkg/spdk/engine.go (29 hunks)
  • pkg/spdk/replica.go (4 hunks)
  • pkg/spdk/server.go (1 hunks)
  • pkg/spdk_test.go (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • pkg/client/client.go
  • pkg/spdk/server.go
  • pkg/spdk/replica.go
🔇 Additional comments (7)
pkg/spdk_test.go (1)

271-271: LGTM: Consistent removal of upgradeRequired parameter

The changes consistently remove the upgradeRequired parameter from all EngineCreate calls across test functions, aligning with the refactoring in the engine implementation.

Also applies to: 354-354, 530-530, 1228-1228, 1243-1243

pkg/spdk/engine.go (6)

58-59: Improved field naming clarity

The rename from dmDeviceBusy to dmDeviceIsBusy makes the boolean field's purpose more explicit.


117-119: LGTM: Clear and concise helper method

The isNewEngine() method provides a clear way to check if an engine is newly created by examining both IP fields.


121-147: Well-structured requirements check method

The checkInitiatorAndTargetCreationRequirements method effectively:

  • Encapsulates complex logic for determining creation requirements
  • Provides clear logging for each scenario
  • Returns explicit boolean flags for required actions

Line range hint 2229-2238: Improved error handling for target switchover

Good addition of suspension check before target switchover to prevent potential issues. This helps ensure the engine is in a safe state before proceeding.


2404-2435: 🛠️ Refactor suggestion

Consider improving error handling in DeleteTarget

The error handling could be more granular to ensure proper cleanup even when some operations fail.

 func (e *Engine) DeleteTarget(spdkClient *spdkclient.Client, superiorPortAllocator *commonbitmap.Bitmap) (err error) {
     e.log.Infof("Deleting target with target port %d", e.TargetPort)
 
+    var errs []error
+
     err = spdkClient.StopExposeBdev(e.Nqn)
     if err != nil {
-        return errors.Wrapf(err, "failed to stop expose bdev while deleting target instance for engine %s", e.Name)
+        errs = append(errs, errors.Wrapf(err, "failed to stop expose bdev"))
     }
 
     err = e.releaseTargetPort(superiorPortAllocator)
     if err != nil {
-        return errors.Wrapf(err, "failed to release target port while deleting target instance for engine %s", e.Name)
+        errs = append(errs, errors.Wrapf(err, "failed to release target port"))
     }
 
     e.log.Infof("Deleting raid bdev %s while deleting target instance", e.Name)
     _, err = spdkClient.BdevRaidDelete(e.Name)
     if err != nil && !jsonrpc.IsJSONRPCRespErrorNoSuchDevice(err) {
-        return errors.Wrapf(err, "failed to delete raid bdev after engine %s while deleting target instance", e.Name)
+        errs = append(errs, errors.Wrapf(err, "failed to delete raid bdev"))
     }
 
     for replicaName, replicaStatus := range e.ReplicaStatusMap {
         e.log.Infof("Disconnecting replica %s while deleting target instance", replicaName)
         err = disconnectNVMfBdev(spdkClient, replicaStatus.BdevName)
         if err != nil {
             e.log.WithError(err).Warnf("Engine failed to disconnect replica %s while deleting target instance", replicaName)
             replicaStatus.Mode = types.ModeERR
+            errs = append(errs, errors.Wrapf(err, "failed to disconnect replica %s", replicaName))
         }
         replicaStatus.BdevName = ""
     }
+
+    if len(errs) > 0 {
+        return fmt.Errorf("failed to delete target instance for engine %s: %v", e.Name, errs)
+    }
     return nil
 }

Likely invalid or redundant comment.


2238-2244: Enhanced NVMe device info loading

The code now properly handles the case where NVMe device is not found, distinguishing between expected and unexpected errors.

✅ Verification successful

Let me analyze the error handling pattern further by checking the implementation of LoadNVMeDeviceInfo and the error constant.


Let me verify the consistency of error handling by checking the implementation in the vendor package.


Based on the verification results, I can now provide the final response:

Error handling for NVMe device not found is consistently implemented

The error handling pattern is correctly and consistently implemented across the codebase:

  • The error constant ErrorMessageCannotFindValidNvmeDevice is properly defined in the helper types package
  • The error check using ErrorIsValidNvmeDeviceNotFound is used consistently in multiple locations in engine.go
  • The pattern distinguishes between expected device-not-found scenarios and other unexpected errors
  • Similar error handling is also present in the vendor package's initiator implementation
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for similar error handling patterns in other files
rg "ErrorIsValidNvmeDeviceNotFound" -A 2

Length of output: 1571


Script:

#!/bin/bash
# Check LoadNVMeDeviceInfo implementation and error constant
ast-grep --pattern 'func LoadNVMeDeviceInfo($$$) $$ {
  $$$
}'

# Check for the error constant definition
rg "ErrorMessageCannotFindValidNvmeDevice.*="

Length of output: 248


Script:

#!/bin/bash
# Check the implementation in vendor package
rg -A 10 "func LoadNVMeDeviceInfo" vendor/

# Check for any other usage of this error message
rg -B 2 "cannot find a valid NVMe device"

Length of output: 355

pkg/spdk/engine.go Outdated Show resolved Hide resolved
pkg/spdk/engine.go Outdated Show resolved Hide resolved
pkg/spdk/engine.go Show resolved Hide resolved
pkg/spdk/engine.go Show resolved Hide resolved
pkg/spdk/engine.go Outdated Show resolved Hide resolved
Longhorn 9919

Signed-off-by: Derek Su <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (5)
pkg/spdk/engine.go (5)

117-152: Add godoc comments for the new methods

While the methods are well-structured and have good internal logging, they would benefit from godoc comments explaining their purpose, parameters, and return values.

Add documentation like this:

+// isNewEngine returns true if both IP and TargetIP are empty, indicating a new engine instance
 func (e *Engine) isNewEngine() bool {
     return e.IP == "" && e.TargetIP == ""
 }

+// checkInitiatorAndTargetCreationRequirements determines whether initiator and target instances 
+// need to be created based on the pod IP and provided addresses.
+// Returns (initiatorRequired, targetRequired, error)
 func (e *Engine) checkInitiatorAndTargetCreationRequirements(podIP, initiatorIP, targetIP string) (bool, bool, error) {

448-477: Consider making retry parameters configurable

The retry logic for loading NVMe device info is a good addition, but the magic numbers for retries and intervals should be configurable parameters.

Consider making these parameters configurable:

+const (
+    // DefaultMaxNumRetries is the default number of retries for loading NVMe device info
+    DefaultMaxNumRetries = 3
+    // DefaultRetryInterval is the default interval between retries
+    DefaultRetryInterval = time.Second * 2
+)

 func (e *Engine) handleFrontend(...) {
-    for r := 0; r < maxNumRetries; r++ {
+    retries := DefaultMaxNumRetries
+    interval := DefaultRetryInterval
+    if e.config != nil {
+        retries = e.config.MaxNumRetries
+        interval = e.config.RetryInterval
+    }
+    for r := 0; r < retries; r++ {
         ...
-        time.Sleep(retryInterval)
+        time.Sleep(interval)
     }
 }

Line range hint 2180-2275: Consider breaking down complex target switchover logic

The SwitchOverTarget method handles multiple responsibilities. Consider extracting some logic into helper methods for better maintainability.

Suggested refactoring:

+func (e *Engine) validateSwitchOverPrerequisites(newTargetAddress string) (string, error) {
+    if newTargetAddress == "" {
+        return "", fmt.Errorf("invalid empty target address for engine %s target switchover", e.Name)
+    }
+    
+    podIP, err := commonnet.GetIPForPod()
+    if err != nil {
+        return "", errors.Wrapf(err, "failed to get IP for pod")
+    }
+    
+    return podIP, nil
+}

+func (e *Engine) updateTargetAddresses(newTargetIP string, newTargetPort int32, podIP string) {
+    e.IP = newTargetIP
+    e.Port = newTargetPort
+    
+    if newTargetIP == podIP {
+        e.TargetPort = newTargetPort
+    } else {
+        e.TargetPort = 0
+    }
+}

 func (e *Engine) SwitchOverTarget(spdkClient *spdkclient.Client, newTargetAddress string) (err error) {
-    if newTargetAddress == "" {
-        return fmt.Errorf("invalid empty target address for engine %s target switchover", e.Name)
-    }
-    
-    podIP, err := commonnet.GetIPForPod()
+    podIP, err := e.validateSwitchOverPrerequisites(newTargetAddress)
     if err != nil {
-        return errors.Wrapf(err, "failed to get IP for pod")
+        return err
     }
     ...
-    e.IP = newTargetIP
-    e.Port = newTargetPort
-    if newTargetIP == podIP {
-        e.TargetPort = newTargetPort
-    } else {
-        e.TargetPort = 0
-    }
+    e.updateTargetAddresses(newTargetIP, newTargetPort, podIP)
     ...
 }

2117-2119: Enhance error handling documentation

The comments explain why the engine is not marked as error, but could be more detailed about the retry mechanism and potential failure scenarios.

Add more detailed comments:

-// Engine is still alive and running and is not really in error state.
-// longhorn-manager will retry to suspend the engine.
+// Engine is still operational despite the suspend/resume failure.
+// This is a transient error that can be retried by longhorn-manager.
+// Common causes:
+// - Temporary network issues
+// - Resource contention
+// - Race conditions with other operations
+// The engine will remain in its current state and longhorn-manager
+// will automatically retry the operation with exponential backoff.

Also applies to: 2150-2152


2449-2460: Add validation in releaseTargetPort method

The method could benefit from additional validation to ensure the port is within valid range before attempting release.

Suggested improvement:

 func (e *Engine) releaseTargetPort(superiorPortAllocator *commonbitmap.Bitmap) error {
+    if superiorPortAllocator == nil {
+        return fmt.Errorf("invalid nil port allocator")
+    }
+
     releaseTargetPortRequired := e.TargetPort != 0
 
     if releaseTargetPortRequired {
+        // Validate port is within acceptable range (1-65535)
+        if e.TargetPort < 1 || e.TargetPort > 65535 {
+            return fmt.Errorf("invalid target port %d: must be between 1 and 65535", e.TargetPort)
+        }
+
         if err := superiorPortAllocator.ReleaseRange(e.TargetPort, e.TargetPort); err != nil {
             return errors.Wrapf(err, "failed to release target port %d", e.TargetPort)
         }
     }
     e.TargetPort = 0
 
     return nil
 }
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 4a48c43 and fbe02ea.

📒 Files selected for processing (5)
  • pkg/client/client.go (1 hunks)
  • pkg/spdk/engine.go (29 hunks)
  • pkg/spdk/replica.go (4 hunks)
  • pkg/spdk/server.go (1 hunks)
  • pkg/spdk_test.go (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • pkg/client/client.go
  • pkg/spdk/server.go
  • pkg/spdk/replica.go
🔇 Additional comments (9)
pkg/spdk_test.go (6)

271-271: LGTM: Parameter removal is consistent with the API changes

The removal of the upgradeRequired parameter from EngineCreate aligns with the refactoring mentioned in the AI summary.


354-354: LGTM: Parameter removal is consistent with the API changes

The removal of the upgradeRequired parameter from EngineCreate maintains consistency with the previous change.


530-530: LGTM: Parameter removal is consistent with the API changes

The removal of the upgradeRequired parameter from EngineCreate in the snapshot operations test maintains consistency.


1228-1228: LGTM: Parameter removal is consistent in helper function

The removal of the upgradeRequired parameter from both EngineCreate calls in the revertSnapshot helper function maintains consistency with the API changes.

Also applies to: 1243-1243


Line range hint 1-1354: Consider adding test cases for upgrade scenarios

While the removal of the upgradeRequired parameter simplifies the API, we should ensure that upgrade scenarios are still properly tested through other means.

Let's check if there are any existing upgrade-related tests:

Consider adding dedicated test cases for version upgrade scenarios if they don't exist elsewhere in the codebase.


1354-1354: LGTM: Parameter removal is consistent in target-only test

The removal of the upgradeRequired parameter from EngineCreate in the target-only test case maintains consistency with the API changes.

Let's verify that we haven't missed any instances of EngineCreate calls:

✅ Verification successful

Let me search specifically for EngineCreate calls in the codebase to verify the parameter changes.


Let me verify the EngineCreateRequest structure to confirm the parameter changes.


Let me try searching for the EngineCreateRequest structure using ripgrep instead.


Based on the search results, I can now generate the final response since I have found all the necessary information. The EngineCreateRequest structure shows that upgradeRequired is still a valid field in the protobuf definition, but it's being removed from the test calls. Let me verify all test calls to ensure consistency.


LGTM: All test cases consistently use the updated EngineCreate signature

All test cases in pkg/spdk_test.go have been updated to use the new signature of EngineCreate with false for the salvageRequested parameter (previously upgradeRequired). The changes are consistent across all test cases, maintaining the API compatibility.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any remaining instances of EngineCreate with the old signature
rg "EngineCreate.*,.*,.*,.*,.*,.*,.*,.*,.*true|false\)"

Length of output: 20253


Script:

#!/bin/bash
# Search for EngineCreate calls with surrounding context
rg "EngineCreate\(" -A 2 -B 2

Length of output: 7257


Script:

#!/bin/bash
# Search for EngineCreateRequest structure definition
ast-grep --pattern 'message EngineCreateRequest {
  $$$
}'

Length of output: 60


Script:

#!/bin/bash
# Search for EngineCreateRequest structure definition
rg "EngineCreateRequest" -A 15 -B 2

Length of output: 41171


Script:

#!/bin/bash
# Search for all EngineCreate test calls
rg "EngineCreate\(" pkg/spdk_test.go -A 2 -B 2

Length of output: 1746

pkg/spdk/engine.go (3)

13-13: LGTM: Good addition of multierr package

The addition of go.uber.org/multierr is appropriate for combining multiple errors in the error handling code, particularly in the new port release logic.


45-47: LGTM: Improved field documentation and naming

Good improvements:

  • Added clear documentation for Port and TargetPort fields
  • Renamed dmDeviceBusy to dmDeviceIsBusy for better clarity
  • Added initiator field for better encapsulation of NVMe initiator functionality

Also applies to: 58-59


2412-2443: 🛠️ Refactor suggestion

Improve error handling to collect all errors

The current implementation returns on the first error. Consider collecting all errors to provide a complete picture of what failed during target deletion.

Suggested improvement:

 func (e *Engine) DeleteTarget(spdkClient *spdkclient.Client, superiorPortAllocator *commonbitmap.Bitmap) (err error) {
     e.log.Infof("Deleting target with target port %d", e.TargetPort)
+    var errs []error
 
     if err := spdkClient.StopExposeBdev(e.Nqn); err != nil {
-        return errors.Wrapf(err, "failed to stop expose bdev while deleting target instance for engine %s", e.Name)
+        errs = append(errs, errors.Wrapf(err, "failed to stop expose bdev"))
     }
 
     if err := e.releaseTargetPort(superiorPortAllocator); err != nil {
-        return errors.Wrapf(err, "failed to release target port while deleting target instance for engine %s", e.Name)
+        errs = append(errs, errors.Wrapf(err, "failed to release target port"))
     }
 
     if _, err := spdkClient.BdevRaidDelete(e.Name); err != nil && !jsonrpc.IsJSONRPCRespErrorNoSuchDevice(err) {
-        return errors.Wrapf(err, "failed to delete raid bdev after engine %s while deleting target instance", e.Name)
+        errs = append(errs, errors.Wrapf(err, "failed to delete raid bdev"))
     }
 
     for replicaName, replicaStatus := range e.ReplicaStatusMap {
         if err := disconnectNVMfBdev(spdkClient, replicaStatus.BdevName); err != nil {
             replicaStatus.Mode = types.ModeERR
+            errs = append(errs, errors.Wrapf(err, "failed to disconnect replica %s", replicaName))
         }
         replicaStatus.BdevName = ""
     }
 
+    if len(errs) > 0 {
+        return fmt.Errorf("failed to delete target instance for engine %s: %v", e.Name, errs)
+    }
     return nil
 }

Likely invalid or redundant comment.

Copy link
Contributor

@c3y1huang c3y1huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest running regression and negative test cases to ensure this PR doesn't introduce regressions for v1.7.x and v1.8.0 features.

@innobead
Copy link
Member

innobead commented Dec 9, 2024

Suggest running regression and negative test cases to ensure this PR doesn't introduce regressions for v1.7.x and v1.8.0 features.

For refactoring, do not suggest backporting this to 1.7.x but changing to master/1.8 is good enough.

@derekbit
Copy link
Member Author

derekbit commented Dec 9, 2024

Suggest running regression and negative test cases to ensure this PR doesn't introduce regressions for v1.7.x and v1.8.0 features.

We won't backport it to v1.7.x. Currently, there are no regression test cases, but I've tested the attachment, detachment and salvage.

@c3y1huang
Copy link
Contributor

c3y1huang commented Dec 9, 2024

For refactoring, do not suggest backporting this to 1.7.x but changing to master/1.8 is good enough.

We won't backport it to v1.7.x. Currently, there are no regression test cases, but I've tested the attachment, detachment and salvage.

Got it. We should still run regression and negative tests to ensure this change doesn't impact the other features in v1.8.0.

@derekbit
Copy link
Member Author

derekbit commented Dec 9, 2024

For refactoring, do not suggest backporting this to 1.7.x but changing to master/1.8 is good enough.

We won't backport it to v1.7.x. Currently, there are no regression test cases, but I've tested the attachment, detachment and salvage.

Got it. We should still run regression and negative tests to ensure this change doesn't impact the other features in v1.8.0.

@c3y1huang
Passed the tests of robot negative (cluster_restart, kubelet_restart, node_reboot) and v2 regression testing.

@derekbit derekbit requested a review from c3y1huang December 9, 2024 13:05
@derekbit derekbit merged commit c936355 into longhorn:main Dec 10, 2024
8 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants