-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: always ensure endpoint creation #176
Conversation
Warning Rate limit exceeded@derekbit has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 22 minutes and 40 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (6)
WalkthroughThe pull request introduces various modifications across multiple files in the NVMe package and related components. Key changes include updates to method signatures, improved error handling, and logging enhancements. New methods for device management are added in the Changes
Assessment against linked issues
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #176 +/- ##
==========================================
- Coverage 22.89% 22.88% -0.02%
==========================================
Files 34 34
Lines 4813 4842 +29
==========================================
+ Hits 1102 1108 +6
- Misses 3537 3557 +20
- Partials 174 177 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (12)
pkg/nvme/initiator.go (2)
344-354
: Consider adding timeout configuration.The
waitAndLoadNVMeDeviceInfoWithoutLock
uses hardcoded retry values. Consider making these configurable for different environments.+const ( + defaultMaxWaitDeviceRetries = 60 + defaultWaitDeviceInterval = time.Second +) + +func (i *Initiator) waitAndLoadNVMeDeviceInfoWithoutLock( + transportAddress, + transportServiceID string, + maxRetries int, + retryInterval time.Duration, +) (err error) { - for r := 0; r < maxNumWaitDeviceRetries; r++ { + if maxRetries <= 0 { + maxRetries = defaultMaxWaitDeviceRetries + } + if retryInterval <= 0 { + retryInterval = defaultWaitDeviceInterval + } + for r := 0; r < maxRetries; r++ {
544-553
: Consider using os.PathError for more specific error handling.The
isEndpointExist
function could benefit from more specific error handling usingos.PathError
.func (i *Initiator) isEndpointExist() (bool, error) { _, err := os.Stat(i.Endpoint) - if err != nil { - if os.IsNotExist(err) { - return false, nil - } - return false, err + if err != nil { + if pathErr, ok := err.(*os.PathError); ok { + if os.IsNotExist(pathErr) { + return false, nil + } + return false, fmt.Errorf("path error checking endpoint: %v", pathErr) + } + return false, fmt.Errorf("unexpected error checking endpoint: %v", err) } return true, nil }pkg/spdk/client/advanced.go (4)
Line range hint
11-39
: Potential Naming Conflict with Default NamesWhen
name
is empty, the function usesfilepath.Base(devicePath)
as thename
for both the AIO bdev and the LVS. If multiple devices have the same filename (e.g.,disk.img
), this could lead to name collisions in SPDK, causing unexpected behavior.Consider generating unique names by incorporating additional identifiers like a UUID or the full device path to prevent potential conflicts.
Line range hint
15-17
: Handle Existing Bdev Creation Errors GracefullyIf
BdevAioCreate
fails because the bdev already exists, the function returns an error. This could happen if the device was previously added and not properly cleaned up.Before attempting to create the bdev, check if it already exists using
BdevGetBdevs
or handle the specific error indicating an existing bdev to avoid unnecessary failures.
Line range hint
43-54
: Ensure Comprehensive Cleanup on Partial DeletionIn
DeleteDevice
, ifBdevLvolDeleteLvstore
succeeds butBdevAioDelete
fails, the AIO bdev remains, leaving residual resources.Implement additional error handling or a rollback mechanism to ensure that both the LVS and the AIO bdev are deleted successfully, or provide clear logging and recovery steps if partial deletion occurs.
Line range hint
56-71
: Handle Nil Transport List AppropriatelyThe condition
if nvmfTransportList != nil && len(nvmfTransportList) == 0
may skip transport creation ifnvmfTransportList
isnil
. This could happen ifNvmfGetTransports
returnsnil
without an error, causing subsequent operations to fail due to missing transport.Modify the condition to create the transport when
nvmfTransportList
isnil
or empty.Apply this diff:
if err != nil { return err } -if nvmfTransportList != nil && len(nvmfTransportList) == 0 { +if nvmfTransportList == nil || len(nvmfTransportList) == 0 { if _, err := c.NvmfCreateTransport(spdktypes.NvmeTransportTypeTCP); err != nil && !jsonrpc.IsJSONRPCRespErrorTransportTypeAlreadyExists(err) { return err } }pkg/types/types.go (1)
79-85
: Consistent Case Handling in Error ChecksIn
ErrorIsDeviceOrResourceBusy
, the error message is converted to lowercase, butErrorMessageDeviceOrResourceBusy
is not, which might lead to case sensitivity issues.Convert both strings to lowercase for a reliable comparison.
Apply this diff:
func ErrorIsDeviceOrResourceBusy(err error) bool { - return strings.Contains(strings.ToLower(err.Error()), ErrorMessageDeviceOrResourceBusy) + return strings.Contains(strings.ToLower(err.Error()), strings.ToLower(ErrorMessageDeviceOrResourceBusy)) } func ErrorIsValidNvmeDeviceNotFound(err error) bool { - return strings.Contains(err.Error(), ErrorMessageCannotFindValidNvmeDevice) + return strings.Contains(err.Error(), ErrorMessageCannotFindValidNvmeDevice) }Ensure similar consistency in
ErrorIsValidNvmeDeviceNotFound
.pkg/nvme/nvme.go (3)
73-78
: Enhance Error Context When Listing NVMe DevicesIf
listRecognizedNvmeDevices
fails, the returned error lacks context about the operation being performed.Wrap the error to provide additional context for easier troubleshooting.
Apply this diff:
nvmeDevices, err := listRecognizedNvmeDevices(executor) if err != nil { - return nil, err + return nil, errors.Wrap(err, "failed to list recognized NVMe devices") }
80-88
: Provide Detailed Error Messages for Subsystem IssuesThe error messages when no subsystems are found or multiple subsystems are detected lack specific details, which can hinder debugging.
Include the device path and any relevant identifiers in the error messages for better clarity.
Apply this diff:
if len(subsystems) == 0 { - return nil, fmt.Errorf("no subsystem found for NVMe device %s", d.DevicePath) + return nil, fmt.Errorf("no NVMe subsystem found for device %s", d.DevicePath) } if len(subsystems) > 1 { - return nil, fmt.Errorf("multiple subsystems found for NVMe device %s", d.DevicePath) + return nil, fmt.Errorf("multiple NVMe subsystems found for device %s", d.DevicePath) }
166-167
: Enhance Error Messaging for Device Not FoundThe error message uses
types.ErrorMessageCannotFindValidNvmeDevice
but may not include dynamic details like NQN and address, making it less informative.Include the NQN and address in the error message to aid in diagnosing the issue.
Apply this diff:
return nil, fmt.Errorf("%s with subsystem NQN %s and address %s:%s", types.ErrorMessageCannotFindValidNvmeDevice, nqn, ip, port)pkg/util/device.go (1)
Line range hint
56-62
: EnsureremoveAsync
Always Sends to ChannelIn the
remove
function, ifremoveAsync
encounters an error before sending to thedone
channel, it might cause a deadlock.Guarantee that
removeAsync
sends an error ornil
to the channel in all code paths.Verify that
removeAsync
does not exit without sending a value todone
.pkg/nvme/nvmecli.go (1)
Line range hint
1-378
: Consider documenting the connection to the main issueWhile these changes improve code clarity, their connection to fixing the volume detaching/attaching loop issue (longhorn/longhorn#9919) isn't immediately clear. Consider:
- Adding comments explaining how these changes contribute to the fix
- Ensuring that the core fixes for the issue are included in other files
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (6)
pkg/nvme/initiator.go
(15 hunks)pkg/nvme/nvme.go
(4 hunks)pkg/nvme/nvmecli.go
(3 hunks)pkg/spdk/client/advanced.go
(4 hunks)pkg/types/types.go
(3 hunks)pkg/util/device.go
(1 hunks)
🔇 Additional comments (5)
pkg/nvme/initiator.go (2)
254-256
: Good addition of parameter validation.
The validation of transport parameters before proceeding with the initiator start is a good defensive programming practice.
694-704
: Good practice: Reloading device numbers after device creation.
The addition of device number reloading after creating the linear device mapper is a good practice to ensure consistency.
pkg/nvme/nvmecli.go (3)
157-157
: LGTM: Simplified function signature
Good improvement. The function only processes JSON data and doesn't need the executor parameter.
172-172
: LGTM: Consistent with V1 implementation
Good change. The function matches the V1 implementation in processing only JSON data without needing the executor.
200-200
: LGTM: Improved function naming
The rename from listControllers
to listRecognizedNvmeDevices
better reflects the function's purpose. However, let's verify that all callers have been updated.
✅ Verification successful
Function rename is properly implemented across the codebase
The verification confirms that:
- No references to the old function name
listControllers
exist in the codebase - The new function name
listRecognizedNvmeDevices
is used consistently in both its definition (pkg/nvme/nvmecli.go
) and its only caller (pkg/nvme/nvme.go
)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for any remaining references to the old function name
rg "listControllers" --type go
# Search for references to the new function name to verify updates
rg "listRecognizedNvmeDevices" --type go
Length of output: 248
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (4)
pkg/nvme/initiator.go (4)
398-405
: Consider enhancing error messages for device busy scenarios.While the error handling logic is correct, consider adding more context to help diagnose issues:
if types.ErrorIsDeviceOrResourceBusy(err) { if returnErrorForBusyDevice { - return true, err + return true, errors.Wrapf(err, "device %s is busy and cannot be removed", i.Name) } dmDeviceIsBusy = true } else { - return false, err + return false, errors.Wrapf(err, "failed to remove device %s", i.Name) }🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 398-405: pkg/nvme/initiator.go#L398-L405
Added lines #L398 - L405 were not covered by tests
326-336
: Consider adding validation for endpoint creation status.While the endpoint creation logic is improved, consider adding validation after makeEndpoint:
i.logger.Infof("Creating endpoint %v", i.Endpoint) exist, err := i.isEndpointExist() if err != nil { return dmDeviceIsBusy, errors.Wrapf(err, "failed to check if endpoint %v exists for NVMe initiator %s", i.Endpoint, i.Name) } if exist { i.logger.Infof("Skipping endpoint %v creation for NVMe initiator", i.Endpoint) } else { if err := i.makeEndpoint(); err != nil { return dmDeviceIsBusy, err } + // Validate endpoint creation + if exist, err := i.isEndpointExist(); err != nil || !exist { + return dmDeviceIsBusy, errors.Wrapf(err, "endpoint %v creation validation failed", i.Endpoint) + } }🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 329-330: pkg/nvme/initiator.go#L329-L330
Added lines #L329 - L330 were not covered by tests
[warning] 332-332: pkg/nvme/initiator.go#L332
Added line #L332 was not covered by tests
[warning] 335-335: pkg/nvme/initiator.go#L335
Added line #L335 was not covered by tests
Line range hint
247-341
: Critical: Improve test coverage for core initiator operations.The Start method and its dependencies have significant test coverage gaps. Given this is a critical path for volume operations, consider adding tests for:
- Transport parameter validation
- Device busy scenarios
- Endpoint creation/validation
- Error handling paths
This is particularly important as this code fixes a volume stuck issue.
Would you like me to help create a test plan or generate test cases for these scenarios?
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 276-277: pkg/nvme/initiator.go#L276-L277
Added lines #L276 - L277 were not covered by tests
[warning] 283-283: pkg/nvme/initiator.go#L283
Added line #L283 was not covered by tests
[warning] 290-290: pkg/nvme/initiator.go#L290
Added line #L290 was not covered by tests
[warning] 297-297: pkg/nvme/initiator.go#L297
Added line #L297 was not covered by tests
[warning] 302-302: pkg/nvme/initiator.go#L302
Added line #L302 was not covered by tests
[warning] 308-308: pkg/nvme/initiator.go#L308
Added line #L308 was not covered by tests
[warning] 310-310: pkg/nvme/initiator.go#L310
Added line #L310 was not covered by tests
[warning] 312-313: pkg/nvme/initiator.go#L312-L313
Added lines #L312 - L313 were not covered by tests
[warning] 322-322: pkg/nvme/initiator.go#L322
Added line #L322 was not covered by tests
[warning] 329-330: pkg/nvme/initiator.go#L329-L330
Added lines #L329 - L330 were not covered by tests
[warning] 332-332: pkg/nvme/initiator.go#L332
Added line #L332 was not covered by tests
[warning] 335-335: pkg/nvme/initiator.go#L335
Added line #L335 was not covered by tests
[warning] 351-351: pkg/nvme/initiator.go#L351
Added line #L351 was not covered by tests
[warning] 361-363: pkg/nvme/initiator.go#L361-L363
Added lines #L361 - L363 were not covered by tests
[warning] 368-370: pkg/nvme/initiator.go#L368-L370
Added lines #L368 - L370 were not covered by tests
Line range hint
1-709
: Consider adding retry limits and deadlock prevention.While the changes improve endpoint creation reliability, consider adding:
- Maximum retry limits for the entire operation to prevent infinite loops
- Deadlock detection for device busy scenarios
- Metrics/monitoring for tracking device states and operation durations
This would help prevent and diagnose similar issues in the future.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 276-277: pkg/nvme/initiator.go#L276-L277
Added lines #L276 - L277 were not covered by tests
[warning] 283-283: pkg/nvme/initiator.go#L283
Added line #L283 was not covered by tests
[warning] 290-290: pkg/nvme/initiator.go#L290
Added line #L290 was not covered by tests
[warning] 297-297: pkg/nvme/initiator.go#L297
Added line #L297 was not covered by tests
[warning] 302-302: pkg/nvme/initiator.go#L302
Added line #L302 was not covered by tests
[warning] 308-308: pkg/nvme/initiator.go#L308
Added line #L308 was not covered by tests
[warning] 310-310: pkg/nvme/initiator.go#L310
Added line #L310 was not covered by tests
[warning] 312-313: pkg/nvme/initiator.go#L312-L313
Added lines #L312 - L313 were not covered by tests
[warning] 322-322: pkg/nvme/initiator.go#L322
Added line #L322 was not covered by tests
[warning] 329-330: pkg/nvme/initiator.go#L329-L330
Added lines #L329 - L330 were not covered by tests
[warning] 332-332: pkg/nvme/initiator.go#L332
Added line #L332 was not covered by tests
[warning] 335-335: pkg/nvme/initiator.go#L335
Added line #L335 was not covered by tests
[warning] 351-351: pkg/nvme/initiator.go#L351
Added line #L351 was not covered by tests
[warning] 361-363: pkg/nvme/initiator.go#L361-L363
Added lines #L361 - L363 were not covered by tests
[warning] 368-370: pkg/nvme/initiator.go#L368-L370
Added lines #L368 - L370 were not covered by tests
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (1)
pkg/nvme/initiator.go
(15 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
pkg/nvme/initiator.go
[warning] 168-168: pkg/nvme/initiator.go#L168
Added line #L168 was not covered by tests
[warning] 255-256: pkg/nvme/initiator.go#L255-L256
Added lines #L255 - L256 were not covered by tests
[warning] 276-277: pkg/nvme/initiator.go#L276-L277
Added lines #L276 - L277 were not covered by tests
[warning] 283-283: pkg/nvme/initiator.go#L283
Added line #L283 was not covered by tests
[warning] 290-290: pkg/nvme/initiator.go#L290
Added line #L290 was not covered by tests
[warning] 297-297: pkg/nvme/initiator.go#L297
Added line #L297 was not covered by tests
[warning] 302-302: pkg/nvme/initiator.go#L302
Added line #L302 was not covered by tests
[warning] 308-308: pkg/nvme/initiator.go#L308
Added line #L308 was not covered by tests
[warning] 310-310: pkg/nvme/initiator.go#L310
Added line #L310 was not covered by tests
[warning] 312-313: pkg/nvme/initiator.go#L312-L313
Added lines #L312 - L313 were not covered by tests
[warning] 322-322: pkg/nvme/initiator.go#L322
Added line #L322 was not covered by tests
[warning] 329-330: pkg/nvme/initiator.go#L329-L330
Added lines #L329 - L330 were not covered by tests
[warning] 332-332: pkg/nvme/initiator.go#L332
Added line #L332 was not covered by tests
[warning] 335-335: pkg/nvme/initiator.go#L335
Added line #L335 was not covered by tests
[warning] 351-351: pkg/nvme/initiator.go#L351
Added line #L351 was not covered by tests
[warning] 361-363: pkg/nvme/initiator.go#L361-L363
Added lines #L361 - L363 were not covered by tests
[warning] 368-370: pkg/nvme/initiator.go#L368-L370
Added lines #L368 - L370 were not covered by tests
[warning] 398-405: pkg/nvme/initiator.go#L398-L405
Added lines #L398 - L405 were not covered by tests
[warning] 417-417: pkg/nvme/initiator.go#L417
Added line #L417 was not covered by tests
[warning] 529-529: pkg/nvme/initiator.go#L529
Added line #L529 was not covered by tests
[warning] 550-550: pkg/nvme/initiator.go#L550
Added line #L550 was not covered by tests
[warning] 552-552: pkg/nvme/initiator.go#L552
Added line #L552 was not covered by tests
[warning] 565-566: pkg/nvme/initiator.go#L565-L566
Added lines #L565 - L566 were not covered by tests
[warning] 687-687: pkg/nvme/initiator.go#L687
Added line #L687 was not covered by tests
[warning] 689-692: pkg/nvme/initiator.go#L689-L692
Added lines #L689 - L692 were not covered by tests
[warning] 695-699: pkg/nvme/initiator.go#L695-L699
Added lines #L695 - L699 were not covered by tests
[warning] 701-705: pkg/nvme/initiator.go#L701-L705
Added lines #L701 - L705 were not covered by tests
🔇 Additional comments (1)
pkg/nvme/initiator.go (1)
254-256
: LGTM: Good input validation.
The added validation for transport parameters helps prevent invalid initiator configurations.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 255-256: pkg/nvme/initiator.go#L255-L256
Added lines #L255 - L256 were not covered by tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (2)
pkg/nvme/initiator.go (2)
30-36
: Consider adding documentation for retry constants.The retry parameters look reasonable, but adding documentation would help explain their purpose and chosen values.
const ( + // maxConnectTargetRetries is the maximum number of attempts to connect to a target maxConnectTargetRetries = 15 + // retryConnectTargetInterval is the duration to wait between connection attempts retryConnectTargetInterval = 1 * time.Second + // maxWaitDeviceRetries is the maximum number of attempts to wait for device creation maxWaitDeviceRetries = 60 + // waitDeviceInterval is the duration to wait between device checks waitDeviceInterval = 1 * time.Second )
697-712
: Enhance error handling in reloadLinearDmDevice.Consider wrapping errors with additional context for better debugging.
err = util.DmsetupReload(i.Name, table, i.executor) if err != nil { - return err + return errors.Wrapf(err, "failed to reload dm device %s with table %s", i.Name, table) } // Reload the device numbers dmDevPath := getDmDevicePath(i.Name) major, minor, err := util.GetDeviceNumbers(dmDevPath, i.executor) if err != nil { - return err + return errors.Wrapf(err, "failed to get device numbers for %s", dmDevPath) }
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (1)
pkg/nvme/initiator.go
(22 hunks)
🧰 Additional context used
📓 Learnings (1)
pkg/nvme/initiator.go (1)
Learnt from: derekbit
PR: longhorn/go-spdk-helper#176
File: pkg/nvme/initiator.go:344-354
Timestamp: 2024-12-08T07:56:35.135Z
Learning: Retry parameters such as `maxNumWaitDeviceRetries` and `waitDeviceInterval` are defined in another Go file and are used in `pkg/nvme/initiator.go`. Before suggesting to extract magic numbers, check if they are already defined and used elsewhere in the codebase.
🔇 Additional comments (2)
pkg/nvme/initiator.go (2)
346-355
: Add test coverage for the new waitAndLoadNVMeDeviceInfoWithoutLock method.
This method is critical for handling device info loading with retries, but it lacks test coverage according to the codecov report.
Would you like me to help generate unit tests for this method? The tests should cover:
- Successful device info loading
- Retry behavior with temporary failures
- Permanent failure cases
716-718
: Verify path handling security.
The getDmDevicePath function handles device paths. While it uses filepath.Join which is good, we should verify that the name parameter is properly sanitized to prevent path traversal attacks.
Consider adding input validation for device names:
func validateDeviceName(name string) error {
if strings.Contains(name, "/") || strings.Contains(name, "..") {
return fmt.Errorf("invalid device name %s: must not contain path separators", name)
}
return nil
}
✅ Verification successful
The previous search didn't yield results, so let's try a different approach to understand:
- How getDmDevicePath is used in the codebase
- If there's any validation before calling it
- How device names are handled in the nvme package
This will help us determine if there are any security concerns with the path handling.
From the previous results, we see that getDmDevicePath is only used internally with i.Name. We need to understand how i.Name is validated when the Initiator is created and used. This will help us determine if there are sufficient safeguards against path traversal.
From previous results, we see the Name field is used extensively with dmsetup operations. We need to find:
- How the Name is validated during Initiator creation
- Any validation in dmsetup operations
- Any validation in the underlying util package
This will help us determine if there are sufficient safeguards against path traversal before the name is used in getDmDevicePath.
Device name validation is sufficient for security
The code has adequate safeguards against path traversal attacks:
- The Name field is validated during Initiator creation, rejecting empty values
- The name comes from a controlled source (Initiator struct) and is not directly user-supplied
- The device mapper (dmsetup) commands use the name as a direct argument, not as a path component
- filepath.Join in getDmDevicePath is used correctly to create the final path under /dev/mapper
The current implementation is secure because:
- Device mapper names are handled as identifiers, not paths
- The fixed base path (/dev/mapper) cannot be escaped due to filepath.Join's proper path handling
- All device operations are performed through the system's device mapper interface
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for path validation in the codebase
rg -A 3 "func.*validateDeviceName|func.*sanitizeDeviceName" --type go
# Check for any existing path traversal tests
rg -l "TestPathTraversal|test.*path.*traversal" --type go
Length of output: 131
Script:
#!/bin/bash
# Let's check how getDmDevicePath is used in the codebase
rg -A 5 "getDmDevicePath" --type go
# Check if there's any validation before calling getDmDevicePath
ast-grep --pattern 'getDmDevicePath($_)'
# Look for any device name validation patterns in the code
rg -A 3 "name.*=|name :=" --type go pkg/nvme/
Length of output: 1899
Script:
#!/bin/bash
# Let's check the Initiator struct and how Name is set/validated
ast-grep --pattern 'type Initiator struct {
$$$
}'
# Check for any validation of the Name field
rg -B 5 -A 5 "i\.Name" pkg/nvme/initiator.go
# Look for where the Initiator struct is created/initialized
ast-grep --pattern 'NewInitiator($$$)'
Length of output: 10512
Script:
#!/bin/bash
# Let's find where the Initiator is created and how Name is validated
rg -B 5 -A 10 "func NewInitiator" pkg/nvme/initiator.go
# Check for any name validation in dmsetup operations
rg -A 3 "DmsetupCreate|DmsetupRemove|DmsetupReload" pkg/nvme/
# Look for any name validation in the util package
rg -A 3 "func.*Dmsetup" --type go
Length of output: 2787
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (7)
pkg/nvme/initiator.go (3)
256-258
: Improve error message clarity.The error message could be more specific about which parameter is empty.
- return false, fmt.Errorf("invalid transportAddress %s and transportServiceID %s for starting initiator %s", transportAddress, transportServiceID, i.Name) + return false, fmt.Errorf("empty transportAddress or transportServiceID (got: address=%q, service_id=%q) for starting initiator %s", transportAddress, transportServiceID, i.Name)
357-378
: Enhance error logging in connectTarget.Consider adding more context to error logs to aid debugging.
- i.logger.WithError(err).Warn("Failed to discover target") + i.logger.WithError(err).Warnf("Failed to discover target (attempt %d/%d) for address %s:%s", r+1, maxRetries, transportAddress, transportServiceID) - i.logger.WithError(err).Warn("Failed to connect target") + i.logger.WithError(err).Warnf("Failed to connect target (attempt %d/%d) for address %s:%s", r+1, maxRetries, transportAddress, transportServiceID)
416-418
: Consider unwrapping error before checking type.The error from
DisconnectTarget
is wrapped, which might affect error type checking in the caller.- err = DisconnectTarget(i.SubsystemNQN, i.executor) - if err != nil { - return dmDeviceIsBusy, errors.Wrapf(err, "failed to disconnect target for NVMe-oF initiator %s", i.Name) + if err := DisconnectTarget(i.SubsystemNQN, i.executor); err != nil { + return dmDeviceIsBusy, fmt.Errorf("failed to disconnect target for NVMe-oF initiator %s: %w", i.Name, err)pkg/spdk/client/advanced.go (4)
Line range hint
11-42
: Add input validation for devicePath and clusterSize.Consider validating input parameters to ensure robustness.
func (c *Client) AddDevice(devicePath, name string, clusterSize uint32) (bdevAioName, lvsName, lvsUUID string, err error) { + if devicePath == "" { + return "", "", "", fmt.Errorf("devicePath cannot be empty") + } + if clusterSize == 0 { + return "", "", "", fmt.Errorf("clusterSize must be greater than 0") + } + // Use the file name as aio name and lvs name if name is not specified.
Line range hint
43-55
: Add input validation and improve error handling.Consider validating input parameters and handling errors more gracefully.
func (c *Client) DeleteDevice(bdevAioName, lvsName string) (err error) { + if bdevAioName == "" || lvsName == "" { + return fmt.Errorf("bdevAioName and lvsName cannot be empty") + } + if _, err := c.BdevLvolDeleteLvstore(lvsName, ""); err != nil { + // If LVS doesn't exist, continue with block device deletion + if !jsonrpc.IsJSONRPCRespErrorLvolstoreNotFound(err) { return err + } }
Line range hint
56-82
: Add input validation and improve error handling.Consider validating input parameters and handling cleanup on partial failures.
func (c *Client) StartExposeBdev(nqn, bdevName, nguid, ip, port string) error { + if nqn == "" || bdevName == "" || ip == "" || port == "" { + return fmt.Errorf("nqn, bdevName, ip, and port cannot be empty") + } + nvmfTransportList, err := c.NvmfGetTransports("", "") if err != nil { return err } + + var cleanup func() + defer func() { + if err != nil && cleanup != nil { + cleanup() + } + }() + if nvmfTransportList != nil && len(nvmfTransportList) == 0 { if _, err := c.NvmfCreateTransport(spdktypes.NvmeTransportTypeTCP); err != nil && !jsonrpc.IsJSONRPCRespErrorTransportTypeAlreadyExists(err) { return err } } + if _, err := c.NvmfCreateSubsystem(nqn); err != nil { return err } + cleanup = func() { c.NvmfDeleteSubsystem(nqn, "") }
Line range hint
83-117
: Improve error handling for cleanup operations.Consider continuing cleanup even if some operations fail.
func (c *Client) StopExposeBdev(nqn string) error { + var errs []error + subsystemList, err := c.NvmfGetSubsystems("", "") if err != nil { return err } // ... existing code ... for _, l := range listenerList { if _, err := c.NvmfSubsystemRemoveListener(nqn, l.Address.Traddr, l.Address.Trsvcid, l.Address.Trtype, l.Address.Adrfam); err != nil { - return err + errs = append(errs, fmt.Errorf("failed to remove listener: %w", err)) + continue } } // ... similar changes for namespace removal ... + if len(errs) > 0 { + return fmt.Errorf("cleanup errors: %v", errs) + }
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (6)
pkg/nvme/initiator.go
(22 hunks)pkg/nvme/nvme.go
(4 hunks)pkg/nvme/nvmecli.go
(3 hunks)pkg/spdk/client/advanced.go
(4 hunks)pkg/types/types.go
(3 hunks)pkg/util/device.go
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- pkg/util/device.go
- pkg/types/types.go
- pkg/nvme/nvmecli.go
🧰 Additional context used
📓 Learnings (1)
pkg/nvme/initiator.go (2)
Learnt from: derekbit
PR: longhorn/go-spdk-helper#176
File: pkg/nvme/initiator.go:328-339
Timestamp: 2024-12-08T15:56:31.575Z
Learning: In the NVMe-oF initiator code (`pkg/nvme/initiator.go`), additional locks within methods like `Start` are unnecessary because locking is handled in the caller side.
Learnt from: derekbit
PR: longhorn/go-spdk-helper#176
File: pkg/nvme/initiator.go:344-354
Timestamp: 2024-12-08T07:56:35.135Z
Learning: Retry parameters such as `maxNumWaitDeviceRetries` and `waitDeviceInterval` are defined in another Go file and are used in `pkg/nvme/initiator.go`. Before suggesting to extract magic numbers, check if they are already defined and used elsewhere in the codebase.
🔇 Additional comments (8)
pkg/nvme/initiator.go (6)
6-6
: LGTM! Improved constant naming and imports.
The constant renames make their purpose clearer, and the addition of path/filepath
ensures cross-platform path handling.
Also applies to: 30-36
57-60
: LGTM! Improved error message clarity.
The error message now correctly specifies "NVMe-oF initiator" for better context.
Line range hint 248-343
: LGTM! Improved robustness of Start function.
The changes enhance reliability by:
- Validating transport parameters
- Adding structured logging
- Ensuring clean state before starting
- Adding retry mechanism for target connection
346-355
: LGTM! Good encapsulation of retry logic.
The function effectively encapsulates the retry logic for loading NVMe device info.
429-456
: LGTM! Well-implemented getter methods.
The getter methods follow Go conventions and provide clear access to internal state.
716-717
: LGTM! Improved path handling.
Using filepath.Join
ensures correct path handling across different platforms.
pkg/nvme/nvme.go (2)
73-87
: LGTM! Improved error handling and messages.
The changes enhance error handling with more descriptive messages and better subsystem validation.
166-166
: LGTM! Improved error message consistency.
Using types.ErrorMessageCannotFindValidNvmeDevice
ensures consistent error messaging across the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (6)
pkg/nvme/initiator.go
(21 hunks)pkg/nvme/nvme.go
(4 hunks)pkg/nvme/nvmecli.go
(3 hunks)pkg/spdk/client/advanced.go
(4 hunks)pkg/types/types.go
(3 hunks)pkg/util/device.go
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- pkg/util/device.go
- pkg/nvme/nvme.go
- pkg/nvme/nvmecli.go
- pkg/spdk/client/advanced.go
- pkg/types/types.go
🧰 Additional context used
📓 Learnings (1)
pkg/nvme/initiator.go (2)
Learnt from: derekbit
PR: longhorn/go-spdk-helper#176
File: pkg/nvme/initiator.go:328-339
Timestamp: 2024-12-08T15:56:31.575Z
Learning: In the NVMe-oF initiator code (`pkg/nvme/initiator.go`), additional locks within methods like `Start` are unnecessary because locking is handled in the caller side.
Learnt from: derekbit
PR: longhorn/go-spdk-helper#176
File: pkg/nvme/initiator.go:344-354
Timestamp: 2024-12-08T07:56:35.135Z
Learning: Retry parameters such as `maxNumWaitDeviceRetries` and `waitDeviceInterval` are defined in another Go file and are used in `pkg/nvme/initiator.go`. Before suggesting to extract magic numbers, check if they are already defined and used elsewhere in the codebase.
🔇 Additional comments (5)
pkg/nvme/initiator.go (5)
30-36
: LGTM! Improved constant naming for clarity.
The renamed constants better reflect their purpose and usage in the code.
256-258
: LGTM! Added parameter validation.
Early validation of required parameters improves error handling and prevents downstream issues.
274-283
: LGTM! Enhanced device initialization checks.
The code now properly handles cases where the initiator is already launched, with appropriate checks for parameter matching and endpoint loading.
702-712
: LGTM! Added device numbers reload.
Properly reloading device numbers after device mapper operations ensures consistency and prevents potential device mapping issues.
716-717
: LGTM! Improved path handling.
Using filepath.Join
for path construction is the correct approach for cross-platform compatibility and path manipulation safety.
- Refactor initiator and nvme codes - Improve log messages - Update API docs Longhorn 9919 Signed-off-by: Derek Su <[email protected]>
A linear dm device can be reused if it is somehow not deleted before. The situation is usually caused by "Device or Resource Busy" error. While creating an engine and reusing the existing dm device, it still needs to load the device numbers of the dm device. Longhorn 9919 Signed-off-by: Derek Su <[email protected]>
Longhorn 9919 Signed-off-by: Derek Su <[email protected]>
Longhorn 9919 Signed-off-by: Derek Su <[email protected]>
Which issue(s) this PR fixes:
Issue longhorn/longhorn#9919
Signed-off-by: Derek Su [email protected]
What this PR does / why we need it:
Special notes for your reviewer:
Additional documentation or context