-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(auto-salvage): v2 support #222
Conversation
4acf3c9
to
0d205b4
Compare
This pull request is now in conflict. Could you fix it @c3y1huang? 🙏 |
b411222
to
80dd23b
Compare
c653fcf
to
743f3f2
Compare
743f3f2
to
1ad077c
Compare
This pull request is now in conflict. Could you fix it @c3y1huang? 🙏 |
1ad077c
to
ff408d4
Compare
WalkthroughThe pull request introduces several modifications to the Changes
Assessment against linked issues
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
ff408d4
to
e9dafb6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Outside diff range and nitpick comments (4)
pkg/spdk/replica.go (2)
Line range hint
1803-1804
: ValidatesnapshotName
parameter to prevent invalid or malicious inputEnsure that the
snapshotName
parameter is validated to prevent invalid or harmful snapshot names. Invalid names could cause issues during logical volume operations. Consider adding validation to enforce allowed characters and length constraints.
Line range hint
1965-1967
: Verify backup URL manipulation when inspecting the last restored backupIn the
canDoIncrementalRestore
function, replacingrequestedBackupName
withrestore.LastRestored
inbackupURL
assumes thatrequestedBackupName
appears only once and exclusively represents the backup name. IfrequestedBackupName
occurs multiple times or is a substring elsewhere in the URL, this could lead to incorrect backup URLs. Consider parsing the URL and explicitly replacing only the backup name component to ensure accuracy.pkg/spdk/engine.go (2)
Line range hint
758-766
: Provide additional context when marking replicas as errorIn
checkAndUpdateInfoFromReplicaNoLock
, ifReplicaGet
returns an error, the replica mode is set toERR
. To aid in debugging, consider enhancing the log message with additional context about the error. This will help in identifying the root cause during troubleshooting.Apply this diff to improve the logging:
if err != nil { - e.log.WithError(err).Warnf("Failed to get replica %s with address %s, mark the mode from %v to ERR", replicaName, address, e.ReplicaModeMap[replicaName]) + e.log.WithError(err).Warnf("Failed to get replica %s with address %s due to %v; marking the mode from %v to ERR", replicaName, address, err, e.ReplicaModeMap[replicaName]) e.ReplicaModeMap[replicaName] = types.ModeERR return }
Line range hint
1329-1335
: Ensure proper error handling when ancestor snapshots are not foundIn
getRebuildingSnapshotList
, ifancestorSnapshotName
orlatestSnapshotName
is empty, an error is returned. Verify that this condition is correctly handled and consider providing a more descriptive error message to clarify why the ancestor or latest snapshot could not be found.Apply this diff to enhance the error message:
if ancestorSnapshotName == "" || latestSnapshotName == "" { - return []*api.Lvol{}, fmt.Errorf("cannot find the ancestor snapshot %s or latest snapshot %s from RW replica %s snapshot map during engine replica add", ancestorSnapshotName, latestSnapshotName, srcReplicaName) + return []*api.Lvol{}, fmt.Errorf("cannot find the ancestor snapshot or latest snapshot from RW replica %s snapshot map during engine replica add; ancestorSnapshotName: '%s', latestSnapshotName: '%s'", srcReplicaName, ancestorSnapshotName, latestSnapshotName) }
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
⛔ Files ignored due to path filters (8)
go.mod
is excluded by!go.mod
go.sum
is excluded by!**/*.sum
,!go.sum
vendor/github.com/longhorn/go-common-libs/utils/misc.go
is excluded by!vendor/**
vendor/github.com/longhorn/types/pkg/generated/spdkrpc/spdk.pb.go
is excluded by!**/*.pb.go
,!**/generated/**
,!vendor/**
vendor/golang.org/x/exp/LICENSE
is excluded by!vendor/**
vendor/golang.org/x/exp/PATENTS
is excluded by!vendor/**
vendor/golang.org/x/exp/constraints/constraints.go
is excluded by!vendor/**
vendor/modules.txt
is excluded by!vendor/**
📒 Files selected for processing (5)
- pkg/client/client.go (6 hunks)
- pkg/spdk/engine.go (14 hunks)
- pkg/spdk/replica.go (2 hunks)
- pkg/spdk/server.go (3 hunks)
- pkg/spdk_test.go (21 hunks)
🔇 Additional comments (11)
pkg/client/client.go (3)
Line range hint
95-106
: LGTM: ReplicaGet parameter addition is well-implemented.The addition of the
runtimeRequested
parameter is clean and maintains consistent error handling patterns.
Line range hint
518-529
: LGTM: EngineGet parameter addition is consistent.The addition of the
runtimeRequested
parameter follows the same pattern as ReplicaGet, maintaining code consistency.
Line range hint
475-494
: LGTM: EngineCreate changes support auto-salvage functionality.The addition of the
salvageRequested
parameter aligns with the PR objectives for v2 auto-salvage support.Let's verify the integration with v2 volumes:
✅ Verification successful
Let me gather more information about the salvage functionality integration by searching for related code patterns.
Based on the search results, I can now generate the final response as we have sufficient evidence about the salvage functionality integration.
LGTM: Salvage functionality is properly integrated across the codebase
The
salvageRequested
parameter is correctly integrated with comprehensive salvage handling logic:
- Propagates through client → SPDK server → engine layers
- Implements robust salvage candidate filtering based on volume head sizes
- Includes proper logging and error handling for salvage operations
The implementation aligns with the PR objectives and shows proper integration with both v1 and v2 volumes, as evidenced by the compatibility checks in the codebase.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify v2 volume handling in the codebase # Test: Search for v2 volume related code to ensure proper integration rg -g '!{vendor,.*}' -A 5 'v2.*volume|volume.*v2'Length of output: 561
Script:
#!/bin/bash # Search for salvage related code patterns rg -g '!{vendor,.*}' -A 5 'salvage|SalvageRequested' # Also check for engine creation patterns to understand the integration ast-grep --pattern 'func $_ EngineCreate($$$) { $$$ }'Length of output: 6101
pkg/spdk_test.go (3)
271-271
: LGTM: Engine creation calls consistently updatedThe engine creation calls have been consistently updated across all test cases to include the new parameters, with appropriate test coverage for different scenarios including normal operation, rebuilding, and upgrade paths.
Also applies to: 354-354, 530-530, 1228-1228, 1243-1243, 1354-1354, 1428-1428, 1432-1432
327-327
: LGTM: Get operations consistently updatedThe engine and replica get operations have been consistently updated to include the new
runtimeRequested
parameter.Also applies to: 335-335, 383-383, 413-413, 862-862, 889-889, 907-907, 956-956, 987-987, 1005-1005, 1061-1061, 1177-1177, 1181-1181, 1215-1215, 1290-1290, 1370-1370
Line range hint
1-1450
: LGTM: Test structure remains robustThe test infrastructure maintains good practices with proper setup, teardown, and resource cleanup. The changes are well-integrated into the existing test structure.
pkg/spdk/server.go (2)
918-918
:EngineCreate
method updated correctly withreq.SalvageRequested
The
EngineCreate
method now passesreq.SalvageRequested
toe.Create
, aligning with the updatedCreate
method signature.
1053-1054
:EngineGet
method includes runtime information retrievalThe
EngineGet
method now callse.checkAndUpdateInfoFromReplicaNoLock(req.RuntimeRequested)
, ensuring that runtime information is updated as requested.pkg/spdk/replica.go (1)
615-618
: Verify state transition from Stopped to PendingThe condition modifies the replica state from
InstanceStateStopped
toInstanceStatePending
whenheadSvcLvol.UUID
is empty. Ensure that this state transition aligns with the intended state machine and does not introduce unintended side effects.pkg/spdk/engine.go (2)
758-758
: Review the usage ofruntimeRequested
parameter inReplicaGet
The method
replicaServiceCli.ReplicaGet
is called with theruntimeRequested
parameter. Ensure that the value ofruntimeRequested
accurately reflects the desired behavior in this context. If runtime information is needed, consider passingtrue
; otherwise, confirm thatfalse
is appropriate.
105-112
: Verify all usages of theCreate
method include the newsalvageRequested
parameterThe
Create
method signature now includes the new boolean parametersalvageRequested
. Please ensure that all invocations of this method throughout the codebase have been updated to include this parameter to prevent compilation errors or unintended behavior.✅ Verification successful
Let me try a different approach to find the usages of the
Create
method.
Let me try a more specific search to find the actual calls to the Engine's Create method.
Based on the search results, I can now provide a final response:
All usages of
EngineCreate
have been verified to include thesalvageRequested
parameterThe codebase search shows that all calls to
EngineCreate
in the test files (pkg/spdk_test.go
) and the client implementation (pkg/client/client.go
) have been properly updated to include thesalvageRequested
parameter. Every call includes all 10 parameters, with thesalvageRequested
parameter consistently set tofalse
in the test cases.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Find all calls to the `Create` method that may be missing the `salvageRequested` parameter. # Expected result: No calls to `Create` with only 7 arguments. # Command: ast-grep --lang go $'Create($_, $_, $_, $_, $_, $_, $_)'Length of output: 163
Script:
#!/bin/bash # Search for calls to Create method with context rg "Create\(" -A 3 --type goLength of output: 65214
Script:
#!/bin/bash # Search for calls to EngineCreate with context rg "EngineCreate\(" -A 5 --type goLength of output: 10056
e9dafb6
to
24912d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (3)
pkg/spdk/server.go (2)
409-410
: Consider optimizing memory allocation.The
rRuntime
replica instance is allocated unconditionally but only used if a matching lvol is found. Consider moving the allocation inside theif req.RuntimeRequested
block after finding a matching lvol.- rRuntime := &Replica{} spdkClient := s.spdkClient if req.RuntimeRequested { + var rRuntime *Replica // ... lvol filtering logic ... for lvolName, bdevLvol := range bdevLvolMap { if lvolName != req.Name { continue } + rRuntime = NewReplica(...) // ... rest of the code ... } r = rRuntime }Also applies to: 456-457
1053-1054
: Add documentation for runtime behavior.The
RuntimeRequested
parameter affects the behavior ofcheckAndUpdateInfoFromReplicaNoLock
, but this behavior is not documented. Consider adding a comment explaining when runtime information should be requested and its implications.+ // If RuntimeRequested is true, the engine's state will be updated with current + // runtime information from its replicas. This may have performance implications + // as it requires additional API calls to gather the runtime state. e.checkAndUpdateInfoFromReplicaNoLock(req.RuntimeRequested)pkg/spdk/engine.go (1)
724-724
: Add documentation for runtimeRequested parameterThe
runtimeRequested
parameter has been added to multiple methods but lacks documentation explaining:
- Its purpose and when it should be true/false
- The impact on behavior when enabled/disabled
- Any performance implications
Consider adding documentation like this:
+// checkAndUpdateInfoFromReplicaNoLock updates the engine's info from its replicas. +// If runtimeRequested is true, it will fetch additional runtime information which +// may impact performance. This should only be set to true when the additional +// information is required for operations like salvage. func (e *Engine) checkAndUpdateInfoFromReplicaNoLock(runtimeRequested bool) {Also applies to: 729-729, 758-758, 1089-1089, 1300-1300
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (5)
- pkg/client/client.go (6 hunks)
- pkg/spdk/engine.go (14 hunks)
- pkg/spdk/replica.go (2 hunks)
- pkg/spdk/server.go (3 hunks)
- pkg/spdk_test.go (21 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/client/client.go
🔇 Additional comments (10)
pkg/spdk/server.go (1)
918-918
: LGTM: Added salvage support to engine creation.The addition of
SalvageRequested
parameter aligns with the PR objectives for implementing v2 auto-salvage support.pkg/spdk_test.go (3)
327-335
: Consistent usage of runtimeRequested parameterAll
ReplicaGet
andEngineGet
calls have been updated with the newruntimeRequested
parameter set tofalse
. This is consistent with the existing test coverage pattern.A previous review already suggested adding test coverage for
runtimeRequested=true
scenarios.Also applies to: 383-413, 862-907, 956-1005, 1061-1181, 1215-1290
Line range hint
1354-1370
: LGTM: Proper separation of initiator and target testingThe test case properly verifies the behavior when the engine is created with different IP addresses for initiator and target:
- Verifies empty initiator fields when initiator is not created
- Verifies target fields are properly set
- Verifies cleanup behavior
1428-1432
: LGTM: Comprehensive upgrade testingThe test case properly verifies the engine creation with upgrade requirements:
- Tests target address format
- Verifies IP and port assignments
- Confirms endpoint creation
pkg/spdk/replica.go (4)
615-618
: LGTM: State transition logic for auto-salvage.The added condition correctly handles state transition from
Stopped
toPending
when the head lvol UUID is empty, which is necessary for re-initializing replicas during auto-salvage.
645-647
: LGTM: Improved error handling for device checks.The enhanced error handling correctly distinguishes between expected "no such device" errors and other errors, which is crucial for reliable auto-salvage operations.
649-677
: LGTM: Enhanced lvol bdev handling for auto-salvage.The added logic properly handles existing lvol bdevs by:
- Checking for existence before creation
- Constructing the replica object from existing state if found
- Maintaining proper error handling throughout
This change is crucial for supporting auto-salvage v2 by preventing unnecessary recreation of existing lvol bdevs.
679-683
: LGTM: Added validation for lvol bdev existence.The added validation ensures that the lvol bdev exists after creation or discovery, preventing potential issues with missing lvol bdevs during auto-salvage operations.
pkg/spdk/engine.go (2)
105-112
: LGTM: Parameter addition aligns with auto-salvage featureThe addition of the
salvageRequested
boolean parameter and its logging is well-integrated into the Create method signature.
194-207
: LGTM: Proper error handling for salvage operationsThe implementation correctly:
- Logs the salvage request
- Updates replica modes through
updateReplicaModeToFilterSalvageCandidates
- Skips replicas in ERR mode during engine creation
1961459
to
78975ca
Compare
This pull request is now in conflict. Could you fix it @c3y1huang? 🙏 |
pkg/spdk/engine.go
Outdated
// It iterates through all replicas and: | ||
// - Skips the replicas with mode ERR. | ||
// - Gets the volume head size of each replica. | ||
// - Selects replicas with the largest volume head size as salvage candidates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@c3y1huang We can merge it after resolving the conflict. Thank you. |
longhorn/longhorn-8430 Signed-off-by: Chin-Ya Huang <[email protected]>
longhorn/longhorn-8430 Signed-off-by: Chin-Ya Huang <[email protected]>
The logic in this PR needs to be updated to incorporate the changes introduced in #208. |
968294a
to
f8f592f
Compare
@coderabbitai review |
✅ Actions performedReview triggered.
|
longhorn/longhorn-8430 Signed-off-by: Chin-Ya Huang <[email protected]>
longhorn/longhorn-8430 Signed-off-by: Chin-Ya Huang <[email protected]>
longhorn/longhorn-8430 Signed-off-by: Chin-Ya Huang <[email protected]>
longhorn/longhorn-9244 Signed-off-by: Chin-Ya Huang <[email protected]>
f8f592f
to
bb18df9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (4)
pkg/spdk_test.go (3)
530-530
: Consider testing salvage interaction with snapshots.While this test thoroughly covers snapshot operations and rebuilding, it would be valuable to add test cases that verify how salvage operations interact with snapshot chains and rebuilding processes.
1354-1354
: Add salvage testing for target-only configuration.Consider adding test cases that verify salvage functionality when operating in target-only mode, as this represents a distinct operational scenario that should be validated.
Line range hint
271-1432
: Enhance overall salvage feature test coverage.While the changes consistently implement the new salvage parameters across all test cases, the test suite would benefit from a dedicated test function that comprehensively verifies the auto-salvage feature, including:
- Success and failure scenarios
- Edge cases and error conditions
- Integration with existing functionality (snapshots, rebuilding, upgrades)
- Performance impact of salvage operations
pkg/spdk/engine.go (1)
286-358
: Consider performance optimizations in filterSalvageCandidatesThe implementation is thorough and well-documented. However, there are opportunities for performance optimization:
- The map iteration at line 296-298 could be avoided by operating directly on the input map
- The sorting of volume head sizes could be replaced with a single pass to find the maximum
Consider this optimization:
func (e *Engine) filterSalvageCandidates(replicaAddressMap map[string]string) (map[string]string, error) { - // Initialize filteredCandidates to hold a copy of replicaAddressMap. - filteredCandidates := map[string]string{} - for key, value := range replicaAddressMap { - filteredCandidates[key] = value - } + // Work directly on the input map + filteredCandidates := replicaAddressMap volumeHeadSizeToReplicaNames := map[uint64][]string{} + var largestVolumeHeadSize uint64 // Collect volume head size for each replica. for replicaName, replicaAddress := range replicaAddressMap { func() { // ... existing connection and replica retrieval code ... // Track maximum size while collecting + if replica.Head.ActualSize > largestVolumeHeadSize { + largestVolumeHeadSize = replica.Head.ActualSize + } volumeHeadSizeToReplicaNames[replica.Head.ActualSize] = append( volumeHeadSizeToReplicaNames[replica.Head.ActualSize], replicaName ) }() } - // Sort the volume head sizes to find the largest. - volumeHeadSizeSorted, err := commonutils.SortKeys(volumeHeadSizeToReplicaNames) - if err != nil { - return nil, errors.Wrap(err, "failed to sort keys of salvage candidate by volume head size") - } - - if len(volumeHeadSizeSorted) == 0 { + if len(volumeHeadSizeToReplicaNames) == 0 { return nil, errors.New("failed to find any salvage candidate with volume head size") } - // Determine salvage candidates with the largest volume head size. - largestVolumeHeadSize := volumeHeadSizeSorted[len(volumeHeadSizeSorted)-1] // ... rest of the function
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (5)
pkg/client/client.go
(2 hunks)pkg/spdk/engine.go
(4 hunks)pkg/spdk/replica.go
(3 hunks)pkg/spdk/server.go
(1 hunks)pkg/spdk_test.go
(7 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- pkg/client/client.go
- pkg/spdk/replica.go
- pkg/spdk/server.go
🔇 Additional comments (4)
pkg/spdk_test.go (1)
271-271
: 🛠️ Refactor suggestion
Add test coverage for salvage functionality.
The EngineCreate
call includes new parameters for salvage support, but the test only covers the salvageRequested=false
case. Consider adding test scenarios that verify the behavior when salvageRequested=true
.
pkg/spdk/engine.go (3)
107-114
: LGTM: Added salvage support parameter
The addition of the salvageRequested
parameter to the Create method aligns with the auto-salvage feature requirements. The parameter is properly logged for debugging purposes.
196-203
: LGTM: Well-structured salvage request handling
The salvage request handling is implemented with proper logging and error handling. The code follows a clear flow by delegating the filtering logic to a dedicated method.
1900-1905
: LGTM: Code formatting improvement
The indentation changes improve code readability without affecting functionality.
Which issue(s) this PR fixes:
Issue longhorn/longhorn#8430
What this PR does / why we need it:
Skip creating lvol bdev for a new replica instance if it exists.
Special notes for your reviewer:
None
Additional documentation or context
None