Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<bug>[sblk]: fix vg being re created #407

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

MatheMatrix
Copy link
Member

@MatheMatrix MatheMatrix commented Dec 3, 2023

fix vg being re created

http://jira.zstack.io/browse/ZSTAC-57025

Resolves/Related: ZSTAC-57025

Change-Id: 180d7b4804474940baf57fb02c047eb3

Summary by CodeRabbit

  • 新功能

    • 应用程序新增搜索功能,用户可通过顶部搜索栏快速查找信息。
  • 错误修复

    • 修复了可能导致数据丢失的问题,现在在非首次创建卷组时会抛出异常以警示用户。
  • 测试

    • 增加了多个测试用例以验证共享块插件的连接、锁定检查以及多队列数据卷的处理。
  • 重构

    • 更新了部分函数签名,以支持新的参数,改进了代码的逻辑流程。
  • 样式

    • 移除了一些不再需要的初始化函数调用,简化了代码初始化过程。
  • 文档

    • 对外部文档进行了更新,以反映新的功能和变更。
  • 杂务

    • 引入了新的调试模块,增强了运行时的跟踪和调试能力。

fix vg being re created

http://jira.zstack.io/browse/ZSTAC-57025

Resolves/Related: ZSTAC-57025

Change-Id: 180d7b4804474940baf57fb02c047eb3
Copy link

coderabbitai bot commented Dec 3, 2023

Walkthrough

本次代码更新主要涉及KVM代理的共享块插件和测试套件。新增了参数is_first_create_vgisFirst,用于处理卷组创建和共享块连接的逻辑。此外,还引入了新的测试用例,优化了初始化流程,并增强了调试功能。这些变更可能会影响相关功能的控制流和逻辑。

Changes

文件路径 变更摘要
kvmagent/.../shared_block_plugin.py 新增参数is_first_create_vg,调整了卷组创建逻辑。
kvmagent/.../shared_block_plugin_teststub.py connect方法新增参数isFirst,改变了控制流逻辑。
kvmagent/.../test_shareblock_connect.py 引入os.path模块,增加了新的命令和断言。
kvmagent/.../sharedblock_utils.py shareblock_connect函数新增参数isFirst,改变了函数签名和逻辑。
kvmagent/.../test_shareblock_data_volume_with_iothreadpin.py 移除了init_kvmagent()函数调用。
kvmagent/.../test_shareblock_data_volume_with_multi_queues.py 移除了init_kvmagent()函数调用,影响初始化序列。
kvmagent/.../test_sharedblock_check_lock.py 新增测试方法和模块导入,初始化存储设备工具。
kvmagent/.../test_virtio_scsi_shareblock_data_volume_with_iothreadpin.py 移除了init_kvmagent()函数调用。
kvmagent/.../pytest_utils.py 引入debug模块,新增install_runtime_tracedumper()调用。
zstacklib/.../list_ops.py 新增list_split函数,用于将列表分割成子列表。
zstacklib/.../lvm.py 新增vgck函数,修改了lvm_vgck函数的控制流和逻辑。

🐰✨
代码跃动,逻辑新,
共享块改,测试添。
小兔编诗,庆更新,
代码世界,又一春。 🌱🎉


Tips

Chat with CodeRabbit Bot (@coderabbitai)

  • If you reply to a review comment from CodeRabbit, the bot will automatically respond.
  • To engage with CodeRabbit bot directly around the specific lines of code in the PR, mention @coderabbitai in your review comment
  • Note: Review comments are made on code diffs or files, not on the PR overview.
  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Note: For conversation with the bot, please use the review comments on code diffs or files.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 1

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 1813a3b and 88c9e6c.
Files selected for processing (4)
  • kvmagent/kvmagent/plugins/shared_block_plugin.py (3 hunks)
  • kvmagent/kvmagent/test/shareblock_testsuite/shared_block_plugin_teststub.py (1 hunks)
  • kvmagent/kvmagent/test/shareblock_testsuite/test_shareblock_connect.py (2 hunks)
  • kvmagent/kvmagent/test/utils/sharedblock_utils.py (1 hunks)
Additional comments: 5
kvmagent/kvmagent/plugins/shared_block_plugin.py (1)
  • 643-646: 在connect方法中,正确地传递了cmd.isFirst参数给create_vg_if_not_found方法,这与PR目标一致。
kvmagent/kvmagent/test/shareblock_testsuite/shared_block_plugin_teststub.py (1)
  • 19-28: 确保所有调用 connect 方法的代码都已更新,以包含新的 isFirst 参数,并且逻辑正确地处理了这个参数。
kvmagent/kvmagent/test/shareblock_testsuite/test_shareblock_connect.py (2)
  • 1-1: 代码中新增了os.path模块的导入,这可能意味着文件路径操作的增加。

  • 51-69: 测试函数test_sharedblock_connect中添加了新的逻辑和断言,包括并发执行connect方法和对卷组状态的额外检查。特别是在第66行,isFirst参数被用于控制卷组创建的行为,这与PR目标一致,即防止不必要地重新创建卷组。

kvmagent/kvmagent/test/utils/sharedblock_utils.py (1)
  • 23-33: 确保代码库中所有对shareblock_connect函数的调用都已更新,以包含新的isFirst参数。这是为了防止卷组不必要地被重新创建,从而避免数据丢失。

Comment on lines 444 to 453

return diskPaths

def create_vg_if_not_found(self, vgUuid, disks, hostUuid, allDisks, forceWipe=False):
def create_vg_if_not_found(self, vgUuid, disks, hostUuid, allDisks, forceWipe=False, is_first_create_vg=False):
# type: (str, set([CheckDisk]), str, set([CheckDisk]), bool) -> bool
@linux.retry(times=5, sleep_time=random.uniform(0.1, 3))
def find_vg(vgUuid, raise_exception = True):
def find_vg(vgUuid, raise_exception=True):
cmd = shell.ShellCmd("timeout 5 vgscan --ignorelockingfailure; vgs --nolocking -t %s -otags | grep %s" % (vgUuid, INIT_TAG))
cmd(is_exception=False)
if cmd.return_code != 0 and raise_exception:
Copy link

@coderabbitai coderabbitai bot Dec 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [447-463]

create_vg_if_not_found方法中,当is_first_create_vgFalse时,抛出异常以防止数据丢失,这与PR目标一致。但是,在add_disk方法中调用create_vg_if_not_found时,没有传递is_first_create_vg参数,这可能会导致在不应该创建vg时创建vg,从而可能导致数据丢失。建议在add_disk方法中添加对is_first_create_vg参数的处理。

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我在 create_vg_if_not_found 的 is_first_create_vg 参数设置为默认 false,是不是这样就比较安全,就不需要在 add_disk 里加了

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MatheMatrix 您的建议将is_first_create_vg参数的默认值设置为False是一个安全的做法,这样可以确保在默认情况下不会无意中重新创建卷组,从而避免数据丢失。如果您的代码逻辑中确实需要在某些情况下创建新的卷组,您可以显式地将该参数设置为True。这样,就不需要在add_disk方法中添加额外的处理了。🐰

add ping check for sblk vg lock

http://jira.zstack.io/browse/SUG-411

Resolves/Related: SUG-411

Change-Id: 3c246404f4944e4db991250bf103b464
@MatheMatrix
Copy link
Member Author

@coderabbitai review

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 11

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 88c9e6c and 8164f6e.
Files selected for processing (9)
  • kvmagent/kvmagent/plugins/shared_block_plugin.py (8 hunks)
  • kvmagent/kvmagent/test/shareblock_testsuite/test_shareblock_data_volume_with_iothreadpin.py (1 hunks)
  • kvmagent/kvmagent/test/shareblock_testsuite/test_shareblock_data_volume_with_multi_queues.py (1 hunks)
  • kvmagent/kvmagent/test/shareblock_testsuite/test_sharedblock_check_lock.py (1 hunks)
  • kvmagent/kvmagent/test/shareblock_testsuite/test_virtio_scsi_shareblock_data_volume_with_iothreadpin.py (1 hunks)
  • kvmagent/kvmagent/test/utils/pytest_utils.py (1 hunks)
  • kvmagent/kvmagent/test/utils/sharedblock_utils.py (2 hunks)
  • zstacklib/zstacklib/utils/list_ops.py (1 hunks)
  • zstacklib/zstacklib/utils/lvm.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • kvmagent/kvmagent/plugins/shared_block_plugin.py
Additional comments: 3
kvmagent/kvmagent/test/shareblock_testsuite/test_shareblock_data_volume_with_iothreadpin.py (1)
  • 11-16:
kvmagent/kvmagent/test/shareblock_testsuite/test_virtio_scsi_shareblock_data_volume_with_iothreadpin.py (1)
  • 11-16: 根据PR目标和生成的摘要,init_kvmagent()函数调用已从test_virtio_scsi_shareblock_data_volume_with_iothreadpin.py文件中移除。这表明初始化KVM代理或相关功能的过程可能已经改变。请确保这一更改不会影响测试的执行和预期结果。
kvmagent/kvmagent/test/utils/pytest_utils.py (1)
  • 3-4: 新增了对zstacklib.utils模块中debug的依赖,并调用了install_runtime_tracedumper()函数。这可能会改变代码的运行时调试和跟踪收集行为。

Comment on lines +33 to +43
def list_split(list, n):
if len(list) == 0:
return list

if n <= 0:
raise Exception("num must be positive")

result = [[] for _ in range(min(n, len(list)))]
for i, v in enumerate(list):
result[i % n].append(v)
return result
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using list as a variable name since it shadows the built-in list type in Python. Consider using a different name like input_list or items. Also, when the input list is empty, the function should return an empty list of lists to maintain consistency with its intended purpose. Lastly, the exception message should refer to the parameter n instead of "num" for clarity.

- def list_split(list, n):
+ def list_split(input_list, n):
-     if len(list) == 0:
+     if len(input_list) == 0:
-         return list
+         return [[]]
-         raise Exception("num must be positive")
+         raise Exception("n must be positive")
-     result = [[] for _ in range(min(n, len(list)))]
+     result = [[] for _ in range(min(n, len(input_list)))]
-     for i, v in enumerate(list):
+     for i, v in enumerate(input_list):

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
def list_split(list, n):
if len(list) == 0:
return list
if n <= 0:
raise Exception("num must be positive")
result = [[] for _ in range(min(n, len(list)))]
for i, v in enumerate(list):
result[i % n].append(v)
return result
def list_split(input_list, n):
if len(input_list) == 0:
return [[]]
if n <= 0:
raise Exception("n must be positive")
result = [[] for _ in range(min(n, len(input_list)))]
for i, v in enumerate(input_list):
result[i % n].append(v)
return result

@@ -20,15 +20,16 @@ def sharedblock_ping(vgUuid):
}))

@misc.return_jsonobject()
def shareblock_connect(sharedBlockUuids=None, allSharedBlockUuids=None, vgUuid=None,hostId=None,hostUuid=None, forceWipe=True):
def shareblock_connect(sharedBlockUuids=None, allSharedBlockUuids=None, vgUuid=None,hostId=None,hostUuid=None, forceWipe=True, isFirst=True):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shareblock_connect 函数中添加了一个新参数 isFirst。请确认所有调用此函数的代码都已更新,以包含这个新参数。由于 isFirst 的默认值设置为 True,这可能会影响函数的现有行为,需要验证这一点是否符合预期。

Comment on lines +21 to +25
hostUuid = "8b12f74e6a834c5fa90304b8ea54b1dd"
hostId = 24
vgUuid = "36b02490bb944233b0b01990a450ba83"
vg2Uuid = "ee09b7986bbc4a1f85f439b168c3aee7"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在测试用例中使用硬编码的UUID和主机ID可能不是最佳实践,因为在不同的环境或资源下运行测试时可能会导致问题。建议使用动态生成或配置文件中的值来替换这些硬编码值。

)
self.assertEqual(True, rsp.success, rsp.error)
self.assertEqual(1, len(rsp.failedVgs), rsp.failedVgs.__dict__)
self.assertEqual(rsp.failedVgs.hasattr(vgUuid), True, rsp.failedVgs.__dict__)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasattr函数的使用似乎是不正确的。在这里,它被用来检查rsp.failedVgs是否包含特定的UUID,但hasattr是用来检查对象是否有特定名称的属性。如果failedVgs是一个包含UUID的列表,应该使用vgUuid in rsp.failedVgs来进行检查。

- self.assertEqual(rsp.failedVgs.hasattr(vgUuid), True, rsp.failedVgs.__dict__)
+ self.assertIn(vgUuid, rsp.failedVgs, str(rsp.failedVgs))

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
self.assertEqual(rsp.failedVgs.hasattr(vgUuid), True, rsp.failedVgs.__dict__)
self.assertIn(vgUuid, rsp.failedVgs, str(rsp.failedVgs))

vgUuids=[vgUuid, vg2Uuid],
)
self.assertEqual(True, rsp.success, rsp.error)
self.assertEqual(rsp.failedVgs.hasattr(vgUuid), True, str(rsp.failedVgs))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上,这里的hasattr使用也是不正确的。如果failedVgs是一个字典或者有属性的对象,应该使用getattr(rsp.failedVgs, vgUuid, None)来代替hasattr,或者如果failedVgs是一个列表,应该使用vgUuid in rsp.failedVgs

- self.assertEqual(rsp.failedVgs.hasattr(vgUuid), True, str(rsp.failedVgs))
+ self.assertIn(vgUuid, rsp.failedVgs, str(rsp.failedVgs))

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
self.assertEqual(rsp.failedVgs.hasattr(vgUuid), True, str(rsp.failedVgs))
self.assertIn(vgUuid, rsp.failedVgs, str(rsp.failedVgs))

@@ -1,10 +1,11 @@
import os
import coverage
from zstacklib.test.utils import env
from zstacklib.utils import debug

Out_flag = True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

全局变量Out_flag用于控制测试环境的退出状态,但这种做法可能不是最佳实践,因为它可能会绕过一些清理和日志记录步骤。

Comment on lines 1 to 11
import os
import coverage
from zstacklib.test.utils import env
from zstacklib.utils import debug

Out_flag = True


debug.install_runtime_tracedumper()
class PytestExtension(object):

cov = None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This review was outside the patches, and no patch overlapping with it was found. Original lines [26-30]

teardown_class中直接使用os._exit(0)os._exit(1)退出进程可能不是最佳实践,因为这会立即退出进程,而不会调用清理处理程序、执行日志记录等。建议使用更标准的测试框架方法来处理测试的退出。


Note: This review was outside the patches, and no patch overlapping with it was found. Original lines [32-32]

函数名ztest_decorater可能是一个拼写错误,正确的应该是ztest_decorator

Comment on lines 1814 to 1825

return True, ""

@bash.in_bash
def vgck(vgUuid, timeout):
return bash.bash_roe('timeout -s SIGKILL %s vgck %s 2>&1' % (timeout, vgUuid))

def lvm_vgck(vgUuid, timeout):
health, o, e = bash.bash_roe('timeout -s SIGKILL %s vgck %s 2>&1' % (360 if timeout < 360 else timeout, vgUuid))
health, o, e = vgck(vgUuid, 360 if timeout < 360 else timeout)
check_stuck_vglk()

if health != 0:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lvm_vgck function is using a hardcoded timeout value of 360 seconds for the vgck command. Consider making the timeout value configurable or ensuring that it is suitable for all expected use cases.

- health, o, e = vgck(vgUuid, 360 if timeout < 360 else timeout)
+ # Consider replacing the hardcoded value with a configurable parameter or constant
+ health, o, e = vgck(vgUuid, timeout)

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
return True, ""
@bash.in_bash
def vgck(vgUuid, timeout):
return bash.bash_roe('timeout -s SIGKILL %s vgck %s 2>&1' % (timeout, vgUuid))
def lvm_vgck(vgUuid, timeout):
health, o, e = bash.bash_roe('timeout -s SIGKILL %s vgck %s 2>&1' % (360 if timeout < 360 else timeout, vgUuid))
health, o, e = vgck(vgUuid, 360 if timeout < 360 else timeout)
check_stuck_vglk()
if health != 0:
return True, ""
@bash.in_bash
def vgck(vgUuid, timeout):
return bash.bash_roe('timeout -s SIGKILL %s vgck %s 2>&1' % (timeout, vgUuid))
def lvm_vgck(vgUuid, timeout):
# Consider replacing the hardcoded value with a configurable parameter or constant
health, o, e = vgck(vgUuid, timeout)
check_stuck_vglk()
if health != 0:

Comment on lines +1818 to +1819
def vgck(vgUuid, timeout):
return bash.bash_roe('timeout -s SIGKILL %s vgck %s 2>&1' % (timeout, vgUuid))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vgck function is not handling the case where the timeout command may not be available on the system. Ensure that the system has the timeout utility installed or handle the absence gracefully.

- return bash.bash_roe('timeout -s SIGKILL %s vgck %s 2>&1' % (timeout, vgUuid))
+ # Ensure that the timeout command is available or handle its absence
+ if not shell.command_exists('timeout'):
+     raise Exception('The "timeout" command is not available on this system.')
+ return bash.bash_roe('timeout -s SIGKILL %s vgck %s 2>&1' % (timeout, vgUuid))

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
def vgck(vgUuid, timeout):
return bash.bash_roe('timeout -s SIGKILL %s vgck %s 2>&1' % (timeout, vgUuid))
def vgck(vgUuid, timeout):
# 确保系统已安装 timeout 命令或优雅地处理其缺失
if not shell.command_exists('timeout'):
raise Exception('The "timeout" command is not available on this system.')
return bash.bash_roe('timeout -s SIGKILL %s vgck %s 2>&1' % (timeout, vgUuid))

Comment on lines 1821 to 1825
def lvm_vgck(vgUuid, timeout):
health, o, e = bash.bash_roe('timeout -s SIGKILL %s vgck %s 2>&1' % (360 if timeout < 360 else timeout, vgUuid))
health, o, e = vgck(vgUuid, 360 if timeout < 360 else timeout)
check_stuck_vglk()

if health != 0:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lvm_vgck function is calling check_stuck_vglk without capturing its return value or any exceptions it might raise. Ensure that the result of check_stuck_vglk is handled appropriately.

- check_stuck_vglk()
+ # Handle the result of check_stuck_vglk
+ try:
+     check_stuck_vglk()
+ except Exception as e:
+     logger.error('An error occurred while checking for stuck vglk: %s' % str(e))
+     return False, 'An error occurred while checking for stuck vglk'

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
def lvm_vgck(vgUuid, timeout):
health, o, e = bash.bash_roe('timeout -s SIGKILL %s vgck %s 2>&1' % (360 if timeout < 360 else timeout, vgUuid))
health, o, e = vgck(vgUuid, 360 if timeout < 360 else timeout)
check_stuck_vglk()
if health != 0:
def lvm_vgck(vgUuid, timeout):
health, o, e = vgck(vgUuid, 360 if timeout < 360 else timeout)
# Handle the result of check_stuck_vglk
try:
check_stuck_vglk()
except Exception as e:
logger.error('An error occurred while checking for stuck vglk: %s' % str(e))
return False, 'An error occurred while checking for stuck vglk'
if health != 0:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant