Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

During restoration, the storaged or metad service fails to be started. The following error message is displayed: [db/db_impl/db_impl_open.cc:2112] DB::Open() failed: Corruption: Corruption: IO error: No such file or directory: While open a file for random read: /xxx/000024.ldb: No such file or directory in file /xxx/MANIFEST-000020 #5976

Open
cccxgit opened this issue Nov 17, 2024 · 0 comments

Comments

@cccxgit
Copy link

cccxgit commented Nov 17, 2024

  • nebula 版本:3.6
  • 部署方式:k8s分布式
  • 安装方式:源码编译
  • 是否上生产环境:Y
  • 硬件信息
    • 机械磁盘
    • 6U15G
  • 问题的具体描述
    我这边生产环境搭建了一个基于nebula的k8s分布式集群,已创建15个图空间,导入5亿点边数据。在服务正常情况下,执行create snapshot进行数据的备份。基于备份的数据,为metad和storaged服务进行恢复时,存在偶先storaged或metad服务启动失败,报错信息为
2024/11/11-21:38:38.656812 140578633020992 [WARN] [db/db_impl/db_impl_open.cc:2112] DB::Open() failed: Corruption: Corruption: IO error: No such file or directory: While open a file for random read: /xxx/000024.ldb: No such file or directory in file /xxx/MANIFEST-000020
2024/11/11-21:38:38.656835 140578633020992 [db/db_impl/db_impl.cc:477] Shutdown: canceling all background work
2024/11/11-21:38:38.656893 140578633020992 [db/db_impl/db_impl.cc:677] Shutdown complete
  • 其他信息:
    (1)通过多次恢复验证,storaged启动失败的概率大于metad
    (2)本集群未使用bragent进行备份恢复,而是自研一套方案。本集群的恢复方案为:1、从远端存储机器中下载snapshot备份文件压缩包到storaged和metad容器中;2、通过nebula.service stop关闭storaged和metad服务,并解压snapshot文件到storaged或metad指定data目录下(storaged存在多个图空间,对应多个snapshot压缩文件,启动多线程并行解压);3、解压完成的服务,执行nebula.service start启动(采用节点粒度启动服务。当节点中的storaged和metad都解压完,一起启动服务)。
    (3)不同节点机器性能存在差异,因此服务启动时间不同,存在时间差(可能10mins)
    (4)当前已验证远端存储机器下载的snapshot文件无破损(md5值验证);所有解压均无失败
  • 问题检索:
    (1)rocksdb github社区,有几个相似问题的issue,均处于open
https://github.com/facebook/rocksdb/issues/10258
https://github.com/facebook/rocksdb/issues/10357

(2)其中facebook/rocksdb#10357 贴子最下面,似乎有解决方案,请帮助分析感谢
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant