You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During restoration, the storaged or metad service fails to be started. The following error message is displayed: [db/db_impl/db_impl_open.cc:2112] DB::Open() failed: Corruption: Corruption: IO error: No such file or directory: While open a file for random read: /xxx/000024.ldb: No such file or directory in file /xxx/MANIFEST-000020
#5976
Open
cccxgit opened this issue
Nov 17, 2024
· 0 comments
2024/11/11-21:38:38.656812 140578633020992 [WARN] [db/db_impl/db_impl_open.cc:2112] DB::Open() failed: Corruption: Corruption: IO error: No such file or directory: While open a file for random read: /xxx/000024.ldb: No such file or directory in file /xxx/MANIFEST-000020
2024/11/11-21:38:38.656835 140578633020992 [db/db_impl/db_impl.cc:477] Shutdown: canceling all background work
2024/11/11-21:38:38.656893 140578633020992 [db/db_impl/db_impl.cc:677] Shutdown complete
我这边生产环境搭建了一个基于nebula的k8s分布式集群,已创建15个图空间,导入5亿点边数据。在服务正常情况下,执行
create snapshot
进行数据的备份。基于备份的数据,为metad和storaged服务进行恢复时,存在偶先storaged或metad服务启动失败,报错信息为(1)通过多次恢复验证,storaged启动失败的概率大于metad
(2)本集群未使用bragent进行备份恢复,而是自研一套方案。本集群的恢复方案为:1、从远端存储机器中下载snapshot备份文件压缩包到storaged和metad容器中;2、通过nebula.service stop关闭storaged和metad服务,并解压snapshot文件到storaged或metad指定data目录下(storaged存在多个图空间,对应多个snapshot压缩文件,启动多线程并行解压);3、解压完成的服务,执行nebula.service start启动(采用节点粒度启动服务。当节点中的storaged和metad都解压完,一起启动服务)。
(3)不同节点机器性能存在差异,因此服务启动时间不同,存在时间差(可能10mins)
(4)当前已验证远端存储机器下载的snapshot文件无破损(md5值验证);所有解压均无失败
(1)rocksdb github社区,有几个相似问题的issue,均处于open
(2)其中facebook/rocksdb#10357 贴子最下面,似乎有解决方案,请帮助分析感谢
The text was updated successfully, but these errors were encountered: