-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Unify cache instance between DLA and cloud native datacache. #52793
Conversation
} | ||
LOG(INFO) << process_name << " start step " << start_step++ << ": staros worker init successfully"; | ||
#endif | ||
|
||
// set up thrift client before providing any service to the external | ||
// because these services may use thrift client, for example, stream | ||
// load will send thrift rpc to FE after http server is started |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most risky bug in this code is:
Potential improper handling of std::filesystem::remove_all()
failure due to not checking ec
immediately after calling std::filesystem::remove_all()
. This can lead to a failed attempt at renaming if the directory wasn't removed successfully, which might cause unexpected behavior.
You can modify the code like this:
#ifdef USE_STAROS
std::filesystem::path starlet_cache_path(root_path.path + "/starlet_cache");
if (std::filesystem::exists(starlet_cache_path)) {
if (DiskInfo::disk_id(starlet_cache_path.c_str()) != DiskInfo::disk_id(datacache_path.c_str())) {
LOG(ERROR) << "The datacache directory and the old starlet_cache directory cannot be located on different disks. "
<< "Please manually mount the datacache to the same disk as starlet_cache and then restart again";
return Status::InternalError("The datacache directory is different with old starlet_cache directory");
}
std::error_code ec;
std::filesystem::remove_all(datacache_path, ec);
if (ec) {
LOG(ERROR) << "Fail to remove existing datacache directory: " << ec.message();
return Status::InternalError("Fail to remove existing datacache directory");
}
std::filesystem::rename(starlet_cache_path, datacache_path, ec);
if (ec) {
LOG(ERROR) << "Fail to rename old starlet_cache directory to datacache, reason: " << ec.message();
return Status::InternalError("Fail to handle the old starlet_cache data");
}
}
#endif
// to optimize the io performance and reduce disk waste. | ||
// Set the parameter to `0` will turn off this optimization. | ||
CONF_Int64(datacache_inline_item_count_limit, "130172"); | ||
|
||
|
||
// The following configurations will be deprecated, and we use the `datacache` prefix instead. | ||
// But it is temporarily necessary to keep them for a period of time to be compatible with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most risky bug in this code is:
Changing CONF_mBool(datacache_auto_adjust_enable, "false")
to CONF_mBool(datacache_auto_adjust_enable, "true")
without ensuring system stability.
You can modify the code like this:
- CONF_mBool(datacache_auto_adjust_enable, "true");
+ // Ensure system can handle auto adjustment before enabling it
+ bool isSystemStable = checkSystemStability(); // Pseudo function for illustration
+ CONF_mBool(datacache_auto_adjust_enable, isSystemStable ? "true" : "false");
Enabling automatic cache adjustments without verifying system conditions might risk destabilizing systems that are not prepared for dynamic changes in resource allocation.
cc58977
to
a487637
Compare
a487637
to
db2dad1
Compare
std::filesystem::rename(starlet_cache_path, datacache_path, ec); | ||
} | ||
if (ec) { | ||
LOG(ERROR) << "Fail to rename old starlet_cache directory to datacache, reason: " << ec.message(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print the specific directory for src
and dst
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print the specific directory for
src
anddst
done
if (DiskInfo::disk_id(starlet_cache_path.c_str()) != DiskInfo::disk_id(datacache_path.c_str())) { | ||
LOG(ERROR) << "The datacache directory and the old starlet_cache directory cannot be located on different disks. " | ||
<< "Please manually mount the datacache to the same disk as starlet_cache and then restart again"; | ||
return Status::InternalError("The datacache directory is different with old starlet_cache directory"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be do not return error and set datacache_unified_instance
to false
, so that user still can start be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be do not return error and set
datacache_unified_instance
tofalse
, so that user still can start be?
This configuration item is mainly designed to solve the problem of unified instance exceptions, and users configure to start individual instances for fault tolerance. So this situation is usually more appropriate for users to know and make their own choices. Automatically switching to independent mode may cause unexpected problems due to resource usage and other factors.
starlet_cache_percent, -1); | ||
disk_size = std::max(disk_size, starlet_cache_size); | ||
} | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print a log here so that we know how much percent we are actually using
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print a log here so that we know how much percent we are actually using
When initializing blockcahe, the actual disk size will be printed.
19d3c2b
to
ae7944a
Compare
ae7944a
to
520cf7b
Compare
520cf7b
to
b15ee27
Compare
Signed-off-by: Gavin <[email protected]>
Signed-off-by: Gavin <[email protected]>
…tem count. Signed-off-by: GavinMar <[email protected]>
Signed-off-by: GavinMar <[email protected]>
b15ee27
to
4cad7bc
Compare
…timize cache pollution issues. Signed-off-by: Gavin <[email protected]>
4cad7bc
to
4774095
Compare
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[BE Incremental Coverage Report]✅ pass : 78 / 91 (85.71%) file detail
|
StarRocks#52793) Signed-off-by: Gavin <[email protected]> Signed-off-by: GavinMar <[email protected]>
Why I'm doing:
The current DLA and cloud native datacache are both built on the underlying starcache library, but they are two different instances. This leads to:
What I'm doing:
Fixes #issue
#52940
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: