Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the UUID generator. #4082

Closed
wants to merge 6 commits into from

Conversation

teo-tsirpanis
Copy link
Member

SC-28607

This PR:

  • "Unstatuses" uuid::generate_uuid, making it return a string and simplifying its callsites.
  • Unifies the UUID generator across platforms. Now we generate the UUID using the system's cryptographic random number generator and format it ourselves.
  • Refactors the UUID generator code, separating its generation from its formatting to a string.
  • Removes the mutex around the UUID generator. RAND_Bytes is thread-safe since OpenSSL 1.1.

TODO

  • I received offline the feedback to use fmt instead of snprintf.
  • I also intended to move the cryptographic RNG to a separate module, but that turned out to be more complicated than I thought. We can use Crypto::get_random_bytes after Improve cryptography tests. #3979.

TYPE: IMPROVEMENT
DESC: Refactor the UUID generator and use a single implementation across platforms.

@shortcut-integration
Copy link

This pull request has been linked to Shortcut Story #28607: Use a common UUID generation code across platforms..

Copy link
Contributor

@davisp davisp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to simplify the UUID generation by just formatting random bytes. There's no need to move randomness through a UUID struct for legacy reasons that don't apply to us.

If we want to be pedantic and call these UUIDv4, we can set the 6 bits or whatever though technically this makes things more likely to collide (by an imperceptibly minuscule amount) so no one does to my knowledge. Not setting those bits we can just call them UUIDs without a specific version.

namespace tiledb {
namespace sm {
namespace uuid {
struct Uuid {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole UUID struct seems fairly overboard for generating a UUID. I'd simplify it to just two functions for generating bytes and formatting them as hex. Since we've put this in the uuid namespace, calling code would just need to use auto uuid = uuid::generate(); or similar. Might be you'll want to move the rand_bytes function to some other place but I don't think we have any custom rand modules yet so unless you know of one I wouldn't bother creating it.

Suggested change
struct Uuid {
#ifdef _WIN32
void rand_bytes(std::string& bytes, size_t num_bytes) {
bytes.resize(num_bytes);
int rc = BCryptGenRandom(
BCRYPT_RNG_ALG_HANDLE,
reinterpret_cast<uint8_t*>(bytes.data()),
num_bytes,
0);
if (!NT_SUCCESS(rc)) {
throw std::runtime_error("Cannot generate GUID: BCryptGenRandom failed.");
}
}
#else
void rand_bytes(std::string& bytes, size_t num_bytes) {
bytes.resize(num_bytes);
int rc = RAND_bytes(reinterpret_cast<uint8_t*>(bytes.data()), num_bytes);
if (rc < 1) {
char err_msg[256];
ERR_error_string_n(ERR_get_error(), err_msg, sizeof(err_msg));
throw std::runtime_error(
"Cannot generate random bytes with OpenSSL: " + std::string(err_msg));
}
}
#endif
std::string generate() {
std::string bytes(16);
rand_bytes(bytes, 16);
std::string uuid(32);
for(size_t i = 0; i < bytes.size(); i++) {
snprintf(uuid.data() + (i * 2), 2, "%02x", (uint8_t) bytes[i]);
}
return uuid;
}

Copy link
Contributor

@davisp davisp May 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I removed the hyphenation parameter completely. We don't use it anywhere except tests so why bother keeping it around? If we need it in the future its not like it wouldn't be trivial to add. Also, I've never seen hyphenated versions outside of very legacy uses as those hyphens are dead weight in terms of storage/transmission costs and since random UUIDs aren't parsed there's not reason to pretend there's structure to the value.

@@ -38,11 +38,9 @@ namespace tiledb::storage_format {

std::string generate_uri(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is purely a comment that you can feel free to ignore, but given that we're already planning on rerouting all the places that generate URIs through this function it might be beneficial to do that as part of this PR since we're already modifying the places that would be calling into it. From my spot checking you'll need to add a version that doesn't append the version on but that's pretty trivial.

I didn't look hard enough, but the other thing is a version that takes a std::pair<uint64_t, uint64_t> and proxies through to the existing one might be helpful but might be not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've been asked before to unify these calls, will do as part of this PR.

@KiterLuc
Copy link
Contributor

KiterLuc commented Nov 9, 2023

Closing this. It will be done in https://app.shortcut.com/tiledb-inc/story/27545/implement-single-process-fragment-isolation-strategy by changing the fragment format to avoid the use of UUIDs.

@KiterLuc KiterLuc closed this Nov 9, 2023
@teo-tsirpanis teo-tsirpanis deleted the uuid-refactor branch January 31, 2024 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants