Use C code from pgsql-hackers #15

x4m · 2023-09-17T05:58:55Z

This code have several advantages:

Implement counter per standart draft. This counter protects from collisions better than random numbers, when UUIDs are generated at high speed.
Buffer randomness. This greatly improves speed of generation. Try select uuid_generate_v7() from generate_series(1,1E6);
Avoid pghton(), because it does several unnecesary manipulations.
Avoid initializing timestamp part with randomness. This helps to save some entropy. Make the world greener :)

The code is based on PG patch https://commitfest.postgresql.org/43/4388/ I have my repo https://github.com/x4m/pg_uuid_next , but I do not have energy to support it like you do. So I decided to bring some code from there to your implementation.

Have a nice day :)

This code have several advantages: 1. Implement counter per standart draft. This counter protects from collisions better than random numbers, when UUIDs are generated at high speed. 2. Buffer randomness. This greatly improves speed of generation. Try select uuid_generate_v7() from generate_series(1,1E6); 3. Avoid pghton(), because it does several unnecesary manipulations. 4. Avoid initializing timestamp part with randomness. This helps to save some entropy. Make the world greener :) The code is based on PG patch https://commitfest.postgresql.org/43/4388/ I have my repo https://github.com/x4m/pg_uuid_next , but I do not have energy to support it like you do. So I decided to bring some code from there to your implementation. Have a nice day :)

fboulnois · 2023-09-17T20:31:46Z

@x4m , thanks for your PR! I also read your thread in the pgsql-hackers list.

A few thoughts:

As I outline in my comments on the RFC, I don't really like the alternate pseudorandom constructions for version 7 UUIDs. From a philosophical perspective, one of the goals of this extension is to keep code as simple as possible so it is easy to reason through, and I think the alternate constructions add too much surface area to get things wrong. From a technical perspective, the counter uses up bits in the random data but I am not sure the tradeoff is worthwhile. With 72 bits of randomness you can generate billions of UUIDs every ms without collisions.
This is smart! However, I wonder if the cache should live in the pg_strong_random layer instead? This is what other high quality cryptography libraries seem to do. That way, this would benefit every Postgres instance and application in the world. Talk about making the world greener!
pg_hton64 is ultimately an alias for bswap64, which will get compiled on supported architectures as a single instruction. Are there benchmarks that show that manual bit swapping is faster?
The truth is that I didn't want to fill the UUID at an arbitrary location to avoid buffer overflow type issues, but this is fair to point out.

All in all, I'm unlikely to add 1 or 3. If I benchmark the cache in 2 and it provides a large performance improvement, I may implement something similar. 4 seems reasonable if I can verify and test there aren't any off-by-one errors that would lead to buffer overflows.

Cheers.

x4m · 2023-09-18T11:18:36Z

The idea behind a counter is about keeping serially generated UUIDs sorted. Randomly initialized counter still counts against global collisions, better prevents local collisions and keep data sorted for better ingestion speed.
Indeed, 128 bits is a lot, enought to solve most of uniquiness problems. But using some of these bits for counter helps to solve unqieness problem even better, also proving better sortability.
Postgres uses OpenSSL, though I did not benchmark the feature with this lib. When OpenSSL is absent simply "f = open("/dev/urandom", O_RDONLY, 0);" consumes all the time...

fboulnois · 2023-09-19T16:02:12Z

I'm still undecided on implementing something similar to 2. After several benchmarks the performance gain is about 10% on a Postgres install with OpenSSL. This is pretty good, but I still feel like the functionality probably shouldn't live in this extension. I'm going to continue considering it, however.

sergeyprokhorenko · 2024-04-15T19:08:03Z

@fboulnois
Could you please publish Andrey Borodin's (@x4m) extension. Many people have been wating for his patch, but unfortunately the patch was recently suspended because of feature freeze for 17 version of PostgreSQL, just before the final approval of the new RFC by its authors.

fboulnois · 2024-04-15T19:17:24Z

@fboulnois Could you please publish Andrey Borodin's (@x4m) extension. Many people have been wating for his patch, but unfortunately the patch was recently suspended because of feature freeze for 17 version of PostgreSQL, just before the final approval of the new RFC by its authors.

At this point, the only substantive difference between x4m's patch and my extension is the alternate pseudorandom construction, which I've already discussed in #15 (comment) . Cheers!

x4m · 2024-04-16T05:43:28Z

I think @sergeyprokhorenko meant exactly this: let's add a counter or microseconds.
The thing is if you do
select i, gen_uuid_v7() id from generate_series(1,1000) i
the order of i will be different from order of id.
This is not super critical, but actually makes life better in many scenarios.

sergeyprokhorenko · 2024-04-16T07:24:49Z

I meant to add a counter 18 bits long, initialized every millisecond with a random number. The most significant bit of the counter is initialized to zero to prevent the counter from overflowing if the most significant bits of the random number are ones.
All advanced implementations use a counter:

for JavaScript https://www.npmjs.com/package/uuidv7
for Rust https://crates.io/crates/uuid7
for Go (Golang) https://pkg.go.dev/github.com/gofrs/uuid#NewV7
for Python https://github.com/oittaa/uuid6-python
for C# https://github.com/stevesimmons/uuid7-csharp
for Java https://github.com/f4b6a3/uuid-creator/wiki/1.7.-UUIDv7

This promotes sortability, additional monotonicity and better database performance. It can also make it easier to find bugs by knowing the latest version of the record.
The name of the function with counter can be the common uuidv7().

fboulnois · 2024-06-09T17:47:22Z

Different projects use different counter lengths. Why use 18? I see other projects use 14, 16, 24, 40, 42, and 48 bits for the counter size.

x4m · 2024-06-09T18:08:20Z

Just a sane number: 125MHz without overflow, tradeoff between max frequency and predictability. Detailed explanation was in hackers AFAIR.

sergeyprokhorenko · 2024-06-10T18:25:22Z

Different projects use different counter lengths. Why use 18? I see other projects use 14, 16, 24, 40, 42, and 48 bits for the counter size.

Too short a counter makes it possible for the counter to overflow under peak loads. This can be mitigated by using the timestamp as the counter in such a situation, but is intuitively undesirable. But a short counter theoretically allows for faster UUID detection if the UUID is compared over the timestamp + counter segment (this is not the case now) rather than over the entire UUID length.

With 18 bits, counter overflow is guaranteed to be impossible if the high-order bit of the counter is initialized to zero every millisecond, and if the remaining counter bits are initialized to a random number.

Too long a counter reduces the random segment of the UUID, and therefore worsen unguessability of the UUID by brute-force guessing. But a long counter theoretically allows for fewer resources to be spent on random number generation.

RFC 9562 recommends that the counter SHOULD be at least 12 bits but no longer than 42 bits.

sergeyprokhorenko · 2024-06-10T18:59:50Z

UUIDv7 generation function in ClickHouse DBMS

https://clickhouse.com/docs/en/sql-reference/functions/uuid-functions#generateUUIDv7

ClickHouse/ClickHouse#62852

For the sake of monotonicity, when several microservices access the UUIDv7 generator in parallel, a single UUIDv7 generator is used per server.

sergeyprokhorenko · 2024-06-11T08:00:24Z

Just a sane number: 125MHz without overflow, tradeoff between max frequency and predictability. Detailed explanation was in hackers AFAIR.

See https://www.postgresql.org/message-id/84D20D0F-B5FF-41CD-9F48-E282CE9FEC1D%40yandex-team.ru

and https://www.postgresql.org/message-id/F91948DD-500A-4A22-ABB9-5F4C59C28851%40yandex-team.ru

The standard does not specify how big counter should be. PFA patch with 16 bit counter. Maybe it worth doing 18bit counter - it will save us one byte of PRNG data. Currently we only take 2 bits out of the whole random byte.

trim21 · 2024-10-24T03:01:01Z

I'm new to pgsql native extenions.

Should sequence_counter and previous_timestamp be protected by mutex?
or pg guarantee uuid_generate_v7 won't be called in parallel?

pg_uuidv7/sql/pg_uuidv7--1.6.sql

Lines 7 to 8 in 65efc24

    
           AS 'MODULE_PATHNAME', 'uuid_generate_v7' 
        
           VOLATILE STRICT LANGUAGE C PARALLEL SAFE;

it's PARALLEL SAFE here, I guess we need some kind of mutex?

fboulnois mentioned this pull request Sep 19, 2023

Support Postgres 16 #18

Merged

trim21 mentioned this pull request Oct 24, 2024

clock_gettime(CLOCK_REALTIME, &ts) is not monotonic #41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use C code from pgsql-hackers #15

Use C code from pgsql-hackers #15

x4m commented Sep 17, 2023

fboulnois commented Sep 17, 2023

x4m commented Sep 18, 2023

fboulnois commented Sep 19, 2023

sergeyprokhorenko commented Apr 15, 2024

fboulnois commented Apr 15, 2024

x4m commented Apr 16, 2024 •

edited

Loading

sergeyprokhorenko commented Apr 16, 2024 •

edited

Loading

fboulnois commented Jun 9, 2024

x4m commented Jun 9, 2024

sergeyprokhorenko commented Jun 10, 2024 •

edited

Loading

sergeyprokhorenko commented Jun 10, 2024 •

edited

Loading

sergeyprokhorenko commented Jun 11, 2024 •

edited

Loading

trim21 commented Oct 24, 2024 •

edited

Loading

Use C code from pgsql-hackers #15

Are you sure you want to change the base?

Use C code from pgsql-hackers #15

Conversation

x4m commented Sep 17, 2023

fboulnois commented Sep 17, 2023

x4m commented Sep 18, 2023

fboulnois commented Sep 19, 2023

sergeyprokhorenko commented Apr 15, 2024

fboulnois commented Apr 15, 2024

x4m commented Apr 16, 2024 • edited Loading

sergeyprokhorenko commented Apr 16, 2024 • edited Loading

fboulnois commented Jun 9, 2024

x4m commented Jun 9, 2024

sergeyprokhorenko commented Jun 10, 2024 • edited Loading

sergeyprokhorenko commented Jun 10, 2024 • edited Loading

sergeyprokhorenko commented Jun 11, 2024 • edited Loading

trim21 commented Oct 24, 2024 • edited Loading

x4m commented Apr 16, 2024 •

edited

Loading

sergeyprokhorenko commented Apr 16, 2024 •

edited

Loading

sergeyprokhorenko commented Jun 10, 2024 •

edited

Loading

sergeyprokhorenko commented Jun 10, 2024 •

edited

Loading

sergeyprokhorenko commented Jun 11, 2024 •

edited

Loading

trim21 commented Oct 24, 2024 •

edited

Loading