Row-level TTL PR 1: new API #370

ableegoldman · 2024-10-24T07:04:12Z

First PR for row-level ttl. Includes only the API changes and supporting code

TODO (in order)

Cassandra Fact tables (insert/get)
TopologyTestDriver
Mongo KV tables (insert/get/range/all)
Cassandra KVTables (insert/get/range/all)

Next PR: #371

ableegoldman · 2024-10-24T07:08:41Z

kafka-client/src/main/java/dev/responsive/kafka/api/stores/ResponsiveStores.java

@@ -56,7 +56,9 @@ public final class ResponsiveStores {
   * @return a supplier for a key-value store with the given options
   *         that uses Responsive's storage for its backend
   */
-  public static KeyValueBytesStoreSupplier keyValueStore(final ResponsiveKeyValueParams params) {
+  public static ResponsiveKeyValueBytesStoreSupplier keyValueStore(


@agavra FYI all the changes in ResponsiveStores are technically unrelated -- just holdovers from an alternative API approach I abandoned -- but I think it's the right thing to do anyways so I kept it in. Hope you don't mind

ableegoldman · 2024-10-24T07:39:34Z

kafka-client/src/main/java/dev/responsive/kafka/api/stores/TtlProvider.java

+      return new TtlDuration(Duration.ZERO, Ttl.INFINITE);
+    }
+
+    // TODO(sophie): store ttl as long to avoid Duration conversions on the hot path


@agavra I'm actually a bit concerned about this. Even if we store it as millis we'll still have a Duration on the hot path since this is the return type. What if we just bite the bullet and have the user return a TTL in terms of millis rather than a Duration?

I know it's not as nice API-wise but To be honest I think most Streams users are probably pretty used to reasoning about time in terms of milliseconds. So I'm personally leaning towards replacing Duration with long millis entirely. Thoughts?

Do we have evidence that this is actually significant on a per-retrieved doc basis?

A few bitwise CPU operations seems to be dwarfed by the potential network call + the re-serialization in my opinion. Even if hits the in-memory cache, we're talking about loads from main memory into L1 cache.

The cost of creating a Duration is ~15ns (thanks ChatGPT for the estimation, looks legit to me, and half of that cost is just object allocation) and the cost of calling toMillis is ~1ns (assuming there's no nanosecond component), so we're talking 16ns per-key returned overhead for using Duration on a 3.2Ghz CPU.

Do we have evidence that this is actually significant on a per-retrieved doc basis?

No hard/recent evidence -- I suppose I'm over-leveraging on some older benchmarks I only half-remember from back at Confluent. It's probably not worth worrying about right now

agavra

LGTM! Just nits inline

kafka-client/src/main/java/dev/responsive/kafka/api/stores/ResponsiveKeyValueParams.java

kafka-client/src/main/java/dev/responsive/kafka/api/stores/TtlProvider.java

agavra · 2024-10-24T16:16:55Z

kafka-client/src/main/java/dev/responsive/kafka/api/stores/TtlProvider.java

+      return new TtlDuration(Duration.ZERO, Ttl.INFINITE);
+    }
+
+    // TODO(sophie): store ttl as long to avoid Duration conversions on the hot path


Do we have evidence that this is actually significant on a per-retrieved doc basis?

A few bitwise CPU operations seems to be dwarfed by the potential network call + the re-serialization in my opinion. Even if hits the in-memory cache, we're talking about loads from main memory into L1 cache.

The cost of creating a Duration is ~15ns (thanks ChatGPT for the estimation, looks legit to me, and half of that cost is just object allocation) and the cost of calling toMillis is ~1ns (assuming there's no nanosecond component), so we're talking 16ns per-key returned overhead for using Duration on a 3.2Ghz CPU.

ableegoldman requested a review from agavra October 24, 2024 07:04

API only

8d72148

ableegoldman force-pushed the TTL-1-API branch from 7d5a4db to 8d72148 Compare October 24, 2024 07:05

ableegoldman commented Oct 24, 2024

View reviewed changes

ableegoldman added 3 commits October 24, 2024 00:23

few more updates

e7e2f30

rename

7050300

checkstyle

cb77482

ableegoldman commented Oct 24, 2024

View reviewed changes

agavra approved these changes Oct 24, 2024

View reviewed changes

PR review

b05b1a5

ableegoldman merged commit e7a5b8e into main Oct 24, 2024
1 check passed

ableegoldman deleted the TTL-1-API branch October 24, 2024 20:45

This was referenced Nov 1, 2024

Row-level TTL PR 7: TopologyTestDriver#advanceWallClockTime #379

Merged

Row-level TTL PR 8: end-to-end integration test and advance stream-time during #get #380

Merged

Row-level TTL PR 9: TTD test with ttl deduplicator #381

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Row-level TTL PR 1: new API #370

Row-level TTL PR 1: new API #370

ableegoldman commented Oct 24, 2024 •

edited

Loading

ableegoldman Oct 24, 2024

ableegoldman Oct 24, 2024

agavra Oct 24, 2024

ableegoldman Oct 24, 2024

agavra left a comment

agavra Oct 24, 2024

Row-level TTL PR 1: new API #370

Row-level TTL PR 1: new API #370

Conversation

ableegoldman commented Oct 24, 2024 • edited Loading

ableegoldman Oct 24, 2024

Choose a reason for hiding this comment

ableegoldman Oct 24, 2024

Choose a reason for hiding this comment

agavra Oct 24, 2024

Choose a reason for hiding this comment

ableegoldman Oct 24, 2024

Choose a reason for hiding this comment

agavra left a comment

Choose a reason for hiding this comment

agavra Oct 24, 2024

Choose a reason for hiding this comment

ableegoldman commented Oct 24, 2024 •

edited

Loading