Row-level ttl for KV stores #369

ableegoldman · 2024-10-16T20:56:10Z

TODO (this PR, WIP now):
-RowLevelTtlIntegrationTest
-MongoKeyValueTable unit tests
-AbstractCassandraKVTable unit test (extract ttl unit tests from CassandraFactTableTest)
-range/all support

Followup work:
-migration mode ttl
-in-memory kv store ttl?
-TopologyTestDriver ttl

kafka-client/src/main/java/dev/responsive/kafka/api/stores/ResponsiveStores.java

kafka-client/src/main/java/dev/responsive/kafka/api/stores/ResponsiveKeyValueParams.java

kafka-client/src/main/java/dev/responsive/kafka/api/stores/TtlProvider.java

kafka-client/src/main/java/dev/responsive/kafka/internal/db/CassandraFactTable.java

kafka-client/src/test/java/dev/responsive/kafka/integration/RowLevelTtlIntegrationTest.java

kafka-client/src/main/java/dev/responsive/kafka/api/stores/ResponsiveStores.java

agavra

It's looking good! Thanks for putting this together so fast. No major comments.

kafka-client/src/main/java/dev/responsive/kafka/api/stores/TtlProvider.java

agavra · 2024-10-18T03:15:47Z

kafka-client/src/main/java/dev/responsive/kafka/internal/stores/PartitionedOperations.java

+        // TODO(sophie): figure out how to account for row-level ttl in migration mode
+        startTimeMs = System.currentTimeMillis() - params.ttlProvider().defaultTtl().toMillis();


let's create a follow-up ticket, but I think what we can do is just have them pass in the "start time" to the tool, and then just apply individual TTls as necessary

kafka-client/src/main/java/dev/responsive/kafka/internal/utils/StateDeserializer.java

kafka-client/src/main/java/dev/responsive/kafka/internal/utils/Utils.java

agavra · 2024-10-18T03:20:55Z

.../src/test/java/dev/responsive/kafka/internal/db/AbstractCassandraKVTableIntegrationTest.java

+
+package dev.responsive.kafka.internal.db;
+
+public class AbstractCassandraKVTableIntegrationTest {


nope this is TODO -(going to combine some of the common KV vs Fact table tests, especially the ttl unit tests)

if possible, I'd prefer we used static helper methods to abstract classes for tests 🤷

It's not about the static helper methods it's about the test themselves. We want to run largely the same set of tests for both of these (same insert semantics, same ttl semantics, etc) so I was going to extract all the common tests into this class.

If you'd prefer the @Parametrize approach I can take a look at that, but since not all the tests are going to be the same for the kv vs fact tables, I thought the abstract test format would be cleaner. Does that make sense?

(That said, ideally we can share the same tests for the Mongo table as well, in which case another approach might be easier in the end 🤷‍♀️ )

agavra · 2024-10-18T03:23:22Z

responsive-test-utils/src/main/java/dev/responsive/kafka/internal/db/TTDKeyValueTable.java

-
-      maybeTtlSpec = ((DelegatingTableSpec) maybeTtlSpec).delegate();
+    if (!spec.ttlResolver().hasConstantTtl()) {
+      throw new UnsupportedOperationException("The ResponsiveTopologyTestDriver does not yet "


let's create a ticket for this, I think we have customers that use this

agavra · 2024-10-22T16:18:43Z

kafka-client/src/main/java/dev/responsive/kafka/internal/db/CassandraFactTable.java

-        .bind()
-        .setByteBuffer(DATA_KEY.bind(), ByteBuffer.wrap(key.get()))
-        .setInstant(TIMESTAMP.bind(), Instant.ofEpochMilli(minValidTs));
+  public byte[] get(final int kafkaPartition, final Bytes key, long streamTimeMs) {


the conditional complexity of this method is pretty high -- would be good to split it up into individual methods, perhaps follow the form:

if !ttlResolver { simpleGet() } else if !ttlResolver.needsValueToComputeTtl() { getWithMinTtl() } else { getWithPostFilterForTtl() }

agavra · 2024-10-22T16:21:34Z

kafka-client/src/main/java/dev/responsive/kafka/internal/db/CassandraKeyValueTable.java


      final ResultSet result = client.execute(range);
      resultsPerPartition.add(Iterators.kv(result.iterator(), CassandraKeyValueTable::rows));
    }
+    // TODO(sophie): filter by minValidTs if based on value


we should make sure to address these TODOs before cutting a release since this will result in wrong data, I know it's already a big PR so just make sure we're tracking it somewhere

agavra · 2024-10-22T16:31:10Z

kafka-client/src/main/java/dev/responsive/kafka/internal/db/mongo/MongoKVTable.java

+      // if the default ttl is infinite  we still have to define the ttl index for
+      // the table since the ttlProvider may apply row-level overrides. To approximate
+      // an "infinite" default retention, we just set the default ttl to the maximum value


I think we should change the approach here to be more like the Tombstones: we would create the index on a new field expire_ts (instead of the insertion ts), which could be just be null/empty for documents that don't have a TTL. And instead of having expireAfter = ttl + 12h we should have expireAfter = 12h (which by the way, it looks like this change dropped the 12h, which was there to give a grace to make sure that stream time advances past the physical time). This would dramatically reduce the size of the index in situations where many rows do not have a TTL.

From https://www.mongodb.com/docs/manual/core/index-ttl/#expiration-of-data:

If the indexed field in a document doesn't contain one or more date values, the document will not expire.

If a document does not contain the indexed field, the document will not expire.

agavra · 2024-10-22T16:37:17Z

kafka-client/src/main/java/dev/responsive/kafka/internal/db/mongo/MongoKVTable.java

-    return v == null ? null : v.getValue();
+  public byte[] get(final int kafkaPartition, final Bytes key, final long streamTimeMs) {
+
+    // Need to post-filter if value is needed to compute ttl


I actually think with the approach suggested above, we can do it all server side because expire_ts is already a dedicated column so we don't need to recompute it. We could actually consider doing the same thing in Scylla by inserting an extra column for expire_ts (they just wouldn't create an index on it).

agavra · 2024-10-22T16:37:43Z

kafka-client/src/main/java/dev/responsive/kafka/internal/db/mongo/MongoKVTable.java

  ) {
+    // TODO(sophie): filter by minValidTs if based on key or default only


Same comment about TODOs

kafka-client/src/main/java/dev/responsive/kafka/internal/utils/Utils.java

agavra · 2024-10-22T16:50:23Z

kafka-client/src/main/java/dev/responsive/kafka/api/stores/TtlProvider.java

+    // Just return Optional.empty if the row override is actually equal to the default
+    // to help simplify handling logic for tables
+    if (rowTtlOverride.isPresent() && rowTtlOverride.get().equals(defaultTtl)) {
+      return Optional.empty();
+    }


I'd prefer we removed this optimization unless it proves necessary -- does it actually simplify things? I imagine we shouldn't be comparing the result of this to the default anywhere in the code

It works with today's implementation, but if we ever have partial updates it's risky (as in I inserted a row with a non-default TTL and then I update it to be the default TTL I should make sure that every scenario would override the old TTL)

agavra · 2024-10-22T16:51:49Z

kafka-client/src/main/java/dev/responsive/kafka/internal/db/mongo/MongoKVTable.java

-        Filters.gte(KVDoc.TIMESTAMP, minValidTs)
-    )).first();
-    return v == null ? null : v.getValue();
+  public byte[] get(final int kafkaPartition, final Bytes key, final long streamTimeMs) {


Let's make sure to add tests for MongoDB as well. I'd actually prefer if we just threw an exception for this PR if there's anything but a static default TTL so we can have all the mongo changes be part of a future PR.

agavra · 2024-10-22T16:53:10Z

kafka-client/src/test/java/dev/responsive/kafka/integration/RowLevelTtlIntegrationTest.java

+  }
+
+  @Test
+  public void test() throws Exception {


is this test complete? (also re: TODO below)

haha no, this is the test I've been working on. I just happened to push the skeleton for this it seems

agavra · 2024-10-22T16:57:20Z

...client/src/test/java/dev/responsive/kafka/internal/db/CassandraFactTableIntegrationTest.java


    // Then
-    assertThat(valAt0, is(val));
-    assertThat(valAt1, nullValue());
+    long lookupTime = Duration.ofMinutes(11).toMillis();


no need to change it for this PR, but just in the future you can use TimeUnit.MINUTES.toMillis(11) etc.. for all of these. IMO a little easier to read and has no object allocations 😅

stale

ableegoldman requested a review from agavra October 16, 2024 20:56

ableegoldman commented Oct 16, 2024

View reviewed changes

kafka-client/src/main/java/dev/responsive/kafka/api/stores/ResponsiveStores.java Outdated Show resolved Hide resolved

agavra reviewed Oct 16, 2024

View reviewed changes

agavra previously approved these changes Oct 18, 2024

View reviewed changes

agavra reviewed Oct 22, 2024

View reviewed changes

ableegoldman force-pushed the FEATURE-row-level-ttl branch from 0436073 to ef015eb Compare October 25, 2024 01:49

Rebase on main after TTL PR #2

993c8eb

ableegoldman force-pushed the FEATURE-row-level-ttl branch from ef015eb to 993c8eb Compare October 25, 2024 02:23

ableegoldman added 2 commits October 24, 2024 19:26

spacing

4a2cea0

merge with upstream

cc48d46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Row-level ttl for KV stores #369

Row-level ttl for KV stores #369

ableegoldman commented Oct 16, 2024 •

edited

Loading

agavra left a comment

agavra Oct 18, 2024

agavra Oct 18, 2024

ableegoldman Oct 21, 2024

agavra Oct 21, 2024

ableegoldman Oct 24, 2024

agavra Oct 18, 2024

agavra Oct 22, 2024

agavra Oct 22, 2024

agavra Oct 22, 2024

agavra Oct 22, 2024

agavra Oct 22, 2024

agavra Oct 22, 2024

agavra Oct 22, 2024

agavra Oct 22, 2024

ableegoldman Oct 24, 2024 •

edited

Loading

agavra Oct 22, 2024

		// TODO(sophie): figure out how to account for row-level ttl in migration mode
		startTimeMs = System.currentTimeMillis() - params.ttlProvider().defaultTtl().toMillis();


		package dev.responsive.kafka.internal.db;

		public class AbstractCassandraKVTableIntegrationTest {

		) {
		// TODO(sophie): filter by minValidTs if based on key or default only

Row-level ttl for KV stores #369

Are you sure you want to change the base?

Row-level ttl for KV stores #369

Conversation

ableegoldman commented Oct 16, 2024 • edited Loading

agavra left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ableegoldman Oct 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ableegoldman commented Oct 16, 2024 •

edited

Loading

ableegoldman Oct 24, 2024 •

edited

Loading