Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Row-level TTL PR 1: new API #370

Merged
merged 5 commits into from
Oct 24, 2024
Merged

Row-level TTL PR 1: new API #370

merged 5 commits into from
Oct 24, 2024

Conversation

ableegoldman
Copy link
Contributor

@ableegoldman ableegoldman commented Oct 24, 2024

First PR for row-level ttl. Includes only the API changes and supporting code

TODO (in order)

  1. Cassandra Fact tables (insert/get)
  2. TopologyTestDriver
  3. Mongo KV tables (insert/get/range/all)
  4. Cassandra KVTables (insert/get/range/all)

Next PR: #371

@ableegoldman ableegoldman requested a review from agavra October 24, 2024 07:04
@@ -56,7 +56,9 @@ public final class ResponsiveStores {
* @return a supplier for a key-value store with the given options
* that uses Responsive's storage for its backend
*/
public static KeyValueBytesStoreSupplier keyValueStore(final ResponsiveKeyValueParams params) {
public static ResponsiveKeyValueBytesStoreSupplier keyValueStore(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agavra FYI all the changes in ResponsiveStores are technically unrelated -- just holdovers from an alternative API approach I abandoned -- but I think it's the right thing to do anyways so I kept it in. Hope you don't mind

return new TtlDuration(Duration.ZERO, Ttl.INFINITE);
}

// TODO(sophie): store ttl as long to avoid Duration conversions on the hot path
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agavra I'm actually a bit concerned about this. Even if we store it as millis we'll still have a Duration on the hot path since this is the return type. What if we just bite the bullet and have the user return a TTL in terms of millis rather than a Duration?

I know it's not as nice API-wise but To be honest I think most Streams users are probably pretty used to reasoning about time in terms of milliseconds. So I'm personally leaning towards replacing Duration with long millis entirely. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have evidence that this is actually significant on a per-retrieved doc basis?

A few bitwise CPU operations seems to be dwarfed by the potential network call + the re-serialization in my opinion. Even if hits the in-memory cache, we're talking about loads from main memory into L1 cache.

The cost of creating a Duration is ~15ns (thanks ChatGPT for the estimation, looks legit to me, and half of that cost is just object allocation) and the cost of calling toMillis is ~1ns (assuming there's no nanosecond component), so we're talking 16ns per-key returned overhead for using Duration on a 3.2Ghz CPU.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have evidence that this is actually significant on a per-retrieved doc basis?

No hard/recent evidence -- I suppose I'm over-leveraging on some older benchmarks I only half-remember from back at Confluent. It's probably not worth worrying about right now

Copy link
Contributor

@agavra agavra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just nits inline

return new TtlDuration(Duration.ZERO, Ttl.INFINITE);
}

// TODO(sophie): store ttl as long to avoid Duration conversions on the hot path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have evidence that this is actually significant on a per-retrieved doc basis?

A few bitwise CPU operations seems to be dwarfed by the potential network call + the re-serialization in my opinion. Even if hits the in-memory cache, we're talking about loads from main memory into L1 cache.

The cost of creating a Duration is ~15ns (thanks ChatGPT for the estimation, looks legit to me, and half of that cost is just object allocation) and the cost of calling toMillis is ~1ns (assuming there's no nanosecond component), so we're talking 16ns per-key returned overhead for using Duration on a 3.2Ghz CPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants