Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNDB-11680: Add source sstable/memtable id to vector traces #1411

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

michaeljmarshall
Copy link
Member

@michaeljmarshall michaeljmarshall commented Nov 8, 2024

What is the issue

Fixes: https://github.com/riptano/cndb/issues/11680

When we resume the search for a vector query, it'd be helpful to know which sstable we're inspecting.

What does this PR fix and why was it fixed

Add the sstable/memtable to the log.

Checklist before you submit for review

  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits

@michaeljmarshall michaeljmarshall changed the title Add source sstable/memtable id to vector traces CNDB-11680: Add source sstable/memtable id to vector traces Nov 8, 2024
@jkni jkni self-requested a review November 8, 2024 23:05
@michaeljmarshall
Copy link
Member Author

VectorDistributedTest.rangeRestrictedTest passes locally

Copy link

@jkni jkni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is really useful, but as a nit, I don't love the output for memtables.

For an sstable, we see the SSTable ID, which is ULID-based in the common case, e.g.
DiskANN resume for 1/5 visited 0 nodes, reranked 0 to return 0 results from 3gl2_1sri_5vjqo2nd1q22mk2235.

For the memtables, we rely on the memtable toString implementation, which can contain lots of other information:
ANN search for 1/5 visited 1 nodes, reranked 0 to return 1 results from Memtable-tbl@1033435943(52B serialized bytes, 1 ops, 6.230KiB (0%) on-heap, 101B (0%) off-heap)

Maybe for memtable, we could just use the hashCode, as in the toString implementation.

Here are two samples of the trace at this
point in commit history

"ANN resume for 2/9 visited 0 nodes, reranked 0 to return 0 results from TrieMemtable@acb5508"
"DiskANN search for 4/9 visited 4 nodes, reranked 0 to return 4 results from 3gl7_1sv4_58m4g2ch3lawuilnd5"
Copy link

sonarcloud bot commented Nov 14, 2024

@cassci-bot
Copy link

❌ Build ds-cassandra-pr-gate/PR-1411 rejected by Butler


6 new test failure(s) in 4 builds
See build details here


Found 6 new test failures

Test Explanation Branch history Upstream history
...oParseClientMessageTest.badHeader[version=5/v5] regression 🔴🔵🔵🔵 🔵🔵🔵🔵🔵🔵🔵
o.a.c.i.s.d.v.VectorCompressionTest.testAda002 regression 🔴🔴🔴🔵 🔵🔵🔵🔵🔵🔵🔵
o.a.c.i.s.d.v.VectorCompressionTest.testBert regression 🔴🔴🔴🔵 🔵🔵🔵🔵🔵🔵🔵
o.a.c.i.s.d.v.VectorCompressionTest.testNV_QA_4 regression 🔴🔴🔴🔵 🔵🔵🔵🔵🔵🔵🔵
...i.s.d.v.VectorCompressionTest.testOpenAiV3Large regression 🔴🔴🔴🔵 🔵🔵🔵🔵🔵🔵🔵
o.a.c.u.b.BinLogTest.testTruncationReleasesLogS... flaky 🔴🔵🔴🔵 🔵🔵🔵🔵🔵🔵🔵

Found 54 known test failures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants