Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Exclude dropped chunks from the list of all chunks in the catalog #207

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

vidosits
Copy link
Contributor

@vidosits vidosits commented Feb 7, 2024

Fixes #206

Unit tests have to be written, but internal/systemcatalog/systemcatalog.go is missing its tests altogether and I'm kinda lost where to start so as to not to just test basic stuff, but @noctarius the gentleman he is has offered to add unit tests

@vidosits
Copy link
Contributor Author

vidosits commented Feb 8, 2024

hey, @noctarius could you check if this is the kind of test you were thinking of?

@vidosits vidosits force-pushed the fix/ignore-dropped-hypertable-chunks branch from ca01ead to e7eaca5 Compare February 8, 2024 11:26
@noctarius
Copy link
Owner

Looks like a real good initial test. Have you made sure the test fails without your fix? I'm not sure that this case fixes your specific issue. Happy to test it though :-)

@noctarius
Copy link
Owner

It looks like it's working without the additional check. I wonder when dropped is set to true, over the chunk being dropped immediately 🤔

Let me investigate the Timescale sourcecode. What version do you use? Maybe it's a recent change?!

@vidosits
Copy link
Contributor Author

vidosits commented Feb 8, 2024

you're right of course, this test is a bait in it's current form, it passes without the committed "fix" either way. I think the reason is a combination of the following:

The test itself does not touch this code path. So even if the test failed then the fix wouldn't fix it. The reason is because we're not collecting chunks using func (sc *systemCatalog) GetAllChunks(), but rather with an SQL query on the timescale internal schema.

During this test we're not actually testing the internals, we're just basically testing if we successfully dropped the chunks. It's a bad test.

The whole reason I needed this fix is because of the following scenario:

  • We have a retention policy on one of our tables
  • The table is huge and gets a lot of queries per minute
  • I think (not quite sure) but timescale can't get a lock on the chunk because of a continuous aggregate or some other query
  • I'm actually unsure about the above because the chunk does get dropped, but then metadata stays in the internal chunks table, but it correctly gets marked as "dropped" see table from Event streamer tries to read from dropped chunks #206

Also the test is wrong because of the above: when I'm dropping chunks with SELECT drop_chunks() in the test they all get dropped and I think there's no corresponding row left in the mock db indicating that the chunk was there but with dropped = true column set

To summarize:

  • thanks for taking a look at the test, it's very obviously wrong in multiple places
  • we'd have to somehow simulate (manually setting dropped=True with SQL?) some chunks being marked as dropped in the test
  • we'd have to adjust the test so that it actually touches the function (func (sc *systemCatalog) GetAllChunks()) modified by the fix commit

@vidosits
Copy link
Contributor Author

vidosits commented Feb 8, 2024

It looks like it's working without the additional check. I wonder when dropped is set to true, over the chunk being dropped immediately 🤔

Let me investigate the Timescale sourcecode. What version do you use? Maybe it's a recent change?!

We're using timescale/timescaledb-ha:pg15-ts2.10 image with the v0.33.1 timescaledb-single helm chart from charts.timescale.com

@noctarius
Copy link
Owner

Great! I won't have a chance to look at this again until the end of the weekend since we're on the road, but thanks for your effort and help :-)

@vidosits
Copy link
Contributor Author

vidosits commented Feb 8, 2024

No problem, thanks for taking the time and looking at my silly problem

@noctarius
Copy link
Owner

Not silly at all. It's a valid issue and should be fixed. I guess, I'll probably fall back, forcing the dropped field to true 😅

It may be possible, that it is set first, in it's own transaction, and then the actual drop happens, to prevent the data being "available" for large chunks when actually dropping may take a second or two.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Event streamer tries to read from dropped chunks
2 participants