Skip to content
This repository has been archived by the owner on Aug 13, 2019. It is now read-only.

Add ChunksIterator method to Series interface. #665

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

bwplotka
Copy link
Contributor

@bwplotka bwplotka commented Jul 20, 2019

This is alternative to the #659 I feel like iterators is the way to go here as we want to iterate & stream chunks instead of buffering all.

Both PRs are for same use case: Allowing Querier to return raw chunks if asked. This will help massively on prometheus/prometheus#5703

This iterators implementation is easy here but will be more important & tricky for head in Prometheus.

fixes https://github.com/prometheus/tsdb/issues/328

Signed-off-by: Bartek Plotka [email protected]

querier.go Show resolved Hide resolved
querier.go Outdated Show resolved Hide resolved
querier.go Show resolved Hide resolved
querier.go Outdated Show resolved Hide resolved
@bwplotka bwplotka force-pushed the add-chunks-to-queriers-iterator branch from 291ab8e to 64eeed9 Compare July 29, 2019 17:26
@bwplotka
Copy link
Contributor Author

Addressed comments @codesome PTAL

bwplotka added 3 commits July 31, 2019 12:23
Signed-off-by: Bartek Plotka <[email protected]>
…o far.

TODO: tests & splitting into 2.

Signed-off-by: Bartek Plotka <[email protected]>
@bwplotka
Copy link
Contributor Author

bwplotka commented Aug 1, 2019

Added bit yolo implementation of verticalMergeChunkIterator - need to finish edge cases and add tests, but let me know your thoughts @brian-brazil @codesome

Copy link
Contributor

@codesome codesome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach looks fine to me.

querier.go Show resolved Hide resolved
querier.go Outdated
}

func (it *noSeekSeriesIterator) Seek(t int64) bool {
it.err = errors.New("not implemented: Seek method invoked for noSeekSeriesIterator")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather prefer having an inefficient implementation as @brian-brazil said instead of an error here.

querier.go Show resolved Hide resolved
@bwplotka
Copy link
Contributor Author

bwplotka commented Aug 2, 2019

@brian-brazil very true. But we need to split chunks from time to time as mentioned here: https://github.com/prometheus/tsdb/issues/670 so I would reencode in place here.

Changed the code to reuse more stuff around, let me know how that looks.

chunkenc/chunk.go Outdated Show resolved Hide resolved
querier.go Outdated Show resolved Hide resolved
CHANGELOG.md Outdated
## 0.10.0

- [FEATURE] Added `DBReadOnly` to allow opening a database in read only mode.
- `DBReadOnly.Blocks()` exposes a slice of `BlockReader`s.
- `BlockReader` interface - removed MinTime/MaxTime methods and now exposes the full block meta via `Meta()`.
- [FEATURE] `chunckenc.Chunk.Iterator` method now takes a `chunckenc.Iterator` interface as an argument for reuse.
- [CHANGE] `Series` interface allows return chunk iterator that allows iterating over encoded chunks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong paragraph?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failed rebase

return false
}

for it.Next() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think when the current time is the requested time calling Seek should not advance the iterator.

Suggested change
for it.Next() {
if atT, _ := it.At(); t >= atT {
return true
}
for it.Next() {

the same for all places.

I am basing this behavior basd on @gouthamve's comment in https://github.com/prometheus/tsdb/issues/328#issuecomment-387739347

Should also add this to ensure the behavior in the tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! I will add decisions there to Seek interface comment and add tests for this.

querier.go Outdated Show resolved Hide resolved
return newVerticalMergeSeriesIterator(s.series...)
}

// verticalMergeSeriesIterator implements a series iterater over a list
func (s *verticalChainedSeries) ChunkIterator() ChunkIterator {
Copy link
Contributor

@krasi-georgiev krasi-georgiev Aug 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful to also add a test for this.
It will help visualize what output is expected with a given input.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test that covers this?

}
return &verticalMergeChunkIterator{
a: s[0].ChunkIterator(),
b: newVerticalMergeChunkIterator(s[1:]...),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this really hard to understand. I know it is used in the verticalMergeSeriesIterator as well, but will be really happy if we simplify this at some point.
Not a blocker for this PR thought. I just thought I would mention it 😈

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recursive approach is sometimes hard to follow indeed. Let's postpone cleaning this though, as we do this everywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did run a debug session and the final struct is really bizarre, sort of deeply nested.

But yeah a conversation for another PR.

@bwplotka
Copy link
Contributor Author

bwplotka commented Aug 8, 2019

Thanks @krasi-georgiev for review. Let me add few unit tests to cover all iterators I added (:

Signed-off-by: Bartek Plotka <[email protected]>
// Err returns the current error.
Err() error
func (s *chunkSeries) ChunkIterator() ChunkIterator {
return &chunkIterator{chunks: s.chunks}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newChunkSeriesIterator accepts mint and maxt. Should we do the same here? @brian-brazil @krasi-georgiev ? I think we should.

However not sure what mint maxt means here in detail. Should we be strict and allow mint and maxt in terms of samples inside chunks as well?

By the looks of this: https://github.com/prometheus/tsdb/blob/78e9e360581153d73b1c69d31fd6e2d89e060a82/querier.go#L796 we just check if they overlap with required min or maxt here.

Copy link
Contributor Author

@bwplotka bwplotka Aug 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, we pass to newChunkSeriesIterator with mint maxt so I am pretty sure we need to be strict here as well in ChunkIterator. E.g when we query block [0, 20)with querier restricted to[10, 40]we need to return chunk[10,19]even if the block has[0, 19]on thisChunkIterator` level, I assume.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newChunkSeriesIterator accepts mint and maxt. Should we do the same here? @brian-brazil @krasi-georgiev ? I think we should.

I think it is fine to start with with the current interface if it covers the streaming use case. Don't see a problem to refactor when/if needed.

* Simplified; merged seek and non seek cases together. Added explicit min/max only for chunk series iterator, where it is relevant.
* Adjusted all seek implementation to match edge case requirement (double seek, failed seek + next).

Signed-off-by: Bartek Plotka <[email protected]>
@bwplotka bwplotka force-pushed the add-chunks-to-queriers-iterator branch from d60075a to b84c439 Compare August 8, 2019 18:27
@@ -60,6 +60,19 @@ type Series interface {
ChunkIterator() ChunkIterator
}

// ChunkIterator iterates over the chunk of a time series.
type ChunkIterator interface {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to be near Series interface.

// Do binary search between current position and end.
it.idx = sort.Search(len(it.list)-it.idx, func(i int) bool {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem here was that sort.Search on not found gives the index to put the value in. This makes sense but seems like we forgot that we operate on shorter len(it.list)-it.idx list (:

@bwplotka
Copy link
Contributor Author

bwplotka commented Aug 9, 2019

With new changes this PR also fixes https://github.com/prometheus/tsdb/issues/328

Copy link
Contributor

@krasi-georgiev krasi-georgiev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this cover all changes from #329 so I can close it in favor of this one?

CHANGELOG.md Show resolved Hide resolved
chunkenc/chunk.go Show resolved Hide resolved
chunkenc/chunk.go Show resolved Hide resolved
chunkenc/chunk_test.go Show resolved Hide resolved
querier.go Show resolved Hide resolved
querier.go Show resolved Hide resolved
querier_test.go Show resolved Hide resolved
// Err returns the current error.
Err() error
func (s *chunkSeries) ChunkIterator() ChunkIterator {
return &chunkIterator{chunks: s.chunks}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newChunkSeriesIterator accepts mint and maxt. Should we do the same here? @brian-brazil @krasi-georgiev ? I think we should.

I think it is fine to start with with the current interface if it covers the streaming use case. Don't see a problem to refactor when/if needed.

return newVerticalMergeSeriesIterator(s.series...)
}

// verticalMergeSeriesIterator implements a series iterater over a list
func (s *verticalChainedSeries) ChunkIterator() ChunkIterator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test that covers this?

}
return &verticalMergeChunkIterator{
a: s[0].ChunkIterator(),
b: newVerticalMergeChunkIterator(s[1:]...),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did run a debug session and the final struct is really bizarre, sort of deeply nested.

But yeah a conversation for another PR.

@bwplotka
Copy link
Contributor Author

bwplotka commented Aug 9, 2019

@krasi-georgiev

cc @free if you are happy with that. I am touching this place as well.

// Seek advances the iterator forward to the given timestamp.
// It advances to the chunk with min time at t or first chunk with min time after t.
Seek(t int64) bool
// At returns the meta.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"current value". This makes it sounds like it's dropping memes :)

chunkenc/chunk.go Show resolved Hide resolved
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tsdb: SeriesIterator.Seek() is vaguely defined
5 participants