Mov quantile #761

novertia · 2024-07-17T13:58:00Z

This PR adds the moving quantile algorithm based on the articles mentioned in the issue. It follows the quantile approximation strategy when the rolling window do not have enough data. This PR adds both the code logic under mov_quantile.h as well as the required unit test under test_mov_quantile.cpp

novertia · 2024-07-19T04:54:52Z

@gavv
for calculating the index of heap to return (root index) of partition heap I am currently taking a percentile of size_t checking if its 0 -100 and then calculating the index (percentile * window_length)/100. I am not sure if this approach is okay or should I instead take percentile between 0-1 and calculate and type cast accordingly.

gavv · 2024-07-19T22:14:25Z

Thanks for PR! Me or @adrianopol will do review in the upcoming days.

Regarding 0..100 vs 0..1: I guess we should either name it MovPercentile and use 0..100 or name it MovQuantile and use 0..1. In both cases, we need to use float or double for this number.

novertia · 2024-07-20T09:50:15Z

If we use float or double for taking the percentile or quantile for index calculation at one point we will have to type cast window length from size_t to double or percentile to size_t since the index and window_length is in size_t. Should this approach be okay considering precision loss cases for the aforementioned type casting.

Can we take k instead of a percentile or quantile which can be the kth smallest element user want from the window length like in the document. Although not sure if this approach is in sync with the requirement.

gavv · 2024-07-20T10:55:32Z

Just to be sure I understand the question correctly. You're asking what pair of parameters to use, among these three alternatives, right?

window_size (L) + K in range [0; L)
window_size (L) + percentile (P) in range [0; 100) (then K = L * P / 100)
window_size (L) + quantile (Q) in range [0; 1) (then K = L * Q)

I think in code that will use MovPercentile/MovQuantile, the original parameters that we have would be window_size and percentile/quantile. E.g. we may need to compute 95th percentile of inter-arrival jitter for last 300 packets. We'll likely have numbers 300 and 95 (or 0.95) in config. We could compute K from those numbers, though it would be convenient if MovPercentile/MovQuantile would compute it for us.

As for rounding errors, I believe that if Q is 32-bit float in range [0; 1), then we can compute K precisely for any L in range [0; 2^23), because floats in range [0; 1) can be mapped one-to-one to integers in range [0; 2^23). If that's true, it looks more than enough and we don't have to bother about rounding errors.

I suggest to stick with MovQuantile approach with parameters L + Q, just to be sure we use floats or doubles in range [0; 1), since that range has highest precision (24 bits for 32-bit float, 48 bits for 64-bit float). If you think floats are not enough, let's use doubles.

novertia · 2024-07-20T11:00:45Z

Sure I understood will make the changes

gavv · 2024-08-01T11:11:47Z

Awesome, thanks a lot for PR and sorry for delay!

All suggestions I have are cosmetic, so I just committed them by myself: 4e6b413 (copyright year, make methods private, use const, and a few renames).

I also added a simple stress test with random window size and values: 0bc19da, seems to work fine.

One small question about this snippet:

        if (win_filled_) {
            ...
            min_heap_index_ = win_len_ - 1;
            max_heap_index_ = 0;

Are there situations when min_heap_index_ and max_heap_index_ are not already set to these values in this branch?

novertia · 2024-08-01T16:51:28Z

At the very start when there are not enough values to fill the window at that point the values will be different. Both of them will start from heap root and max_heap_index_ will decrement and min_heap_index_ will increment. Once the window is filled then they will take on the mentioned values. This method will still give us quantile even if we don't have all the values in the window.

Thanks for merging the PR

gavv · 2024-08-01T17:41:12Z

Yeah, I mean, by the time when we enter if (win_filled_) { branch, min_heap_index_ and max_heap_index_ always already have values win_len_ - 1 and 0, right? And these assignments are redundant.

Or am I missing something, and and it's possible that win_filled_ is true, but those two variables have different values?

novertia · 2024-08-01T17:47:24Z

That is correct and the assignments are redundant. Sorry i missed that.

novertia · 2024-08-01T17:49:12Z

when win_filled_ that values max_heap_index_ will be 0 and min_heap_index_ is win_len - 1 there can't be another case.

gavv · 2024-08-01T18:02:54Z

Got it, thanks. No worries, I just wanted to ensure I'm not missing something important.

github-actions bot added the work in progress Pull request is still in progress and changing label Jul 17, 2024

gavv added the contribution A pull-request by someone else except maintainers label Jul 17, 2024

novertia marked this pull request as ready for review July 19, 2024 04:49

github-actions bot added ready for review Pull request can be reviewed and removed work in progress Pull request is still in progress and changing labels Jul 19, 2024

novertia added a commit to novertia/roc-toolkit that referenced this pull request Jul 20, 2024

roc-streaminggh-761 used double for quantile

9ba8167

adrianopol added the review in progress Pull request is being reviewed label Jul 25, 2024

adrianopol self-requested a review July 25, 2024 20:06

gavv removed the ready for review Pull request can be reviewed label Jul 25, 2024

gavv added this to the next milestone Aug 1, 2024

roc-streaminggh-751: Mov quantile

2d7bc90

gavv force-pushed the mov_quantile branch from 34f8d33 to 2d7bc90 Compare August 1, 2024 11:03

gavv merged commit 6f18841 into roc-streaming:develop Aug 1, 2024
1 check passed

gavv requested review from gavv and removed request for adrianopol August 1, 2024 11:06

gavv removed the review in progress Pull request is being reviewed label Aug 1, 2024

github-actions bot added the work in progress Pull request is still in progress and changing label Aug 1, 2024

gavv removed the work in progress Pull request is still in progress and changing label Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mov quantile #761

Mov quantile #761

novertia commented Jul 17, 2024

novertia commented Jul 19, 2024

gavv commented Jul 19, 2024

novertia commented Jul 20, 2024

gavv commented Jul 20, 2024

novertia commented Jul 20, 2024

gavv commented Aug 1, 2024

novertia commented Aug 1, 2024

gavv commented Aug 1, 2024

novertia commented Aug 1, 2024

novertia commented Aug 1, 2024

gavv commented Aug 1, 2024

Mov quantile #761

Mov quantile #761

Conversation

novertia commented Jul 17, 2024

novertia commented Jul 19, 2024

gavv commented Jul 19, 2024

novertia commented Jul 20, 2024

gavv commented Jul 20, 2024

novertia commented Jul 20, 2024

gavv commented Aug 1, 2024

novertia commented Aug 1, 2024

gavv commented Aug 1, 2024

novertia commented Aug 1, 2024

novertia commented Aug 1, 2024

gavv commented Aug 1, 2024