-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework iteration to avoid overflow #68
Conversation
…porky enough to stand on its own and data_access is a port of a Java test file anyway.
- `LinearIterator` no longer aborts one step too early in the final bucket. - `PickyIterator` now returns metadata when it picks. This allows `IterationValue` to provide better data about the current iteration progress without introducing a separate stage in the `PickyIterator` lifecycle to query about what was just picked. - count since last iteration is now reset every iteration, making it less prone to overflow. - end-of-histogram is detected by comparing with max nonzero index, not total count, which avoids overflow.
- Supply count at current index to `pick()` since we already have that available - Quantile iterator won't get stuck asymptotically chasing quantile 1.0_f64 - More tests
self.value | ||
/// The value iterated to. Some iterators provide a specific value inside the bucket, while | ||
/// others just use the highest value in the bucket. | ||
pub fn value_iterated_to(&self) -> u64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this rename-for-clarity is not worth the compatibility concern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine.
|
||
// make sure we don't add this index again | ||
self.fresh = false; | ||
} | ||
} | ||
|
||
// figure out if picker thinks we should yield this value | ||
if self.picker.pick(self.current_index, self.total_count_to_index) { | ||
let val = self.current(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I felt it was easier to reason at-a-glance about when the iterator's fields were used with this function inlined, since it matters that you read things like count_since_last_iteration
before resetting a few lines below.
|
||
/// An iterator that will yield at quantile steps through the histogram's value range. | ||
pub struct Iter<'a, T: 'a + Counter> { | ||
hist: &'a Histogram<T>, | ||
ticks_per_half_distance: u32, | ||
quantile_to_iterate_to: f64, | ||
quantile_just_picked: f64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is what returning Option<PickMetadata>
lets us avoid
return None; | ||
} | ||
|
||
// Because there are effectively two quantiles in play (the quantile of the value for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments per LoC might be getting a little out of hand in this struct, but I like to beat historically confusing logic into submission with overwhelming documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no objection to this comment. Seems well-placed and useful.
@@ -542,365 +539,3 @@ fn total_count_exceeds_bucket_type() { | |||
|
|||
assert_eq!(400, h.count()); | |||
} | |||
|
|||
#[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quantile tests moved to their own file
src/iterators/mod.rs
Outdated
|
||
// TODO count starting at 0 each time we emit a value to be less prone to overflow | ||
self.prev_total_count = self.total_count_to_index; | ||
if let Some(metadata) = self.picker.pick(self.current_index, self.total_count_to_index, self.count_at_index) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think rustfmt
might complain about this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I haven't been rustfmt
-ing but I'm happy to adopt it. I don't even really care if its formatting is suboptimal for some aesthetic; consistency is good. I'll format the iterator files.
@algermissen here's how the quantile iteration example looks now:
The |
This is great! I also think the long comment is useful (not to mention the many extra tests). See my one comment about |
This release has a couple of backwards-incompatible changes: - the old `len()` is now `distinct_values()` - the new `len()` is the old `count()` (which is deprecated) - `IterationValue::value` became `value_iterated_to` Some other API changes: - iterator values gained `quantile_iterated_to()` - `Histogram` gained `is_empty()` Behind the scenes: - #67 and #68 landed a number of fixes to iterators such that the produced values are more correct and sensible. - errors were moved into their own module.
@marshallpierce Thanks a lot, the iteration logic looks definitely fine now. The double line at end is gone for my cases. One nit: The CLI output adds a column which breaks the format named 'hgrm' by the original hdrhistogram and CLI output cannot be loaded by the original plot HTML pages. No big deal, just wanted to give that feedback. |
@algermissen yeah, I'm a little conflicted on that. Anti-extra-column:
Pro-extra-column:
Overall I'm weakly in favor of keeping the output the way it is (human oriented) but I could be persuaded. |
@marshallpierce I agree with you - I am only building a proof of concept throw-away tool, so I am looking for the least amount of work. Otherwise I'd never use the text format. As I am fine with copy/pasting your loop and removing the col by hand, no need for me to pursuade you in dropping the col :-) Keep it as it is - it's better - my 2ct |
Incorporating a gaggle of smaller fixes I came upon while working on this:
pick()
return a metadata struct with optional fields so that the timeline is clear and pickers don't need mostly useless fields to hang on to data until it can be requested 1 function call later.