Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hist] Improve calculation of TAxis::FindBin(x) when x is at the bin edges #14105

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

lmoneta
Copy link
Member

@lmoneta lmoneta commented Nov 24, 2023

When x is at the bin edge values numerical error can cause the computation of the bin to be the one before (or after). Correct for this case assuming the bin edge are computed as in TH1::getBinLowEdge and TH1::GetBinUpEdge. It is clear that the bin edge values are represented as floating point, so depending on how they are computed they can be sometime different. However, it is better to have teh consistency to return the correct value when computed as internally in TAxis.

This should fix the problem reported in https://root-forum.cern.ch/t/bug-in-taxis-findbin/57210
Fixes #14091

A test is also added for FindBin

@lmoneta lmoneta requested review from couet and guitargeek November 24, 2023 10:44
@lmoneta lmoneta self-assigned this Nov 24, 2023
@lmoneta lmoneta requested a review from bellenot as a code owner November 24, 2023 10:44
@phsft-bot
Copy link
Collaborator

Starting build on ROOT-performance-centos8-multicore/soversion, ROOT-ubuntu2204/nortcxxmod, ROOT-ubuntu2004/python3, mac12arm/cxx20, windows10/default
How to customize builds

@lmoneta lmoneta force-pushed the hist_improve_findbin branch from 125c995 to 44cf282 Compare November 24, 2023 11:05
@phsft-bot
Copy link
Collaborator

Starting build on ROOT-performance-centos8-multicore/soversion, ROOT-ubuntu2204/nortcxxmod, ROOT-ubuntu2004/python3, mac12arm/cxx20, windows10/default
How to customize builds

Copy link

github-actions bot commented Nov 24, 2023

Test Results

    17 files      17 suites   4d 7h 27m 53s ⏱️
 2 695 tests  2 679 ✅ 0 💤 16 ❌
43 937 runs  43 905 ✅ 0 💤 32 ❌

For more details on these failures, see this check.

Results for commit 8f0a7cc.

♻️ This comment has been updated with latest results.

@dpiparo
Copy link
Member

dpiparo commented Apr 21, 2024

Given that this was noticed again in the Forum (https://root-forum.cern.ch/t/numbering-convention-for-inner-histogram-bins/59077/3), can we perhaps rebase, merge and port to 6.32?

@guitargeek guitargeek self-assigned this May 20, 2024
@lmoneta lmoneta force-pushed the hist_improve_findbin branch from 44cf282 to 7f33a48 Compare December 9, 2024 18:31
Comment on lines 403 to 421
Int_t TAxis::DoFindFixBin(Double_t x) const
{
int bin = 0;
if (!fXbins.fN) { //*-* fix bins
bin = 1 + int (fNbins * (x-fXmin)/(fXmax-fXmin) );
// apply correction when x is at the bin edge value
// (computed as in GetBinLowEdge) a numerical error in the
// above division can cause a migration in a neighbour bin.
double binwidth = (fXmax - fXmin) / Double_t(fNbins);
double upEdge = fXmin + (bin) * binwidth;
double lowEdge = fXmin + (bin - 1) * binwidth;
if (upEdge <= x) bin += 1;
if (lowEdge > x) bin -= 1;
} else { //*-* variable bin sizes
//for (bin =1; x >= fXbins.fArray[bin]; bin++);
bin = 1 + TMath::BinarySearch(fXbins.fN,fXbins.fArray,x);
}
return bin;
}
Copy link
Contributor

@guitargeek guitargeek Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Int_t TAxis::DoFindFixBin(Double_t x) const
{
int bin = 0;
if (!fXbins.fN) { //*-* fix bins
bin = 1 + int (fNbins * (x-fXmin)/(fXmax-fXmin) );
// apply correction when x is at the bin edge value
// (computed as in GetBinLowEdge) a numerical error in the
// above division can cause a migration in a neighbour bin.
double binwidth = (fXmax - fXmin) / Double_t(fNbins);
double upEdge = fXmin + (bin) * binwidth;
double lowEdge = fXmin + (bin - 1) * binwidth;
if (upEdge <= x) bin += 1;
if (lowEdge > x) bin -= 1;
} else { //*-* variable bin sizes
//for (bin =1; x >= fXbins.fArray[bin]; bin++);
bin = 1 + TMath::BinarySearch(fXbins.fN,fXbins.fArray,x);
}
return bin;
}
Int_t TAxis::DoFindFixBin(Double_t x) const
{
if (!fXbins.fN) { //*-* fix bins
// shift x and fXmax by fXmin:
x = x - fXmin;
const double b = fXmax - fXmin;
const double s = fNbins * x; // scaled version of x
double m = std::floor(s / b);
// iterative correction in case of floating point errors due to floating point division
while (m * b >= s) m = m - 1;
while ((m + 1) * b < s) m = m + 1;
return 1 + Int_t(m);
} else { //*-* variable bin sizes
//for (bin =1; x >= fXbins.fArray[bin]; bin++);
return 1 + TMath::BinarySearch(fXbins.fN,fXbins.fArray,x);
}
}

I have some improvement suggestions:

  1. Better do the correction iteratively to make sure that even when we jumped around by more than one bin because of numeric effects, things are still OK
  2. Avoid floating point division in the correction by rewriting the equations for the comparisons
  3. My gut feeling is to better do the conversion to integer after the corrections and to use std::floor before. Like this, we reduce the number of int-with-float multiplications to a minimum

Copy link
Contributor

@guitargeek guitargeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Lorenzo, thank you very much for this PR! After thinking about this for a while, I have made some suggestions to improve the algorithm inline. In particular, I think we should avoid floating point division in the correction step, for performance and precision reasons.

@lmoneta
Copy link
Member Author

lmoneta commented Jan 7, 2025

Hi Jonas,
Thank you for the suggestions, I agree with your changes and I will change the code as you suggest

…n edges

When x is at the bin edge values numerical error can cause the computation of the bin to be the one before (or after).
Correct for this case assuming the bin edge are computed as in `TH1::getBinLowEdge` and `TH1::GetBinUpEdge`. It is clear that the bin edge values are represented as floating point, so depending on how they are computed they can be sometime different.
However, it is better to have teh consistency to return the correct value when computed as internally in TAxis.

This should fix teh problem reported in https://root-forum.cern.ch/t/bug-in-taxis-findbin/57210 and root-project#14091
@lmoneta lmoneta force-pushed the hist_improve_findbin branch from 7f33a48 to 3959abc Compare January 7, 2025 17:11
Copy link
Contributor

@guitargeek guitargeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the PR and addressing the review comments!

////////////////////////////////////////////////////////////////////////////////
Int_t TAxis::DoFindFixBin(Double_t x) const
{
int bin = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable

Comment on lines 413 to 415
while (m * b >= s)
m = m - 1;
while ((m + 1) * b < s)
m = m + 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
while (m * b >= s)
m = m - 1;
while ((m + 1) * b < s)
m = m + 1;
while (m * b > s) // if left bin boundary is greater then x, decrement
m = m - 1;
while ((m + 1) * b <= s) //if right bin boundary is smaller or equal, increment
m = m + 1;

Sorry, I did the typical edge case mistake! I understand now that if the value is on the lower boundary, it should be considered in the bin.

Implement suggestions by Jonas R. to make FindFixBin more robust to numerical errors
@lmoneta lmoneta force-pushed the hist_improve_findbin branch from 3959abc to 8f0a7cc Compare January 8, 2025 08:20
@guitargeek
Copy link
Contributor

I see that the test is still failing. I'm suspecting it's now because of the implementation of GetBinLowEdge() and GetBinUpEdge():
https://github.com/root-project/root/blob/master/hist/hist/src/TAxis.cxx#L513

I see that it first does the division to get the bin width, and then the multiplication. The multiplication is with bin - 1, which is either zero or >= 1. That means the numerical error will be amplified I think. Probably it's better to first do multiplication and then division. I'll make some studies to confirm this.

@guitargeek
Copy link
Contributor

guitargeek commented Jan 8, 2025

Here is a little study to demonstrate this effect:

int fNbins = 90;
double fXmin = 0;
double fXmax = 10;

int mybin = 8;

double xdm = fXmin + (mybin - 1) * ((fXmax - fXmin) / fNbins);
double xmd = (fXmin + (mybin - 1) * (fXmax - fXmin)) / fNbins;

std::cout << std::setprecision(20);

std::cout << "div then mult : " << xdm << std::endl;
std::cout << "mult then div : " << xmd << std::endl;

The output is:

div then mult : 0.77777777777777767909
mult then div : 0.77777777777777779011

So right now, there is unnecessary precision loss in the bin edge functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug in TAxis::FindBin (Double_t x) ?
6 participants