[hist] Improve calculation of `TAxis::FindBin(x)` when x is at the bin edges #14105

lmoneta · 2023-11-24T10:44:33Z

When x is at the bin edge values numerical error can cause the computation of the bin to be the one before (or after). Correct for this case assuming the bin edge are computed as in TH1::getBinLowEdge and TH1::GetBinUpEdge. It is clear that the bin edge values are represented as floating point, so depending on how they are computed they can be sometime different. However, it is better to have teh consistency to return the correct value when computed as internally in TAxis.

This should fix the problem reported in https://root-forum.cern.ch/t/bug-in-taxis-findbin/57210
Fixes #14091

A test is also added for FindBin

phsft-bot · 2023-11-24T10:44:41Z

Starting build on ROOT-performance-centos8-multicore/soversion, ROOT-ubuntu2204/nortcxxmod, ROOT-ubuntu2004/python3, mac12arm/cxx20, windows10/default
How to customize builds

phsft-bot · 2023-11-24T11:05:37Z

Starting build on ROOT-performance-centos8-multicore/soversion, ROOT-ubuntu2204/nortcxxmod, ROOT-ubuntu2004/python3, mac12arm/cxx20, windows10/default
How to customize builds

github-actions · 2023-11-24T13:45:09Z

Test Results

17 files 17 suites 4d 7h 27m 53s ⏱️
2 695 tests 2 679 ✅ 0 💤 16 ❌
43 937 runs 43 905 ✅ 0 💤 32 ❌

For more details on these failures, see this check.

Results for commit 8f0a7cc.

♻️ This comment has been updated with latest results.

dpiparo · 2024-04-21T14:42:00Z

Given that this was noticed again in the Forum (https://root-forum.cern.ch/t/numbering-convention-for-inner-histogram-bins/59077/3), can we perhaps rebase, merge and port to 6.32?

guitargeek · 2025-01-07T14:06:21Z

hist/hist/src/TAxis.cxx

+Int_t TAxis::DoFindFixBin(Double_t x) const
+{
+   int bin = 0;
+   if (!fXbins.fN) {        //*-* fix bins
+      bin = 1 + int (fNbins * (x-fXmin)/(fXmax-fXmin) );
+      // apply correction when x  is at the bin edge value
+      // (computed as in GetBinLowEdge) a numerical error in the
+      // above division can cause a migration in a neighbour bin.
+      double binwidth = (fXmax - fXmin) / Double_t(fNbins);
+      double upEdge = fXmin + (bin) * binwidth;
+      double lowEdge = fXmin + (bin - 1) * binwidth;
+      if (upEdge <= x) bin += 1;
+      if (lowEdge > x) bin -= 1;
+   } else {                  //*-* variable bin sizes
+      //for (bin =1; x >= fXbins.fArray[bin]; bin++);
+      bin = 1 + TMath::BinarySearch(fXbins.fN,fXbins.fArray,x);
+   }
+   return bin;
+}


Suggested change

Int_t TAxis::DoFindFixBin(Double_t x) const

{

int bin = 0;

if (!fXbins.fN) { //*-* fix bins

bin = 1 + int (fNbins * (x-fXmin)/(fXmax-fXmin) );

// apply correction when x is at the bin edge value

// (computed as in GetBinLowEdge) a numerical error in the

// above division can cause a migration in a neighbour bin.

double binwidth = (fXmax - fXmin) / Double_t(fNbins);

double upEdge = fXmin + (bin) * binwidth;

double lowEdge = fXmin + (bin - 1) * binwidth;

if (upEdge <= x) bin += 1;

if (lowEdge > x) bin -= 1;

} else { //*-* variable bin sizes

//for (bin =1; x >= fXbins.fArray[bin]; bin++);

bin = 1 + TMath::BinarySearch(fXbins.fN,fXbins.fArray,x);

}

return bin;

}

Int_t TAxis::DoFindFixBin(Double_t x) const

{

if (!fXbins.fN) { //*-* fix bins

// shift x and fXmax by fXmin:

x = x - fXmin;

const double b = fXmax - fXmin;

const double s = fNbins * x; // scaled version of x

double m = std::floor(s / b);

// iterative correction in case of floating point errors due to floating point division

while (m * b >= s) m = m - 1;

while ((m + 1) * b < s) m = m + 1;

return 1 + Int_t(m);

} else { //*-* variable bin sizes

//for (bin =1; x >= fXbins.fArray[bin]; bin++);

return 1 + TMath::BinarySearch(fXbins.fN,fXbins.fArray,x);

}

}

I have some improvement suggestions:

Better do the correction iteratively to make sure that even when we jumped around by more than one bin because of numeric effects, things are still OK

Avoid floating point division in the correction by rewriting the equations for the comparisons

My gut feeling is to better do the conversion to integer after the corrections and to use std::floor before. Like this, we reduce the number of int-with-float multiplications to a minimum

guitargeek

Hi Lorenzo, thank you very much for this PR! After thinking about this for a while, I have made some suggestions to improve the algorithm inline. In particular, I think we should avoid floating point division in the correction step, for performance and precision reasons.

lmoneta · 2025-01-07T17:02:29Z

Hi Jonas,
Thank you for the suggestions, I agree with your changes and I will change the code as you suggest

…n edges When x is at the bin edge values numerical error can cause the computation of the bin to be the one before (or after). Correct for this case assuming the bin edge are computed as in `TH1::getBinLowEdge` and `TH1::GetBinUpEdge`. It is clear that the bin edge values are represented as floating point, so depending on how they are computed they can be sometime different. However, it is better to have teh consistency to return the correct value when computed as internally in TAxis. This should fix teh problem reported in https://root-forum.cern.ch/t/bug-in-taxis-findbin/57210 and root-project#14091

guitargeek

Thank you very much for the PR and addressing the review comments!

guitargeek · 2025-01-07T17:47:42Z

hist/hist/src/TAxis.cxx

+////////////////////////////////////////////////////////////////////////////////
+Int_t TAxis::DoFindFixBin(Double_t x) const
+{
+   int bin = 0;


Unused variable

guitargeek · 2025-01-07T17:50:19Z

hist/hist/src/TAxis.cxx

+      while (m * b >= s)
+         m = m - 1;
+      while ((m + 1) * b < s)
+         m = m + 1;


Suggested change

while (m * b >= s)

m = m - 1;

while ((m + 1) * b < s)

m = m + 1;

while (m * b > s) // if left bin boundary is greater then x, decrement

m = m - 1;

while ((m + 1) * b <= s) //if right bin boundary is smaller or equal, increment

m = m + 1;

Sorry, I did the typical edge case mistake! I understand now that if the value is on the lower boundary, it should be considered in the bin.

Implement suggestions by Jonas R. to make FindFixBin more robust to numerical errors

guitargeek · 2025-01-08T09:51:16Z

I see that the test is still failing. I'm suspecting it's now because of the implementation of GetBinLowEdge() and GetBinUpEdge():
https://github.com/root-project/root/blob/master/hist/hist/src/TAxis.cxx#L513

I see that it first does the division to get the bin width, and then the multiplication. The multiplication is with bin - 1, which is either zero or >= 1. That means the numerical error will be amplified I think. Probably it's better to first do multiplication and then division. I'll make some studies to confirm this.

guitargeek · 2025-01-08T09:56:52Z

Here is a little study to demonstrate this effect:

int fNbins = 90;
double fXmin = 0;
double fXmax = 10;

int mybin = 8;

double xdm = fXmin + (mybin - 1) * ((fXmax - fXmin) / fNbins);
double xmd = (fXmin + (mybin - 1) * (fXmax - fXmin)) / fNbins;

std::cout << std::setprecision(20);

std::cout << "div then mult : " << xdm << std::endl;
std::cout << "mult then div : " << xmd << std::endl;

The output is:

div then mult : 0.77777777777777767909
mult then div : 0.77777777777777779011

So right now, there is unnecessary precision loss in the bin edge functions.

lmoneta requested review from couet and guitargeek November 24, 2023 10:44

lmoneta self-assigned this Nov 24, 2023

lmoneta requested a review from bellenot as a code owner November 24, 2023 10:44

couet approved these changes Nov 24, 2023

View reviewed changes

lmoneta force-pushed the hist_improve_findbin branch from 125c995 to 44cf282 Compare November 24, 2023 11:05

ferdymercury added the in:Math Libraries label Apr 16, 2024

ferdymercury approved these changes Apr 16, 2024

View reviewed changes

guitargeek self-assigned this May 20, 2024

lmoneta force-pushed the hist_improve_findbin branch from 44cf282 to 7f33a48 Compare December 9, 2024 18:31

guitargeek reviewed Jan 7, 2025

View reviewed changes

guitargeek requested changes Jan 7, 2025

View reviewed changes

lmoneta force-pushed the hist_improve_findbin branch from 7f33a48 to 3959abc Compare January 7, 2025 17:11

guitargeek approved these changes Jan 7, 2025

View reviewed changes

guitargeek reviewed Jan 7, 2025

View reviewed changes

[hist] Apply suggestions by Jonas R.

8f0a7cc

Implement suggestions by Jonas R. to make FindFixBin more robust to numerical errors

lmoneta force-pushed the hist_improve_findbin branch from 3959abc to 8f0a7cc Compare January 8, 2025 08:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[hist] Improve calculation of `TAxis::FindBin(x)` when x is at the bin edges #14105

[hist] Improve calculation of `TAxis::FindBin(x)` when x is at the bin edges #14105

lmoneta commented Nov 24, 2023 •

edited by ferdymercury

Loading

phsft-bot commented Nov 24, 2023

phsft-bot commented Nov 24, 2023

github-actions bot commented Nov 24, 2023 •

edited

Loading

dpiparo commented Apr 21, 2024

guitargeek Jan 7, 2025 •

edited

Loading

guitargeek left a comment •

edited

Loading

lmoneta commented Jan 7, 2025

guitargeek left a comment

guitargeek Jan 7, 2025

guitargeek Jan 7, 2025

guitargeek commented Jan 8, 2025

guitargeek commented Jan 8, 2025 •

edited

Loading

[hist] Improve calculation of TAxis::FindBin(x) when x is at the bin edges #14105

Are you sure you want to change the base?

[hist] Improve calculation of TAxis::FindBin(x) when x is at the bin edges #14105

Conversation

lmoneta commented Nov 24, 2023 • edited by ferdymercury Loading

phsft-bot commented Nov 24, 2023

phsft-bot commented Nov 24, 2023

github-actions bot commented Nov 24, 2023 • edited Loading

Test Results

dpiparo commented Apr 21, 2024

guitargeek Jan 7, 2025 • edited Loading

Choose a reason for hiding this comment

guitargeek left a comment • edited Loading

Choose a reason for hiding this comment

lmoneta commented Jan 7, 2025

guitargeek left a comment

Choose a reason for hiding this comment

guitargeek Jan 7, 2025

Choose a reason for hiding this comment

guitargeek Jan 7, 2025

Choose a reason for hiding this comment

guitargeek commented Jan 8, 2025

guitargeek commented Jan 8, 2025 • edited Loading

[hist] Improve calculation of `TAxis::FindBin(x)` when x is at the bin edges #14105

[hist] Improve calculation of `TAxis::FindBin(x)` when x is at the bin edges #14105

lmoneta commented Nov 24, 2023 •

edited by ferdymercury

Loading

github-actions bot commented Nov 24, 2023 •

edited

Loading

guitargeek Jan 7, 2025 •

edited

Loading

guitargeek left a comment •

edited

Loading

guitargeek commented Jan 8, 2025 •

edited

Loading