Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Floating point exceptions by default in Debug mode (GCC/Clang) #10578

Merged
merged 25 commits into from
Aug 16, 2024

Conversation

jmarrec
Copy link
Contributor

@jmarrec jmarrec commented Jun 20, 2024

Pull request overview

Realize there was still a TODO... Let's see if CI complains or not now.

Pull Request Author

Add to this list or remove from it as applicable. This is a simple templated set of guidelines.

  • Title of PR should be user-synopsis style (clearly understandable in a standalone changelog context)
  • Label the PR with at least one of: Defect, Refactoring, NewFeature, Performance, and/or DoNoPublish
  • Pull requests that impact EnergyPlus code must also include unit tests to cover enhancement or defect repair
  • Author should provide a "walkthrough" of relevant code changes using a GitHub code review comment process
  • If any diffs are expected, author must demonstrate they are justified using plots and descriptions
  • If changes fix a defect, the fix should be demonstrated in plots and descriptions
  • If any defect files are updated to a more recent version, upload new versions here or on DevSupport
  • If IDD requires transition, transition source, rules, ExpandObjects, and IDFs must be updated, and add IDDChange label
  • If structural output changes, add to output rules file and add OutputChange label
  • If adding/removing any LaTeX docs or figures, update that document's CMakeLists file dependencies

Reviewer

This will not be exhaustively relevant to every PR.

  • Perform a Code Review on GitHub
  • If branch is behind develop, merge develop and build locally to check for side effects of the merge
  • If defect, verify by running develop branch and reproducing defect, then running PR and reproducing fix
  • If feature, test running new feature, try creative ways to break it
  • CI status: all green or justified
  • Check that performance is not impacted (CI Linux results include performance check)
  • Run Unit Test(s) locally
  • Check any new function arguments for performance impacts
  • Verify IDF naming conventions and styles, memos and notes and defaults
  • If new idf included, locally check the err file and other outputs

@jmarrec jmarrec added DoNotPublish Includes changes that shouldn't be reported in the changelog Developer Issue Related to cmake, packaging, installers, or developer tooling (CI, etc) labels Jun 20, 2024
@jmarrec jmarrec requested a review from Myoldmopar June 20, 2024 00:05
@jmarrec jmarrec self-assigned this Jun 20, 2024
@jmarrec
Copy link
Contributor Author

jmarrec commented Jul 8, 2024

@Myoldmopar Shouldn't these errors be investigated and fixed (either the code needs fixing or the test initialization itself)?

@Myoldmopar
Copy link
Member

@Myoldmopar Shouldn't these errors be investigated and fixed (either the code needs fixing or the test initialization itself)?

Yes, are you asking me to do this investigation and fix things? If so, I can add it to my list and get to it soon. If not, are you wanting to?

And as an aside, we are in the unfortunate position of including a third party library that uses NaN intentionally as a signal value. I'm not sure what will come of this, but I wanted to offer a small heads up here.

@jmarrec jmarrec added the Defect Includes code to repair a defect in EnergyPlus label Aug 12, 2024
@Myoldmopar
Copy link
Member

After @jmarrec enabled floating point exception catching on Mac, and found a whole bunch of failing tests, he got to work fixing them up. There are unfortunately 5 that are still remaining, and because of their nature (sizing, coils, ...), it would be great to get insight from @rraustad . I'm inclined to just disable the FPE on this branch long enough to merge it in, then open a new branch where they fail again, and resolve them there.

Thoughts anyone?

@rraustad
Copy link
Contributor

So what is the method to find these FP exceptions? Run unit tests in a debug build? That's very painful.

@jmarrec
Copy link
Contributor Author

jmarrec commented Aug 13, 2024

I can't believe how long this took me, but I think I got it working for windows too, in 64018cf

Turns out the gtest was not enabling floating point exceptions for msvc (ever).

I added a FORCE_DEBUG_ARITHM_MSVC cmake configuration variable that should allow turning it on for MSVC release builds too, and by default it'll enable it in Debug builds.

I'm testing it on MSVC now, will consider adding it to this branch if it behaves properly.

@jmarrec
Copy link
Contributor Author

jmarrec commented Aug 13, 2024

Alright, confirmed it appears to work fine on MSVC, so I added it to this branch.

$ devenv /debugexe .\Products\energyplus_tests.exe '--gtest_filter=*DXCoils_Test2*'

image

In release, after calling cmake -DFORCE_DEBUG_ARITHM_MSVC:BOOL=ON .

$ ctest -R DXCoils_Test2 --output-on-failure
Test project C:/src/EnergyPlus/build-release
    Start 1095: EnergyPlusFixture.DXCoils_Test2
1/1 Test #1095: EnergyPlusFixture.DXCoils_Test2 ...***Exception: Numerical  0.31 sec
Note: Google Test filter = EnergyPlusFixture.DXCoils_Test2
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from EnergyPlusFixture
[ RUN      ] EnergyPlusFixture.DXCoils_Test2


0% tests passed, 1 tests failed out of 1

Total Test time (real) =   0.56 sec

The following tests FAILED:
        1095 - EnergyPlusFixture.DXCoils_Test2 (NUMERICAL)
Errors while running CTest

@Myoldmopar
Copy link
Member

@jmarrec this is great, thanks for another major effort.

@rraustad could you confirm that these tests fail locally for you now? I'd love to know whether to just wait here until they are finally fixed and merge this all at once, or get this merged in as nearly-done and finish on a separate branch.

Thanks all!

@rraustad
Copy link
Contributor

@Myoldmopar working as expected. I could push the fix for DXCoils_Test2 which also fixes TestMultiSpeedCoilsAutoSizingOutput. The MultispeedDX* failures may take a little time to figure out the Nan.

image

@jmarrec
Copy link
Contributor Author

jmarrec commented Aug 13, 2024

@rraustad feel free to push indeed! thanks

@Myoldmopar
Copy link
Member

Down to 9! Looks like they are 3 polygon and 6 ITE, hopefully just two more one-liners! :)

@rraustad
Copy link
Contributor

ITE* unit test, old favorite:

image

@rraustad
Copy link
Contributor

rraustad commented Aug 14, 2024

I have no idea why fixing the crash causes the SurfSunlitFrac values to change. 1) the non-zero values are for the mirrored shading surface now, and 2) the south was has a 0 sunlit fraction ??

@@ -3947,11 +3947,11 @@ void CLIPLINE(Real64 &x1, Real64 &x2, Real64 &y1, Real64 &y2, Real64 maxX, Real6
else if (c2 == 3) {
visible = true;
x1 = minX;
y1 = maxY + e / dx;
y1 = maxY + General::SafeDivide(e, dx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be a better way to correct this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -2212,7 +2212,7 @@ TEST_F(EnergyPlusFixture, AirflowNetwork_TestPressureStat)

// Check indoor pressure and mass flow rate
EXPECT_NEAR(PressureSet, state->afn->AirflowNetworkNodeSimu(3).PZ, 0.0001);
EXPECT_NEAR(0.00255337, state->afn->ReliefMassFlowRate, 0.0001);
EXPECT_NEAR(0.06551, state->afn->ReliefMassFlowRate, 0.0001);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume since StdRhoAir is now in the fixture that something has changed BEFORE this unit test sets StdRhoAir.

@rraustad
Copy link
Contributor

rraustad commented Aug 14, 2024

Looks like I forgot to add the FPE fix for PolygonClippingDirect

state->dataBSDFWindow->SUNCOSTS(4, 9)(1) = 0.2; <-- will be Nan in FigureSolarBeanAtTimestep if not initialized
state->dataBSDFWindow->SUNCOSTS(4, 9)(2) = 0.2;
state->dataBSDFWindow->SUNCOSTS(4, 9)(3) = 0.2;
FigureSolarBeamAtTimestep(*state, state->dataGlobal->HourOfDay, state->dataGlobal->TimeStep);

and looking at the reason for the CI failures again for SolarShadingTest_PolygonOverlap2 and 3.

The south wall has coordinates:

    "    -5,-5,3,  !- X,Y,Z ==> Vertex 1 {m}",
    "    -5,-5,0,  !- X,Y,Z ==> Vertex 2 {m}",
    "    5,-5,0,  !- X,Y,Z ==> Vertex 3 {m}",
    "    5,-5,3;  !- X,Y,Z ==> Vertex 4 {m}",

and the south shade nearest the wall has coordinates:

    "    -10,-8,20,  !- X,Y,Z ==> Vertex 1 {m}",
    "    -10,-8,0,  !- X,Y,Z ==> Vertex 2 {m}",
    "    10,-8,0,  !- X,Y,Z ==> Vertex 3 {m}",
    "    10,-8,20;  !- X,Y,Z ==> Vertex 4 {m}",

with the outer shade having coordinates:

    "    -10,-9,20,  !- X,Y,Z ==> Vertex 1 {m}",
    "    -10,-9,0,  !- X,Y,Z ==> Vertex 2 {m}",
    "    10,-9,0,  !- X,Y,Z ==> Vertex 3 {m}",
    "    10,-9,20;  !- X,Y,Z ==> Vertex 4 {m}",

So the outer shade is the same size as the inner shade which are larger than the wall. OK fine. These surfaces face North and location is Chicago, so the sun is behind these surfaces? And then seeing the SurfSunlitFrac = 1 for the mirrored shading surfaces would make sense? I still can't figure out why the wall and window SurfSunlitFrac = 0 when the shade transmittance = 1.

@rraustad
Copy link
Contributor

@Myoldmopar I have done as much as I can here.

@jmarrec
Copy link
Contributor Author

jmarrec commented Aug 14, 2024

@rraustad thank you very much!

@jmarrec
Copy link
Contributor Author

jmarrec commented Aug 14, 2024

The SUNCOSTS being (0.2, 0.2, 0.2) is unrealistic.

Gotten from running 1ZoneUncontrolled.idf with chicago weather on Jan 1 at 12:

    state->dataBSDFWindow->SUNCOSTS(state->dataGlobal->TimeStep, state->dataGlobal->HourOfDay)(1) = 0.20531446332266728;
    state->dataBSDFWindow->SUNCOSTS(state->dataGlobal->TimeStep, state->dataGlobal->HourOfDay)(2) = -0.84761109808931534;
    state->dataBSDFWindow->SUNCOSTS(state->dataGlobal->TimeStep, state->dataGlobal->HourOfDay)(3) = 0.48928662105799514;

THe South Wall has a OutNormVec being OutNormVec = (x = 0, y = -1, z = 0), so when using 0.2 for the SUNCOSTS y, you end up with a negative number.

Gotten from running 1ZoneUncontrolled.idf with chicago weather on Jan 1 at 12
@jmarrec
Copy link
Contributor Author

jmarrec commented Aug 14, 2024

Alright, I think everything should be green now.

There will be a conflict with #10657 in SolarShading.cc but that's easy to resolve (just use the #10657 version of CLIPLINE)

@rraustad
Copy link
Contributor

Thanks @jmarrec that did the trick. The SurfSunlitFrac values are back to the original values. However, I am confused as to how these unit tests passed when the SUNCOS data was not provided.

@jmarrec
Copy link
Contributor Author

jmarrec commented Aug 14, 2024

@rraustad We shouldn't be too concerned about this (we shouldn't play wiht NaN to begin with), but to satisfy the curiosity:

In

// Recover the sun direction from the array stored in previous loop
state.dataSolarShading->SUNCOS = state.dataBSDFWindow->SUNCOSTS(iTimeStep, iHour);
state.dataSolarShading->SurfSunCosTheta = 0.0;
if (state.dataSolarShading->SUNCOS(3) < DataEnvironment::SunIsUpValue) return;
for (int SurfNum = 1; SurfNum <= state.dataSurface->TotSurfaces; ++SurfNum) {
state.dataSolarShading->SurfSunCosTheta(SurfNum) = state.dataSolarShading->SUNCOS(1) * state.dataSurface->Surface(SurfNum).OutNormVec(1) +
state.dataSolarShading->SUNCOS(2) * state.dataSurface->Surface(SurfNum).OutNormVec(2) +
state.dataSolarShading->SUNCOS(3) * state.dataSurface->Surface(SurfNum).OutNormVec(3);

IEEE 754 says that operator < returns false

As specified, the predicates associated with the <, ≤, =, ≥, > mathematical symbols (or equivalent notation in programming languages) return false on an unordered relation

https://en.wikipedia.org/wiki/NaN

A functionally equivalent test code is this: https://godbolt.org/z/sq5nxj6Gz

state.dataSolarShading->SurfSunCosTheta is NaN as well.

cf

// TypeTraits: Type Traits double Specialization
template<>
struct TypeTraits< double >
{
typedef double traits_type;
typedef std::size_t Size;
// Initial Value
static
traits_type
initial_value()
{
return traits_type(); // Use default constructor
}
// Debug Value
static
traits_type
debug_value()
{
return ( std::numeric_limits< traits_type >::has_signaling_NaN ? std::numeric_limits< traits_type >::signaling_NaN() : std::numeric_limits< traits_type >::max() );
}
// Initial Array Value
static
traits_type
initial_array_value()
{
#ifdef OBJEXXFCL_ARRAY_INIT_DEBUG
return debug_value();
#else
return initial_value();
#endif
}

Copy link
Contributor

@rraustad rraustad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I look forward to using this in the future.

@Myoldmopar
Copy link
Member

Alright, I pulled in develop, choosing the other PR's changes in SolarShading. Everything passes nicely, but I will let CI confirm this once more. Thanks @jmarrec and @rraustad !

@Myoldmopar
Copy link
Member

This is really exciting, thanks for all the effort @rraustad and @jmarrec ! Tested it locally with develop one last time, and it's good. Super happy on CI. Merging.

@Myoldmopar Myoldmopar merged commit 2b5104f into develop Aug 16, 2024
15 checks passed
@Myoldmopar Myoldmopar deleted the enable_fe_default_debug branch August 16, 2024 00:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Defect Includes code to repair a defect in EnergyPlus Developer Issue Related to cmake, packaging, installers, or developer tooling (CI, etc) DoNotPublish Includes changes that shouldn't be reported in the changelog
Projects
None yet
9 participants