Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port potential Databox enhancements into Spiner from Singe -- transformations #95

Open
wants to merge 49 commits into
base: main
Choose a base branch
from

Conversation

BrendanKKrueger
Copy link
Collaborator

@BrendanKKrueger BrendanKKrueger commented Aug 26, 2024

PR Summary

Proposing changes to Spiner based on a wrapper that was written in Singe, in order to provide:

  • extrapolations beyond the edge of the data table;
  • transformations (e.g., log-log interpolation)

PR Checklist

  • Adapt example code from Singe to fit within Spiner
  • The Singe example hard-codes the transformation to be log-lin with both linear and logarithmic accessors for both axes; make this general and provide linear, logarithmic, NQT-log as example transformations.
  • Because Singe doesn't own Databox, this is a wrapper around Databox. Determine if these features should continue to be an extended wrapper around Databox or features of Databox itself.
  • Bring tests over from Singe or write new tests.
  • Code is formatted. (You can use the format_spiner make target.)

@BrendanKKrueger BrendanKKrueger added the enhancement New feature or request label Aug 26, 2024
@BrendanKKrueger BrendanKKrueger self-assigned this Aug 26, 2024
@BrendanKKrueger BrendanKKrueger marked this pull request as draft August 26, 2024 22:30
@BrendanKKrueger
Copy link
Collaborator Author

@dholladay00 and @Yurlungur, these are some features from Singe that Jonah and I discussed transferring over to Spiner to make them more widely accessible. My main question at the moment is whether you want these features to (a) be added directly to Databox, (b) be added to a wrapper around or extension to Databox, or (c) kept in Singe.

@Yurlungur
Copy link
Collaborator

Broadly I think I'm supportive of these features moving into spiner... but looking at the header file you added for this MR I wonder how much work it would be because I notice that your version pretty explicitly talks about, e.g., density and temperature, which I don't think we want to do with spiner. It should be more generic.

@BrendanKKrueger
Copy link
Collaborator Author

Oh, absolutely it should be (and will be) more generic. The first commit was just copying over the Singe file to show a starting point, but there's plenty in that file that's specific to Singe and that will need to be cleaned up to be more appropriate for Spiner.

@BrendanKKrueger
Copy link
Collaborator Author

BrendanKKrueger commented Sep 13, 2024

A few implementation question that came up as I started adapting RegularGrid1D to have transformations:

  • Do we ever expect a transformation to have runtime state? The current implementation requires that transformation state (if there is any) be available at compile time, because it simplifies the interface.
    • Allowing runtime state would also require that either (a) the transformation have a default constructor, or (b) users would be forced to update all their code to provide a transformation (even when there is a default constructor).
    • The current implementation assumes no runtime state, which means in the future there would be bigger changes if a transformation ever needed runtime state.
    • Edit 2024-10-23: We decided that we don't want to allow transformations to have runtime state.
  • Do we consider x (the untransformed variable) or u (the transformed variable) to be the "ground truth"? Transformations may not always be perfectly symmetric, which creates the possibility of small gaps appearing. And because those gaps are at critical points (the endpoints of the domain), you could hit them more often than you might think.
    • Edit 2024-10-23: See discussions below on handling this.
  • Should umin, umax, and du be accessible by the user?
    • Edit 2024-10-23: I made (nearly?) everything about u hidden from the user in RegularGrid1D, so that it's treated as an internal detail that the user shouldn't be expected to know about.
  • The quantity dx is now poorly defined, because now du is fixed and dx is derived from the transformation, and it may vary across the domain. Do we need the dx function? The best we can do is probably to take an index or an x value and return the dx value for that specific interval. It probably makes more sense to remove it entirely, but that changes the interface that's available to the user.

@Yurlungur
Copy link
Collaborator

Instead, I've opted for a "safer" option: write new methods get_data_value and set_data_value (I'm not attached to these names), which are classic accessors to the dependent variable values, which allow the DataBox to handle the transformations correctly.

This makes complete sense to me. I'm in favor.

Regarding operator()

I feel pretty strongly that operator needs to stay as databox sometimes plays the role of a multiD array, not an interpolator. And that's what some of the internal metadata. That said, I think I'm okay with your compromise solution, so long as we have a public accessor for the underlying pointer, which I think we do.

@Yurlungur
Copy link
Collaborator

Related to the operator() / accessor discussion: Should be set_data_value (or whatever we name it) be const? I made it const because there's already operator() const that returns mutable reference. This implies that when a DataBox is const, the metadata of the DataBox is const, but the dependent variable data is not constant.

I think that's right

@BrendanKKrueger
Copy link
Collaborator Author

The head of main isn't formatted correctly. I ran the clang-format, but now I'm rolling back changes to lines I didn't mess with. I'll leave it to y'all if you want to run clang-format completely or not.

@Yurlungur
Copy link
Collaborator

The head of main isn't formatted correctly. I ran the clang-format, but now I'm rolling back changes to lines I didn't mess with. I'll leave it to y'all if you want to run clang-format completely or not.

Let's just submit a separate MR to main that formats the code in one sweep.

@BrendanKKrueger BrendanKKrueger marked this pull request as ready for review September 30, 2024 18:03
@BrendanKKrueger BrendanKKrueger changed the title Port potential Databox enhancements into Spiner from Singe Port potential Databox enhancements into Spiner from Singe -- transformations Oct 1, 2024
Copy link
Collaborator

@Yurlungur Yurlungur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this! Some comments below but overall I'm very happy with how this came out. As an aside, should I add the NQT logs and friends now in this PR? Or save them for a later implementation?

spiner/transformations.hpp Show resolved Hide resolved
spiner/transformations.hpp Outdated Show resolved Hide resolved
spiner/transformations.hpp Show resolved Hide resolved
spiner/transformations.hpp Outdated Show resolved Hide resolved
spiner/transformations.hpp Show resolved Hide resolved
spiner/databox.hpp Outdated Show resolved Hide resolved
@BrendanKKrueger
Copy link
Collaborator Author

should I add the NQT logs and friends now in this PR? Or save them for a later implementation?

Whatever makes you happy. I just didn't feel like getting into adding the NQT stuff as a dependency, because dependency management is often a tricky question and usually best left to a core team member. I'm fine if you add commits to this MR or if you add another MR that depends on this one.

@Yurlungur
Copy link
Collaborator

should I add the NQT logs and friends now in this PR? Or save them for a later implementation?

Whatever makes you happy. I just didn't feel like getting into adding the NQT stuff as a dependency, because dependency management is often a tricky question and usually best left to a core team member. I'm fine if you add commits to this MR or if you add another MR that depends on this one.

I'll add them in a later MR... Very busy at the moment so that will keep things moving along.

@BrendanKKrueger
Copy link
Collaborator Author

Earlier I stated

Do we consider x (the untransformed variable) or u (the transformed variable) to be the "ground truth"? Transformations may not always be perfectly symmetric, which creates the possibility of small gaps appearing. And because those gaps are at critical points (the endpoints of the domain), you could hit them more often than you might think.

Thinking about this more, I think I should clarify my thinking and get your thoughts. For additional clarity, let's define some notation:

  • x_in is the value of x that the user inputs into a given function
  • u_in is the value of x_in after transformation: u_in = Transform::forward(x_in)
  • x_tr is the value of x_in after a round-trip transformation: x_tr = Transform::reverse(u_in)
  • Analytically, x_in and x_tr are equal, but with finite-precision numerics they may not be equal.

Option 1: x is the "ground truth"

  • The bounds on u should be shifted slightly away from u_in so that when the user sets x_in to the lower bound or upper bound that the user passed into the constructor, we are guaranteed that those values will fall within the bounds of u_in regardless of how theoretically accurate u_in is and whether or not x_tr does something weird.
  • In this case, the Databox should save the x_in values for the lower and upper bounds, and the methods that query the bounds should return these saved values even if x_tr for the lower and upper bounds is different.

Option 2: u is the "ground truth"

  • The bounds on u are exactly the bounding values of u_in. This means that the bounds on x are the x_tr bounds rather than the x_in bounds, which may slightly shift the range of x. That means that the bounds on x that the user passed to the constructor as x_in values may or may not actually lie within the range of the Databox.
  • In this case we're already storing the u values, and calling the methods that query the bounds will just call Transform::reverse on the bounding u values.

The current implementation is option 2 (u is the "ground truth"), but thinking about it more, I think it should be option 1 (x is the "ground truth"). If you agree with that, then I'll have to put in some logic to shift the bounds in u to ensure that the bounds in x_in are within the bounds of x_tr. It's the end of the day, so I may try to address that tomorrow.

@BrendanKKrueger
Copy link
Collaborator Author

I thought about this more last night, and realized that my comment from the end of the day yesterday was incorrect. I now think the following is correct:

  • Note: f(x) = forward transform; r(u) = reverse transform.
  • Since ulo is derived from xlo, then f(xlo) = ulo and f(anything less than xlo) will be outside the u bounds. Similarly on the high end. So if x is the ground truth, we don't need to do any corrections to the transformations.
  • If u is the ground truth, then we do need to muck about with things.
    • f(r(ulo)) is not guaranteed to be equal to ulo. If you call the method to query the lower bound, that will return r(ulo). If you then interpolate to that point, you would be interpolating to the point u = f(r(ulo)), which may or may not be within the u bounds.
    • r(f(xlo)) is not guaranteed to be equal to xlo. If you call the constructor with xlo then ulo will be set as f(xlo). If you then call the method to query the lower bound, you get r(f(xlo)) back, which may or may not match xlo.
    • There are two critical points (the lower and upper bounds), and two equations to resolve (f(r(u)) = u, r(f(x)) = x), so you would apply a linear offset to the forward and reverse transformations in order to get everything to work. But that could mess up any careful calibration of the transform itself (for example, the stuff I did with the logarithmic transform to ensure that r(f(0)) = 0).

I think that tells us:

  1. We need to treat x as the "ground truth" representation, which makes sense since the user works in x-space and u-space should be treated like an internal detail not inherently visible to the user.
  2. I need to check the code and ensure that the xlo and xhi values from the constructor are saved and used for the lower-bound and upper-bound query methods.

PS: The GitHub interface is driving me nuts, because I can't seem to make conversation threads, which means that every conversation gets interleaved with every other conversation.

@BrendanKKrueger
Copy link
Collaborator Author

@Yurlungur, I updated how the bounds are handled, and added some additional testing. I found that we end up with an issue in RegularGrid1D::x(const int i), because that's derived from u so you have to make some sort of decision. I added a "TODO" comment with a few possible ways to handle this and very brief comments on each. I'd be interested in your thoughts on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants