Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Including .hpp files and namespaces #712

Closed
rok-cesnovar opened this issue Oct 5, 2020 · 15 comments
Closed

Including .hpp files and namespaces #712

rok-cesnovar opened this issue Oct 5, 2020 · 15 comments
Labels
cpp-codegen feature New feature or request

Comments

@rok-cesnovar
Copy link
Member

rok-cesnovar commented Oct 5, 2020

I would like to start the discussion of simplifying including C++ code inside a Stan model.
This is not something for this release obviously but I think is worth doing.

Current state

In Cmdstan, if you specify an external hpp file via the USER_HEADER, the code in the included hpp file needs to be in the namespace of the model you are including it into. To me, that is very annoying as that makes the external .hpp file really not seamlessly reusable without some sort of script trickery. It also makes it difficult to share the external C++ code.

Rstan circumvents this by post-processing the generated C++ code and including that hpp file inside the namespace.

I am not sure how Pystan does this if at all.

Proposal

This is more of a question than a proposal. Would it be ok if we did something close to what Rstan does, but in stanc.

Meaning that for #include "code.stan" nothing changes, but #include "code.hpp" would get enclosed in the model namespace. Though we might also want to include C++ code not inside the namespace (for C++ includes for example).

So we have a few options imo:

  • #include "code.hpp" is include as is after the C++ is generated, but #include "code.stanhpp" is included inside the namespace
  • annotatation to mark an include you wish to place inside the namespace. Something like:
@InModelCppNamespace
#include "code.hpp"
@spinkney
Copy link
Contributor

There is a strong need for this capability for users who do not or cannot use rstan. For example, I have a specific use case on utilizing the quadratic solver in Stan models. As all the C++ quadratic solvers are under gpl or other non-bsd friendly licenses this will almost never be part of the Stan code base. Users will always have to include external c++ to include this type of functionality.

There are plenty of times when the licensing issue of R means that corporate use cases must use cmdstan and forego R interfacing altogether. This is separate from the previous point, as even self-written c++ is hard to interface with cmdstan. I only know it can be done as @rok-cesnovar has done it for the quadratic solver.

The last point is on interfacing c++ code with stan versions newer than rstan. Even if the user is not bound to cmdstan and can use rstan there are reasons not to (until rstan reaches release parity with stan versions). Given the historical lags between rstan and cmdstan some users will find that the newer version of stan has the capabilities they need vs what rstan currently offers. If the user also wants to include c++ they're somewhat out-of-luck.

Anyway, I hope I've provided reason enough to make this a worthwhile functionality to introduce.

@smonto2
Copy link

smonto2 commented Oct 16, 2020

I was just running one of my Stan files in R and got a prompt saying: "In this version of rstan, we compile your model as usual, but
also test our new compiler on your syntactically correct model. In this case, the new compiler did not work like we hoped." It seems to be a problem precisely with one of the include files I'm using. Here's the idea:

  • I'm working from an RStudio project which essentially lets all files be self-contained within the folder structure
  • Stan file is at Project_Folder/Code/Stan and so is the #include file called "Functions.stan"
  • The file running the Stan model is at Project_Folder/Code/Estimation
  • Therefore, I use #include "Code/Stan/Functions.stan" in my model and it works great, but it is apparently not recognized by the new compiler.

Just thought I'd share in case it's useful, as I'm sure some other R and RStudio users like me might run into this problem at some point in the future once the new compiler rolls out.

@rok-cesnovar
Copy link
Member Author

@smonto2 thanks. This isnt directly related to this issue. Its actually not a problem as this is a false positive. The next version of rstan will handle includes a bit differently which is why this popped up. But its nothing to worry about.

@seantalts
Copy link
Member

+1 some way in Stan of including C++. Not sure what the best syntax would be, #include "code.hpp" where we just rely on the hpp extension seems fine to me. I would sort of prefer some kind of module-style import mechansim; we're at a junction where we don't necessarily get much out of using the decades-old-mostly-derided string pasting mechanism behind C and C++ includes. So we could for example allow something like

import code

model {
  code::function(...);
}

import could Do The Right Thing depending on whether "code.stan" or "code.hpp" exists. Or we could import "code.hpp" but still use code::function().

@rok-cesnovar
Copy link
Member Author

Yeah, I also like the import code. This is somethin @bbbales2 was talking about in connection with a "StanCRAN" idea - to make repo of Stan code users could reuse (It could also be C++ code).

In the case of C++ external code, import could handle including the Stan function header

functions {
    real foo(real a, real b);
}

and also the underlying C++ code. So I see this as being complementary to a #include <code.hpp> approach.

@bbbales2
Copy link
Member

+1 some way in Stan of including C++

I'm more for streamlining development of Math forks and using them with cmdstanX rather than injecting C++ into Stan directly from an interface. Math has all the infrastructure for developing extensions to Math (especially tests and whatnot), and so I think it makes sense for any C++ development to happen there.

@hyunjimoon & I are planning on writing a case study on how to use a fork of Math with cmdstanr via Rok's script here: stan-dev/cmdstanr#300 (comment) in that theme

I would sort of prefer some kind of module-style import mechansim

Yeah if a design doc for something like this appears I will definitely contribute to it.

@rok-cesnovar
Copy link
Member Author

I'm more for streamlining development of Math forks

This is nice, but the problem is that once you have a Math fork you need to keep it up-to-date. And that is a hassle. Especially if the payoff is the ability to include one function. Installing git clones is also much slower.

To me, the biggest use-case for me is a non-BSD3 code that will never make it in the official Math repo but is useful. And all the wild PDE stuff Ivan is working on for example.

@bbbales2
Copy link
Member

This is nice, but the problem is that once you have a Math fork you need to keep it up-to-date. And that is a hassle.

This is true for code in a fork or code maintained somehow else. If Stan changes and breaks it, you gotta debug and fix your code.

Developing in a fork of the Math library also means all your includes and setup and build is ezpz.

The other aspect of maintenance are tests. With a fork, you can just put your tests in the regular place and whammo, you (or anyone else coming down the line) can run them.

I am very skeptical that developing C++ away from Math is a good idea. If there is another way to develop C++ outside of Math, I think we should change Math to do whatever that other thing is doing, and then it will again be the best place to develop C++ for Stan :D.

Anyway, I'm guilty of encouraging C++ outside of Math for sure, I just wanna go the other direction on this now lol.

And all the wild PDE stuff Ivan is working on for example.

I looked it up and he's doing forks apparently: https://discourse.mc-stan.org/t/using-petsc-library-together-with-stan/17550

@rok-cesnovar
Copy link
Member Author

This is true for code in a fork or code maintained somehow else. If Stan changes and breaks it, you gotta debug and fix your code.

Its not a problem only if Stan breaks it. You also need to keep the Math fork in sync in order to support new features with the same fork.
Lets say I have my own function that I have written C++ for and did that in Math for Stan 2.24. If I also want use binary vectorized functions in the same model, I need to sync the fork to the latest version. Granted that you need to do this every three months, which is not a really huge effort, but still, only for that one function I want to share with coworkers or the community.

Not to mention that if you have it written in a Math fork in the stan::math namespace, that also needs to be added to stanc. If you want to do it properly, you need to have a math fork and a stanc3 binary plus cmdstan and stan fork to properly sync everything.

If I had a few more functions then its probably debatable whether its easier to have this as opposed to shipping a bunch of files around. Bot for 1 or 2 functions I think its much easier to have a single file.

I am very skeptical that developing C++ away from Math is a good idea.

This issue is meant for the use case of single (or a few) C++ functions you want to share around, not to do substantial development away from Math. There are some users that do this with rstan, which is the only interface that realistically supports it, there are support questions for this every few weeks.

I looked it up and he's doing forks apparently

For non-rstan development that is the only realistic way right now. Which is why I opened this issue. There is the USER_HEADER thing in cmdstan but you need to make a duplicate and change the namespace everytime you want to use it in another model. Ivan's case also requires some makefile logic so that is a bit of a different case there.

@bbbales2
Copy link
Member

bbbales2 commented Oct 20, 2020

Meaning that for #include "code.stan" nothing changes, but #include "code.hpp" would get enclosed in the model namespace. Though we might also want to include C++ code not inside the namespace (for C++ includes for example).

Is the solution to ask users to write their functions in the stan::math namespace? That is imported via a using namespace stan::math statement in the model block. That's how we get stan::math in there anyway.

--- soapbox ---

I don't think there's a good solution until we define libraries/namespaces for Stan code in Stan and then think about a C++ FFI from there (with something like rstantools). I still think Math forks are better even for a couple files, but I agree they aren't perfect that good (edit: maybe even 'that good' is too aggressive -- it's kinda no fun any way you head at this problem).

--- /soapbox ---

@bbbales2
Copy link
Member

Question. @hyunjimoon and I were working on some custom Math functions without touching stanc3. So we have a custom Math branch over here with a custom function in it: hyunjimoon/math#1

The stan signature is:

real[] interp1(real[] x, real[] y, real[] x_test);

So we compile out stan model with --allow-undefined=TRUE.

We end up getting ambiguity errors:

/home/bbales2/.cmdstanr/cmdstan-hyunji/stan/lib/stan_math/stan/math/prim/fun/interp1.hpp:8:55: note: candidate function [with T1 = double, T2 = double, T3 =
      stan::math::var_value<double>]
        inline std::vector<return_type_t<T1, T2, T3>> interp1(const std::vector<T1>& xData, const std::vector<T2>& yData, const std::vector<T3>& xTest, std::ostream...
                                                      ^
interp1.hpp:72:1: note: candidate function [with T0__ = double, T1__ = double, T2__ = stan::math::var_value<double>]
interp1(const std::vector<T0__>& x, const std::vector<T1__>& y,

The first is the function we defined and the second is the template generated by stanc3.

I guess I don't have anything other to say that it is a difficulty, but it seems related to the conversation here.

I'm tempted to say why don't we remove the automatically generated function definition? That would make this work and it would also make debugging this stuff much easier (things would fail at compile and not link), but that might break something else and it also seems hacky. Ugh.

@SteveBronder
Copy link
Contributor

@bbbales2 if yinz are looking at interpolation stuff you may want to check out @pgree 's interpolation PR for stan math

stan-dev/math#1814

@bbbales2
Copy link
Member

Oh yeah we were just using it as a convenient example to work with. If that gets in we can just change it to something else.

@yizhang-yiz
Copy link

I took a stab at adding namespace token and corresponding parser (branch https://github.com/stan-dev/stanc3/tree/namespaced_fn), in order to support the following.

data {
  int<lower=0> N;
  int<lower=0,upper=1> y[N];
}
parameters {
  real<lower=0,upper=1> theta;
}
transformed parameters {
  real a = 1;
  real b = foo::bar::baz(a);
}
model {
  theta ~ beta(b,1);
  y ~ bernoulli(theta);
}

This is intended to allow external cpp function (including high-order) so I turned off semantic check for this particular rule. In the end the hpp file just contains a c++ type namespace function call.

@WardBrian
Copy link
Member

A few of the suggestions here (like not emitting a prototype for external functions) have been implemented, and I think we'd probably not end up doing the original suggestion of directly #include-ing C++ files (see #1278 for an alternative, either thing would need a design doc), so I'm going to close this older issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cpp-codegen feature New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants