-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing support for precompiled header #151
Comments
To test compilation time, I'd put copies of the model in different places to make sure CmdStan isn't just reusing the BridgeStan compiled model. Or try to clean out the compiled versions in between. But if that's the difference, I'd expect CmdStanModel to be a split second. In CmdStan, we only have to translate the Stan program to a C++ class. The sampler and command-line tools are compiled in separate translation units and then linked. The linking does need to happen, though, so that takes a bit of time. Do you know if BridgeStan and CmdStanPy compile at the same optimization level? I hope CmdStanPy isn't default to |
I wonder if it’s slower to compile position independent code? It’s also possible some of the precompiled header logic in CmdStan didn’t make it into our makefile, though that stuff is very brittle in my experience |
I don't think the difference is due to caching, just to make sure I gave it slightly different models, and that didn't change the timings. I'm also seeing the pretty consistently in by benchmarks with posteriordb.
Ah, that does make sense. Still doesn't explain the difference I think though...
I don't see a big difference in time for a gradient evaluation between bridgestan and cmdstan, so I don't think that's it either. A small difference might be lost in all the other things that differ, but I'm pretty sure something like I guess it could also be because we also compile the hessian? Or something in a makefile isn't right and it recompiles something it shouldn't... |
Hessians will add overhead. But it shouldn't be that much because the Hessian code itself is fast and there's nothing else to compile in forward mode other than the functions being used in the code. But then that's also true for the reverse mode functions, so maybe it is just the Hessians. The only way I know how to test that would be modifying the code, but if it really is just the Hessians slowing things down, then we probably want to be able to conditionally compile w/o Hessians. |
If you compile with autodiff hessians you’re also including all of stan/math/fwd, which definitely adds a lot of files for the preprocessor to handle |
CmdStanPy doesn't change the default optimization level - it just runs the CmdStan makefile. |
Seems the hessian code is behind a compile time flag right now, and that was off in the code above. To make sure I just deleted all references to the hessian in |
I can take a closer look tomorrow, with that ruled out my questions would be: |
I was just running into missing symbol issues with models involving sundials, and that made be check the variables in the makefile. Things there don't seem right to me, and that might be the source of this issue as well. It looks like some setup code never ended up being executed. For instance the sundials libs aren't precompiled (no sundials shared libs anywhere in the The output of
|
I'm not able to re-create this with a warmed-up (meaning I've built at least one model previously) CmdStan output
$ /usr/bin/time make examples/bernoulli/bernoulli
--- Translating Stan model to C++ code ---
bin/stanc --o=examples/bernoulli/bernoulli.hpp examples/bernoulli/bernoulli.stan
--- Compiling, linking C++ code ---
g++ -std=c++1y -pthread -D_REENTRANT -Wno-sign-compare -Wno-ignored-attributes -I stan/lib/stan_math/lib/tbb_2020.3/include -O3 -I src -I stan/src -I stan/lib/rapidjson_1.1.0/ -I lib/CLI11-1.9.1/ -I stan/lib/stan_math/ -I stan/lib/stan_math/lib/eigen_3.4.0 -I stan/lib/stan_math/lib/boost_1.78.0 -I stan/lib/stan_math/lib/sundials_6.1.1/include -I stan/lib/stan_math/lib/sundials_6.1.1/src/sundials -DBOOST_DISABLE_ASSERTS -c -Wno-ignored-attributes -x c++ -o examples/bernoulli/bernoulli.o examples/bernoulli/bernoulli.hpp
g++ -std=c++1y -pthread -D_REENTRANT -Wno-sign-compare -Wno-ignored-attributes -I stan/lib/stan_math/lib/tbb_2020.3/include -O3 -I src -I stan/src -I stan/lib/rapidjson_1.1.0/ -I lib/CLI11-1.9.1/ -I stan/lib/stan_math/ -I stan/lib/stan_math/lib/eigen_3.4.0 -I stan/lib/stan_math/lib/boost_1.78.0 -I stan/lib/stan_math/lib/sundials_6.1.1/include -I stan/lib/stan_math/lib/sundials_6.1.1/src/sundials -DBOOST_DISABLE_ASSERTS -Wl,-L,"/home/brian/Dev/cpp/cmdstan/stan/lib/stan_math/lib/tbb" -Wl,-rpath,"/home/brian/Dev/cpp/cmdstan/stan/lib/stan_math/lib/tbb" examples/bernoulli/bernoulli.o src/cmdstan/main.o -Wl,-L,"/home/brian/Dev/cpp/cmdstan/stan/lib/stan_math/lib/tbb" -Wl,-rpath,"/home/brian/Dev/cpp/cmdstan/stan/lib/stan_math/lib/tbb" stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_nvecserial.a stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_cvodes.a stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_idas.a stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_kinsol.a stan/lib/stan_math/lib/tbb/libtbb.so.2 -o examples/bernoulli/bernoulli
rm -f examples/bernoulli/bernoulli.o
18.67user 0.68system 0:19.37elapsed 99%CPU (0avgtext+0avgdata 1322664maxresident)k
0inputs+7608outputs (0major+440066minor)pagefaults 0swaps BridgeStan output
$ /usr/bin/time make test_models/bernoulli/bernoulli_model.so
--- Translating Stan model to C++ code ---
./bin/stanc --o=test_models/bernoulli/bernoulli.hpp test_models/bernoulli/bernoulli.stan
--- Compiling C++ code ---
g++ -std=c++1y -pthread -D_REENTRANT -Wno-sign-compare -Wno-ignored-attributes -I ./stan/lib/stan_math/lib/tbb_2020.3/include -O3 -I ./stan/src -I ./stan/lib/rapidjson_1.1.0/ -I ./stan/lib/stan_math/ -I ./stan/lib/stan_math/lib/eigen_3.4.0 -I ./stan/lib/stan_math/lib/boost_1.78.0 -I ./stan/lib/stan_math/lib/sundials_6.1.1/include -I ./stan/lib/stan_math/lib/sundials_6.1.1/src/sundials -fPIC -DBOOST_DISABLE_ASSERTS -c -x c++ -o test_models/bernoulli/bernoulli.o test_models/bernoulli/bernoulli.hpp
--- Linking C++ code ---
g++ -std=c++1y -pthread -D_REENTRANT -Wno-sign-compare -Wno-ignored-attributes -I ./stan/lib/stan_math/lib/tbb_2020.3/include -O3 -I ./stan/src -I ./stan/lib/rapidjson_1.1.0/ -I ./stan/lib/stan_math/ -I ./stan/lib/stan_math/lib/eigen_3.4.0 -I ./stan/lib/stan_math/lib/boost_1.78.0 -I ./stan/lib/stan_math/lib/sundials_6.1.1/include -I ./stan/lib/stan_math/lib/sundials_6.1.1/src/sundials -fPIC -DBOOST_DISABLE_ASSERTS -Wl,-L,"/home/brian/Dev/cpp/bridgestan/stan/lib/stan_math/lib/tbb" -Wl,-rpath,"/home/brian/Dev/cpp/bridgestan/stan/lib/stan_math/lib/tbb" -shared -lm -o test_models/bernoulli/bernoulli_model.so test_models/bernoulli/bernoulli.o ./src/bridgestan.o -Wl,-L,"/home/brian/Dev/cpp/bridgestan/stan/lib/stan_math/lib/tbb" -Wl,-rpath,"/home/brian/Dev/cpp/bridgestan/stan/lib/stan_math/lib/tbb" ./stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_nvecserial.a ./stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_cvodes.a ./stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_idas.a ./stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_kinsol.a ./stan/lib/stan_math/lib/tbb/libtbb.so.2
rm -f test_models/bernoulli/bernoulli.o
18.10user 0.76system 0:18.87elapsed 99%CPU (0avgtext+0avgdata 1339500maxresident)k
0inputs+5304outputs (0major+439114minor)pagefaults 0swaps Both roughly 18s on my machine. If I set |
It seems the difference I'm observing is because there is a I'm not sure why the cache file was created in one case but not the other. I've been changing the g++ version quite a bit, maybe that had something to do with it (and if the cache file is not there I get deprecation warnings from boost with g++ 12 and 13). Anyway, I don't think this really is a bridgestan issue after all, so I'm closing this |
Yeah, that is the “precompiled header” file. I’m not sure why it would be one place but not the other, since the math library is the same in both |
Looking at it again, I think the code that creates the precompiled header is only in the cmdstan repo, not in the stan or stan_math repo, so it just isn't available in bridgestan. Given that it seems to give a pretty good speedup (at least for me, no idea why @WardBrian didn't see the same thing). The code in the makefile is here. I guess it could be included in the bridegestan makefile as well? Or maybe the stan repo would be a better place for it, so that both projects can share the makefile rules? |
I think I have the precompiled header disabled in my cmdstan installation. It's a decent speedup, but if we do support it I would make the opposite choice of CmdStan and disable it by default. It's a pretty constant source of cmdstanpy bug reports due to things like xcode updating and the old file becoming invalid |
Ah, I see your point... The file size of the precompiled header made me wonder what it could be that makes it so big, so I had a quick look at what headers get included when building the model. Wasn't all that that conclusive, and the prepocessor output is only(?) 18MB. I guess the most interesting part is that various parts of boost make up almost half of the 300_000 lines after the prepocessor runs. Not sure it's useful to anyone, but just in case, here's the link: https://gist.github.com/aseyboldt/6841525395e37e8ba2fceeaa5021b547 |
I believe @rok-cesnovar knows more about pre-compiled headers/helped set them up for CmdStan. The doc for them always has lots of scary warnings, like
(emphasis mine)
This seems to me like it would be a huge problem with |
I'd like to see how much precompiled headers save in the way of time. They are a huge pain in the context of CmdStanPy because the PCH is constantly breaking and having to be regenerated, but the suggested fix (which takes a couple minutes to run) gets lost in a huge scrolling error message we've never been able to tamp down. |
For me the precompiled header takes compiling a model from 17s to 7s |
The precompiled headers are huge because they include the entire math library plus all the header-only dependencies (boost, eigen, ...). The precomp. headers are much smaller with clang (see stan-dev/cmdstan#872), at least they were a few years back. clang is particularly annoying with precompiled headers because it breaks for any minor or patch version bump. And its seems that macs just update clang automatically.
CmdStan handles this by having separate precompiled headers for threads. So when you compile the first model with threads it builds separate precompiled headers with _threads suffix.
Similar numbers for me. It has a huge effect. If it didn't I think we would have never made it the default in CmdStan. |
Could we do something similar to the How does the compiler know to use |
hmmmm, that seems like a great idea at first sight!
This makefile trick: stan-dev/cmdstan#882 |
If I understand this correctly, the reason that the precompiled header helps so much is because without it the compiler has to parse ~300_000 lines of code in ~3000 different files each time a model is compiled. This would also explain why reducing the optimization level helps so little with compile times. The optimization step in general is just shorter than the parsing step? Then maybe a cleaner solution would be to reduce the number of includes that aren't actually needed in a model. The special functions code from boost for instance is already 26_000 lines of code, but I'd guess only a very small subset of models actually uses more than one or two of those. I guess that is easier said than done though, as that would require stanc to pick and choose includes depending on which functions are used in the model, instead of pulling everything in each time? |
Not "including the world" is a project we've been talking about in the math library since I joined the project (and probably before). A fair number of things in math end up including really broad headers, which causes this problem and makes conditional testing really difficult (if you ask a test file "did any of your included headers change" the answer is "yes" with probability approaching 1, given the current state of things). It's worthwhile to do, but it's a big undertaking and is one of those things that only starts showing real benefits when you've moved a very large fraction of things over to the different style |
Ah, those projects are always great :D But even if the math library itself does this perfectly, as long as stanc always includes everything anyway (I think that's what's going on at least...) this wouldn't help. If so, then maybe fixing the includes created by stanc first would make it easier to actually see at least some incremental improvement? I get your point about this being a larger project though :-) |
Yep, the Stan repo and stanc3 would both also need changes to take advantage of it. For now I'm curious if @rok-cesnovar and I can make the PCH experience sufficiently "nice". I didn't realize we could stuff arbitrary information into the filenames for these (like we do for |
If I compile models using cmdstan, it seems to consistently finish about twice as fast as if I compile a model with bridgestan:
Then
I would have expected pretty much no difference, or even a bit faster comiles for bridgestan, because it doesn't have to also compile the sampler and the command line tool?
Update This seems to be because cmdstan precompiles the header
The text was updated successfully, but these errors were encountered: