Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add implementation details docs page, note on propto=True for log_density #192

Merged
merged 3 commits into from
Dec 7, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/internals.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,4 @@ going on "behind the scenes".
internals/development
internals/testing
internals/documentation

internals/details
36 changes: 36 additions & 0 deletions docs/internals/details.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
Implementation Details
WardBrian marked this conversation as resolved.
Show resolved Hide resolved
======================


.. _log_density_propto:

``log_density`` with ``propto=true``
------------------------------------

The log density function provided by a Stan model has
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would phrase this as having the ability to drop constants. Then I'd give simple recommendations:

  1. If you're running MCMC that needs gradients and only density up to proportion, then use propto = true. Setting propto=true will be at least as fast.
  2. To evaluate the log density on double values to match, use propto=true. Setting propto=true may be slower or faster, depending on the cost of calculating normalizing constants (propto=false) and the cost of autodiff (required to get the right answer if propto=true).

I don't think we need to say much more than that.

Why the double back ticks?

I couldn't understand what lines 34/35 were doing.

I'd just give simple recommendations:

  • doing gradient-based calculations with autodiff: use propto because it's faster
  • doing log density evals without autodiff: depending

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The double back ticks are how ReStructuredText (that sphinx uses) wants code formatted. They're equivalent to single backticks in Markdown. Lines 34/35 are also a RST detail to get a link that is also code formatted. The result is that "|reduce_sum|" above gets rendered as reduce_sum

I agree phrasing it in terms of a suggestion for each case is clearer. I left the explanation in, but under a sub-heading for the curious.

the ability to be calculated up to an additive constant.
This is indicated by the ``propto`` ("``prop``\ortional ``to``")
argument to function.
Usually, this is done for efficiency reasons, as the constant
terms may require computation that is not necessary for calculating
gradients or most sampling algorithms.

However, in the case of the ``log_density`` function (which does not calculate
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to restate the "only needs the log density up to a proportion" bit again in this paragraph.

Copy link
Collaborator Author

@WardBrian WardBrian Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? We're recommending setting it to False in this paragraph, which is safe for all usages

derivatives), this argument may make the calculation **slower**. This is because
the implementation of this argument relies on the presence of autodiff types
(``var``\s, in the terminology of Stan's math library) to determine what is or
is not constant with respect to the parameters. If not for this (and indeed, if
the argument is set to ``false``), the calculation of the log density is able to be
computed using only primitive types (``double``\s).

The consequence of this is that, if the ``propto`` argument is set to ``true``,
the ``log_density`` function will at a minimum need to perform more allocations
than if it were set to ``false``. There may be an even higher cost, due to functions
such as |reduce_sum|_ or Stan's ODE integraters changing their behavior when applied
to autodiff types and performing additional work which is thrown away when gradients
are not needed.


.. |reduce_sum| replace:: ``reduce_sum``
.. _reduce_sum: https://mc-stan.org/docs/stan-users-guide/reduce-sum.html