Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Stan 2.33] Pathfinder algorithm #684

Closed
WardBrian opened this issue Aug 24, 2023 · 28 comments · Fixed by #686
Closed

[Stan 2.33] Pathfinder algorithm #684

WardBrian opened this issue Aug 24, 2023 · 28 comments · Fixed by #686

Comments

@WardBrian
Copy link
Member

WardBrian commented Aug 24, 2023

Summary:

Expose Pathfinder: stan-dev/stan#3123 stan-dev/cmdstan#1155

This issue is to sketch out a design and for coordinating with the CmdStanR team (@jgabry)

Description:

Pathfinder is useful for standalone VI and for initializing MCMC, so we should cater to both usages.

New method on CmdStanModel: pathfinder.

Signature:

    def pathfinder(
        self,
        data: Union[Mapping[str, Any], str, os.PathLike, None] = None,
        *,
        init_alpha: Optional[float] = None,
        tol_obj: Optional[float] = None,
        tol_rel_obj: Optional[float] = None,
        tol_grad: Optional[float] = None,
        tol_rel_grad: Optional[float] = None,
        tol_param: Optional[float] = None,
        history_size: Optional[int] = None,
        draws: Optional[int] = None, # controls size of actual output
        num_single_draws: Optional[int] = None, # corresponds to the num_draws argument on cmdline
        num_paths: Optional[int] = None,
        max_lbfgs_iters: Optional[int] = None,
        num_elbo_draws: Optional[int] = None,
       # arguments standard to all methods
        seed: Optional[int] = None,
        inits: Union[Dict[str, float], float, str, os.PathLike, None] = None,
        output_dir: OptionalPath = None,
        sig_figs: Optional[int] = None,
        save_profile: bool = False,
        show_console: bool = False,
        refresh: Optional[int] = None,
        time_fmt: str = "%Y%m%d%H%M%S",
        timeout: Optional[float] = None,
    ) -> CmdStanPathfinder:

This is probably more-or-less what you would expect going off of the CmdStan implementation. One notable thing is - at the moment - I am not providing a way to save individual pathfinder runs (the save_single_paths arg). If things from the diagnostic files produced by this argument end up being important to initializing MCMC, we can add support then, but I don't think it is valuable in the first implementation

New Class CmdStanPathfinder

This will look broadly similar to CmdStanLaplace, with the main methods being those that let you access the draws: stan_variable, draws_pd, etc

For supporting the initialization of MCMC, I think it is important to also have a method like

create_inits(self, seed = None, num_chains = 4)

This will return a list of dictionaries of 4 random draws from the pathfinder outputs in the format required for the inits argument to other methods. We can also consider adding CmdStanPathfinder as an option for the inits argument directly, in which case this method would just be called.

@jgabry
Copy link
Member

jgabry commented Aug 24, 2023

It's a bit unfortunate that laplace has argument draws, variational has output_samples and pathfinder has num_draws. Even if CmdStan chose not to make these argument names consistent, should we do that in CmdStanPy/R?

@jgabry
Copy link
Member

jgabry commented Aug 24, 2023

I would suggest just draws for all of them, but most important is just that they’re consistent

@WardBrian
Copy link
Member Author

sampling uses “iter_sampling”, which is the worst of the bunch I think

@jgabry
Copy link
Member

jgabry commented Aug 24, 2023

Yeah, I think we should consider picking one name for the number of draws and deprecating all the other ones.

@WardBrian WardBrian mentioned this issue Aug 25, 2023
2 tasks
@WardBrian WardBrian linked a pull request Aug 25, 2023 that will close this issue
2 tasks
@mitzimorris
Copy link
Member

the problem with draws vs iter_sampling vs. iter_warmup et al, is that a draw is from the posterior and the assumption is that during warmup, a chain has not yet converged therefore the iteration is not necessarily a draw.
a sample is a collection of draws from the posterior.

the question is do we want argument names to be semantically correct?

@wds15
Copy link

wds15 commented Aug 25, 2023

how about draws to refer to the posterior (the "real" thing people want) and warmup_draws for the warmup phase... which is something people have to do to get to what they want...

@mitzimorris
Copy link
Member

mitzimorris commented Aug 25, 2023

It's a bit unfortunate that laplace has argument draws, variational has output_samples and pathfinder has num_draws.

Pathfinder has num_psis_draws because multi-path Pathfinder takes all of the draws from the set of individual Pathfinders, does some kind of PSIS reweighting, and the chooses num_psis_draws from that set of draws.

for CmdStanR and CmdStanPy I agree that we should wrap Pathfinder and not expose save_single_paths,
therefore, the returned sample size should be argument draws.

w/r/t unfortunateness of iter_warmup and iter_sampling - indeed ugly. but that was the reasoning we went through back in the day.

@WardBrian
Copy link
Member Author

What should the draws argument in Python/R correspond to on the command line? num_draws? num_psis_draws? Both (unless/until we expose single paths)?

@mitzimorris
Copy link
Member

in the case where num_paths=1, it corresponds to the draws argument to single-path pathfinder.
otherwise, it correspond to the num_psis_draws argument to multi-path pathfinder.

@jgabry
Copy link
Member

jgabry commented Aug 25, 2023

how about draws to refer to the posterior (the "real" thing people want) and warmup_draws for the warmup phase... which is something people have to do to get to what they want...

I like iter_sampling -> draws, but maybe for warmup it would make more sense to have warmup_length or warmup_iter or something like that instead of warmup_draws because what are we really drawing from during warmup?

@WardBrian
Copy link
Member Author

I guess my question is if it ever makes sense to request a different number of single-pathfinder draws even when the thing you really care about is the PSIS draws. What if num_psis_draws > num_draws * num_paths?

@jgabry
Copy link
Member

jgabry commented Aug 25, 2023

What if num_psis_draws > num_draws * num_paths?

How does the CmdStan implementation handle that?

@WardBrian
Copy link
Member Author

It doesn’t do any checks, so I suppose the question falls to the services implementation by @SteveBronder

my guess is you probably end up with repeated draws in the PSIS output?

@WardBrian
Copy link
Member Author

I think we can do something like

        draws: Optional[int] = None, # controls size of actual output
        num_single_draws: Optional[int] = None, # corresponds to the num_draws argument on cmdline

when num_paths == 1, these are synonyms and it is an error to set both of them (to different values)

@mitzimorris
Copy link
Member

I like the create_inits suggestion.

there has been a long discussion on Stan developer slack about whether or not Pathfinder can also provide inits for the metric and stepsize. cf: https://mc-stan.slack.com/archives/C7V03NJHL/p1692880182694339
tldr; sometimes useful, open research question.

@WardBrian
Copy link
Member Author

whether or not Pathfinder can also provide inits for the metric and stepsize

I think this will eventually be important, but I don't want to think about the API/implementation of this until more people have had the ability to play around

@mitzimorris
Copy link
Member

I just put a comment in the CmdStanR Pathfinder discussion: stan-dev/cmdstanr#840 (comment)

@jgabry
Copy link
Member

jgabry commented Aug 29, 2023

Thanks!

@avehtari
Copy link

avehtari commented Sep 7, 2023

I would like to have an option eval_lp: bool = TRUE, as evaluating lp for many draws from the approximation can take much longer than any other part of Pathfinder, and there are cases we would trust Parhfinder sufficiently to be fine skipping the Pareto-k diagnostic

@mitzimorris
Copy link
Member

I would like to have an option eval_lp: bool = TRUE, as evaluating lp for many draws from the approximation can take much longer than any other part of Pathfinder, and there are cases we would trust Parhfinder sufficiently to be fine skipping the Pareto-k diagnostic

I think we'd need to add this option to the stan::services::pathfinder method, no?

@avehtari
Copy link

avehtari commented Sep 7, 2023

Yes, it needs to be first added to Stan services. There's an open issue for that stan-dev/stan#3215

@SteveBronder
Copy link

@avehtari just so I understand, don't we need the lp evaluations to then do PSIS? If yes then would that mean we turn off psis when eval_lp: false and just return the samples?

@avehtari
Copy link

avehtari commented Sep 8, 2023

Correct

@mitzimorris
Copy link
Member

when eval_lp: false, do we only run 1 single path pathfinder?

@avehtari
Copy link

avehtari commented Sep 8, 2023

It makes more sense for single path, but with multi path it can still be used to figure out if there are well-separated modes (but lp is needed to see if some of them are minor modes)

@mitzimorris
Copy link
Member

if eval_lp: false and multi-chain, are all draws from all single-path pathfinder returned? or a random sample of the draws from the single-path pathfinder? if the latter, do we want chain id?

@avehtari
Copy link

avehtari commented Sep 8, 2023

You have great questions! chain id would help to diagnose well separated modes, but if that is difficult to add then just returning the number of draws we would return with lp true would be fine.

@bob-carpenter
Copy link
Contributor

When we have multi-path, we're always generating those log densities. It's an independent question of whether to include them in the output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants