Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seed behavior for generate #1963

Open
h-mayorquin opened this issue Sep 6, 2023 · 1 comment
Open

Seed behavior for generate #1963

h-mayorquin opened this issue Sep 6, 2023 · 1 comment
Assignees
Labels
core Changes to core module

Comments

@h-mayorquin
Copy link
Collaborator

h-mayorquin commented Sep 6, 2023

I did not have time to participate in the review of #1948.

We added the following auxiliary function to generate:

def _ensure_seed(seed):
# when seed is None:
# we want to set one to push it in the Recordind._kwargs to reconstruct the same signal
# this is a better approach than having seed=42 or seed=my_dog_birthday because we ensure to have
# a new signal for all call with seed=None but the dump/load will still work
if seed is None:
seed = np.random.default_rng(seed=None).integers(0, 2**63)
return seed

Which generates a different seed at runtime when no default seed is provided.

I think I understand the rationality, it tries to achieve two things.

  1. The results of the generator should be the same after saving it and loading (@samuelgarcia, thanks for adding tests for this!).
  2. Every call to a generator should generate new traces if the seed is not specified. Let's call this variability by default.

I don't agree completly with 2. When less experienced users utilize let's stay generate_ground_truth_recording with default parameters I want the function to have deterministic behavior. This is better for debugging and error finding as when I run the function on my system it wiill be as similar as possible as their run. With the current behavior given by ensure_seed I will not be able to reproduce their exact results unless I have their specific seed that was generated at runtime. This can be extracted but it will be difficult for new users and it will create one more round of questions and requests when discussing an issue. A more severe version of this problem is presented if users find an error when they run generate_ground_truth_recording (because the specific seed was problematic), but then they run the function again and the error dissappear. This undermines their trust in their algorithm (they just know something is wrong but can't reproduce) and makes it hard for them to report the problem to us.

I think that @samuelgarcia made this decision from a power user point of views as this is more convenient and it makes sense semantically: NoiseGeneratorRecording should generate different noise everytime you call it (just like np.random.rand). But I think error reporting and debugging is an imporant use case of these functions which I think is better served by having a specific seed by default. This is better for less experienced users. Power users that want to actually generate variability are in a better position to know what a seed is, change it themselves and roll a function like ensure_seed by themselves if needed.

Two other minor advantages of having a seed by default instead of generating at runtime:

  1. It is better for provenance, kwargs saves exactly what the user inputed instead of a value generated on the fly.
  2. Less code in our library and easier to document the behavior.
@h-mayorquin h-mayorquin added the core Changes to core module label Sep 6, 2023
@h-mayorquin
Copy link
Collaborator Author

I haved changed my view somehow and I think that most of the functions that interact with the user should actually have the behavior that @samuelgarcia implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Changes to core module
Projects
None yet
Development

No branches or pull requests

2 participants