Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failing randomly #99

Open
Bachibouzouk opened this issue Nov 8, 2023 · 8 comments
Open

Test failing randomly #99

Bachibouzouk opened this issue Nov 8, 2023 · 8 comments
Assignees

Comments

@Bachibouzouk
Copy link
Collaborator

This test passes and fails randomly, which can be confusing

__________ TestRandSwitchOnWindow.test_coincidence_normality_on_peak ___________

self = <tests.test_switch_on.TestRandSwitchOnWindow object at 0x7f739cb05ff0>

    def test_coincidence_normality_on_peak(self):
        # Create an instance of the Appliance class with the desired parameters
        appliance = Appliance(user=None, number=10, fixed='no')
    
        # Generate a sample of 'coincidence' values
        sample_size = 30
        coincidence_sample = []
        for _ in range(sample_size):
            coincidence = appliance.calc_coincident_switch_on(inside_peak_window=True)
            coincidence_sample.append(coincidence)
    
        # Perform the Shapiro-Wilk test for normality
        _, p_value = stats.shapiro(coincidence_sample)
    
        # Assert that the p-value is greater than a chosen significance level
>       assert p_value > 0.05, "The 'coincidence' values are not normally distributed."
E       AssertionError: The 'coincidence' values are not normally distributed.
E       assert 0.021894052624702454 > 0.05

tests/test_switch_on.py:40: AssertionError
@Bachibouzouk Bachibouzouk self-assigned this Nov 8, 2023
@Bachibouzouk
Copy link
Collaborator Author

Bachibouzouk commented Nov 8, 2023

I did look at the distribution for various appliance numbers with the following code

from ramp.core.core import Appliance
import matplotlib.pyplot as plt

for N in [5,10,20,30]:

    appliance = Appliance(user=None, number=N, fixed='no')

    # Generate a sample of 'coincidence' values
    sample_size = 10000
    coincidence_sample = []
    for _ in range(sample_size):
        coincidence = appliance.calc_coincident_switch_on(inside_peak_window=True)
        coincidence_sample.append(coincidence)

    plt.hist(coincidence_sample, bins=[i+1 for i in range(N)], label=f"{N}")

plt.xlabel("Coincidence number")
plt.ylabel("Number of occurences")
plt.title(f"Sample size={sample_size}")
plt.legend(title="Appliance number")
plt.show()

And here is the result graph

coincidences_vs_number

It seems to me that when the value of appliance number is too small the distribution is not normal at all and when it is larger, than there are two larger tails for coincidence values of 1 and appliance.number (in red a gaussian with the same parameters as provided in method calc_coincident_switch_on)

coincidence_number10

coincidence_number50

In this test the method calc_coincident_switch_on has been provided arguments such that the coincidence should be calculated the following way

coincidence = min(self.number, max(1, math.ceil(random.gauss(mu=(self.number * mu_peak + 0.5), sigma=(s_peak * self.number * mu_peak)))))

The call of min and max explains the tails at 1 and appliance.number in the distribution. What I could not get my head around was the fact the distribution was not gaussian for small values of appliance.number. I removed the math.ceil in the expression above and computed again, I now get gaussian distibutions also for small values of appliance.number

coincidences_vs_number

Was this behavior intended @FLomb ? If not, thank to @ClaudiaLSS test we could spot it :) !

Bachibouzouk added a commit that referenced this issue Nov 30, 2023
The reason of the failure must be investigated in a dedicated
issue #99
@Bachibouzouk
Copy link
Collaborator Author

TestRandSwitchOnWindow::test_coincidence_normality_on_peak seems to also fail randomly

Bachibouzouk added a commit that referenced this issue Dec 4, 2023
The reason of the failure must be investigated in a dedicated
issue #99
@Bachibouzouk
Copy link
Collaborator Author

Bachibouzouk commented Mar 26, 2024

https://doi.org/10.1145/1146238.1146263

Might be a good reference for stochastic testing, their repo is located https://github.com/UDST/urbansim

@Bachibouzouk
Copy link
Collaborator Author

@FLomb - a comment on #99 (comment) would help me to address this failing test :)

@FLomb
Copy link
Contributor

FLomb commented Apr 5, 2024

Hi,

This issue is related to something we dealt with quite some time ago (in PR #7). Unfortunately, most of the discussion with the user who proposed that PR happened via email. So I took the time to look back at those very lengthy email discussions to get my head around this issue. Here are my thoughts.

As you correctly noted, the calls of min and max explain the tails at 1 and appliance.number. Those are not ideal, and it would be nice to find a solution for them, but they are not unexpected.

The fact that the distribution is skewed to the right, especially when simulating few appliances, is not due to the math.ceil that needs to remain in place to ensure we only get integer numbers. Instead, it is due to the hard-coded +0.5 bit. This was initially introduced as part of the above-referenced past PR as a way to skew on-peak distributions to the right on purpose. Nonetheless, looking back at this with some perspective, and having made additional tests today, I think that was a mistake. In fact, the mu_peak parameter is what is supposed to allow users to skew the distribution more towards the right or the left (by making it larger or smaller than the default 0.5). The parameter works perfectly for this aim, and the +0.5 is unnecessary. If we agree that on-peak switch-on events shouldn't be perfectly Gaussian by default, but slightly more in favour of a higher number of appliances switched-on coincidentally, we should just modify the default value for this parameter, not add a +0.5 to the code.

Regarding the test, all of the above means the following:

  • The test should know that the tails are there and not complain (until we improve on that front)
  • The test should also be ready to accommodate users who decide they do not want a normal distribution after all, i.e. users that want to use mu_peak to skew the distribution towards the right or the left, for instance, to simulate extreme scenarios for system robustness.

In addition, regardless of what we do with this test, I propose we scrap the +0.5 parameter.

@FLomb
Copy link
Contributor

FLomb commented Apr 11, 2024

HI @Bachibouzouk, thanks for the update! As a side note, was it on purpose that you edited my comment with your updates instead of replying to it? I find it less straightforward to follow the discussion this way.

Anyhow, I tried replicating your original code, but I couldn't, due to an AttributeError in the appliance. So I've kept working with the code I had made for my own testing, which bypasses the appliance and just tests the coincidence functions you have pasted above. What I get is different from what you do; in my case, removing the +0.5 factor leds to perfectly Gaussian results (see below)
image

Adding the -0.5 that you suggest, instead, visibly skews the distribution towards the left (see below).
image

@Bachibouzouk
Copy link
Collaborator Author

Bachibouzouk commented Apr 11, 2024

As a side note, was it on purpose that you edited my comment with your updates instead of replying to it?

No this was a manipulation mistake, sorry for that! Here is my message (now not in the right order of the timeline)

@FLomb - I think actually the 0.5 factor is needed but not at the same place

The following picture is the distribution where I removed the +0.5 factor
coincidence_number10_correction0

One can see that the distribution is shifted to the right compared to gaussian distribution. If I add a correction shift of -0.5 within the ceil function, then we have a better match between the distribution and the gaussian distribution

coincidence_number10_correction-0 5

So I suggest we change the current code:

            coincidence = min(
                self.number,
                max(
                    1,
                    math.ceil(
                        random.gauss(
                            mu=(self.number * mu_peak + 0.5),
                            sigma=(s_peak * self.number * mu_peak),
                        )
                    ),
                ),
            )

to

            coincidence = min(
                self.number,
                max(
                    1,
                    math.ceil(
                        random.gauss(
                            mu=(self.number * mu_peak),
                            sigma=(s_peak * self.number * mu_peak),
                        ) - 0.5
                    ),
                ),
            )

@Bachibouzouk
Copy link
Collaborator Author

I tried replicating your original code, but I couldn't, due to an AttributeError in the appliance

This is peculiar, which branch were you on? Maybe we can have a bilateral call on this?

Bachibouzouk added a commit that referenced this issue Apr 29, 2024
Bachibouzouk added a commit that referenced this issue Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants