Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hillas image test #34

Open
wants to merge 143 commits into
base: master
Choose a base branch
from
Open

Hillas image test #34

wants to merge 143 commits into from

Conversation

ellijobst
Copy link
Collaborator

Adding tests and scripts for image comparison and hillas parameter comparison

@ellijobst ellijobst requested a review from aleberti March 14, 2022 10:24
@ellijobst
Copy link
Collaborator Author

Hi Julian,

I ran the image comparison over these files in the "/remote/ceph/group/magic/MAGIC-LST/Data/MAGIC/CrabNebula/MCP/image_comparison" directory:

"20220306_M1_05101249.001_Y_CrabNebula-W0.40+035.root",
"20220306_M1_05101249.002_Y_CrabNebula-W0.40+035.root",
"20220306_M2_05101249.001_Y_CrabNebula-W0.40+035.root",
"20220306_M2_05101249.002_Y_CrabNebula-W0.40+035.root",

"20220306_M1_05101249.001_I_CrabNebula-W0.40+035.h5",
"20220306_M1_05101249.002_I_CrabNebula-W0.40+035.h5",
"20220306_M2_05101249.001_I_CrabNebula-W0.40+035.h5",
"20220306_M2_05101249.002_I_CrabNebula-W0.40+035.h5",

for 500 events per file. So in total a bit less than 2000 events were compared.
I got these results:

FAILED test_image_comparison.py::test_image_comparison[dataset_calibrated4-dataset_images4] -
assert [1465, 1501, 1551] == []
FAILED test_image_comparison.py::test_image_comparison[dataset_calibrated5-dataset_images5] -
assert [14575] == []
FAILED test_image_comparison.py::test_image_comparison[dataset_calibrated6-dataset_images6] -
assert [1304, 1814] == []
FAILED test_image_comparison.py::test_image_comparison[dataset_calibrated7-dataset_images7] -
assert [14186, 14321, 14332, 14501] == []

So ~10/2000 events failed the test. I checked the images and for most of them it were just 1 or 2 pixels that were different.

Maybe we can put some threshold, so that the test does not fail if it is just a certain percentage of events that have an error (similar to the way it is done for the hillas/stereo params comparison). What percentage would you suggest?

@aleberti
Copy link
Collaborator

So, if we compare images using time slices instead of time in ns, comparing 50000 events, I get 8 events with differences (mostly 1 or 3 pixels different), so 0.016%. Is this acceptable @jsitarek? Also, I made image comparison faster, 50000 events compared in less than 10minutes.

@jsitarek
Copy link
Collaborator

So, if we compare images using time slices instead of time in ns, comparing 50000 events, I get 8 events with differences (mostly 1 or 3 pixels different), so 0.016%. Is this acceptable @jsitarek? Also, I made image comparison faster, 50000 events compared in less than 10minutes.

thanks for the test. It looks fine. Let's set the automatic test at the level of 0.03%

@ellijobst
Copy link
Collaborator Author

Hi Julian,

in the last few weeks, Alessio and me ran some test_image_comparison tests on these files:
"20210314_M1_05095172.001_Y_CrabNebula-W0.40+035.root", "20210314_M1_05095172.002_Y_CrabNebula-W0.40+035.root", "20210314_M2_05095172.001_Y_CrabNebula-W0.40+035.root", "20210314_M2_05095172.002_Y_CrabNebula-W0.40+035.root"

We ran four different kinds of tests:

for the first one the data was converted into ns in ctapipe_io_magic, and for the cleaning the thresholds for ns were applied. This comparison lead to the following results:

image charge errors:
0.57%, 0.62%, 0.14% and 0.16% for the four files
image time errors:
6.71%, 10.89%, 0.87% and 1.12%

All tests failed!

Then we did the same but timeslices were used throughout the entire process, so all thresholds were in timeslices and all files as well. All tests passed with error percentages that were of the order 1e-05.

Then we converted to ns in ctapipe_io_magic, in image_comparison.py converted back to timeslices, then did the cleaning with timeslice thresholds and finally after the cleaning converted back to ns for the comparison.

This gave us:
image charge errors:
0.16%, 0.16%, 0.029% and 0.0368%
image time errors:
0.022%, 0.059%, 0.007% and 0.0%

So for the third file the test passed, since it is below our threshold of 0.03% errors.

Just to be sure, we tried the 2nd test again, but converted to ns after the cleaning. This gave the same, good, results as before, just as expected.

So it is probably best to just do the cleaning and everything in timeslices and convert to ns afterwards (test2).

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants