-
Notifications
You must be signed in to change notification settings - Fork 8
/
VOHE-Note.tex
962 lines (671 loc) · 70.3 KB
/
VOHE-Note.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
\documentclass[11pt,a4paper]{ivoa}
\input tthdefs
\input gitmeta
\title{Virtual Observatory and High Energy Astrophysics}
% see draft note here:
% see ivoatexDoc for what group names to use here; use \ivoagroup[IG] for
% interest groups.
\ivoagroup{DM}
\author{
Mathieu Servillat and the HE group
}
% 1st ASOV meeting
% Ada Nebot
% Bruno Khélifi
% Catherine Boisson
% François Bonnarel
% Laurent Michel
% Mathieu Servillat
% Mireille Louys
% Pierre Cristofari
% 2nd ASOV meeting (including above authors)
% Fabian Schussler
% Ian Evans
% Janet Evans
% Jutta Schnabel
% Karl Kosack
% Mark Cresitello-Dittmar
% Matthias Fuessling
% Régis Terrier
% Note contributors on github (including issues)
% Mathieu Servillat
% François Bonnarel
% Bruno Khélifi
% Laurent Michel
% Mark Cresitello-Dittmar
% Karl Kosack
% Matthias Fuessling
% Ian Evans
% Tess Jaffe
% IVOA HE group
\editor{Mathieu Servillat}
% \previousversion[????URL????]{????Concise Document Label????}
\previousversion{This is the first public release}
\usepackage{longtable}
%\usepackage{booktabs} % For prettier tables
\usepackage{lscape}
%\usepackage{minted}
\setlength {\marginparwidth }{2cm}
\usepackage{todonotes}
\begin{document}
\begin{abstract}
This note explores the connections between the Virtual Observatory (VO) and High Energy (HE) astrophysics. Observations of the Universe at high energies are based on techniques that are radically different compared to the optical, or radio domain. We describe the operations and purpose of several HE observatories, then detail the specificities of the HE data and its processing, and derive typical HE use cases relevant for the VO. A HE group has been federated over the years and this note reports on several topics that could constitute an initial roadmap to a HE interest group within the IVOA.
\end{abstract}
\section*{Acknowledgments}
We acknowledge support from the ESCAPE project funded by the EU Horizon 2020 research and innovation program (Grant Agreement n.824064).
Additional funding was provided by the INSU (Action Sp\'ecifique Observatoire Virtuel, ASOV), the Action F\'ed\'eratrice
CTA and the Action Pluriannuelle Incitatrice Astrophysioque des processus de Hautes \'Energies at the Observatoire de
Paris, and the Paris Astronomical Data Centre (PADC).
\section*{Conformance-related definitions}
The words ``MUST'', ``SHALL'', ``SHOULD'', ``MAY'', ``RECOMMENDED'', and
``OPTIONAL'' (in upper or lower case) used in this document are to be
interpreted as described in IETF standard RFC2119 \citep{std:RFC2119}.
The \emph{Virtual Observatory (VO)} is a
general term for a collection of federated resources that can be used
to conduct astronomical research, education, and outreach.
The \href{https://www.ivoa.net}{International
Virtual Observatory Alliance (IVOA)} is a global
collaboration of separately funded projects to develop standards and
infrastructure that enable VO applications.
\section{Introduction}
% We should introduce the purpose of the note in distribution and access of event list data products. Science cases should be focused to highlight that.
High Energy (HE) astronomy typically includes X-ray astronomy, gamma-ray astronomy,
% of the GeV range (HE), the TeV range (very high energy, VHE) up to the ultra high energy (UHE) above 100 TeV, the VHE
neutrino astronomy, and studies of cosmic rays. This domain is now sufficiently developed to provide high level data such as catalogs, images, including full-sky surveys for some missions, and sources properties in the shape of spectra and time series.
Some high level HE observations have been included in the VO, via data access endpoints provided by observatories or by agencies and indexed in the VO Registry.
%Some high-energy (HE) data is already available via the VO. Images, time series, and spectra may be described with Obscore and access.
However, after browsing this data, users may want to download lower level data and reapply data reduction steps relevant to their Science objectives. A common scenario is to download HE "event" lists, i.e. lists of detected events on a HE detector, that are expected to be detection of particles (e.g. a HE photon, or a neutrino), and the corresponding calibration files, including Instrument Response Functions (IRFs). The findability and accessibility of these data via the VO is the focus of this note.
We report typical use cases for data access and analysis of data from current HE observatories. From those use
cases, we note that some existing IVOA recommendations are of interest to the domain. They should be further explored and tested
by HE observatories. We then discuss how standards could evolve to better integrate specific aspects of HE data, and if
new standards should be developed.
\subsection{Objectives of the document}
The main objective of the document is to analyse how HE data can be better integrated to the VO.
We first identify and expose the specificities of HE data as provided by several HE observatories. Then we intend to illustrate how HE data is or can be handled using current IVOA standards. Finally, we explore several topics that could lead to HE specific recommandations.
A related objective is to provide a context and a list of topics to be further discussed within the IVOA by a dedicated HE Interest Group (HEIG).
\subsection{Scope of the document}
This document mainly focuses on HE data discovery through the VO, with the identification of common use cases in the HE astrophysics domain, which provides an insight of the specific metadata to be exposed through the VO for HE data.
Some current existing IVOA recommendations are discussed in this document within the HE context and they will be in-depth
studied in the HEIG.
% \subsection{Role within the VO Architecture}
% \begin{figure}
% \centering
% % As of ivoatex 1.2, the architecture diagram is generated by ivoatex in
% % SVG; copy ivoatex/archdiag-full.xml to role_diagram.xml and throw out
% % all lines not relevant to your standard.
% % Notes don't generally need this. If you don't copy role_diagram.xml,
% % you must remove role_diagram.pdf from SOURCES in the Makefile.
% \includegraphics[width=0.9\textwidth]{role_diagram.pdf}
% \caption{Architecture diagram for this document}
% \label{fig:archdiag}
% \end{figure}
% Fig.~\ref{fig:archdiag} shows the role this document plays within the
% IVOA architecture \citep{2010ivoa.rept.1123A}.
\section{High Energy observatories and experiments}
%XMM use case scenario
%Données attachées ? data link?
There are various observatories, either ground, space or deep-sea based, that distribute high-energy data with
different levels of involvement in the VO. We list here the observatories currently represented in the VO HE group.
There are also other observatories that are connected to the VO in some way, and may join the group discussions at IVOA.
\subsection{Gamma-ray programs}
\subsubsection{H.E.S.S}
\label{sec:hess}
The High Energy Stereoscopic System (H.E.S.S.) experiment is an array of Imaging Atmospheric Cherenkov Telescopes (IACT)
located in Namibia that investigates cosmic very high energies (VHE) gamma rays in the energy range from 10s of GeV to
100 of TeV. It is comprised of four telescopes officially inaugurated in 2004, and a much larger fifth telescope
operational since 2012, extending the energy coverage towards lower energies and further improving sensitivity.
The H.E.S.S. collaboration operates the telescopes as a private experiment and publishes mainly high level data,
i.e. images, time series and spectra in scientific publications after dedicated analyses. Using complex algorithms,
private software process the raw data by applying calibration, reconstructing event properties from their Cherenkov
images and purifying the event list by removing as much as possible events induced by atmospheric cosmic rays (CRs). Even
after this purification, events are largely generated by CRs and statistical analyses are required to derive
the astrophysical source properties. Models of background due to the remaining CRs
(generally generated from real observations) are used with the gamma-ray IRFs (PSF, Energy Dispersion, Collection Area)
that are generated by extensive Monte Carlo simulations. These 4 IRFs (background, PSF, Edisp, CollArea) are computed
for each observation of $\sim$~30min and are valid for the field of view. They depend on true energies, positions in the
field of view and sometimes from event classification types. The derivation of astrophysical quantities from
the event lists are now using open libraries, in particular the reference library Gammapy \citep{gammapy:2023}.
%% Need to describe the IRFs like for Chandra?
In September 2018, the H.E.S.S. collaboration has, for the first time and unique time, released a small subset of its
archival data using the GADF format (see~\ref{sec:GADF}) serialised into the Flexible Image Transport System (FITS) format,
an open file format widely used in astronomy. The release consists of Cherenkov event-lists and IRFs for observations of
various well-known gamma-ray sources \citep{hess-zenodo.1421098}.
This test data collection has been registered in the VO via a TAP service hosted at the Observatoire de Paris, with a
tentative ObsCore description of each dataset (see section \ref{sec:vorecs_obscore}). In the future, the H.E.S.S. legacy archive will possibly be published in a similar way and accessible through the VO.
\subsubsection{CTAO}
\label{sec:ctao}
The Cherenkov Telescope Array Observatory (CTAO) is the next generation ground-based IACT instrument for gamma-ray astronomy
at very high energies. With tens of telescopes located in the northern (La Palma, Canary Island)
and southern (Chili) hemispheres, CTAO will be the first open ground-based VHE gamma-ray observatory and the world’s
largest and most sensitive instrument to study high-energy phenomena in the Universe. Built on the technology of current
generation ground-based gamma-ray detectors (e.g. H.E.S.S., MAGIC and VERITAS), CTAO will be between five and 10 times
more sensitive and have unprecedented accuracy in its detection of VHE gamma rays.
CTAO will distribute data as an open observatory, for the first time in this domain, with calls for proposals and
publicly released data after a proprietary period. CTAO will ensure that the provided data will be FAIR: Findable,
Accessible, Interoperable and Reusable, by following the FAIR Principles for data management \citep{Wilkinson2016}.
In particular, because of the complex data processing and reconstruction steps, the provision of provenance metadata
for CTAO data has been a driver for the development of a provenance standard in astronomy.
CTAO will also ensure VO compatibility of the distributed data and access systems. CTAO participated to the ESCAPE
European Project, and is now part of the ESCAPE Open Collaboration to face common challenges for Research Infrastructures
in the context of cloud computing, including data analysis and distribution.
A focus of CTAO is to distribute in this context their Data Level 3 (DL3) datasets, that correspond to lists of Cherenkov
events detected by the telescopes along with the proper IRFs. CTAO is planning an internal and a public Science Data
Challenges, which represent opportunities to build "VO inside" solutions.
%% Need to describe the IRFs like for Chandra?
The CTAO observatory is complementary to other gamma-ray instruments observing the sky up to ultra high energies (ie PeV).
Detecting directly from ground secondary charged particles of extensive air showers initiated by gamma rays, Water
Cherenkov Detectors (WCD) survey the whole observable sky above the TeV/tens of TeV energy range. The HAWC and LHAASO
detectors are running in the northern hemisphere and the future SWGO observatory will be installed in the southern
hemisphere. Such instruments have similar high-level data structures and it has been already demonstrated that joined
analyses with Gammapy of data from IACTs and WCDs using the GADF format are very powerful \citep{2022A&A...667A..36A}.
\subsection{X-ray programs}
\subsubsection{Chandra}\label{sec:chandra}
Part of NASA's fleet of ``Great Observatories'', the Chandra X-ray Observatory (CXO) was launched in 1999 to observe
the soft X-ray universe in the 0.1 to 10 keV energy band. Chandra is a guest observer, pointed-observation mission and
obtains roughly 800 observations per year using the Advanced CCD Imaging Spectrometer (ACIS) and High Resolution Camera
(HRC) instruments. Chandra provides high angular resolution with a sub-arcsecond on-axis point spread function (PSF),
a field of view up to several hundred square arcminutes, and a low instrumental background. The Chandra PSF varies with
X-ray energy and significantly with off-axis angle, increasing to R50 $\sim$25 arcsec at the edge of the field of view.
A pair of transmission gratings can be inserted into the X-ray beam to provide dispersed spectra with E/DeltaE $\sim$1000
for bright sources. The Chandra spacecraft normally dithers in a Lissajous pattern on the sky while taking data, and
this motion must be removed from the time-resolved X-ray event lists when constructing X-ray images using the motion
of optical guide stars tracked by the Aspect camera.
% Are the analysis step description made below in necessary? for the homogenity between instruments
The Chandra X-ray Center (CXC) processes the spacecraft data through a set of Standard Data Processing Level 0 through
Level 2 pipelines. These pipelines perform numerous steps including decommutating the telemetry data,
applying instrument calibrations (e.g., detector geometric, time- dependent gain, and CCD charge transfer efficiency
(CTI) corrections, bad and hot pixel flagging), computing and applying the time-resolved Aspect solution to de-dither
the motion of the telescope, identifying good time intervals (GTIs), and finally filtering out bad times and X-ray events
with bad status. All data products are archived in the Chandra Data Archive (CDA) in FITS format following HEASARC
OGIP standards; see also \S~\ref{sec:ogip}. The CDA manages the proprietary data period (currently 6 months, after
which the data become public) and provides dedicated interactive and IVOA-compliant interfaces to locate and download
datasets.
The CXC also provides the Chandra Source Catalog, which in the latest release (2.1) includes data for $\sim$407K unique
X-ray sources on the sky and more than 2.1 million individual detections and photometric upper limits. For each X-ray
source and detection, the catalog provides a detailed set of more than 100 tabulated positional, spatial, photometric,
spectral, and temporal properties. An extensive selection of individual observation, stacked-observation, detection
region, and master source FITS data products (e.g., RMFs, ARFs, PSFs, spectra, light curves, aperture photometry MPDFs)
are also provided that are directly usable for further detailed scientific analysis.
% According to https://heasarc.gsfc.nasa.gov/docs/heasarc/caldb/docs/memos/cal_gen_92_002/cal_gen_92_002.html#tth_sEc2.1,
% RMF, ARF and PSF does not depend on spectral models
Finally, the CXC distributes the CIAO data analysis package to allow users to recalibrate and analyse their data. A key
aspect of CIAO is to provide users the ability to create instrument responses (RMFs, ARFs, PSFs, etc) for their
observations. The Sherpa modeling and fitting package supports N-dimensional model fitting and optimisation in Python,
and supports advanced Bayesian Markov chain Monte Carlo analyses.
\subsubsection{XMM-Newton}
The European Space Agency's (ESA) X-ray Multi-Mirror Mission (XMM-Newton) \footnote{https://www.cosmos.esa.int/web/xmm-newton}
was launched by an Ariane 504 on December 10th 1999. XMM-Newton is ESA's second cornerstone of the Horizon 2000 Science Programme.
It carries 3 high throughput X-ray telescopes with an unprecedented effective area, 2 reflexion grating spectrometers and an optical monitor.
The large collecting area and ability to make long uninterrupted exposures provide highly sensitive observations.
The XMM-Newton mission is helping scientists to solve a number of cosmic mysteries, ranging from the enigmatic black holes
to the origins of the Universe itself. Observing time on XMM-Newton is being made available to the scientific community,
applying for observational periods on a competitive basis.
One of the mission's ground segment modules, the SSC \footnote{http://xmmssc.irap.omp.eu/}, is in charge of maximising the scientific return of
this space observatory by exhaustively analyzing
the content of the instruments' fields of view. During the development phase (1996-1999), the SSC,
in collaboration with the SOC (ESAC), designed and produced the scientific analysis software (SAS).
Since then, it has contributed to its maintenance and development. This software is publicly available.
The general pipeline is operated as ESAC (Villafranca, Spain) since 2012, except for the part concerning cross-correlation
with astronomical archives which runs in Strasbourg.
The information thus produced is intended for the guest observer and, after a proprietary period of one year,
for the international community.
In parallel, the SSC regularly compiles an exhaustive catalog of all X-ray sources detected by EPIC cameras.
The SSC validates these catalogs, enriches them with multi-wavelength data and exploits them in several scientific programs.
The XMM catalog is published through various WEB applications: XSA \footnote{https://www.cosmos.esa.int/web/xmm-newton/xsa},
XCatDB \footnote{https://xcatdb.unistra.fr/4xmm}, IRAP \footnote{http://xmm-catalog.irap.omp.eu/} and
HEASARCH \footnote{http://heasarc.gsfc.nasa.gov/db-perl/W3Browse/w3browse.pl}.
It is also published in the VO, mainly as TAP services.
It is to be noted that the TAP service operated in Strasbourg (\url{https://xcatdb.unistra.fr/xtapdb} - to be deployed in 10/2024) returns responses where data is mapped on the MANGO model with MIVOT (see section \ref{sec:vorecs})
%\todo[inline]{To be validated by ADA.}
\subsubsection{SVOM}
SVOM (Space-based multi-band astronomical Variable Objects Monitor) \footnote{https://www.svom.eu/en/home/}
is a Sino-French mission dedicated
to the study of the transient high-energy sky, and in particular to the detection, localisation and
study of Gamma Ray Bursts (GRBs).
Gamma-ray bursts are sudden, intense flashes of X-ray and gamma-ray light.
They are associated with the cataclysmic formation of black holes, either by the merger of two compact stars
(neutron star or black hole) or by the sudden explosion of a massive star, twenty to one hundred times the mass of our Sun.
The birth of a black hole is accompanied by the ejection of jets of matter that reach speeds close to the speed of light.
These jets of matter then decelerate in the circumstellar medium, sweeping away everything in their path.
Gamma-ray bursts can be observed at the very edge of the universe, acting as lighthouses that illuminate
the dark ages of its creation. Although they have been studied extensively over the past fifteen years,
gamma-ray bursts are still poorly understood phenomena. To better understand them, China and France have decided
to join forces with the SVOM satellite, which is specifically dedicated to the study of gamma-ray bursts.
The special feature of the SVOM mission is that it combines ground-based and space-based observations,
providing a spectral bandwidth from the visible to the high-energy range. By guaranteeing multi-wavelength
observations of about one hundred bursts of all types per year, the SVOM mission will make a unique contribution
to two of the most fruitful areas of research in recent decades: the use of bursts in cosmology and the understanding
of the phenomenon. Looking further ahead, the SVOM mission will work in close synergy with a new generation of
instruments dedicated to the search for neutrinos and gravitational waves of cosmic origin, in order to confirm
the astrophysical origin of the signals detected by these future instruments.
SVOM has been successfully launched on June 22 2024 from Xichang lauchpad.
\subsection{KM3Net and neutrino detection}
The KM3NeT neutrino detectors are arrays of water-based Cherenkov detectors currently under construction in the deep
Mediterranean Sea. With its two sites off the French and Italian coasts, the KM3NeT collaboration aims at single particle
neutrino detection for neutrino physics with the more densely instrumented ORCA detector in the GeV to TeV range, and
VHE astrophysics with the ARCA detector in the TeV range and above.
Using Earth as a shield from atmospheric particle interference by searching for upgoing particle tracks in the detectors,
the measurement of astrophysical neutrinos can be performed almost continuously for a wide field of view that covers the
full visible sky. For these particle events, extensive Monte Carlo simulations are performed to evaluate the
statistical significance towards the various theoretical assumptions for galactic or cosmic neutrino signals and extensive filtering of the events dominated by the atmospheric particle background by about $1:10^{6}$ is required.
During the construction phase, the KM3NeT collaboration develops its interfaces for open science and builds on the data
gathered by its predecessor ANTARES, from which neutrino event lists have already been published on the KM3NeT VO server
as TAP service. However, for full reproducibility of searches for point-like astronomical sources as well as wider scientific use of dedicated neutrino selections,
information derived from simulations like background estimate, PSF and detector acceptance are required and should be linked
to the actual event list and interpolation for a given observation.
With multiple detectors targeting high-energy neutrinos like IceCube, ANTARES, KM3NeT, Baikal and future projects, the
chance to detect a significant amount of cosmic and galactic neutrinos increases, requiring an integrated approach to
link event lists with instrument responses and to correctly interpret observation time and flux expectations. As observations
generally encompass large continously taken data sets covering a large area of the sky for multiple years, with very low statistical
expectations for actual neutrino observation, especially correctly interpreting the observation time interval and re-weighting and limiting any probabilistic
measures to a dedicated study must be facilitated for proper use of neutrino data.
\section{Common practices in the High Energy community}
\label{sec:vhespec}
\subsection{Data specificities}
\subsubsection{Event-counting}
Observations of the Universe at high energies are based on techniques that are radically different compared to the optical, or radio domain. HE observatories are generally designed to detect particles, e.g. individual photons, cosmic-rays, or neutrinos, with the ability to estimate several characteristics of those particles. This technique is generally named \textbf{event counting}, where an event has some probability of being due to the interaction of an astronomical particle with the detectors.
The data corresponding to an \textbf{event} is first an instrumental signal, which is then calibrated and processed to estimate event characteristics such as a time of arrival, coordinates on the sky, and the energy proxy associated to the event. Several other intermediate and qualifying characteristics can be associated to a detected event.
When observing during an interval of time, the data collected is a list of the detected events, named an \textbf{event list} in the HE domain, and event-list in this document.
%HE projects already have data formats in use to transport the results of observations together with the necessary instrument response files.
%Such response files depend on the way raw event lists are combined together; they are essential for the calibration steps that will help to produce calibrated event-lists in position, time and energy.
\subsubsection{Data levels}\label{sec:datalevels}
After detection of events, data processing steps are applied to generate data products. We typically distinguish at least 3 main data levels.
\begin{itemize}
\item[1] An event-list with calibrated temporal and spatial characteristics, e.g. sky coordinates for a given epoch, event arrival time with time reference, and a proxy for particle energy.
\item[2] Binned and/or filtered event list suitable for preparation of science images, spectra or light-curves. For some instruments, corresponding instrument responses associated with the event-list, calculated but not yet applied (e.g, exposure maps, sensitivity maps, spectral responses).
\item[3] Calibrated maps, or spectral energy distributions for a source, or light-curves in physical units, or adjusted source models.
\end{itemize}
An additional higher data level may correspond to catalogs, e.g. a source catalog pointing to several data products (e.g. collection of high level products) with each one corresponding to a source, catalog of source models generated with an uniform analyse.
However, the definitions of these data levels can vary significantly from facility to facility. For example, in the VHE Cherenkov astronomy domain (e.g. CTA), the data levels listed above are labelled DL3\footnote{lower
level data (DL0--DL2), that are specific to the used instrumentation (IACT, WCD), are reconstructed and filtered, which
constitute the events lists called DL3.} to DL5. For Chandra X-ray data, the first two levels correspond to L1 and L2 data products (excluding the responses), while transmission-grating data products are designated L1.5 and source catalog and associated data products are all designated L3.
\subsubsection{Background signal}
Observations in HE may contain a high background component, that may be due to instrument noises, or to unresolved astrophysical sources, emission from extended regions or other terrestrial sources producing particles similar to the signal. The characterisation and estimation of this background may be particularly important to then apply corrections during the analysis of a source signal.
In the VHE domain with the IACT, WCD and neutrino techniques, the main source of background at the DL3 level is created by cosmic-ray induced events. The case of unresolved astrophysical sources, emission from extended regions are treated as models of gamma-ray or neutrino emission.
In the X-ray domain, contributions to background can include an instrumental component, the local radiation environment (i.e. space weather) which can change dynamically, and may include the cosmological background due to unresolved astrophysical sources, depending on the spatial resolution of the instrument.
\subsubsection{Time intervals}
Depending on the stability of the instruments and observing conditions, a HE observation can be decomposed into several intervals of time that will be further analysed.
For example, Stable Time Intervals (STI) are defined in Cherenkov astronomy to characterise periods of time during which the instrument response is stable. In the X-ray domain, Good Time Intervals (GTI) are computed to exclude time periods where data are missing or invalid, and may be used to reject periods impacted by high radiation, e.g. due to space weather. In contrast, for neutrino physics, relevant observation periods can cover up to several years due to the low statistics of the expected signal and a continuous observational coverage of the full field of view.
\subsubsection{Instrument Response Functions}
Though an event-list can contain calibrated physical values, the data typically still has to be corrected for the
photometric, spectral, spatial, and/or temporal responses of the instruments used to yield scientifically interpretable
information. The IRFs provide mappings between the physical properties of the source and the observables, and so enable
estimation of the former (such as the real flux of particles arriving at the instrument, the spectral distribution of
the particle flux, and the temporal variability and morphology of the source).
The instrumental responses typically vary with the true energy of the event, the arrival direction of the event into the
detector. A further complication of ground-based detectors like IACTs and WCTs is that the instrumental responses also vary with:
\begin{itemize}
\item The horizontal coordinates of the atmosphere, i.e. the response to a photon at low elevation is different from that at zenith due to a larger air column density, and different azimuths are affected by different magnetic field strengths and directions that modify the air-shower properties.
\item The atmosphere density, which can have an effect on the response that changes throughout a year, depending on the site of observation.
\item The brightness of the sky (for IACTs), i.e. the response is worse when the moon is up, or when there is a strong night-sky-background level from e.g. the Milky Way or Zodiacal light.
\end{itemize}
Since these are not aligned with a sky coordinate system, field-rotation during an observation must also be taken into account.
Therefore the treatment of the temporal variation of IRFs is important, and is often taken into account in analysis by averaging over some short time period, such as the duration of the observation, or intervals within.
\subsubsection{Granularity of data products}
The event-list dataset is generally stored as a table, with one row per candidate detection (event) and several columns
for the observed and/or estimated physical parameters (e.g. arrival time, position on detector or in the sky, energy or
pulse height, and additional properties such as errors or flags that are project-dependent).
The list of columns in the event-list is for example defined in the data format,
such as OGIP or GADF as introduced further below (\ref{sec:data_formats}). The data formats in use generally describe the event-list data together
with the IRFs (Effective Area, Energy Dispersion, Point Spread Function, Background) and other relevant information, such
as: Stable and/or Good Time Interval, dead time, ...
Such time intervals may be used to define the granularity of the data products, e.g. it may be practical to list all events that will be analysed with the same IRFs over a given stable time interval. In H.E.S.S., such event-list correspond to a run of 30min of data acquisition.
Where feasible, the efficient granularity for distributing HE data products seems to be the full combination of data (event-list) and associated IRFs, packed or linked together, with further calibration files, so that the package becomes self-described.
%It seems appropriate to distribute the metadata in the VO ecosystem together with a link to the data file in community format for finer analysis.
%In order to allow for multi-wavelength data discovery of HE data products and compare observations across different regimes,
\subsection{Statistical challenges}
In order to produce advanced astrophysics data products such as light curves or spectra, assumptions
about the noise, the source morphology and its expected energy distribution must be introduced. This is one of the main
drivers for enabling a full and well described access to event-list data, as HE scientific analyses generally start at this data level.
\subsubsection{Low count statistics}
Low count statistics are common for sources detected in HE astrophysics observations. For detectors with low intrinsic backgrounds, limiting source detection thresholds may be in the range 3--5 counts, {\em i.e.\/}, in the Poisson regime. Even for observations with more counts, many detectors have sufficient spatial and spectral channels (and observations are typically time-resolved) so that the number of counts per spatial pixel/spectral channel/temporal bin will often be very low, and so appropriate extreme Poisson statistical methods must be used to analyze the data ({\em e.g.\/}, using the C-statistic when analyzing low-count Poisson data that may include bins with no counts). This implies that measurements may require representations that are more robust than a mean value with Gaussian distributed errors.
\subsubsection{Event selection}
%When processing an event-list, it is important to perform an optimal selection of the events according to the science
%analysis use case, i.e. the source targeted or the science objectives. The selection can be performed on the event
%characteristics, e.g. time, energy or more specific indicators (patterns, shape, IRFs properties, ...).
When analyzing an event-list, optimal selection of the events according to the science analysis use case is essential. While appropriately selecting data from an observation ({\em e.g.\/}, selecting a region surrounding the target source) is a common practice, for HE observations spatial, spectral, and temporal selection is typically necessary because of the large ranges covered by these dimensional axes. For example, a {\em Chandra\/} X-ray Observatory dataset spans two orders of magnitude energy (spectral) range; this is compared to roughly a factor of 2 for an optical spectrum. Selections may be performed on the event characteristics such as time, energy, or more specific indicators ({\em e.g.\/}, patterns, shape, IRFs properties).
\subsubsection{Event binning}
Binning together events in any of the spatial/spectral/temporal axes is commonly used when analyzing HE astrophysics data to increase the number of counts per bin (at the expense of reduced resolution along the given axis). For example, binning spatially can increase the S/N of faint extended emission. For the spectral and temporal axes, binning to achieve a minimum number of counts per bin may be used to facilitate data modeling while still preserving the highest possible resolution in regions with more counts. After binning, this means that spectra and light curves with variable bin widths may be commonly encountered when dealing with HE datasets.
\subsubsection{The unfolding problem}
%Due to the small number of particles
%detected in many types of HE observations (i.e. within a Poisson regime) and the fact that the IRFs may not be directly invertible,
%techniques such as forward-folding fitting \citep{mattox:1996} are needed to estimate the physical properties of the
%source from the observables.
Because particles detected by HE astrophysics experiments are ionizing, they typically interact with the materials of the telescope and detector ({\em e.g.\/}, by exciting K-shell electrons) so the relationship between the observables and the source's physical properties of interest is typically complex. Recovering the physical properties from the observables is sometimes termed ``the unfolding problem.''
For example, for instruments that detect photons, the observed source spectrum can be related to the physical source spectrum very generally as follows:
\begin{equation}\label{eqn:phaspec}
M(E', \hat{p}', t) = \int_{E'} dE\, d\hat{p}\, R(E'; E, \hat{p}, t) A(E, \hat{p}', t) P(\hat{p}'; E, \hat{p}, t) S(E, \hat{p}, t)
\end{equation}
where $M(E', \hat{p}', t)$ is the expected observed channel distribution of detected source counts, $R(E'; E, \hat{p}, t)$ is the redistribution matrix that defines the probability that a photon with actual energy $E$, location $\hat{p}$, and arrival time $t$ will be observed with apparent energy $E'$ and location $\hat{p}'$, $A(E, \hat{p}', t)$ is the instrumental effective area (sensitivity), $P(\hat{p}'; E, \hat{p}, t)$ is the photon spatial dispersion transfer function ({\em i.e.\/}, the instrumental point spread function), and $S(E, \hat{p}, t)$ is the physical model that describes the physical energy spectrum, spatial morphology, and temporal variability of the source. Missions that follow the OGIP standards (see section~\ref{sec:ogip}) generally record the redistribution matrix using the redistribution matrix file (RMF) format and the instrumental effective area using the auxiliary response file (ARF) format. Other experiments combine the RMF and ARF into a single instrument response function (IRF).
Low count statistics implies that the mapping from $S$ to $M$ is typically not invertible ({\em i.e.\/}, one cannot simply derive $S$ given $M$)\null. Methods such as forward-folding fitting \citep{mattox:1996} ({\em i.e.\/}, proposing a model for $S$, folding the model through equation~({\ref{eqn:phaspec}) to derive $M$ and optimizing the model parameters to minimize the deviations between $M$ and the actual observed data) are needed to estimate the physical properties of the source from the observables. A further added complexity is that the integrated responses may themselves be functions of the unknown $S$.
\subsection{Data formats}
\label{sec:data_formats}
\subsubsection{{OGIP}}\label{sec:ogip}
NASA's HEASARC FITS Working Group was part of the Office of Guest Investigator Programs, or OGIP, and created in the 1990's the multi-mission standards for the format of FITS data files in NASA high-energy astrophysics. Those so-called OGIP recommendations\footnote{\url{https://heasarc.gsfc.nasa.gov/docs/heasarc/ofwg/ofwg_recomm.html}} include standards on keyword usage in metadata, on the storage of spatial, temporal, and spectral (energy) information, and representation of response functions, etc. These standards predate the IVOA but include such VO concepts as data models, vocabularies, provenance, as well as the corresponding FITS serialisation specification.
The purpose of these standards was to allow all mission data archived by the HEASARC to be stored in the same data format
and be readable by the same software tools. \S~\ref{sec:chandra} above, for example, describes the Chandra mission products,
but many other projects do so as well. Because of the OGIP standards, the same software tools can be used on all of the HE
mission data that follow them. There are now some thirty plus different mission datasets archived by NASA following
these standards and different software tools that can analyse any of them.
Now that the IVOA is defining data models for spectra and time series, we should be careful to include the existing OGIP
standards as special cases of what are developed to be more general standards for all of astronomy. Standards about
source morphology should also be introduced.
\subsubsection{GADF and VODF}
\label{sec:GADF}
The data formats for gamma-ray astronomy\footnote{\url{https://gamma-astro-data-formats.readthedocs.io/}} (GADF) is a community-driven initiative for the definition
of a common and open high-level data format for gamma-ray instruments \citep{2017AIPC.1792g0006D} starting at the
reconstructed event level. GADF is based partially on the OGIP standards and is specialised for Very High Energy data.
It was originally developed in 2011 for CTAO during it's prototyping phase, and was further tested on data from the
H.E.S.S. telescope array. This format is now used as a standard for VHE gamma-ray data. The project was made open-source
in 2016, and became the base format for the \emph{Gammapy} software.
The Very-high-energy Open Data Format\footnote{\url{https://vodf.readthedocs.io/}} (VODF), will build upon and be the successor to GADF. It is
intended to address some of the short-comings of the GADF format, to provide a properly documented and consistent data
model, to cover use cases of both VHE gamma-ray and neutrino astronomy, and to provide more support for validation and
versioning. VODF will provide a standard set of file formats for data starting at the reconstructed event level (DL3, i.e.
first item in the section \ref{sec:datalevels}) as well as higher-level products (i.e. sky images, light curves, and spectra)
and source catalogues (see section \ref{sec:datalevels}), as well as N-dimensional binned data cubes. With these
standards, common science tools can be used to analyse data from multiple high-energy instruments, including
facilitating the ability to do combined likelihood fits of models across a wide energy range directly from events or
binned products. VODF aims to follow or be compatible with existing IVOA standards as much as possible.
\subsection{Tools for data extraction and visualisation}
\label{sec:tools}
%HE data is particularly complex and diverse at lower levels. It is common to find specific tools to process the data for a given facility, e.g. CIAO for Chandra, SAS fro XMM-Newton, of Gammapy for gamma-ray data, with a particular focus on Cherenkov data as foreseen for CTA.
%
%Those tools can generally handle data from several other observatories, that have some level of commonalities.
%
%Several other HE software are build to handle the existing data format standards, hence enabling multi-instrument studies, e.g. XSpec, Sherpa, or Gammapy.
%
%
%\todo[inline]{To be completed (e.g. ???)}
% mireille : to be discussed
%??? naïve question : what would be the benefit to convert science ready event table data to VOTable?
%Would TOPcat, Aladin, etc. allow more preview steps , xmatch, multi-wavelength analysis ?
High energy data are typically multi-dimensional ({\em e.g.\/}, 2 spatial dimensions, time, energy, possibly polarisation) and may be complex and diverse at lower levels. Therefore one may commonly find specific tools to process the data for a given facility, {\em e.g.\/}, CIAO for Chandra, SAS for XMM-Newton, or Gammapy for gamma-ray data, with a particular focus on Cherenkov data as foreseen by CTA.
However, many tools in a high energy astrophysics data analysis package may perform common tasks in a mission-independent way and can work well with similar data from other facilities. For example, one commonly needs to be able to filter and project the multi-dimensional data to select specific data subsets with manageable sizes and eliminate extraneous data. Some tool sets include built-in generic filtering and binning capabilities so that a general purpose region filtering and binning syntax is available to the end user.
A high energy astrophysics data analysis package typically includes tools that apply or re-apply instrumental calibrations to the data, and as described above these may be observatory-specific. More general algorithms ({\em e.g.\/}, source detection) and utility tools ({\em e.g.\/}, extract an observed spectrum from a region surrounding a source) are applied to calibrated data to extract data subsets that can then be fed into modeling tools ({\em e.g.\/}, Xspec, Sherpa, or Gammapy) together with the appropriate instrumental responses (IRFs, or RMFs and ARFs) to derive physical quantities. Since instrumental responses are often designed to be compliant with widely adopted standards, the tools that apply these responses in many cases will interoperate with other datasets that use the same standards.
Most data analysis packages provide a visualisation capability for viewing images, interacting with astronomy databases, overlaying data, or interacting via SAMP to tie several application functions together {\em (e.g.\/}, TopCat, Aladin, ds9, ESASky, Firefly) to simultaneously support both analysis and visualisation of the data at hand. In addition, many packages offer a scripting interface ({\em e.g.\/}, Python, Jupyter notebooks) that enable customised job creation to perform turn-key analysis or process bulk data in batch mode.
To allow users of data to use pre-existing tools, often packages will support file I/O using several formats, for example, including FITS images and binary tables (for event files), VO formats, and several ASCII representations ({\em e.g.\/}, space, comma, or tab-separated columns).
We do note that currently high energy astrophysics data and analysis systems are not created equally and there are a number of nuances with some of the data formats and analysis threads for specific instrument and projects.
\section{Use Cases}
Given the specificities of the HE observatories (see section \ref{}) and the HE data (see section \ref{}), we list in this section some use cases that are typical to the search and handling of HE data.
\subsection{UC1: re-analyse event-list data for a source in a catalog}
After the selection of a source of interest, or a group of sources, one may access different high level HE data products such as
images, spectra and light-curves. To further study the HE data, users genrally download the corresponding event-lists and calibration files to performe a new analyse of the data, with their specific science case in mind.
Users will thus access those event-list and retrieve or regenerate the related calibration files. They will also install and run dedicated tools to reprocess this low-level data.
%\todo[inline]{To be completed (e.g. Paula, Laurent)}
One of the characteristics of the HE data is that, contrary to what is usually done in optics for example, their optimal
use requires providing users with a view of the processing that generated the data. This implies providing ancillary data,
products with different calibration levels, and possibly linking together products issued by the same processing.
%(LM)
\subsection{UC2: observation preparation}
When planning for new HE/VHE observations, one needs to search for any existing event-list data already available in the
targeted sky regions, and assess if this data is enough to fulfill the science goals.
For this use case, one needs first to obtain the stacked exposure maps of past observations. This quantity is
energy-dependent for VHE data can be derived from pointing position and effective areas that are position- and energy-
dependent associated to each observation.
\subsection{UC3: transient or variable sources}
\todo[inline]{To be completed (e.g. Ada)}
\subsection{UC4: Multi-wavelength and multi-messenger science}
Though there are scientific results based on HE data only, the multi-wavelength and multi-messenger approach is
particularly developed in the HE domain. An astrophysical source of HE radiations is indeed generally radiating
energy in several domains across the electromagnetic spectrum and may be a source of other particles, in particular
neutrino. It is not rare to observe a HE source in radio and to look for counterparts in the infrared, optical or UV
domain and either in X-rays or VHE/UHE band. Spectroscopy and spatially-resolved spectroscopy are also widely used to
identify HE sources.
The HE domain is thus confronted to different kinds of data types and data archives, which leads to interesting use
cases for the development of the VO.
One use case is associated to independent analyses of the multi-wavelength and multi-messenger data. HE data are
analysed to produce DL5/L3 quantities from DL3/L1 stored in the VO. And the multi-wavelength and multi-messenger
DL5/L3 data stored are retrieved into the VO and associated to realise astrophysical interpretations.
The other growing use case is associated to joint statistical analyses of multi-instrument data at different levels
(DL3/L1 and DL5/L3) by adapted open science analysis tools.
For both use cases, any type of data should be findable on the VO and retrievable. And the data should have a
standardised open format (OGIP, GADF, VODF).
Such use case is already in use with small data sets shared by VHE experiments. In
\citep{2019A&A...625A..10N, 2022A&A...667A..36A}, groups of astronomers working on the Gammapy library had successfully
analysed DL3 data taken on the Crab nebula by different facilities (MAGIC, H.E.S.S., FACT, VERITAS, Fermi-LAT and HAWC).
A real statistical joint analysis has been performed to derive an emitting model of the Crab pulsar wind nebula over more
than five decades in energy. Such analysis types can be now retrieved in the literature. One can also find joint analyses using X-ray and VHE data \citep{giunti2022}. A proof of concept of joint analysis of VHE gamma-ray and VHE neutrino,
using simulated data, has been also published \citep{unbehaun2024}.
\subsection{UC5: Extended source searches}
Beyond the multimessenger approach towards a specific source type, an extension of this approach can be seen in the analysis
of long-term and wide-angle observations for extended sky regions in the multimessenger domain. For these analyses, extensive filtering
and statistical analyses of the datasets is required. This approach is especially dominant in low-countrate experiments like neutrinos,
where former analyses included the mapping of neutrino emissions in the galactic plane to gamma-ray emissions \citep{doi:10.1126/science.adc9818}
or search for neutrino emission from the fermi bubbles with ANTARES data \citep{ANTARES2014}.
%
%\subsection{Examples of multi-wavelength analysis}
%
%\subsubsection{Multiple Imaging Atmospheric Cherenkov Telescopes extraction example}
%
%In order to exploit high energy data across a large interval of energy values, and from various IACTs, there is a need
%to harmonise metadata description. Datasets can then be mixed together to create a fused event-list dataset, to expand
%the analysis along the spectral energy axis and study the spectral behaviour of an astronomical object.
%
%This was proposed in \citep{2019A&A...625A..10N} by a group of HE astronomers of various HE facilities.
%%This work used event-list data products as an input from different facilities (MAGIC, H.E.S.S., FACT, VERITAS, etc...). data for the Crab Nebula computed from the Maximum likelihood functions of each event depending on the IRFs properties.
%In this work, the authors implemented a prototypical data format (GADF) for a small set of MAGIC, VERITAS, FACT, and
%H.E.S.S. Crab nebula observations, and they analysed them with the open-source Gammapy software package. By combining
%data from Fermi-LAT, and from four of the currently operating imaging atmospheric Cherenkov telescopes, they produced a
%joint maximum likelihood fit of the Crab nebula spectrum.
%
%Such a work has been more recently extended with the HAWC data \citep{2022A&A...667A..36A}, and included neutrino data
%in a common CTA and KM3NeT source search \citep{unbehaun2024}.
\section{IVOA standards of interest for HE}
\subsection{IVOA Recommendations}
\label{sec:vorecs}
\subsubsection{ObsCore and TAP}
\label{sec:vorecs_obscore}
Event-list datasets can be described in ObsCore using a dataproduct\_type set to "event", and distributed via a TAP service. However, this is not widely used in current services, and we observe only a few services with event-list datasets declared in the VO Registry, and mainly the H.E.S.S. public data release (see \ref{sec:hess}).
As services based on the Table Access Protocol \citep{2019ivoa.spec.0927D} and ObsCore are well developed within the VO, it would be a straightforward option to discover HE event-list datasets, as well as multi-wavelength and multi-messenger associated data.
Extension of ObsCore are proposed for some astronomy domains (radio, time), which is also relevant for the astronomy domain. The ObsCore description of HE datasets is further discussed in section \ref{sec:obscore_he}.
%Here is the evaluation of the ObsCore metadata for distributing high energy data set, some features being re-usable as such, and some other features requested for addition or re-interpretation.
\subsubsection{DataLink}
%\todo[inline]{To be completed (e.g. François)} proposed below by FB (2024-01-31)
The DataLink specification \citep{2023ivoa.spec.1215B} defines a \{links\} endpoint providing the possibility to link several
access items to each row of the main response table. These links are described and stored in a second
table. In the case of an ObsCore response each dataset can be linked this way (via the access\_url
FIELD content) to previews, documentation pages, calibration data as well as to the dataset itself.
Some dynamical links to web services may also be provided. In that case the service input parameters are
described with the help of a "service descriptor" feature as described in the same DataLink specification.
\subsubsection{HiPS}
Several HE observatories are well suited for sky survey, and the Hierarchical Progressive Survey (HiPS) standard is well suited for sky survey exploration. We note that the Fermi facility provides a useful sky survey in the GeV domain.
\subsubsection{MOCs}
Cross-correlation of data with other observations is an important use case in the HE domain. Using the Multi-Order Coverage map (MOC) standard, such operations become more efficient. Distribution of MOCs associated to HE data should thus be encouraged and especially ST-MOCs (space + time coverage)
that make easier the study of transient phenomena.
% (LM)
\subsubsection{MIVOT}
Model Instances in VOTables (MIVOT \cite{2023ivoa.spec.0620M}) defines a syntax to map VOTable data to any model serialised in VO-DML.
The annotation operates as a bridge between the data and the model.
It associates the column/param metadata from the VOTable to the data model elements (class, attributes, types, etc.) [...].
The data model elements are grouped in an independent annotation block complying with the MIVOT XML syntax.
This annotation block is added as an extra resource element at the top of the VOTable result resource.
The MIVOT syntax allows to describe a data structure as a hierarchy of classes.
It is also able to represent relations and composition between them. It can also build up data model objects by aggregating instances from different tables of the VOTable.
In the case of HE data, this annotation pattern, used together with the MANGO model, will help to make machine-readable quantities that are currently not considered in the VO,
such as the hardness ratio, the energy bands, the flags associated with measurements or extended sources.
\subsubsection{Provenance}
Provenance information of VHE data product is crucial information to provide, especially given the complexity of the data preparation and analysis workflow in the VHE domain. Such complexity comes from the specificities of the VHE data as exposed in sections \ref{sec:vhespec}.
The develoment of the IVOA Provenance Data Model \citep{2020ivoa.spec.0411S} has been conducted with those use cases in mind. The Provenance Data Model proposes to structure this information as activities and entities (as in the W3C PROV recommendation), and adds the concepts of descriptions and configuration of each step, so that the complexity of provenance of VHE data can be exposed.
\subsubsection{VOEvent}
Source variability and observations of transient are common in the HE domain, and as such, handling of alerts is generally including in the requirements of HE observatories. Alerts are both sent and received by HE observatories. The IVOA recommendation VOEvent \citep{2017ivoa.spec.0320S} is thus of interest to the HE domain.
\subsubsection{Measurements}
The Measurements model \citep{2022ivoa.spec.1004R} describes measured or determined astronomical data and their associated errors.
This model is highly compatible with the primary measured properties of High Energy data (Time, Spatial Coordinates, Energy).
However, since HE data is typically very sparse, derived properties are often expressed as probability distributions, which are not
well represented by the IVOA models. This is one area where input from the HE community can help to improve the IVOA models to better
represent HE data.
\subsubsection{Photometry}
Flux density measurements are commonly performed in the HE domain, e.g. from images with various photometry techniques. The Photometry Data Model (PhotDM, \citealt{2022ivoa.spec.1101S}) could be of interest to obtains such measurements in HE as well as at other wavelength, in order to compute Spectral Energy Distribution for a given source. PhotDM is particularly developed with an attention to optical photometry, but may be adapted to HE needs.
\subsubsection{Object visibility and scheduled observations}
HE observatories have similar needs on the topic of observation preparation and scheduling. As suchs, standards like ObsLocTAP \citep{2021ivoa.spec.0724S} and ObjVisSAP\footnote{\url{https://www.ivoa.net/documents/ObjVisSAP/}} are relevant and may be of interest in the HE domain.
\subsection{Data Models in working drafts}
The HE domain and practices could serve as use cases for the development of data models, such as Dataset DM, Cube DM or MANGO DM.
\subsubsection{Dataset}
The Dataset Metadata model\footnote{https://www.ivoa.net/documents/DatasetDM} provides a specification of high-level metadata to describe astronomical datasets and data products.
One feature of this model is that it describes a Dataset as consisting of one or more associated data products. This feature is not
well fleshed out in the model. The HE use cases provide examples where it may be necessary to associate multiple data products
(e.g. an Event list and its associated IRFs) as a single entity to form a useful dataset.
\subsubsection{Cube}
The Cube model\footnote{https://www.ivoa.net/documents/CubeDM} describes multi-dimensional sparse data cubes and images. This submodel is specifically designed to
represent Event list data and provides the framework for specialising to represent data products such as Spectra and Time Series
as slices of a multi-dimensional cube. The image modeling provides the structure necessary to represent important HE image products.
\subsubsection{MANGO}
MANGO is a model (draft: \footnote{https://github.com/ivoa-std/MANGO}) that has been developed to reveal
and describe complex quantities that are usually distributed in query response tables.
The use cases on which MANGO is built were collected in 2019 from different scientific fields, including HE.
The model focuses on the case of the epoch propagation, the state description and photometry.
Some features of MANGO are useful for the HE domain:
% \begin{itemize}[noitemsep,topsep=0pt,parsep=0pt,partopsep=0pt] << these require the enumitem package?
\begin{itemize}
\item Hardness ratio support
\item Energy band description
\item Machine-readable description of state values
\item Ability to group quantities (e.g., position with detection likelihood)
\item MANGO instance association (e.g., source with detections)
\end{itemize}
\section{Topics for discussions in an Interest Group}
\subsection{Definition of a HE event in the VO}
\label{sec:event-bundlle-or-list}
\subsubsection{Current definition in the VO}
The IVOA standards include the concept of event-list, for example in ObsCore v1.1 \citep{2017ivoa.spec.0509L}, where
event is a dataproduct\_type with the following definition:
\begin{quote}
\textbf{event}: an event-counting (e.g. X-ray or other high energy) dataset of some sort. Typically this is
instrumental data, i.e., "event data". An event dataset is often a complex object containing multiple files or
other substructures. An event dataset may contain data with spatial, spectral, and time information for each
measured event, although the spectral resolution (energy) is sometimes limited. Event data may be used to produce
higher level data products such as images or spectra.
\end{quote}
More recently, a new definition was proposed in the product-type vocabulary\footnote{\url{https://www.ivoa.net/rdf/product-type}} (draft):
\begin{quote}
\textbf{event-list}: a collection of observed events, such as incoming high-energy particles. A row in an event
list is typically characterised by a spatial position, a time and an energy.
\end{quote}
Such a definition remains vague and general, and could be more specific, including a definition for a HE event, and the
event-list data type.
\subsubsection{Proposed definition to be discussed}
A first point to be discuss would be to converge on a proper definition of HE specific data products:
\begin{itemize}
\item Propose definitions for a product-type \textbf{event-list}: A collection of observed events, such as incoming
high-energy particles, where an event is generally characterised by a spatial position, a time and a spectral value
(e.g. an energy, a channel, a pulse height).
\item Propose definitions for a product-type \textbf{event-bundle}: An event-bundle dataset is a complex object
containing an event-list and multiple files or other substructures that are products necessary to analyse the
event-list. Data in an event-bundle may thus be used to produce higher level data products such as images or spectra.
\end{itemize}
An ObsCore erratum could then propose to change event for event-list and event-bundle.
The precise content of an event-bundle remains to be better defined, and may vary significantly from a facility to another.
For example, Chandra primary products distributed via the Chandra Data Archive include around half a dozen different
types of products necessary to analyse Chandra data (for example, L2 event-list, Aspect solution,
bad pixel map, spacecraft ephemeris, V\&V Report).
% {\bf the following is not clear for BKH: It is also possible to retrieve secondary products, containing more products that are needed to recalibrate the data with updated calibrations}.
For VHE gamma rays and neutrinos, the DL3 event lists should mandatory be associated to their associated IRFs files. The
links between the event-list and these IRFs should be well defined in the event-bundle.
\subsection{ObsCore description of an event-list}
\label{sec:obscore_he}
%%%% texte by Mireille to be checked and merged : start %%
%\include{ObscoreReviewforVOHEcontext_Mireille Louys}
%I have some items to add in the various categories well defined by Mathieu
%%%%%%%%%%texte by Mireille to be merged : end %%
%\subsubsection{Mandatory fields}
ObsCore \citep{2017ivoa.spec.0509L} can provide a metadata profile for a data product of type event-list (event) and a qualified access to the distributed file using the Access class from ObsCore (URL, format, file size).
\subsubsection{Usage of the mandatory terms in ObsCore}
In the ObsCore representation, the event-list data product is described in terms of curation, coverage and access. However, several properties are simply set to NULL following the recommendation: Resolutions, Polarisation States, Observable Axis Description, Axes lengths (set to -1).
We also note that some properties are energy dependent, such as the Spatial Coverage, Spatial Extent, PSF.
%\todo[inline]{TODO: show a table with all reused terms , and provide an example}
Mandatory terms in ObsCore may be for example:
\begin{itemize}
\item dataproduct\_subtype = DL3, maybe specific data format (VODF)
\item calib\_level = between 1 and 2
\item obs\_collection could contain many details : obs\_type (calib, science), obs\_mode (subarray
configuration), pointing\_mode, tracking\_type, event\_type, event\_cuts, analysis\_type…
\item s\_ra, s\_dec = maybe telescope pointing coordinates
\item target\_name : several targets may be in the field of view
\item s\_fov, s\_region, s\_resolution, em\_resolution... all those values are energy dependent, one should specifiy that the value is at a given energy, or within a range of values.
\item em\_min, em\_max : add fields expressed in energy (e.g. eV, keV or TeV)
\item t\_exptime : ontime, livetime, stable time intervals... maybe a T-MOC would help
\item facility\_name, instrument\_name : minimalist, would be e.g. CTAO and a subarray.
\end{itemize}
\subsubsection{Metadata re-interpretation for the HE context}
\paragraph{observation\_id}
In the current definition of ObsCore, the data product collects data from one or several observations. The same happens in HE context.
\paragraph{access\_ref, access\_format}
The initial role of this metadata was to hold the access\_url allowing data access.
Depending on the packaging of the event bundle in one compact format (OGIP, GADF, tar ball, ...)
or as different files available independently in various urls, a datalink pointer can be used for accessing the various parts of IRFs, background maps, etc.
Then in such a case the value for access\_format should be "application/x-votable+xml;content=datalink". The format itself of the data file is then given by the datalink parameter "content-type".
See next section \ref{sec:datalink}.
\paragraph{o\_ucd}
For the even-list table, we can consider all measures stored in columns values have been observed .
The nature of items along time, position and energy axis are identifed in Obscore with ucd as 'time', 'pos.eq.*', 'em.*'
and counted as t\_xel, s\_xel1, s\_xel2, em\_xel which correspond to the number of rows/events candidates observed.
The signal observed is the result of event counting and would be PHA (Pulse height amplitude at detector level) or a number of counts for photons or particles, or a flux, etc.., depending on the data calibration level considered.
ObsCore uses o\_ucd to characterise the nature of the measure.
Various UCDs are used for that: o\_ucd=phys.count, phot.count, phot.flux, etc. there is currently no UCD defined for a raw measure like PulseHeightAmplitude, but if needed this can be requested for addition in the UCDList vocabulary. See VEP-UCD-15\_pulseheight.txt proposed at \url{'https://voparis-gitlab.obspm.fr/vespa/ivoa-standards/semantics/vep-ucd/-/blob/master/'}.
Note that these parameters vary between the dataset of calib\_level of 1 (Raw) to the a more advanced data products (calib\_level 2 or 3), which are filtered and rebinned from the original raw event-list.
\subsubsection{Proposed additions}
\paragraph{ev\_number}
The event list contains a number of rows, representing detections candidates, that have no metadata keyword yet in Obscore.
We propose 'ev\_number' to record this.
In fact the t\_xel, s\_xel1 and s\_xel2, em\_xel elements do not apply for an event list in raw count as it has not been binned yet.
\paragraph{Adding MIME-type to access\_format table}
As seen in section \ref{sec:data_formats} current HE experiments and observatories use their community defined data format for data dissemination.
They encapsulate the event-list table together with ancillary data dedicated to calibration and observing configurations and parameters.
Even if the encapsulation is not standardised between the various projects, it is useful for a client application to rely on the access\_format property in order to send it to an appropriate visualising tool.
Therefore these can be included in the MIME-type table of ObsCore section 4.7. suggestion for new terms like :
\begin{itemize}
\item application/x-fits-ogip ...
\item application/x-gadf ...
\item application/x-vodf ...
\end{itemize}
\todo[inline]{to be completed with proper definition}
\paragraph{energy\_min, energy\_max}
It is not user-friendly for the user to select dataset according to an energy range when the spectral axis is expressed in wavelength and meters. The units and quantities are not familiar to this community.
Moreover the numerical representation of the spectral range in em\_min leads to quantities with many figures and a power as -18 not easily comparable with the current usage.
\todo[inline]{cf. example HESS data shown in Aladin}
\paragraph{t\_gti}
The searching criteria in terms of time coverage require the list of stable/good time intervals to pick appropriate datasets.
t\_min, t\_max is the global time span but t\_gti could contain the list of GTI as a T\_MOC description following the Multi-Order-Coverage (MOC) IVOA standard \citep{2022ivoa.spec.0727F}.
This element could then be compared across data collections to make the data set selection via simple intersection or union operations in T\_MOC representation.
On the data provider's side, the T-MOC element can be computed from the Stable/Good Time Interval table in OGIP or GADF to produce the ObsCore t\_gti field.
\subsubsection{Access and Description of IRFs}
Each IRF file can have an Access object from ObsCore DM to describe a link to the IRF part of the data file.
This can be reflected in an extension of ObsTAP TAP\_SCHEMA.
In the TAP service we could add an IRF Table, with the following columns:
\begin{itemize}
\item event-list datapublisher\_id
\item irf\_type, category of response: EffectiveArea, PSF, etc.
\item irf\_description, one line explanation for the role of the file
\item Access.url, URL to point to the IRF
\item Access.format, format of IRF
\item Access.size, size of IRF file
\end{itemize}
\subsection{Event-list Context Data Model}
\label{sec:EventListContext}
\begin{figure}
\centering
\includegraphics[width=0.9\textwidth]{figures/EventListContext}
\caption{event-list Context Data Model. Notes: STIs and GTIs are slightly different concepts, and multiplicities should be adapted, energy is to specific for an event (intensity?), more products may be attached to a STI/GTI or to IRF.}
\label{fig:EventListContext}
\end{figure}
The event-list concept may include, or may be surrounded by other connected concepts. Indeed, an event-list dataset alone cannot be scientifically analysed without the knowledge of some contextual data and metadata, starting with the good/stable time intervals, and the corresponding IRFs.
The aim of an Event-list Context Data Model is to name and identify the relations between the event-list and its contextual information. A first attempt is presented in Figure~\ref{fig:EventListContext}.
Such a model could help to define specific HE data attributes, that could be relevant for an ObsCore description of HE dataset, and thus incuded in a proposed extension.
\subsection{Use of Datalink for HE products}
\label{sec:datalink}
There are two options to provide an access to a full event-bundle package.
In the first option, the "event-bundle" dataset (\ref{sec:event-bundlle-or-list}) exposed in the discovery service contains all the relevant information, e.g. several frames in the FITS file, one corresponding to the event-list itself, and the others providing good/stable time intervals, or any IRF file. This is what was done in the current GADF data format (see \ref{sec:GADF}). In this option, the content of the event-list package should be properly defined in its description: what information is included and where is it in the dataset structure? The Event-list Context Data Model (see \ref{sec:EventListContext}) would be useful to provide that information.
In the second option, we would provide links to the relevant information from the base "event-list" (\ref{sec:event-bundlle-or-list}) exposed in the discovery service. This could be done using Datalink and a list of links to each contextual information such as the IRFs. The Event-list Context Data Model (see \ref{sec:EventListContext}) would provide the concepts and vocabulary to characterise the IRFs and other information relevant to the analysis of an event-list. These specific concepts and terms describing the various flavors of IRFs and GTI will be given in the semantics and content\_qualifier FIELDS of the DataLink response to qualify the links. The different links can point to different
dereferencable URLs or alternbatively to different fragments of the same drefereencable URL as stated by the DataLink specification.
%\todo[inline]{To be completed: show an example ?}
\bibliography{VOHE-Note, ivoatex/docrepo, ivoatex/ivoabib}
%\bibliographystyle{}
\appendix
\section{Changes from Previous Versions}
No previous versions yet.
% these would be subsections "Changes from v. WD-..."
% Use itemize environments.
% NOTE: IVOA recommendations must be cited from docrepo rather than ivoabib
% (REC entries there are for legacy documents only)
\end{document}