-
Notifications
You must be signed in to change notification settings - Fork 0
/
BibVO.tex
374 lines (289 loc) · 14.4 KB
/
BibVO.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
\documentclass[11pt,a4paper]{ivoa}
\input tthdefs
\definecolor{termcolor}{rgb}{0.6,0.1,0.1}
\iftth
\def\vocterm#1{\textit{\color{termcolor}#1}}
\else
\def\vocterm{\startvocterm\realvocterm}
\def\realvocterm#1{\textit{\color{termcolor}#1}\endvocterm}
\begingroup
\gdef\breakablecolon{:\hskip0pt}
\catcode`\:=\active
\gdef\startvocterm{\begingroup
\catcode`\:=\active\let:=\breakablecolon}
\gdef\endvocterm{\endgroup}
\endgroup
\fi
\lstloadlanguages{sh,SQL}
\lstset{flexiblecolumns=true,basicstyle=\ttfamily,showstringspaces=False}
\title{Bibliographic Interfaces in the Virtual Observatory}
% see ivoatexDoc for what group names to use here; use \ivoagroup[IG] for
% interest groups.
\ivoagroup{Data Curation and Preservation}
\author[http://www.ivoa.net/cgi-bin/twiki/bin/view/IVOA/MarkusDemleitner]{Demleitner, M.}
\author{Accomazzi, A.}
\editor{Demleitner, M.}
% \previousversion[????URL????]{????Concise Document Label????}
\previousversion[https://ivoa.net/documents/Notes/BibVO/20240214]{Version 1.0}
\previousversion{This is the first public release}
\begin{document}
\begin{abstract}
This note documents existing approaches to interfacing the Virtual
Observatory to bibliographic services, in particular the Astrophysics
Data Service ADS, and suggests amendments to these practices
where necessary.
\end{abstract}
\section*{Conformance-related definitions}
The words ``MUST'', ``SHALL'', ``SHOULD'', ``MAY'', ``RECOMMENDED'', and
``OPTIONAL'' (in upper or lower case) used in this document are to be
interpreted as described in IETF standard RFC2119 \citep{std:RFC2119}.
The \emph{Virtual Observatory (VO)} is a
general term for a collection of federated resources that can be used
to conduct astronomical research, education, and outreach.
The \href{https://www.ivoa.net}{International
Virtual Observatory Alliance (IVOA)} is a global
collaboration of separately funded projects to develop standards and
infrastructure that enable VO applications.
Items written \vorent{like this} in the following refer to elements from
the VOResource data model \citet{2018ivoa.spec.0625P} and its
extensions.
Items written \vocterm{like this} in the following refer to terms in
IVOA vocabularies, which can be inspected below
\url{https://ivoa.net/rdf}.
\section{Introduction}
In addition to ``blind'' data discovery within the VO, users frequently
want to know about VO resources in their relationship to the wider
world, in particular that of scholarly publications. Use cases we
envision are, specifically,
\begin{itemize}
\item A user of a bibliographic service has discovered a publication of
interest; they should also be directed to a VO resource that supplements
that publication.
\item A user of a bibliographic service has discovered a publication
that has used VO datasets; they should be notified of the availability
of these datasets in standards-compliant data services.
\item A user of a VO resource wants to properly give credit; we
argue that resources published as a supplement to a scholarly
publication need a different treatment here from resources not directly
derived from one.
\end{itemize}
\subsection{Using IVOA Identifiers}
\label{sect:use-ivoid}
Several of the mechanisms described here use or yield IVOA Identifiers
\citep{2016ivoa.spec.0523D}, hereafter ivoids. Technically, these are
URIs with a scheme of \textit{ivo} resolvable in the VO Registry.
For general bibliographic services, these URIs will probably not be
immediately useful, as non-VO software generally cannot resolve them.
One could in principle use OAI-PMH endpoints
associated with the full registries listed
on the IVOA's Registry of
Registries\footnote{\url{https://rofr.ivoa.net}} to obtain VOResource
metadata and then perhaps translate that to DataCite metadata using XSLT
stylesheets as shown in \url{https://github.com/ivoa/vor-doi}.
However, is is much easier to resolve ivoids using one of
several custom services that provide simplified shortcuts.
For instance, prepending \verb|https://dc.g-vo.org/I| to an ivoid will
return XML-serialised VOResource metadata.
Prepending \verb|https://dc.g-vo.org/LP|
will yield an HTML-for\-matted landing page with basic metadata and access
options generated from the VOResource.
\subsection{Running RegTAP Queries Without VO Tooling}
\label{sect:regtap}
Below, we give several ADQL queries that are intended to be executed
against RegTAP services. While it is preferable to run these queries
through full-fledged TAP \citep{2019ivoa.spec.0927D} clients such as
pyVO \citep{2014ascl.soft02004G} or stilts \citep{2006ASPC..351..666T},
in particular non-VO actors like bibliographic services can avoid extra
dependencies by operating these services through plain HTTP tools.
Using the reg.g-vo.org RegTAP endpoint\footnote{A full list
of full searchable VO registries can be obtained from the Registry of
Registries at \url{https://rofr.ivoa.net}; the script should work with
any of these endpoints.} as an example, a shell script to run the query
from sect.~\ref{sect:res-art} could be:
\lstinputlisting[language=sh]{harvest.sh}
This results in easily consumable (if metadata-poor) CSV data.
\section{Linking VO Resources to Articles}
The VOResource metadata schema \citep{2018ivoa.spec.0625P} contains the
\vorent{content/source} field, defined as
\begin{quotation}
\noindent A bibliographic reference from which the present resource is
derived or extracted.
\noindent This is intended to point to an article in the published
literature. An ADS Bibcode is recommended as a value when available.
\end{quotation}
\noindent in VOResource 1.2. VOResource 1.3 will presumably add DOIs as a
recommended identifier type.
It is generally understood that, absent resource-specific regulations
communicated in \vorent{rights}, users of a VO services should cite the
referenced article when a resource has played a significant role in some
published research. Facilitating this citation is one major reason for
having machine-readable identifiers in that field. Care must be taken
to properly declare the \vorent{source/@format} to either \verb|bibcode|
or \verb|doi|. Otherwise, the query below will miss useful metadata.
Conversely, bibliographic services can use this information to enrich
bibliographic records with links to data sources (``D links'' in ADS
nomenclature). To retrieve pairs of ivoid and bibcode, the following
RegTAP \citep{2019ivoa.spec.1011D} query can be used:
\begin{lstlisting}[language=SQL]
select ivoid, source_value, source_format
from rr.resource
where source_format in ('doi', 'bibcode')
\end{lstlisting}
See sect.~\ref{sect:regtap} for hints on how and where to run this
query.
Note that default match limits of the TAP services may already
truncate this response, so TAP clients should take care to pass a
suitable large MAXREC.
See sect.~\ref{sect:use-ivoid} for how to deal with the ivoids this
returns in a non-VO context.
\section{Linking Datasets To Articles}
\label{sect:res-art}
The second scenario is that resources within a data centre are used or
originate in some publication and the data centre operators want to
notify bibliographic services of that fact. Again, this might yield the
equivalent of ADS ``D links''. An example for this would be an SSAP
\citep{2012ivoa.spec.0210T} service that has a non-constant
\textsl{meta.bib.bibcode} column.
Bibliographic services \emph{could} try to harvest such information from
the services themselves. However, even where there is a standard
location as in SSAP, the protocols are usually not well suited for
harvesting, and it would certainly be unattractive for bibliographic
services to have to harvest dozens or hundreds of VO services with
something like weekly cadence.
A second scenario when linking datasets to articles is observatory
bibliographies, i.e., metadata collection tracking the use of certain
data products in the literature. So far, no VO standard exists for
formatting and exchanging this kind of metadata.
\subsection{The biblink-harvest endpoint}
Therefore, we propose a new endpoint; a data centre could have one or a
few of these for its entire data holdings. This endpoint is a GET-able
resource returning a \textit{application/json}-encoded sequence of link
records. Each link record MUST have the following keys:
\begin{itemize}
\item \textit{bib-ref} -- a machine-readable identifier of a bibliographic
source; currently, either a bibcode or a DOI.
\item \textit{relationship} -- a relationship identifier; see below.
\item \textit{dataset-ref} -- a reference to the dataset or VO resource cited.
This SHOULD be a URI understood by common web browsers. See
sect.~\ref{sect:use-ivoid} for options providers have to reference VO
resources here. Avoid linking directly to the dataset itself. If at
all possible, link to a landing page-type rendering of dataset metadata
in biblinks.
\end{itemize}
Optionally, a key \textit{bib-format} can be given. If it is missing,
harvesters may assume bibcodes in \textit{bib-ref}. Otherwise, it should
contain one of the values defined for \vorent{source/@format} in
VOResource.
Optionally, a key \textit{anchor-text} can be given. This contains a
descriptive, short text that the bibliographic service should use as
the anchor text of the link element.
Optionally, a key \textit{cardinality} can be given. This is when the
data centre does not actually report individual links but rather a link
to a centre-side page keeping per-publication links. The cardinality
then gives the (approximate) number of links on the page pointed to.
All other keys are reserved except those beginning with an underscore
character. These can be used to prototype extensions or -- not
encouraged -- to communicate extra metadata between specific data
centres and specific bibliographic services.
Since we expect responses from such endpoints to be always of manageable
size –– a few megabytes --, we do not define a harvesting mode for this
type of endpoint for now.
Here is a possible response from a biblinks endpoint:
\begin{lstlisting}[basicstyle=\footnotesize]
[{'bib-ref': '2020A&A...637A...4R',
'dataset-ref': 'https://dc.g-vo.org/LP/org.gavo.dc/toss/q/line_tap',
'relationship': 'IsSupplementedBy'},
{'bib-ref': '2012A&A...546A..55R',
'dataset-ref': 'https://dc.g-vo.org/LP/org.gavo.dc/toss/q/line_tap',
'relationship': 'IsSupplementedBy'}]
\end{lstlisting}
\subsection{Relationship Types}
The \textit{relationship} field should be an identifier from the IVOA
relationship\_type\footnote{\url{http://ivoa.net/rdf/relationship_type}}
vocabulary. Typical terms data centres would use are
\begin{itemize}
\item \vocterm{Cites} is used when \textit{bib-ref} derives results from
\textit{dataset-ref}.
\item \vocterm{IsSupplementedBy} is used when data derived in
\textit{bib-ref} is published in \textit{dataset-ref}.
\end{itemize}
As a guideline, the link record can be interpreted as an RDF triple
(\textit{bib-ref}, \textit{relationship}, \textit{dataset-ref}).
\subsection{Discovering biblink-harvest endpoints}
Data centres register their biblink-harvest endpoints as VODataService
\citep{2021ivoa.spec.1102D} DataService-s. These must have a capability
with a \verb|standardID| of
$$\hbox{\nolinkurl{ivo://ivoa.net/std/bibvo#biblink-harvest-1.0}}.$$
To discover the endpoints of such services, bibliography services would
execute a RegTAP query like
\begin{lstlisting}[language=SQL,basicstyle=\footnotesize\ttfamily]
select ivoid, access_url
from rr.capability
natural join rr.interface
where
standard_id like 'ivo://ivoa.net/std/bibvo#biblink-harvest-1.%'
\end{lstlisting}
See sect.~\ref{sect:regtap} for hints on how and where to run this
query.
\section{Making VO Resources Citable}
While we believe that in the case of VO resources simply supplementing a
published article, an extra metadata record for the VO resource in
external metadata services is generally not very useful -- scientists
can \emph{discover} the data in the VO Registry, and for \emph{credit} it
is still more useful if they cite the supplemented journal publication
--, there are numerous VO resources that do not have an (adequate)
citable article; these include resources publishing data from multiple
independent journal publications, observatory archives, infrastructure
services of the data centres, and so on.
In these cases, it is desirable to export metadata records to
VO-external services. Since we do not want to establish ivoids as
permanent identifiers interpretable outside of the VO, VO publishers
need to obtain DOIs for resources they want to be eligible for
inclusion into external metadata directories\footnote{If they have no
other means of obtaining a DOI so, they can use GAVO's VOiDOI service at
\url{https://dc.g-vo.org/voidoi/q/ui/custom}}.
For records that have a DOI, VO publishers then create a date element
with a role of \vocterm{ExportRequested} in their record's \vorent{curation}
element, somewhat like this:
\begin{lstlisting}[language=XML]
<curation>
[...]
<date role="ExportRequested">2023-10-27</date>
</curation>
\end{lstlisting}
To make incremental harvests a bit less shaky, publishers should use a
date about two days in the future here.
External metadata directories can obtain the ivoids and DOIs of all
records marked in this way with a RegTAP query
(cf.~sect.~\ref{sect:regtap}) like
\begin{lstlisting}
select
ivoid, alt_identifier, date_value
from
rr.res_date
natural join rr.alt_identifier
where
value_role='exportrequested'
and alt_identifier like 'doi:%'
\end{lstlisting}
It would be conceivable to incrementally harvest the VO Registry for
such records by memorising the last date such a query was run and then
adding a constraint \verb|date_value<'<iso-of-last-date>'|. However,
because of harvesting delays, the date given in the Registry is not
necessarily the date the \vocterm{ExportRequested} date became visible
in any given searchable registry. Harvesters should account for a day
or two of delay and plan for regular full re-harvests.
See sect.~\ref{sect:use-ivoid} for information how to obtain full
metadata records for resources discovered in this way. Alternatively,
use the DOI to retrieve the associated Datacite metadata from
\url{https://api.datacite.org/}.
\appendix
\section{Changes from Previous Versions}
%\subsection{Changes from Version 1.0}
No previous versions yet.
% these would be subsections "Changes from v. WD-..."
% Use itemize environments.
% NOTE: IVOA recommendations must be cited from docrepo rather than ivoabib
% (REC entries there are for legacy documents only)
\bibliography{ivoatex/ivoabib,ivoatex/docrepo}
\end{document}