-
Notifications
You must be signed in to change notification settings - Fork 1
/
eprint2rdm.1.html
129 lines (127 loc) · 5.04 KB
/
eprint2rdm.1.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
<!DOCTYPE html>
<html>
<head>
<title>Institutional Repository Data Management</title>
<link href='https://fonts.googleapis.com/css?family=Open+Sans' rel='stylesheet' type='text/css'>
<link rel="stylesheet" href="https://caltechlibrary.github.io/css/site.css">
</head>
<body>
<header>
<a href="http://library.caltech.edu" title="link to Caltech Library Homepage"><img src="https://caltechlibrary.github.io/assets/liblogo.gif" alt="Caltech Library logo"></a>
</header>
<nav>
<ul>
<li><a href="/">Home</a></li>
<li><a href="index.html">README</a></li>
<li><a href="LICENSE">LICENSE</a></li>
<li><a href="INSTALL.html">INSTALL</a></li>
<li><a href="user_manual.html">User Manual</a></li>
<li><a href="search.html">Search Docs</a></li>
<li><a href="about.html">About</a></li>
<li><a href="https://github.com/caltechlibrary/irdmtools">GitHub</a></li>
</ul>
</nav>
<section>
<h1 id="name">NAME</h1>
<p>eprint2rdm</p>
<h1 id="synopsis">SYNOPSIS</h1>
<p>eprint2rdm <a href="#options">OPTIONS</a> [EPRINT_HOST] EPRINT_ID</p>
<h1 id="description">DESCRIPTION</h1>
<p>eprint2rdm is a Caltech Library oriented command line application
that takes an EPrint hostname and EPrint ID and returns a JSON document
suitable to import into Invenio RDM. It relies on access to EPrint’s
REST API. It uses EPRINT_USER, EPRINT_PASSWORD and EPRINT_HOST
environment variables to access the API. Using the “-all-ids” options
you can get a list of keys available from the EPrints REST API.</p>
<p>eprint2rdm can harvest a set of eprint ids into a dataset collection
using the “-id-list” and “-harvest” options. You map also provide
customized resource type and person role mapping for the content you
harvest. This will allow you to be substantially closer to the final
record form needed to crosswalk EPrints data into Invenio RDM.</p>
<h1 id="options">OPTIONS</h1>
<dl>
<dt>-help</dt>
<dd>
display help
</dd>
<dt>-license</dt>
<dd>
display license
</dd>
<dt>-version</dt>
<dd>
display version
</dd>
<dt>-all-ids</dt>
<dd>
return a list of EPrint ids, one per line.
</dd>
<dt>-harvest DATASET_NAME</dt>
<dd>
Harvest content to a dataset collection rather than standard out
</dd>
<dt>-id-list ID_FILE_LIST</dt>
<dd>
(used with harvest) Retrieve records based on the ids in a file, one
line per id.
</dd>
<dt>-resource-map FILENAME</dt>
<dd>
use this comma delimited resource map from EPrints to RDM resource
types. The resource map file is a comma delimited file without a header
row. The First column is the EPrint resource type string, the second is
the RDM resource type string.
</dd>
<dt>-contributor-map FILENAME</dt>
<dd>
use this comma delimited contributor type map from EPrints to RDM
contributor types. The contributor map file is a comma delimited file
without a header row. The first column is the value stored in the
EPrints table “eprint_contributor_type” and the second value is the
string used in the RDM instance.
</dd>
</dl>
<h1 id="example">EXAMPLE</h1>
<p>Example generating a JSON document for from the EPrints repository
hosted as “eprints.example.edu” for EPrint ID 118621. Access to the
EPrint REST API is configured in the environment. The result is saved in
“article.json”. EPRINT_USER, EPRINT_PASSWORD and EPRINT_HOST
(e.g. eprints.example.edu) via the shell environment.</p>
<pre><code>EPRINT_USER="__USERNAME_GOES_HERE__"
EPRINT_PASSWORD="__PASSWORD_GOES_HERE__"
EPRINT_HOST="eprints.example.edu"
eprint2rdm 118621 >article.json</code></pre>
<p>Generate a list of EPrint ids from a repository</p>
<pre><code>eprint2rdm -all-ids >eprintids.txt</code></pre>
<p>Generate a JSON document from the EPrints repository hosted as
“eprints.example.edu” for EPrint ID 118621 using a resource map file to
map the EPrints resource type to an Invenio RDM resource type and a
contributor type map for the contributors type between EPrints and
RDM.</p>
<pre><code>eprint2rdm -resource-map resource_types.csv \
-contributor-map contributor_types.csv \
eprints.example.edu 118621 \
>article.json</code></pre>
<p>Putting it together in the to harvest an EPrints repository saving
the results in a dataset collection for analysis or migration.</p>
<ol type="1">
<li>create a dataset collection</li>
<li>get the EPrint ids to harvest applying a resource type map,
“resource_types.csv” and “contributor_types.csv” for contributor type
mapping</li>
<li>Harvest the eprint records and save in our dataset collection</li>
</ol>
<pre><code>dataset init eprints.ds
eprint2rdm -all-ids >eprintids.txt
eprint2rdm -id-list eprintids.txt -harvest eprints.ds</code></pre>
<p>At this point you would be ready to improve the records in eprints.ds
before migrating them into Invenio RDM.</p>
</section>
<footer>
<span>© 2023 <a href="https://www.library.caltech.edu/copyright">Caltech Library</a></span>
<address>1200 E California Blvd, Mail Code 1-32, Pasadena, CA 91125-3200</address>
<span><a href="mailto:[email protected]">Email Us</a></span>
<span>Phone: <a href="tel:+1-626-395-3405">(626)395-3405</a></span>
</footer>
</body>
</html>