- books
- journals
- articles
- maps
- manuscripts
- sheets of music
- ...
- bibliographic descriptions
- holding informations
- references
- patron data
- ...
... catalogued in library specific formats (MARC, MAB2, PICA, ...)
... provided via library specific APIs (OAI, SRU, Z39.50, ...)
... used in diverse systems (OPACs, discovery systems, institutional repositories, link resolvers, ...)
... for a library specific metadata toolkit
... is an open collaboration of the three university libraries of Bielefeld, Gent and Lund
... joined by developers of other institutions
... provides an open source set of programming components to build up digital libraries and research services
... supports "Extract, Transform, Load" (ETL) processes
-
Items are the basic unit of data processing in Catmandu. Items may be read, stored, and accessed in many forms.
-
Importers are Catmandu packages to read items into an application. One can also import from remote sources for instance via Atom and OAI-PMH endpoints.
-
Fixes transforms items, massage the data into any format you like.
-
Stores are databases and search engines to store/index your data.
-
Exporters are Catmandu packages to export items from an application.
-
Iterables - Every stream of data, if it comes from Iterators, Fixes or Stores is an iterator. With Iterators the memory consumption of your program is low: you can process Gigabytes, Terabytes of input data without ever running out of memory.
AlephX BibTeX MAB2 MARC PICA
Atom CSV JSON RDF XLS XML YAML
getJSON
OAI
SRU
Z39.50
CHI
DBI
Elasticsearch
MongoDB
Solr
catmandu <command> [-DIL] [long options...]
-D --debug
-L --load_path
-I --lib_path
Available commands:
commands: list the application's commands
help: display a command's help screen
config: export the Catmandu config
convert: convert objects
count: count the number of objects in a store
data: store, index, search, import, export or convert objects
delete: delete objects from a store
export: export objects from a store
import: import objects into a store
info: list installed Catmandu modules
move: move objects to another store
repl: interactive shell for Catmandu
$ catmandu info
$ catmandu help <command>
or
$ catmandu exporter_info
$ catmandu fix_info
$ catmandu importer_info
$ catmandu store_info
$ catmandu help <command>
catmandu convert [-?hLv] [long options...]
examples:
cat books.json | catmandu convert JSON to CSV --fields id,title
options:
-? -h --help this usage screen
-L --load_path
-v --verbose
$ cat ./shared/journals_mab2.dat | catmandu convert MAB2 to JSON
$ catmandu convert MAB2 to JSON < ./shared/journals_mab2.dat
$ catmandu convert MAB2 --type XML to JSON < ./shared/journals_mab2.xml
{
"_id" : "246797-5",
"record" : [
...
[
"331",
" ",
"_",
"UNIX-Magazin"
],
...
[
"406",
"a",
"j",
"1988",
"k",
"1992"
],
...
]
}
$ catmandu convert MARC to JSON < ./shared/camel.mrc
$ catmandu convert MARC --type RAW to JSON < ./shared/camel.mrc
$ catmandu convert MARC --type XML to JSON < ./shared/camel.xml
$ catmandu convert PICA to YAML < ./shared/pica.xml
$ catmandu convert PICA to JSON < ./shared/pica.xml
$ catmandu convert CSV to YAML < ./shared/eu_elections_2014.csv
$ catmandu convert CSV to CSV --fields Wahlbezirk,DKP,NPD < ./shared/eu_elections_2014.csv
$ catmandu convert YAML to JSON < ./shared/journals.yml
$ catmandu convert MAB2 --fix ./shared/mab2rdf.fix to CSV --file mab2.csv --fields dc_identifier,dc_title,dc_language < ./shared/journals_mab2.dat
$ catmandu convert MAB2 --fix ./shared/mab2rdf.fix to XLS --file mab2.xls --fields dc_identifier,dc_title,dc_language < ./shared/journals_mab2.dat
$ cat ./shared/test.tt
[%- FOREACH f IN record %]
[% _id %] [% f.shift %][% f.shift %][% f.shift %][% f.join(":") %]
[%- END %]
$ catmandu convert MARC to Template --template ./shared/test.tt < ./shared/camel.mrc
$ cat ./shared/marc.tt
[% _id %] [% dc.creator.0 %]: [% dc.title %]
$ catmandu convert MARC --fix ./shared/marc.fix to Template --template ./shared/marc.tt < ./shared/camel.mrc
see https://gbv.github.io/aREF/aREF.html and https://metacpan.org/pod/RDF::aREF
catmandu convert RDF --file ./shared/zdb_resources.rdf to YAML
catmandu convert MAB2 --type RAW --fix ./shared/mab2rdf.fix to RDF --type ttl < ./shared/mab2.dat
catmandu convert MAB2 --type RAW --fix ./shared/mab2rdf.fix to RDF --type xml < ./shared/mab2.dat
catmandu import [-?hLv] [long options...]
examples:
catmandu import YAML --file books.yml to MongoDB
--database_name items --bag book
options:
-? -h --help this usage screen
-L --load_path
-v --verbose
... by default all Importers expect UTF-8 encoded data
$ catmandu import MARC --type RAW --fix ./shared/marc.fix to MongoDB --database_name marc --bag marc < ./shared/camel.mrc
$ catmandu import MAB2 --fix ./shared/mab2rdf.fix to MongoDB --database_name mab --bag mab < ./shared/journals_mab2.dat
$ mongo
> use marc
> db.marc.find()
$ catmandu import MARC --type RAW --fix ./shared/marc.fix to Elasticsearch --index_name marc --bag marc < ./shared/camel.mrc
$ catmandu import MAB2 --fix ./shared/mab2rdf.fix to Elasticsearch --index_name mab --bag mab < ./shared/journals_mab2.dat
$ curl 'http://localhost:9200/mab/_search?q=*'
catmandu export [-?hLqv] [long options...]
examples:
catmandu export MongoDB --database_name items --bag book to YAML
options:
-? -h --help this usage screen
-L --load_path
-v --verbose
-q --query
--limit
$ catmandu export MongoDB --database_name mab --bag mab to JSON
$ catmandu export Elasticsearch --index_name marc --bag marc to JSON
$ catmandu export Elasticsearch --index_name mab --bag mab --query '_id:"http://example.org/1142708-5"'
catmandu count [-?hLq] [long options...]
examples:
catmandu count Elasticsearch --index_name shop --bag products
--query 'brand:Acme'
options:
-? -h --help this usage screen
-L --load_path
-q --query
$ catmandu count MongoDB --database_name mab --bag mab
$ catmandu count MongoDB --database_name marc --bag marc --query '{"dc.creator": "Wall, Larry."}'
$ catmandu count Elasticsearch --index_name mab --bag mab
$ catmandu count Elasticsearch --index_name mab --bag mab --query 'dc_title:"magazin"'
$ catmandu count Elasticsearch --index_name marc --bag marc --query 'dc.creator:"wall"'
catmandu delete [-?hLq] [long options...]
examples:
catmandu delete Elasticsearch --index_name items
--bag book -q 'title:"Programming Perl"'
options:
-? -h --help this usage screen
-L --load_path
-q --query
$ catmandu delete MongoDB --database_name mab --bag mab
$ catmandu delete Elasticsearch --index_name mab --bag mab
$ catmandu delete MongoDB --database_name marc --bag marc --query '{"dc.creator": "Wall, Larry."}'
$ catmandu delete Elasticsearch --index_name mab --bag mab --query '_id:"http://example.org/1142708-5"'
catmandu move [-?hLqv] [long options...]
examples:
catmandu move MongoDB --database_name items --bag book
to Elasticsearch --index_name items --bag book
options:
-? -h --help this usage screen
-L --load_path
-v --verbose
-q --query
--limit
$ catmandu move MongoDB --database_name marc --bag marc to Elasticsearch --index_name moved
$ catmandu move MongoDB --database_name marc --bag marc --query '{"dc.creator": "Wall, Larry."}' to Elasticsearch --index_name moved
$ catmandu move Elasticsearch --index_name mab --bag mab --query '_id:"http://example.org/1142708-5"' to Elasticsearch --index_name selected --bag selected
catmandu data [-?hLqv] [long options...]
-? -h --help this usage screen
-L --load_path
--from-store
--from-importer
--from-bag
--count
--into-exporter
--into-store
--into-bag
--start
--limit
--total
-q --cql-query
--query
--fix fix expression(s) or fix file(s)
--replace
-v --verbose
$ catmandu data --from-store MongoDB --from-database_name marc --from-bag marc --query '{"dc.creator": "Wall, Larry."}'
$ catmandu data --from-store Elasticsearch --from-index_name marc --query 'dc.creator:"Wall, Larry."'
$ catmandu data --from-store Elasticsearch --from-index_name mab --from-bag mab --cql-query 'publisher exact Heise'
$ catmandu data --from-store Elasticsearch --from-index_name mab --from-bag mab --cql-query 'issued > 2009' --into-exporter YAML
$ catmandu data --from-store Elasticsearch --from-index_name mab --from-bag mab --cql-query 'issued > 2009' --into-exporter CSV --fix 'retain_field("_id")'
$ catmandu convert OAI --url http://pub.uni-bielefeld.de/oai to JSON
$ catmandu convert SRU --base http://sru.gbv.de/gvk --recordSchema picaxml --parser picaxml --query "pica.iss=0939-4362" to JSON
$ catmandu convert getJSON --from http://example.org/alice.json to YAML
$ catmandu convert getJSON --dry 1 --url http://{domain}/robots.txt < domains
$ cat catmandu.yml
---
store:
mdb:
package: MongoDB
options:
database_name: mydb
els:
package: Elasticsearch
options:
index_name: mydb
$ catmandu import JSON to mdb < records.json
$ catmandu import MARC to els < records.mrc
$ catmandu export mdb to JSON
$ catmandu export els to JSON
- convert data
- store data
- query data
- get data
- edit config
... easy data manipulation by non programmers
... small Perl DSL language
$append - Add a new item at the end of an array
$prepend - Add a new item at the start of an array
$first - Syntactic sugar for index '0' (the head of the array)
$last - Syntactic sugar for index '-1' (the tail of the array)
marc_map('008_/35-38','language');
marc_map('100','authors.$append');
marc_map('245[10]a','title');
marc_map('500a','publisher');
marc_map('650a','subject', -join => '; ');
remove_field('record');
mab_map('001','identifier');
mab_map('002[a]','date');
mab_map('037[b]','language');
mab_map('050[ ]','format');
mab_map('052[ ]_/0-0','type');
mab_map('331[ ]','title');
mab_map('406jk','coverage.$append', -join => ' - ');
mab_map('700[bc]','subject.$append');
remove_field('record');
pica_map('001A0','date');
pica_map('010@a','language');
pica_map('009Qa','primaryTopicOf.$append');
pica_map('027A[01]a','varyingFormOfTitle');
remove_field('record');
add_field('name','Smith');
# { name => 'Smith' }
set_field('name','Doe');
# { name => 'Doe'}
copy_field('name','title');
# { name => 'Doe, John', title => 'Dr.' }
remove_field('title');
# { name => 'Doe, John' }
move_field('name','dc.creator');
# { 'dc.creator' => 'Doe, John' }
retain_field('dc.creator')
# delete every field except named field
# { subjects => 'Perl,R,JavaScript' }
split_field('subjects',',');
sort_field('subjects');
# { subjects => ['JavaScript', 'Perl', 'R'] }
join_field('subjects','; ');
# { subjects => 'JavasSript; Perl; R' }
# { name => 'Doe'}
upcase('name');
# { name => 'DOE' }
downcase('name');
# { name => 'doe' }
capitalize('name');
# { name => 'Doe' }
append('name',', John');
# { name => 'Doe, John' }
prepend('name',', Dr. ');
# { name => 'Dr. Doe, John' }
# { name => ' Doe, '}
trim('name');
# { name => 'Doe,' }
trim('name','nonword');
# { name => 'Doe' }
substring('name', 0, 1);
# { name => 'D' }
# {format => 'MARC21'}
replace_all('format', '\d', '');
# {format => 'MARC'}
# {id => ['123-4', '567-X']}
replace_all('id.*', '-[0-9xX]$', '');
# {id => ['123', '567']}
# { numbers => [1, 2, 3] }
copy_field('numbers','count');
count('count');
copy_field('numbers','sum');
sum('sum');
# { numbers => [1, 2, 3], count => 3, sum => 6 }
$ cat dict.csv
004,Informatik
310,Statistik
510,Mathematik
# { ddc => '004' }
lookup('ddc', 'dict.csv', -default=>'Allgemeines');
lookup('ddc', 'dict.csv', -delete=>'1');
# { ddc => 'Informatik' }
lookup_in_store('ddc', 'MongoDB', -database_name => 'lookups');
if_exists('ddc');
lookup('ddc', 'dict.csv', -delete=>'1');
end();
unless_exists('ddc');
add_field('ddc', '000');
end();
if_any_match('ddc', '004');
set_field('subject', 'Informatik');
end();
unless_any_match('subject', '[a-zA-Z]+');
lookup('subject', 'dict.csv', -delete=>'1');
end();
add_field('dc.title','code4lib');
add_field('dc.subject.$append', 'Computer');
add_field('dc.subject.$append', 'Informatik');
add_field('dc.subject.$append', 'Bibliothek');
add_field('dc.identifier.$append.zdbid','2415107-5');
add_field('dc.identifier.$append.ocn','502377032');
add_field('dc.identifier.$append.issn','1940-5758');
remove_field('dc.identifier.$first');
remove_field('dc.subject.1');
remove_field('dc.subject.*');
# Collapse deep nested hash to a flat hash
collapse();
# Expand flat hash to deep nested hash
expand();
# Clone the perl hash and work on the clone
clone();
# Use an external program that can read JSON
# from stdin and write JSON to stdout
cmd("java MyClass");
... provide processing hooks around Fix functions
$ echo "{}" | catmandu convert --fix 'meow()'
{ "meow": "Prrr" }
$ echo "{}" | catmandu convert --fix 'do bark() meow() end'
woof! woof!
{ "meow": "Prrr" }
- import & fix data
- export & fix data
...
- export data to RDF
├── Catmandu
│ ├── Cmd
│ │ └── foo.pm
│ ├── Exporter
│ │ ├── Foo.pm
│ ├── Fix
│ │ ├── foo_map.pm
│ ├── Importer
│ │ ├── Foo.pm
│ ├── Store
│ │ ├── Foo
│ │ │ ├── Bag.pm
│ │ │ └── Searcher.pm
│ │ ├── Foo.pm
package Catmandu::Importer::Hello;
use Catmandu::Sane;
use Moo;
with 'Catmandu::Importer';
sub generator {
my ($self) = @_;
state $fh = $self->fh;
state $n = 0;
return sub {
my $line = $self->readline or return;
my ($name) = split( ',', $line );
return $name
? { "hello" => $name }
: { "hello" => 'World' };
};
}
1;
package Catmandu::Fix::hello_world;
use Moo;
sub fix {
my ($self,$data) = @_;
$data->{hello} = 'World';
return $data;
}
1;
package Catmandu::Cmd::hello_world;
use parent 'Catmandu::Cmd';
sub command_opt_spec {
(
[ "greeting|g=s", "provide a greeting text" ],
);
}
sub description {
<<EOS;
examples:
catmandu hello_world --greeting "Hoi"
options:
EOS
}
sub command {
my ($self, $opts, $args) = @_;
my $greeting = $opts->greeting // 'Hello';
print "$greeting, World!\n"
}
1;
catmandu -I ./lib convert Hello < ./shared/names.csv
catmandu -D -I ./lib convert Hello --fix "hello_world()" < ./shared/names.csv
catmandu -I ./lib hello_world --greeting Moin
http://metacpan.org/release/Catmandu
http://github.com/LibreCat/Catmandu
Comic by Randall Munroe, CC BY-NC 2.5