WARNING: WORK IN PROGRESS - THIS IS ONLY A TEMPLATE FOR THE DOCUMENTATION.
RELEASE DOCS ARE ON THE PROJECT WEBSITE
This release allows to search Ckan, write into it and convert CKAN metadata into DCAT format. If you are upgrading from previous version, see Release notes.
With Maven: If you use Maven as build system, put this in the dependencies
section of your pom.xml
:
<dependency>
<groupId>eu.trentorise.opendata</groupId>
<artifactId>jackan</artifactId>
<version>#{version}</version>
</dependency>
Without Maven: you can download Jackan jar and its dependencies from here, then copy the jars to your project classpath.
In case updates are available, version numbers follow semantic versioning rules.
Code can be found in TestApp1.java
import eu.trentorise.opendata.jackan.CkanClient;
public class TestApp1 {
public static void main(String[] args) {
CkanClient cc = new CkanClient("http://dati.trentino.it");
System.out.println(cc.getDatasetList());
}
}
Code can be found in TestApp2.java
import eu.trentorise.opendata.jackan.CkanClient;
import eu.trentorise.opendata.jackan.model.CkanDataset;
import eu.trentorise.opendata.jackan.model.CkanResource;
import java.util.List;
public class TestApp2 {
public static void main(String[] args) {
CkanClient cc = new CkanClient("http://dati.trentino.it");
List<String> ds = cc.getDatasetList(10, 0);
for (String s : ds) {
System.out.println();
System.out.println("DATASET: " + s);
CkanDataset d = cc.getDataset(s);
System.out.println(" RESOURCES:");
for (CkanResource r : d.getResources()) {
System.out.println(" " + r.getName());
System.out.println(" FORMAT: " + r.getFormat());
System.out.println(" URL: " + r.getUrl());
}
}
}
}
Should give something like this:
DATASET: abitazioni
RESOURCES:
abitazioni
FORMAT: JSON
URL: http://www.statweb.provincia.tn.it/INDICATORISTRUTTURALISubPro/exp.aspx?idind=133&info=d&fmt=json
abitazioni
FORMAT: CSV
URL: http://dati.trentino.it/storage/f/2013-06-16T113651/_lcmGkp.csv
numero-di-abitazioni
FORMAT: JSON
URL: http://www.statweb.provincia.tn.it/INDICATORISTRUTTURALISubPro/exp.aspx?ntab=Sub_Numero_Abitazioni&info=d&fmt=json
numero-di-abitazioni
FORMAT: CSV
URL: http://dati.trentino.it/storage/f/2013-06-16T113652/_yWBmJG.csv
DATASET: abitazioni-occupate
RESOURCES:
abitazioni-occupate
FORMAT: JSON
URL: http://www.statweb.provincia.tn.it/INDICATORISTRUTTURALISubPro/exp.aspx?idind=134&info=d&fmt=json
abitazioni-occupate
FORMAT: CSV
URL: http://dati.trentino.it/storage/f/2013-06-16T113653/_iaMMc2.csv
numero-di-abitazioni-occupate
FORMAT: JSON
URL: http://www.statweb.provincia.tn.it/INDICATORISTRUTTURALISubPro/exp.aspx?ntab=Sub_Numero_Abitazioni_Occupate&info=d&fmt=json
numero-di-abitazioni-occupate
FORMAT: CSV
URL: http://dati.trentino.it/storage/f/2013-06-16T113654/__lLACk.csv
...
Code can be found in TestApp3.java
import eu.trentorise.opendata.jackan.CkanClient;
import eu.trentorise.opendata.jackan.CkanQuery;
import eu.trentorise.opendata.jackan.model.CkanDataset;
import java.util.List;
public class TestApp3 {
public static void main(String[] args) {
CkanClient cc = new CkanClient("http://dati.trentino.it");
CkanQuery query = CkanQuery.filter().byGroupNames("turismo").byTagNames("ristoranti");
List<CkanDataset> filteredDatasets = cc.searchDatasets(query, 10, 0).getResults();
for (CkanDataset d : filteredDatasets) {
System.out.println();
System.out.println("DATASET: " + d.getName());
}
}
}
Should give something like this:
DATASET: osterie-tipiche-trentine
DATASET: poi-trento
DATASET: punti-di-interesse-valsugana
DATASET: poi-altopiano-di-pine-e-valle-di-cembra
DATASET: punti-di-ristoro-vivifiemme-2013
First, a brief recap of operations for writing offered by ckan:
- create: in ckan creation often acts more as upsert, that is, if object with existing id/name already exists it is updated
- update: update completely replaces stuff on server, and if you don't send a list or set it to null it gets emptied on the server. This can problematic for example when updating datasets containg a list of resources.
- patch: added in Ckan 2.3 for less destructive updates. Jackan does not implement
patch
operations so far and instead offers so-calledpatch-update
operations that emulatepatch
but only by callingupdate
(so they work also in ckan < 2.3) - delete: marks objects as non-visible in the website and api. To really delete things
purge
operations would need to be implemented. - purge: this one really deletes stuff
Currently Jackan supports:
create | update | patch | patch update | delete | purge | |
---|---|---|---|---|---|---|
Resource | X* | X* | X* | X | ||
Dataset | X | X | X | X | ||
Group | X | |||||
Organization | X | |||||
User | X | |||||
Tag | X | |||||
Vocabulary | X |
*Resource create
and update
also allow uploading/modifying files. To upload files you will need a recent version of Ckan (we tested it and worked with 2.5.2 in demo.ckan.org, but couldn't make it work with version 2.2a)
Sometimes Ckan forgets to properly validate input. For example, at least with Ckan 2.2a we have been able to create resources with empty id :-/ To prevent writing such garbage we extended default CkanClient
with CheckedCkanClient
, which is more picky about possibly inconsistent input. If you also care about data integrity you might want to use the Checked client or extend it with your own validation rules when writing into Ckan. To try how different clients behave against the extensive Jackan test suite when running tests we set the client client class to use as parameter jackan.test.ckan.client-class=eu.trentorise.opendata.jackan.CheckedCkanClient
in conf/jackan.test.properties
Maybe in the future we will implement also java.validation api support.
All writable classes have an ancestor with "Base"
appended to the Ckan object name, like CkanDatasetBase
.
When writing Jackan sends to Ckan only the non-null fields of such base classes (except for patch-update, which is more sophisticated). Notice CKAN instances might have custom data schemas that force presence of custom properties among 'regular' ones. In this case, they go to java others
hashmap and when serialized are put into the main json body (Note that to further complicate things there is also an extras
field).
Many test cases for writing can be found in WriteCkan*IT.java files. Here we just report a couple of them.
// here we use CheckedCkanClient for extra safety
CkanClient myClient = new CheckedCkanClient("http://put-your-catalog.org", "put your ckan api key token");
CkanDatasetBase dataset = new CkanDatasetBase();
dataset.setName("my-cool-dataset-" + new Random().nextLong());
// notice Jackan will only send field 'name' as it is non-null
CkanDataset createdDataset = myClient.createDataset(dataset);
checkNotEmpty(createdDataset.getId(), "Invalid dataset id!");
assertEquals(dataset.getName(), createdDataset.getName());
System.out.println("Dataset is available online at " + CkanClient.makeDatasetURL(myClient.getCatalogURL(), dataset.getName()));
Shows Jackan-specific patch-update functionality, in this case for changing tags assigned to a dataset (and also shows that new free tags can be created at dataset creation)
// here we use CheckedCkanClient for extra safety
CkanClient myClient = new CheckedCkanClient("http://put-your-catalog.org", "put your ckan api key token");
// we create a dataset with one tag 'cool'
CkanDatasetBase dataset = new CkanDatasetBase("my-dataset-" + new Random().nextLong());
List<CkanTag> tags_1 = new ArrayList();
tags_1.add(new CkanTag("cool"));
dataset.setTags(tags_1);
CkanDataset createdDataset = myClient.createDataset(dataset);
// now we assign a new array with one tag ["amazing"]
List<CkanTag> tags_2 = new ArrayList();
tags_2.add(new CkanTag("amazing"));
createdDataset.setTags(tags_2);
// let's patch-update, jackan will take care of merging tags to prevent erasure of 'cool'
CkanDataset updatedDataset = myClient.patchUpdateDataset(createdDataset);
assert 2 == updatedDataset.getTags().size(); // 'amazing' has been added to ['cool']
System.out.println("Merged tags = "
+ updatedDataset.getTags().get(0).getName()
+ ", " + updatedDataset.getTags().get(1).getName());
System.out.println("Updated dataset is available online at " + CkanClient.makeDatasetURL(myClient.getCatalogURL(), dataset.getName()));
For ser/deserializing JSON there are two kinds of configurations, a default one for reading from ckan and one for writing (that is, POSTing). Most probably you are interested in the default one.
Jackson library annotations are used to automatically convert to/from JSON using Jackson's ObjectMapper
object. Notice that although field names of Java objects are camelcase (like authorEmail
), serialized fields follows CKAN API stlye and use underscores (like author_email
).
Here is an example of serialization/deserialization:
// your Jackson ObjectMapper
ObjectMapper objectMapper = new ObjectMapper();
CkanClient.configureObjectMapper(objectMapper);
CkanDataset dataset = new CkanDataset();
dataset.setName("hello");
String json = objectMapper.writeValueAsString(dataset);
System.out.println("json = " + json);
CkanDataset reconstructed = objectMapper.readValue(json, CkanDataset.class);
assert "hello".equals(reconstructed.getName());
For more fine-grained control you can just register JackanModule
into your Jackson object mapper:
ObjectMapper objectMapper = new ObjectMapper();
objectMapper.registerModule(new JackanModule());
This more advanced usage is for the case you want to do your own POST operations (create/update/delete/purge) to ckan (or maybe extend Jackan :-) ...
Notice for this you might need a different object mapper for each class you intend to post, so to be able to configure each mapper in a fine-grained way. You can find an example for datasets in method CkanClient.configureObjectMapperForPosting
:
ObjectMapper mapperForDatasetPosting = new ObjectMapper();
CkanClient.configureObjectMapperForPosting(mapperForDatasetPosting, CkanDatasetBase.class);
CkanDataset dataset = new CkanDataset("random-name-" + new Random().nextLong());
// this would be the POST body.
String json = mapperForDatasetPosting.writeValueAsString(dataset);
CKAN uses timestamps in a format like 1970-01-01T01:00:00.000010
. In the client we store them as java.sql.Timestamp
so to be able to preserve the microseconds. To parse/format Ckan timestamps, use
CkanClient.formatTimestamp(new Timestamp(123));
CkanClient.parseTimestamp("1970-01-01T01:00:00.000010");
DCAT is an emerging W3C standard for representing catalog metadata. For this reason, when we use Jackan we usually convert Ckan objects to their DCAT representation, which gives us a consistent well defined view of open data catalogs.
There has long been a plugin for ckan to serve metadata as rdf in DCAT format, but according to maintainer (July 2015):
Historically you have been able to access an RDF representation of a CKAN
dataset metadata by navigating to /dataset/{id}.rdf or /dataset/{id}.n3.
These were rendered using templates, and were outdated, incomplete and
broken [1].
Situation on ckan side is getting much better with the new version of the plugin in progress, but we cannot expect all CKAN instances around the world to adopt it now. So currently we provide a class to convert from CKAN objects to their DCAT equivalent called DcatFactory. It will convert a CkanDataset
to a DcatDataset
and a CkanResource
to a DcatDistribution
according to this mapping.
Examples code:
DcatFactory dcatFactory = new DcatFactory();
CkanDataset ckanDataset = new CkanDataset("my-dataset");
DcatDataset dcatDataset
= dcatFactory.makeDataset(
ckanDataset,
"http://dati.trentino.it",
Locale.ITALIAN); // default locale of metadata
CkanResource ckanResource = new CkanResource(
"http://my-department.org/expenses.csv",
"my-dataset");
DcatDistribution dcatDistribution
= dcatFactory.makeDistribution(
ckanResource,
"http://dati.trentino.it",
"my-dataset", // owner dataset id
"cc-zero", // license id
Locale.ITALIAN); // default locale of metadata
To extract more stuff during conversion, you can use GreedyDcatFactory or extend DcatFactory and override the extract* and/or postProcess* methods.
Jackan uses native Java logging system (JUL). If you also use JUL in your application and want to see Jackan logs, you can take inspiration from jackan test logging properties. If you have an application which uses SLF4J logging system, you can route logging with JUL to SLF4J bridge, just remember to programmatically install it first.