The DataIO library opens the given files and allows to read them using an Iterator, a Stream or an RxJava's Observable. This library doesn't load the full file in memory (this is not the case for XML and ODS files).
The easiest is to include DataIO as a maven dependency:
<dependency>
<groupId>be.ugent.idlab.knows</groupId>
<artifactId>dataio</artifactId>
<version>1.3.1</version>
</dependency>
Check the maven central repository for the latest version.
Or if you want to build & install locally:
Run the following command:
mvn install
Run the following command:
mvn test
Interfaces which allows to open a file (remote, database, SPARQL or WoT) and get the corresponding input stream.
getInputStream: opens the file and returns an inputstream
getDataTypes: This method returns a map of datatypes. References to values are mapped to their datatypes, if available.
getContentType: gives the content type of the access object.
Interface which is an implementation of an Iterator, which overrides the remove and forEachRemaining as these function are trivial for each implementation.
As JSONPath is not yet standardized, compatibility issues may arise. We follow the implementation of JsonSurfer, with following additions
- A combination of the child operators will be reduced to a single child operator. In practice, this means that
$.['child']
becomes$['child']
. - In test cases, you may find a construction like
$.[*]
. This construction will also be reduced to$[*]
.
Interface and implementations for streaming the records from sources.
Implementation of the Flow interface, using RxJava under the hood. Implementations of RxJava's Observable
for the different records are provided.
open(args): this function opens the corresponding files (using an Access object) and initiates the iterator and other needed values to allow the creation of records (eg CSVSourceIterator initiates a header value).
Interface which generalizes the access to data
get(String value); returns a list of objects associated to the given string value.
getDataType(String value) returns the IRI of the datatype of a reference in the record
_PATH is a magic property that can be used to obtain a reverse path through the document to reach a specific object. Index notation can be used to grab a specific element.
Suppose a file people.json
{
"people": [
{
"firstName": "John",
"lastName": "Doe",
"phoneNumbers": [
"0123-4567-8888",
"0123-4567-8910"
]
}
]
}
And suppose a JSON path $.people.[*]
, then a specific path for the first "people" object would be $.people.[0]
.
This object's _PATH
property would resolve to [0,people]
, and _PATH[1]
would resolve to people
.