theme | class | headingDivider |
---|---|---|
default |
invert |
2 |
Original source is found at https://github.com/jbee/json-streama/blob/main/PRESENTATION.md
Here: about processing JSON input, for example a data import
Typical basic case:
{
"header": {
"id": "xyz",
"name": "collection"
},
"entries": [{...}, {...}, ...]
}
Mapping
JSON => Objects => process(Objects)
Streaming (or stream processing)
JSON => process(JSON*)
*
: a access "wrapper" API that reads the input as they are processed
could also think of it as:
JSON => new Wrapper(JSON) => process(Wrapper)
Pros:
- object graph as API
- familiar
- order does not matter
- navigate graph freely
- DTOs (high re-usability)
- straight forward (not very error-prone)
- easy(er) to maintain 😍
- well-supported in common libraries (e.g. jackson)
- usually easy to extend and customise the mapping
Cons:
- object graph
- larger JSON => larger graph => more memory =>
OutOfMemoryError
🥴 - slow(er) 😴
- needs guard against too large input
- larger JSON => larger graph => more memory =>
Pros:
- no intermediate representation
- can handle almost any size input (GB or even beyond) 😎
- fast(er)
- no guard against large input required
Cons:
- usually library specific or hand-crafted API
- usually requires custom coded processing
- 1-off solution (low re-usability)
- not straight forward (error-prone)
- hard(er) to maintain 🤯
- in streams order is significant by nature
- streams by nature are transient
What we are used to:
- for JSON order of object members does not matter
Reality:
- In streams order of object members matters
- (or the code gets very complicated and still only true for primitive members)
What we are used to in OO:
- Data can be accessed at any time and does not change unless we do that change ourselves
Reality:
- Stream values are only present at that point in the stream.
- Remembering previous values must be coded for each case.
For example: https://github.com/dhis2/dhis2-core/pull/9574/files
Jackson's "Wrapper" is JsonParser
Properties:
- token based (object-start, object-end, ...)
- very low level
- quite error-prone
- quite hard to understand and maintain
- 1-off solution per input JSON format
- order very important => difficult to code
- internal state tracking nightmare 😱
Pseudo code:
while (parser.hasTokens()) {
String value = null;
switch (parser.nextToken()) {
case OBJECT_START: // what state are we in again?
case STRING: // what state are we in again?
value = parser.nextString();
}
// what to do with the value now?
// and... what state are we in again?
}
You have to wonder...
- what is the state in the parser???
- what is the state of the processing code variables???
... to have best of both worlds?
- fast
- low-footprint (memory, CPU)
- can handle large input
- easy to understand
- easy to use and extend
- re-usable "objects", composable
- "OO think" and Java types
- order is (mostly) irrelevant (in streams it is by nature)
- values are (more) stable (again, in streams the can't by nature)
- no internal and variable state tracking in our brains
- no JSON level concerns in our code => we want to deal with objects
Fake it 'till you make it 😎
Let's just pretend we have objects... use interfaces to model an object graph API:
interface Payload {
Header header();
Stream<Entry> entries();
}
interface Header {
String id();
String name();
}
interface Entry { /* ... */ }
{
"header": {
"id": "xyz",
"name": "collection"
},
"entries": [{...}, {...}, ...]
}
Usage:
Payload root = JsonStream.ofRoot(Payload.class, inputStream);
Header header = root.header();
root.entries().forEach(entry -> {
//...
});
- 1 "auto" implements using Java
Proxy
- ½ order and reading is given/driven by usage (method calls on proxy API)
- can primitive fields be accessed in a random order? Yes
- can there be multiple stream processed collections? Yes
- can stream processing be nested? Yes
- can the JSON contain members that are not "mapped"? Yes
- can stream processing be applied to JSON arrays and object "maps"? Yes
- can the root be an array or object "map"? Yes
- can stream processed entries use other Java types than
Stream
? Yes, supported are:Stream
(Stream<Entry> entries()
)Iterator
(Iterator<Entry> entries()
)Consumer
(void entries(Consumer<Entry> forEachEntry)
)
Map-of-objects style:
[{ "1": {"name": "Track 1"}, "2": {"name": "Track 2"}}]
can be mapped directly to Stream<Track>
:
interface Track {
@JsonProperty(key = true)
int no();
String name();
}
still, this array-of-objects style input will also work:
[{"no": 1, "name": "Track 1"}, {"no": 2, "name": "Track 2"}]
also a property named key
does not require the annotation.
For example, tracks of a music album:
{"tracks": [{"name": "A"}, {"name": "B"}]}
Make properties mandatory using required
:
interface Track {
@JsonProperty(required = true)
String name();
}
Restricting size using minOccur
and maxOccur
:
interface Album {
@JsonProperty(minOccur = 1, maxOccur = 25)
Stream<Track> tracks();
}
If a value is not in the JSON input, return a default
Simply give the "getter" a parameter of the return type:
interface Track {
String name(String defaultValue);
}
String name = track.name("unknown");
- is malformed JSON detected? Yes, => exception
- are accidental "out of order" usages of proxy objects detected? Yes, => exception
- are stream members out of order detected? Yes, => exception
- can the supported Java types be extended? Yes (WIP)
______ __
| _ \ / |
| | | |___ _ __ ___ ___ `| |
| | | / _ \ '_ ` _ \ / _ \ | |
| |/ / __/ | | | | | (_) | _| |_
|___/ \___|_| |_| |_|\___/ \___/
- fast
- low-footprint (memory, CPU)
- can handle large input
- easy to understand
- easy to use and extend
- re-usable "objects", composable
- "OO think" and Java types
- order is (mostly) irrelevant (in streams it is by nature)
- values are (more) stable (again, in streams the can't by nature)
- no internal and variable state tracking in our brains
- no JSON level concerns in our code => we want to deal with objects
Noice!
- Proxies: creates only one per member (independent of the number of items in a stream)
- Simple Values: are stored and accessed via index access in a reused array per member
- Parser:
- zero look ahead
- "self-suspending" PEG parser
______ _____
| _ \ / __ \
| | | |___ _ __ ___ ___ `' / /'
| | | / _ \ '_ ` _ \ / _ \ / /
| |/ / __/ | | | | | (_) | ./ /___
|___/ \___|_| |_| |_|\___/ \_____/
Work in progress:
IntStream
,LongStream
, ...Stream<String>
,Stream<Date>
, ...List<String>
,Map<String,Integer>
...- manual skipping