Heelflip is an JSON aggregator library for Java. It's well-known that aggregation processes are useful but it's very expensive to calculate. Instead of calculate aggregations over JSON files into a relational database or even NoSQL database, we provider a library that does this task for us. Heelflip works in-memory or using Redis to aggregate values.
Heelflip is available at the Central Maven Repository:
<dependency>
<groupId>com.github.greatjapa</groupId>
<artifactId>heelflip</artifactId>
<version>1.3</version>
</dependency>
Considering the following bookstore JSON sample:
{"name":"The Odyssey", "author":"Homer", "genre":"poem", "inStock":true, "price":12.50}
{"name":"The Godfather","author":"Mario Puzo", "genre":"novel", "inStock":true, "price":6.49 }
{"name":"Moby-Dick", "author":"Herman Melville","genre":"novel", "inStock":false,"price":3.07 }
{"name":"Emma", "author":"Austen", "genre":"novel", "inStock":true, "price":30.50}
We can read then as follows:
try(InputStream stream = new FileInputStream("bookstore.json")){
JsonAgg jsonAgg = new JsonAgg();
jsonAgg.loadNDJSON(zipStream);
...
}
Once you have a JsonAgg object we can get global aggregations (min, max and sum) doing as follows:
IFieldAgg priceAgg = jsonAgg.getFieldAgg("price");
popAgg.getMin(); // 3.07
popAgg.getMax(); // 30.50
popAgg.getSum(); // 52.56
Or counting their values (count and cardinality):
IFieldAgg genreAgg = jsonAgg.getFieldAgg("genre");
genreAgg.count(); // 4
genreAgg.cardinality(); // 2
genreAgg.count("novel"); // 3
We also can get group by aggregations doing as follows:
IGroupByAgg groupByAgg = jsonAgg.getGroupBy("price", "inStock");
IFieldAgg priceBystockAgg = groupByAgg.groupBy("true");
priceBystockAgg.getMin(); // 6.49
priceBystockAgg.getMax(); // 30.50
priceBystockAgg.getSum(); // 49.49
or
IGroupByAgg groupByAgg = jsonAgg.getGroupBy("name", "inStock");
IFieldAgg namesInStockAgg = groupByAgg.groupBy("true");
namesInStockAgg.distinctValues();
//"The Odyssey"
//"The Godfather"
//"Emma"
You can also generate a report with all aggregations accumulated. You just need to do:
jsonAgg.dumpReport("report", true);
This code snippet will create a directory in the following structure:
report
│ __missingGroupBy.json
└───name
│ │ __name.json
│ │ name_groupBy_author.json
│ │ name_groupBy_genre.json
│ │ name_groupBy_inStock.json
│ │ name_groupBy_price.json
└───author
└───genre
└───inStock
└───price
Each JSON field has its own directory, for instance, name
, author
etc. This directory has the following items:
__<field_name>.json
file with field global aggregations;- All combinations of group by aggregations separated in different files in this format
<field_name>_groupBy_<field_name>.json
.
Finally, the root directory contains a file __missingGroupBy.json
with a list of missing group by combination.
Creating JsonAgg
with default constructor means that you will aggregate JSON files in-memory. If you try to load several JSONs, the JVM may raise OutOfMemoryError
. To avoid this kind of error, we provide an alternative to process aggregations in Redis instead of in-memory. See code below:
JedisPool pool = new JedisPool(new JedisPoolConfig(), "localhost");
try (Jedis jedis = pool.getResource()) {
JsonAgg jsonAgg = new JsonAgg(jedis);
...
}
There is no problem if you have JSONs with array or nested objects. What Heelflip does is calculate aggregations over fields and their values. But the Heelflip API is based on field names and, for that, we need to rename JSON fields when arrays or objects appears.
For instance, the following JSON entry:
{"name":"Steve", "age":30, "address":{"street":"8nd Street", "city":"New York"}}
will be read as:
{"name":"Steve", "age":30, "address.street":"8nd Street", "address.city":"New York"}}
To retrieve the information about the field "city" we need to concatenate the field names as showed below:
FieldAgg cityAgg = agg.getFieldAgg("address.city");
cityAgg.count();
Similarly, the following JSON entry with arrays:
{ "city" : "SPRINGFIELD", "loc" : [ -72.577769, 42.128848 ], "pop" : 22115}
will be read as:
{ "city" : "SPRINGFIELD", "loc_0" : -72.577769, "loc_1": 42.128848, "pop" : 22115}
NOTE: The examples above are used only to ilustrate how we rename field names for objects and arrays to generate unique names at aggregation time. It is important to understand that we do not flatten the JSON.