The identity benchmark is composed of two components, BenchmarkStitching
and BenchmarkShortRead
.
-
Before loading the graph, create secondary indexes on the required properties. To do this, run ./scripts/run_create_indexes.sh
-
Using
graph-synth
(see https://github.com/aerospike/graph-synth/tree/benchmark-schema-compat), load the graph with the identity schemaNote you will need to configure the scale-factor. This may take some trial and error to get right. For reference, scale factor of 10000 creates a graph with 10k GoldenEntities
Here is an example of how to run graph-synth for the identity schema:
mvn clean install -DskipTests
java -jar ./graph-synth/target/GraphSynth-1.1.0-SNAPSHOT.jar --input-uri=file:$(pwd)/conf/schema/benchmark2024.yaml --output-uri=ws://localhost:8182/g --scale-factor=1000 --clear
-
Run the short read, see
./scripts/run_shortread.sh
and./conf/shortread.properties
to run and configure the short read benchmark
- A TinkerPop based graph database that the benchmark can connect to
- Java 17 CLI installed
- Getting the correct stitching amount may be a little tricky since JMH does not run in absolute iterations, but rather in time
Objective: Gain intuition on how big of scale factor / how long to run stitching for proper results
- Run graph-synth with scale factor 1 million
- Adjust stitching config to 10 minute run (this is done in the conf/stitch.properties file) and run ./scripts/run_stitch.sh
- Run ./scripts/run_summary.sh to get the number of stitched vertices. Extrapolate how much more time we need to run this for the number of stitched vertices to be ~1/2 the number of GoldenEntities
- Check Aerospike and see how much data we are using. From here extrapolate how much bigger our scale factor needs to be to read 100 GB, 1 TB, and 10 TB
- From here we can run the ./scripts/run_shortread.sh just as a smoke test and initial results on a small scale.