Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TH2-5204] External executor #104

Merged
merged 26 commits into from
Jul 3, 2024
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
74012fe
external executor
May 23, 2024
9af7fbc
version update
May 27, 2024
15819e4
README.md
Jun 4, 2024
ba6cf20
Merge branch 'refs/heads/master' into external-executor
Jun 4, 2024
70d4176
cradleVersion
Jun 4, 2024
a9dc35b
version update
Jun 4, 2024
6916ef3
fixes after review
Jun 5, 2024
6cb329e
chart
Jun 5, 2024
bc1353e
chart
Jun 5, 2024
6426684
JsonCreator
Jun 5, 2024
c134f80
Configuration refactored
Jun 5, 2024
191c1df
perftest.md
Jun 7, 2024
d2f07e4
perftest.md
Jun 11, 2024
efb5e5b
event id validation added
Jun 11, 2024
a9624c2
test parameters fix
Jun 12, 2024
6c2c1b5
after review
Jun 12, 2024
490bae3
MessageUtils.toJson
Jun 12, 2024
80cd598
[th2-5204] Corrected configuration check
Nikita-Smirnov-Exactpro Jun 19, 2024
2de0580
[TH2-5207] skip null values for 'labels' and 'messages' fields
Nikita-Smirnov-Exactpro Jun 21, 2024
7b95262
[TH2-5204] cached last page
Nikita-Smirnov-Exactpro Jun 24, 2024
eb9310a
[TH2-5204] use 5.4.1-improve-book-info-cache-9645047662-79f9524-SNAPS…
Nikita-Smirnov-Exactpro Jun 24, 2024
4cc3855
[TH2-5204] use 5.4.1-improve-book-info-cache-9661797806-1bd3f6a-SNAPS…
Nikita-Smirnov-Exactpro Jun 25, 2024
b589fe3
[TH2-5204] Corrected default configuration
Nikita-Smirnov-Exactpro Jun 27, 2024
bddb76b
[TH2-5204] updated cradle 5.4.1-TH2-5207-9758119785-0fae4a0-SNAPSHOT
Nikita-Smirnov-Exactpro Jul 2, 2024
0a05755
[TH2-5204] used cradle 5.4.1-dev
Nikita-Smirnov-Exactpro Jul 2, 2024
2b5bb36
[TH2-5204] updated README.md
Nikita-Smirnov-Exactpro Jul 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 54 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Overview (5.6.0)
# Overview (5.7.0)

Event store (estore) is an important th2 component responsible for storing events into Cradle. Please refer to [Cradle repository] (https://github.com/th2-net/cradleapi/blob/master/README.md) for more details. This component has a pin for listening events via MQ.

Expand All @@ -19,25 +19,39 @@ Infra schema can only contain one estore box description. It consists of one req

General view of the component will look like this:
```yaml
apiVersion: th2.exactpro.com/v1
apiVersion: th2.exactpro.com/v2
kind: Th2Estore
metadata:
name: estore
spec:
image-name: ghcr.io/th2-net/th2-estore
image-version: <image version>
extended-settings:
service:
enabled: false
imageName: ghcr.io/th2-net/th2-estore
imageVersion: <image version>
customConfig:
maxTaskCount: 128
maxTaskDataSize: 536870912
maxRetryCount: 3
mqRouter:
prefetchCount: 100
extendedSettings:
envVariables:
JAVA_TOOL_OPTIONS: "-XX:+ExitOnOutOfMemoryError -Ddatastax-java-driver.advanced.connection.init-query-timeout=\"5000 milliseconds\""
JAVA_TOOL_OPTIONS: >
-XX:+ExitOnOutOfMemoryError
-XX:+UseContainerSupport
-Dlog4j2.shutdownHookEnabled=false
-XX:MaxRAMPercentage=84.2
-XX:MaxMetaspaceSize=70M
-XX:CompressedClassSpaceSize=10M
-XX:ReservedCodeCacheSize=40M
-XX:MaxDirectMemorySize=50M
-Ddatastax-java-driver.advanced.connection.init-query-timeout="5000 milliseconds"
-Ddatastax-java-driver.basic.request.timeout="3 seconds"
resources:
limits:
memory: 500Mi
cpu: 200m
memory: 2000Mi
cpu: 2500m
requests:
memory: 100Mi
cpu: 20m
cpu: 500m
```

# Configuration
Expand All @@ -46,19 +60,21 @@ Configuration is provided as `custom.json` file

```json
{
"maxTaskCount" : 256,
"maxTaskDataSize" : 133169152,
"maxRetryCount" : 1000000,
"retryDelayBase" : 5000
"maxTaskCount": 256,
"maxTaskDataSize": 133169152,
"maxRetryCount": 1000000,
"retryDelayBase": 5000,
"processingThreads": 4
}
```


+ _maxTaskCount_ - maximum number of events that will be processed simultaneously
+ _maxTaskDataSize_ - maximum total data size of events during parallel processing
+ _maxRetryCount_ - maximum number of retries that will be done in case of event persistence failure
+ _retryDelayBase_ - constant that will be used to calculate next retry time(ms):
+ _maxTaskCount_ - maximum number of events that will be processed simultaneously (default: 256)
+ _maxTaskDataSize_ - maximum total data size of events during parallel processing (default: half of available memory)
+ _maxRetryCount_ - maximum number of retries that will be done in case of event persistence failure (default: 1000000)
+ _retryDelayBase_ - constant that will be used to calculate next retry time(ms) (default: 5000):
retryDelayBase * retryNumber
+ _processingThreads_ - number of task processing threads (default: number available logical cpu cores)

If some of these parameters are not provided, estore will use default(undocumented) value.
If _maxTaskCount_ or _maxTaskDataSize_ limits are reached during processing, estore will pause processing new events
Expand All @@ -71,8 +87,26 @@ This is a list of supported features provided by libraries.
_CradleMaxEventBatchSize_ - this option defines the maximum event batch size in bytes.
Please see more details about this feature via [link](https://github.com/th2-net/th2-common-j#configuration-formats)

# Performance

The component provides a performance of 100K events per second if the events are packaged in batches of 20 or
more events(event size: 1.4KB, event status: SUCCEED, no attached messages).

Processing speed (K events/sec) vs batch size for estore (under load of 100K events/s):

![performance chart](./perf_chart.png)

Note: for smaller batches (less than 100 events) higher mqRouter.prefetchCount value should be used (e.g. 1000) to achieve these results.
Nikita-Smirnov-Exactpro marked this conversation as resolved.
Show resolved Hide resolved

More details [here](perftest/perftest.md).

# Changes

## 5.7.0

* Using separate executor instead of ForkJoinPool.commonPool() when storing events
* Updated cradle api: `5.4.0-dev`

## 5.6.0

* Migrated to th2 gradle plugin `0.0.8`
Expand Down Expand Up @@ -183,4 +217,4 @@ Please see more details about this feature via [link](https://github.com/th2-net

## 3.1.0

+ Use async methods for storing events to the Cradle
+ Use async methods for storing events to the Cradle
6 changes: 3 additions & 3 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ sourceCompatibility = 11
targetCompatibility = 11

ext {
cradleVersion = '5.3.0-dev'
cradleVersion = '5.4.0-dev'
}

repositories {
Expand Down Expand Up @@ -43,7 +43,7 @@ dependencies {
implementation("com.exactpro.th2:common-utils:2.2.3-dev") {
because("executor service utils is used")
}
implementation 'com.exactpro.th2:task-utils:0.1.1'
implementation 'com.exactpro.th2:task-utils:0.1.2'
implementation "com.exactpro.th2:cradle-core:$cradleVersion"
implementation "com.exactpro.th2:cradle-cassandra:$cradleVersion"

Expand All @@ -61,7 +61,7 @@ dependencies {

testImplementation 'org.apache.commons:commons-lang3'
testImplementation 'org.junit.jupiter:junit-jupiter:5.10.2'
testImplementation 'org.mockito:mockito-junit-jupiter:5.10.0'
testImplementation 'org.mockito:mockito-junit-jupiter:5.11.0'
}

application {
Expand Down
2 changes: 1 addition & 1 deletion gradle.properties
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
release_version=5.6.0
release_version=5.7.0
description='th2 estore component'
vcs_url=https://github.com/th2-net/th2-estore

Binary file added perf_chart.png
lumber1000 marked this conversation as resolved.
Show resolved Hide resolved
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added perftest/cassandra_disk_usage.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added perftest/cluster_schema.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added perftest/estore_heap.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added perftest/estore_persist.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added perftest/estore_res.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
131 changes: 131 additions & 0 deletions perftest/perftest.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
# Store events via th2-estore (perftest cluster)

## Hardware configuration

### Server

+ **CPU**: Intel Xeon Gold 5218 x 2
+ **RAM**: 768 GB (32 GB RAM x 24)
+ **Disk**: SAMSUNG MZ7LM3T8HMLP SSD 3,5Tb x 5 (raid5 + Hotspare)

### Virtual machines are deployed on de-qa18 server:

+ **perftest-cassandra01** - cassandra
+ **CPU**: 4 core
+ **RAM**: 8Gb
+ **kos-perftest-kuber-node01** - cluster node
+ **CPU**: 11 core
+ **RAM**: 30Gb
+ **kos-perftest-kuber-node02** - cluster node
+ **CPU**: 11 core
+ **RAM**: 30Gb

## Software configuration (perftest cluster)

### RabbitMQ
+ **VM**: kos-perftest-kuber-node01
+ **Docker image**: docker.io/bitnami/rabbitmq:3.11.2-debian-11-r0

### Cassandra
+ **VM**: perftest-cassandra01
+ **Version**: 4.0.5
+ **Special java args**:
+ -Xlog:gc=info,heap*=debug,age*=debug,safepoint=info,promotion*=debug...
+ -Xms4G -Xmx4G

## Cluster schema
![Cluster schema](cluster_schema.png)

## th2 components configuration (perftest cluster)

scheme configuration:
https://gitlab.exactpro.com/vivarium/th2/th2-internal-instances/th2-perf-schemas/-/commit/0b8dc314356aaa12816f0165408137ac24584781
Nikita-Smirnov-Exactpro marked this conversation as resolved.
Show resolved Hide resolved

### woodpecker
+ **Docker image**: ghcr.io/th2-net/th2-woodpecker-template:2.0.0-TH2-5204-estore-perf-9406747565-897a638
+ **Data format**: protobuf

### estore
```yaml
apiVersion: th2.exactpro.com/v2
kind: Th2Estore
metadata:
name: estore
spec:
imageName: ghcr.io/th2-net/th2-estore
imageVersion: 5.7.0-dev
cradleManager:
prepareStorage: false
timeout: 5000
pageSize: 1000
composingServiceThreads: 4
counterPersistenceInterval: 15000
maxUncompressedTestEventSize: 0
maxUncompressedMessageBatchSize: 0
storeIndividualMessageSessions: false
compressionType: LZ4
bookRefreshIntervalMillis: 60000
customConfig:
maxTaskCount : 128
maxTaskDataSize : 536870912
maxRetryCount : 3
mqRouter:
prefetchCount: 1000
extendedSettings:
envVariables:
JAVA_TOOL_OPTIONS: >
-XX:+ExitOnOutOfMemoryError
-XX:+UseContainerSupport
-Dlog4j2.shutdownHookEnabled=false
-XX:MaxRAMPercentage=84.2
-XX:MaxMetaspaceSize=70M
-XX:CompressedClassSpaceSize=10M
-XX:ReservedCodeCacheSize=40M
-XX:MaxDirectMemorySize=50M
-Ddatastax-java-driver.advanced.connection.init-query-timeout="5000 milliseconds"
-Ddatastax-java-driver.basic.request.timeout="3 seconds"
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=1099
-Dcom.sun.management.jmxremote.rmi.port=1099
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.local.only=false
-Djava.rmi.server.hostname=127.0.0.1
lumber1000 marked this conversation as resolved.
Show resolved Hide resolved
resources:
limits:
cpu: 2500m
memory: 2000Mi
requests:
cpu: 1000m
memory: 100Mi
```

## Test parameters
| woodpeckers (n) | batch size (events) | event size (KB) | attached messages (n) | load duration (min) | page duration (days) |
|-----------------|---------------------|-----------------|-----------------------|---------------------|----------------------|
| 2 | 700 | 1.37 | 1 | 40 | 1 |

| component | rate (event/sec) | rate (MB/sec) | Total data size (GB) | Total events (n) |
|------------|------------------|---------------|----------------------|------------------|
| woodpecker | 50,000 | 67 | 157.0 | 90,000,000 |
| total x2 | 100,000 | 134 | 314.1 | 180,000,000 |

## th2-estore metrics
![th2-estore metrics](estore_persist.png)

## Cassandra metrics
![Cassandra metrics](cassandra_disk_usage.png)

## Computation resources
lumber1000 marked this conversation as resolved.
Show resolved Hide resolved

| | RabbitMQ | th2-estore | Cassandra |
|-------------|----------|------------|-----------|
| CPU (cores) | 0.6 | 2.3 | |
| RAM | 433 MB | 1.93 GB | 3.8 GB |

## Computation resources th2-estore metrics
![th2-estore resources](estore_res.png)
![th2-estore heap usage](estore_heap.png)

## Computation resources RabbitMQ metrics
![RabbitMQ resources](rabbitmq_res.png)
Binary file added perftest/rabbitmq_cpu.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added perftest/rabbitmq_ram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added perftest/rabbitmq_res.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading