This component sends events/messages to crawler data processor for further processing via gRPC. It requests events/messages for the certain time intervals using rpt-data-provider. Those intervals are processed periodically, and new ones are written to Cradle if necessary.
The crawler data processor must implement the crawler data processor gRPC service.
from: 2021-06-16T12:00:00.00Z - the lower boundary for processing interval of time. The Crawler processes the data starting from this point in time. Required parameter
to: 2021-06-17T14:00:00.00Z - the higher boundary for processing interval of time. The Crawler does not process the data after this point in time. If it is not set the Crawler will work until it is stopped.
type: EVENTS - the type of data the Crawler processes. Allowed values are EVENTS, MESSAGES. The default value is EVENTS.
name: CrawlerName - the Crawler's name to allow data processor to identify it. Required parameter
defaultLength: PT10M - the step that the Crawler will use to create intervals. It uses the Java Duration format. You can read more about it here. The default value is PT1H.
lastUpdateOffset: 10 - the timeout to check previously processed intervals. Works only if the higher boundary (to parameter is set). The default value is 1
lastUpdateOffsetUnit: HOURS - the time unit for lastUpdateOffset parameter. Allowed values are described here in Enum Constants block. The default value is HOURS
delay: 10 - the delay in seconds between the Crawler has processed the current interval and starts processing the next one. The default value is 10
batchSize: 500 - the size of data chunks the Crawler requests from the data provider and feeds to the data processor. The default value is 300
toLag: 5 - the offset from the real time. When the interval's higher bound is greater than the current time - toLag the Crawler will wait until the interval's end is less than current time - toLag. The default value is 1.
toLagOffsetUnit: MINUTES - the time unit for toLag parameter. Allowed values are described here in Enum Constants block. The default value is HOURS.
schema component description example (crawler.yml):
apiVersion: th2.exactpro.com/v1
kind: Th2Box
metadata:
name: crawler
spec:
image-name: ghcr.io/th2-net/th2-crawler
image-version: <verison>
type: th2-conn
custom-config:
from: 2021-06-16T12:00:00.00Z
to: 2021-06-16T20:00:00.00Z
name: test-crawler
type: EVENTS
defaultLength: PT1H
lastUpdateOffset: 2
lastUpdateOffsetUnit: HOURS
delay: 10
batchSize: 300
toLag: 5
toLagOffsetUnit: MINUTES
pins:
- name: to_data_provider
connection-type: grpc
- name: to_data_processor
connection-type: grpc
extended-settings:
service:
enabled: true
resources:
limits:
memory: 200Mi
cpu: 200m
requests:
memory: 100Mi
cpu: 50m
The crawler required the following links:
- gRPC link to the data provider working in the gRPC mode
- gRPC link to the crawler data processor
Links example:
apiVersion: th2.exactpro.com/v1
kind: Th2Link
metadata:
name: crawler-links
spec:
boxes-relation:
router-grpc:
- name: crawler-to-data-provider
from:
strategy: filter
box: crawler
pin: to_data_provider
to:
service-class: com.exactpro.th2.dataprovider.grpc.DataProviderService
strategy: robin
box: data-provider
pin: server
- name: crawler-to-data-serivce
from:
strategy: filter
box: crawler
pin: to_data_processor
to:
service-class: com.exactpro.th2.crawler.dataprocessor.grpc.DataProcessorService
strategy: robin
box: data-service
pin: server
Crawler takes events/messages from intervals with startTimestamps >= "from" and < "to" of intervals.