Indexing service

This service demonstrates how to connect to the GeoNetwork database.

Database configuration is loaded from the configuration service.

Start the service using:

mvn spring-boot:run

# Remove the index
curl '127.0.0.1:9900/index/tasks/index/records' \
    -X DELETE 

# Index all database records
curl '127.0.0.1:9901/index/tasks/index/all/all' \
   -X GET 

# Index one database record
curl '127.0.0.1:9900/index/tasks/index/simpleBatch/511' \
   -X GET

Metrics

gn_index_totalNumberOfRecordsToIndex 0.0
gn_index_totalNumberOfRecordsIndexed_records_total{gn="index",} 356.0

gn_index_task_771df89e_7e86_4cc5_895a_2956509c302c_numberOfRecordsIndexed_records_total{gn="index",} 89.0

Performances

Currently, the indexing process is doing the following:

Stream ids from Metadata table
Group ids by batch size
Organize records per schema (XSLT step is schema dependant)
Create one index document from DB properties
Process batch with XSLT to add XML properties
Send bulk request (synchronous) to index

The process is quite similar to what GeoNetwork is doing but is incomplete:

DB properties are partial (needs to collect owner, username, validity status, ...)
XSLT indexing is not complete
No geometry parsing

Configuration is:

gn.indexing.batch.size=160
gn.indexing.threadPool.size=20

Metrics are exposed during processing:

To be analyzed:

What about bulk asynchronous operation?
What happens if we configure an Elasticsearch cluster?

Events

2 options analyzed:

RabbitMQ
Kafka

Spring cloud bus event works with the 2.

The configuration server can push event when config change, to reload the configuration in each microservices using:

curl -X POST http://localhost:9999/actuator/bus-refresh

or webhooks.

The indexing service create a gn_indexing_tasks_stream topic.

Custom IndexEvent was tested. A REST endpoint allows to send a message on the topic.

# POST to /index/event/{bucket}/{uuid}
curl -X POST '127.0.0.1:9997/index/event/e0101/1'

The message is consumed by the EventConsumer bean and by the camel route (using kafka or rabbitmq components).

To install rabbitmq:

docker pull rabbitmq:3-management
docker run -d --hostname my-rabbit --name some-rabbit -p 15672:15672 -p 5672:5672 rabbitmq:3-management

To install kafka, see https://kafka.apache.org/quickstart and start Kafka:

tar -xzf kafka_2.13-2.6.0.tgz
cd kafka_2.13-2.6.0
bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Indexing service

Metrics

Performances

Events

Files

README.md

Latest commit

History

README.md

File metadata and controls

Indexing service

Metrics

Performances

Events