Skip to content

Commit

Permalink
Merge pull request #659 from geonetwork/add-format-es-pipeline
Browse files Browse the repository at this point in the history
Add an ElasticSearch pipeline to make formats human-readable
  • Loading branch information
jahow authored Dec 6, 2023
2 parents f46b998 + 2804ad7 commit 0dd0958
Show file tree
Hide file tree
Showing 11 changed files with 385 additions and 14 deletions.
52 changes: 52 additions & 0 deletions docs/guide/deploy.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,3 +141,55 @@ As such, **authenticated requests are not yet supported in GeoNetwork-UI in the
Lastly, even if authenticated requests were cleared regarding CORS rules, it would still be needed to disable the XSRF mechanism for the endpoints that GeoNetwork-UI relies on; XSRF protections works by making the client read the content of an HTTP cookie, and that is forbidden in a cross-origin context

:::

## Enabling improved search fields

ElasticSearch offers the possibility to preprocess the records of a catalog, and this can be leveraged to **improve the search experience in GeoNetwork-UI**. This is done by registering so-called [ingest pipelines](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/ingest.html).

GeoNetwork-UI provides several pipelines, for instance:

- Enable the Metadata Quality Score
- Show better, human-readable data formats

The two options for registering the pipelines are explained below.

::: tip
Once pipelines are registered, the GeoNetwork catalog should be fully reindexed again.
:::

::: warning
**Please note that destroying and recreating the GeoNetwork index _will_ disable the pipelines!** These should simply be registered again afterward.
:::

### Option A: Executing a Node script

This will require having `node` installed on the device, as well as a direct HTTP access to the ElasticSearch instance (i.e. not just access to the GeoNetwork API).

First clone the GeoNetwork-UI repository:

```shell
git clone [email protected]:geonetwork/geonetwork-ui.git
cd geonetwork-ui
```

Then run the following script with the appropriate options:

```shell
node tools/pipelines/register-es-pipelines.js register --host=http://localhost:9090
```

The `--host` option is used to point to the ElasticSearch instance. Additionally, the `--records-index` option can be used if the index containing the metadata records is not called `gn-records`.

### Option B: Running a docker image

A docker image called `geonetwork/geonetwork-ui-tools-pipelines` can be used to register pipelines automatically on startup.

To run it:

```shell
docker run --rm --env ES_HOST=http://localhost:9200 --network host geonetwork/geonetwork-ui-tools-pipelines
```

Here the `ES_HOST` environment variable is used to point to the ElasticSearch instance. Note that this host will be used _from inside the docker container_, so to access an instance on `localhost` the `--network host` option is also required.

The `RECORDS_INDEX` environment variable can be used to a different index name if it is not called `gn-records`.
28 changes: 18 additions & 10 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 0 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,6 @@
"@typescript-eslint/eslint-plugin": "5.62.0",
"@typescript-eslint/parser": "5.62.0",
"autoprefixer": "^10.4.13",
"commander": "^6.2.1",
"cypress": "^12.17.1",
"cypress-browser-permissions": "^1.1.0",
"cypress-real-events": "^1.9.1",
Expand Down
2 changes: 2 additions & 0 deletions support-services/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,8 @@ services:
retries: 10
volumes:
- elasticsearch_data:/usr/share/elasticsearch/data
ports:
- '9200:9200'

kibana:
image: kibana:7.11.1
Expand Down
1 change: 1 addition & 0 deletions tools/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
node_modules
40 changes: 40 additions & 0 deletions tools/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# GeoNetwork-UI Tools

This directory contains various tools used in the development and deployment phases of GeoNetwork-UI.

## [`docker` folder](./docker)

Contains a generic Dockerfile that can be used for any applications provided by GeoNetwork-UI, as well as the corresponding entrypoing and NGINX configuration file.

## [`i18n` folder](./i18n)

Contains utilities related to translation files used in GeoNetwork-UI.

## [`pipelines` folder](./pipelines)

Contains utilities related to [registering pipelines on ElasticSearch](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/ingest.html). Pipelines are used to preprocess records during the indexation process, thus
offering greater control over the values returned by ElasticSearch and giving an improved user experience when searching records.

A CLI is provided to let you register or clear GeoNetwork-UI-related pipelines on an ES instance. For example:

```shell
node pipelines/regiser-es-pipelines.js register --host=http://localhost:9200 --records-index=gn-records
```

A docker image can also be built to register the pipelines automatically in a docker environment:

```shell
npm run pipelines:docker-build
# once image is built, use it like so:
docker run --rm --env ES_HOST=http://localhost:9200 --env RECORDS_INDEX=gn-records --network host geonetwork/geonetwork-ui-tools-pipelines
```

## [`webcomponent` folder](./webcomponent)

Contains utilities related to Web Components provided by the GeoNetwork-UI project.

## Other tools

- `generate-api.sh`: regenerates API clients automatically from existing OpenAPI YAML files
- `make-archive.sh`: bundles directories in `dist` and names it appropriately for releases
- `print-docker-tag.sh`: outputs a docker tag based on the current git tag/branch
4 changes: 2 additions & 2 deletions tools/i18n/cli.js
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
const { readFile, writeFile } = require('fs/promises')
const { program } = require('commander')
import { readFile, writeFile } from 'fs/promises'
import { program } from 'commander'

program
.command('merge <sourcePath> <destPath>')
Expand Down
22 changes: 22 additions & 0 deletions tools/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

11 changes: 10 additions & 1 deletion tools/package.json
Original file line number Diff line number Diff line change
@@ -1,3 +1,12 @@
{
"type": "module"
"name": "geonetwork-ui-tools",
"description": "A series of tools used alongside GeoNetwork-UI",
"type": "module",
"devDependencies": {
"commander": "11.1.0"
},
"scripts": {
"pipelines:register": "node pipelines/register-es-pipelines.js register",
"pipelines:docker-build": "docker build . -f pipelines/Dockerfile -t $(./print-docker-tag.sh tools-pipelines)"
}
}
15 changes: 15 additions & 0 deletions tools/pipelines/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
FROM node:lts-alpine

RUN mkdir /app
WORKDIR /app
COPY ./pipelines/register-es-pipelines.js ./
COPY ./package.json ./
COPY ./package-lock.json ./

ENV ES_HOST="http://elasticsearch:9200"
ENV RECORDS_INDEX="gn-records"

RUN npm ci

#ENTRYPOINT ["node", "./register-es-pipelines.js", "register", "--host", "echo $ES_HOST", "--records-index", "echo $RECORDS_INDEX"]
ENTRYPOINT node ./register-es-pipelines.js register --host $ES_HOST --records-index $RECORDS_INDEX
Loading

0 comments on commit 0dd0958

Please sign in to comment.