If your ElasticSearch server is behind HTTPS and you require certificates, then add the .crt
files in the elasticsearch/certificates
directory, and they will be installed in the docker image.
Each data source should be stored in its own Elastic Search index. The index must be defined as the following (note that the *_vector
fields are optional, and only used in the embedding search). Use the following mappings to create your indexes:
{
"properties": {
"date": {
"type": "date"
},
"id": {
"type": "long"
},
"neighbourhoods": {
"type": "nested",
"properties": {
"count": {
"type": "long"
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"ratio": {
"type": "float"
}
}
},
"parent_source_title_vector": {
"type": "dense_vector",
"dims": 256
},
"source_title_vector": {
"type": "dense_vector",
"dims": 256
},
"text_vector": {
"type": "dense_vector",
"dims": 256
},
"num_chars": {
"type": "long"
},
"num_sentences": {
"type": "long"
},
"parent_source_title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"parent_source_url": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"sentiment_polarity": {
"type": "float"
},
"sentiment_subjectivity": {
"type": "float"
},
"source_title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"source_url": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"text": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
Then, add your indexes/aliases to the default_index_aliases
parameter in .configs
.
Each of these indexes should be filled with documents with the following fields:
Field | Expected Data | Status |
---|---|---|
date | 2023-01-01 | Required |
text | This is the text in a document. | Required |
source_title | Council Report for January 2023 | Required |
sentiment | Float in [-1, 1] | Required |
neighbourhoods | ["Downtown", "Northwest"] | Required |
source_url | Optional | |
parent_source_title | Council Agenda 2023. | Optional |
parent_source_url | Optional |