The Atlas Index Creation Worker queries the data in the Staging Database and generates and submits the index updates to the Atlas File Repository’s Elasticsearch index. There are three endpoints for the Worker:
Regenerate the entire ES index: http://localhost:5001/api/v1/index/file_cases
Update a single file: http://localhost:5001/api/v1/index/file_cases/file_id/
Regenerate a given release: http://localhost:5001/api/v1/index/file_cases/release_ver/
This is a small subset of the entire API available for use. You may execute these restful queries as curl commands or through the kibana dev-tools panel. The following examples will assume the dev-tool panel is being used.
GET _cat/indices/%2A?v=
GET file_cases/_count
GET file_cases/_mapping/?include_type_name=true
GET /file_cases/_mapping/field/file_size?pretty
GET file_cases/_doc/0043e10d-1f57-47be-a8c8-f97537efced8
PUT /file_cases?pretty
{
"settings" : {
"number_of_shards" : 1
},
"mappings": {
"properties": {
"file_name": { "type": "keyword" },
...}
}
}
POST _reindex
{
"source": {
"index": "file_cases"
},
"dest": {
"index": "file_cases_v2"
}
}
DELETE file_cases
Elastic search does not allow an index to be updated, and this presents a challenge when the underlying model requires an update—For example, the transformation of a field from one type to another. The following is an example on updating a field, field_size, from type keyword to type long.
-
Before starting this process, ensure you have up-to-date backups in case something goes wrong.
-
Modify the existing index mapping
"file_name": { "type": "keyword" } => "file_name": { "type": "long" }
-
Create a new index using the updated mapping. Note that this index must be a unique name. For our example, we are adding _temp to the name so that file_cases becomes file_cases_temp.
PUT /file_cases_temp?pretty
< mapping_json_here >
- Transfer the data from the current index into the newly created index
POST _reindex
{
"source": {
"index": "file_cases"
},
"dest": {
"index": "file_cases_temp"
}
}
-
Delete your current index
DELETE file_cases
-
Reindex your newly created index to the name of your initial index.
POST _reindex
{
"source": {
"index": "file_cases_temp"
},
"dest": {
"index": "file_cases"
}
}
-
Delete your temporary index
DELETE file_cases_v2
-
You may need to update other services, such as arranger. To do so, start by tunneling to the knowledge environment with ports 8080.
ssh -L 8080:localhost:8080 ssh atlas-ke
-
Turn on the arranger-ui
cd heavens-docker/atlas/knowledge-environment/ && docker-compose -f docker-compose.arranger_ui.yml up -d
-
In your browser, navigate to localhost:8080.
-
Choose the project you're working on and make any required edits. In our example, we need to choose
endpoint
->File
-> and then update the DOI field to useisArray
-
We must create a temporary project to swap it with our current project, like we needed to do with the index itself. Arranger has some quirks, as such, it is best to name your temporary project with something like
123temp
-
Delete your current project
-
Save your temporary project, creating a new project in the process with the name of your initial project.
-
Delete your temporary project.
-
We now need to restart the knowledge-environment to see the changes
docker-compose -f docker-compose.prod.yml down && docker-compose -f docker-compose.prod.yml up -d
This will pull data from the knowledge-environment to check what data is available. This will only make a GET request to the knowledge-environment to get the number of files associated with a version number.
This script will not work properly inside of the container. In some columns there will be a bytearray(b"<cell contents>")
surrounding the content of the cell.
-
Tunnel to the knowledge-environment machine and set port as 3306
-
On your local machine, copy the .env_example and rename to .env
-
Edit the .env and change the values to the correct user, and password
-
Change the .env host variable to 127.0.0.1
-
Run the
dry_run.py
script -
If you want files for a specific version.
python3 dry_run.py -v <version>
Replace<version>
with the version number of your choice. -
If you want to get all files regardless of version.
python3 dry_run.py