Add test scripts to the tools folder #74

khaledk2 · 2022-12-14T18:58:20Z

I have added the script which we used to test the elasticsearch cluster to the tools folder, I have added some instructions to guide the user.
I have modified the code to copy them automatically to the host machine (searchengine/searchengine/maintenance_scripts/).
It includes a script for indexing or reindexing the data (index_data.sh). There is also a script to check the progress of the indexing process (check_indexing_process.sh)

khaledk2 · 2023-03-19T22:31:36Z

I have added get_search_terms_from_log method to manage.py .
This method analyzes the log file and generates CSV files which contain the stats for the used search terms for each of the resources. The user should provide the folder which contains the log file/s.

khaledk2 · 2023-03-20T14:29:48Z

The methods can be used on the idr-testing.

The following command will analyse the log file which is saved in /data/searchengine/searchengine/logs/ and genrates three CSV files (report_image.csv, report_project.csv, and screen_project.csv) which contains stats about the search terms.

sudo docker run --rm -v /data/searchengine/searchengine/logs/:/data/searchengine/searchengine/logs/ khaledk2/searchengine:test get_search_terms_from_log -l /data/searchengine/searchengine/logs/

When run for the first time it will copy the maintenance scripts inside /data/searchengine/searchengine/maintenance_scripts/ folder. There is a description for the scripts inside this file:

60177ad#diff-3e5af57bb465c1a51df3b132974d8cfda29bc937fe791761c142a4309c96c038

jburel · 2023-05-12T10:41:27Z

@khaledk2 typo in the file name. It should be queries
I think it is more example of a complex queries using and/or filters than the number of queries that is important

khaledk2 · 2023-05-12T11:00:39Z

@jburel I have fixed the typo and renamed the script to complex_queries.

for more information, see https://pre-commit.ci

…the most searched term) and create an endpoint so it will be accessed online.

khaledk2 · 2023-10-30T11:17:20Z

I have added two endpoints to access some stats:

Each returns an Excel file which contains three sheets (image, project and screen).

The first one contains some metadata about the most common Key/value pairs for each resource
The second contains the most searched terms for each resource (i.e. image, project, screen)

It has been deployed on the idr-testing.

The following URL returns the metadata which contains the attribute, no of unique buckets and number of images

https://idr-testing.openmicroscopy.org/searchengine/api/stats/metadata

The following URL will return the most common search terms.

https://idr-testing.openmicroscopy.org/searchengine/api/stats/searchterms

…and the search terms

…to the buckets for metadata and return excel files inteads of csv

pwalczysko · 2023-11-20T13:06:04Z

tools/instructions.rst

+
+* The searchEngine functions can be tested using the ``check_searchengine_health.sh`` script. The script takes about 15 minutes to run. The script output is saved to a text file check_report.txt in the``/data/searchengine/searchengine/`` folder.
+
+* It is possible to stop an elasticsearch cluster node using this script::


Suggested change

* It is possible to stop an elasticsearch cluster node using this script::

* It is possible to stop an elasticsearch cluster node using this script (replace n with an integer, e.g. 1,2,3)::

pwalczysko · 2023-11-20T13:06:16Z

tools/instructions.rst

+* It is possible to stop an elasticsearch cluster node using this script::
+
+    bash stop_node.sh n
+    where n is an integer, e.g. 1,2, 3.


Suggested change

where n is an integer, e.g. 1,2, 3.

pwalczysko · 2023-11-20T13:07:37Z

tools/instructions.rst

+
+* The ``check_cluster_health.sh`` script is used to check the cluster status at any time.
+
+* The searchEngine functions can be tested using the ``check_searchengine_health.sh`` script. The script takes about 15 minutes to run. The script output is saved to a text file check_report.txt in the``/data/searchengine/searchengine/`` folder.


Suggested change

* The searchEngine functions can be tested using the ``check_searchengine_health.sh`` script. The script takes about 15 minutes to run. The script output is saved to a text file check_report.txt in the``/data/searchengine/searchengine/`` folder.

* The searchEngine functions can be tested using the ``check_searchengine_health.sh`` script. The script takes about 15 minutes to run. The script output is saved to a text file check_report.txt in the ``/data/searchengine/searchengine/`` folder.

The added space will hopefully fix the formatting issue

pwalczysko · 2023-11-20T13:08:38Z

tools/instructions.rst

+    where n is an integer, e.g. 1,2, 3.
+* backup_elasticsearch_data.sh script is used to backup the Elasticsearch data.
+
+* It is possible to index or re-index the data using this bash ``scrpt index_data.sh``.


Suggested change

* It is possible to index or re-index the data using this bash ``scrpt index_data.sh``.

* It is possible to index or re-index the data using the ``index_data.sh`` script.

pwalczysko · 2023-11-20T13:09:27Z

tools/instructions.rst

+
+* It is possible to index or re-index the data using this bash ``scrpt index_data.sh``.
+
+* It is possible to restore the Elasticsearch data from the backup (snapshot) using the following command::


Suggested change

* It is possible to restore the Elasticsearch data from the backup (snapshot) using the following command::

* Restore the Elasticsearch data from the backup (snapshot) using the following command::

pwalczysko · 2023-11-20T13:10:41Z

tools/instructions.rst

+
+    bash restore_elasticsearch_data.sh
+
+* It may take up to 15 minutes to restore the data.


This should not be a new bullet point. It is an explanation for the previous bullet point.

pwalczysko · 2023-11-20T13:11:58Z

tools/instructions.rst

+
+* It may take up to 15 minutes to restore the data.
+
+* The ``check_indexing_process.sh`` script is used to check the indexing data progress.


Suggested change

* The ``check_indexing_process.sh`` script is used to check the indexing data progress.

* Check the progress of the data indexing using the ``check_indexing_process.sh`` script.

pwalczysko · 2023-11-20T14:36:55Z

Studied the two produced excel sheets. I found both of them very useful.

searchterms excel sheet:

Insert a header (a separate top row) explaining that the hits are attempted searches with KVPs. Example see above.
Show a column diagram additionally to the pie chart
Show the first 5 values of the KVPs which were searched for (e.g. Cell Line: Hela, Cell Line: blah1, Cell Line: Blah2, Cell Line: Blah3, Cell Line: Blah4 with numbers in a column graph representation to make clear what Cell Lines do people search for in the first place)
Try to remove nonsensical KVP terms which were inserted due to some user searching for a nonsense strings - if the string is not in IDR, remove it.
Use the term Container (with explanation) rather than Project and Screen

metadata excel sheet

Explain that "Bucket" means unique term.
Do not leave the Publication Title etc. just for Project, we must have them for Screen too, and also please produce a summary for both Projects and Screens.

pwalczysko

Some text formatting suggestions made. Also, please improve the layout of the resulting excel sheets as per #74 (comment)

… into tools

khaledk2 · 2023-11-28T23:13:00Z

I have implemented the suggested modifications. It has been deployed on the idr-testing.
It displays the first 5 values of the KVPs which were searched along with the number of searches for each by default for the following endpoint:
https://idr-testing.openmicroscopy.org/searchengine/api/stats/searchterms

The user can specify the number of returned values of the KVPs by setting return_values in the URL, e.g. to return 4 values only. This can be done by:
https://idr-testing.openmicroscopy.org/searchengine/api/stats/searchterms?return_values=4
It is also possible to return all the searched values of the KVPs by setting return_values=all
https://idr-testing.openmicroscopy.org/searchengine/api/stats/searchterms?return_values=all

tools/instructions.rst

Co-authored-by: pwalczysko <[email protected]>

pwalczysko · 2023-12-06T15:36:53Z

Thank you @khaledk2 . Imho, this is very helpful. Lgtm.

khaledk2 · 2024-02-26T09:58:09Z

The endpoints of stats (search terms and metadata) have been secured. So It is required now username and password to access them.

khaledk2 and others added 14 commits October 13, 2023 12:55

Add tools and bash scripts

f482aab

fix destination_folder variables

5cbc010

rename scripts subfolder

1c7bfc2

fix image name ref

3324d92

analyse logs files to get search terms

3687817

[pre-commit.ci] auto fixes from pre-commit.com hooks

7abcad5

for more information, see https://pre-commit.ci

adding example for the two quires

d4105a0

update

21f8317

[pre-commit.ci] auto fixes from pre-commit.com hooks

1d1fb04

for more information, see https://pre-commit.ci

fix typo in the filename

3ec6f37

[pre-commit.ci] auto fixes from pre-commit.com hooks

54a5d1d

for more information, see https://pre-commit.ci

rename two_queries to complex_queries

da67898

update the code to work with secure elasticsearch cluster connection

59ecb9f

precommit fix

9a24daf

khaledk2 force-pushed the tools branch from 7b62119 to 9a24daf Compare October 15, 2023 18:08

khaledk2 added 4 commits October 17, 2023 12:19

fix run node 2 and 3

8c5a280

analyses the searchengine logs to generate some stats (e.g. which is …

64eecbd

…the most searched term) and create an endpoint so it will be accessed online.

fix pre-commit checkes

7450776

Add endpoint to metadata

b5e6a69

khaledk2 added 4 commits November 2, 2023 17:20

check the metadata file and create one if missing, sort the metadata …

e550a93

…and the search terms

Update search_terms endpoints to return all the resources, add links …

5ee8a55

…to the buckets for metadata and return excel files inteads of csv

Fix issue of have searchengine externsion in url and improve reporting

f4c7ffe

fix maintenance_scripts folder

0f7d890

pwalczysko reviewed Nov 20, 2023

View reviewed changes

pwalczysko suggested changes Nov 20, 2023

View reviewed changes

khaledk2 added 2 commits November 28, 2023 22:53

implement the suggested modifications by Petr

1b4d77d

Merge branch 'tools' of https://github.com/khaledk2/omero_search_engine…

f734642

… into tools

pwalczysko reviewed Dec 5, 2023

View reviewed changes

tools/instructions.rst Outdated Show resolved Hide resolved

Update tools/instructions.rst

60adacc

Co-authored-by: pwalczysko <[email protected]>

pwalczysko approved these changes Dec 6, 2023

View reviewed changes

khaledk2 added 2 commits February 1, 2024 22:37

Secure stats api endpoints

3d4fe02

Add default logs folder and allow user to change it

a237f8e

khaledk2 force-pushed the tools branch from 1505591 to a237f8e Compare February 21, 2024 22:31

Merge branch 'main' into tools

0fb4ae0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add test scripts to the tools folder #74

Add test scripts to the tools folder #74

khaledk2 commented Dec 14, 2022

khaledk2 commented Mar 19, 2023

khaledk2 commented Mar 20, 2023 •

edited

Loading

jburel commented May 12, 2023 •

edited

Loading

khaledk2 commented May 12, 2023 •

edited

Loading

khaledk2 commented Oct 30, 2023 •

edited

Loading

pwalczysko Nov 20, 2023

pwalczysko Nov 20, 2023

pwalczysko Nov 20, 2023

pwalczysko Nov 20, 2023

pwalczysko Nov 20, 2023

pwalczysko Nov 20, 2023

pwalczysko Nov 20, 2023

pwalczysko commented Nov 20, 2023 •

edited

Loading

pwalczysko left a comment

khaledk2 commented Nov 28, 2023

pwalczysko commented Dec 6, 2023

khaledk2 commented Feb 26, 2024


		* The searchEngine functions can be tested using the ``check_searchengine_health.sh`` script. The script takes about 15 minutes to run. The script output is saved to a text file check_report.txt in the``/data/searchengine/searchengine/`` folder.

		* It is possible to stop an elasticsearch cluster node using this script::


		* The ``check_cluster_health.sh`` script is used to check the cluster status at any time.

		* The searchEngine functions can be tested using the ``check_searchengine_health.sh`` script. The script takes about 15 minutes to run. The script output is saved to a text file check_report.txt in the``/data/searchengine/searchengine/`` folder.

-* The searchEngine functions can be tested using the ``check_searchengine_health.sh`` script. The script takes about 15 minutes to run. The script output is saved to a text file check_report.txt in the``/data/searchengine/searchengine/`` folder.
+* The searchEngine functions can be tested using the ``check_searchengine_health.sh`` script. The script takes about 15 minutes to run. The script output is saved to a text file check_report.txt in the ``/data/searchengine/searchengine/`` folder.
+The added space will hopefully fix the formatting issue

	* It is possible to index or re-index the data using this bash ``scrpt index_data.sh``.
	* It is possible to index or re-index the data using the ``index_data.sh`` script.


		* It is possible to index or re-index the data using this bash ``scrpt index_data.sh``.

		* It is possible to restore the Elasticsearch data from the backup (snapshot) using the following command::

	* It is possible to restore the Elasticsearch data from the backup (snapshot) using the following command::
	* Restore the Elasticsearch data from the backup (snapshot) using the following command::


		bash restore_elasticsearch_data.sh

		* It may take up to 15 minutes to restore the data.


		* It may take up to 15 minutes to restore the data.

		* The ``check_indexing_process.sh`` script is used to check the indexing data progress.

	* The ``check_indexing_process.sh`` script is used to check the indexing data progress.
	* Check the progress of the data indexing using the ``check_indexing_process.sh`` script.

Add test scripts to the tools folder #74

Are you sure you want to change the base?

Add test scripts to the tools folder #74

Conversation

khaledk2 commented Dec 14, 2022

khaledk2 commented Mar 19, 2023

khaledk2 commented Mar 20, 2023 • edited Loading

jburel commented May 12, 2023 • edited Loading

khaledk2 commented May 12, 2023 • edited Loading

khaledk2 commented Oct 30, 2023 • edited Loading

pwalczysko Nov 20, 2023

Choose a reason for hiding this comment

pwalczysko Nov 20, 2023

Choose a reason for hiding this comment

pwalczysko Nov 20, 2023

Choose a reason for hiding this comment

pwalczysko Nov 20, 2023

Choose a reason for hiding this comment

pwalczysko Nov 20, 2023

Choose a reason for hiding this comment

pwalczysko Nov 20, 2023

Choose a reason for hiding this comment

pwalczysko Nov 20, 2023

Choose a reason for hiding this comment

pwalczysko commented Nov 20, 2023 • edited Loading

pwalczysko left a comment

Choose a reason for hiding this comment

khaledk2 commented Nov 28, 2023

pwalczysko commented Dec 6, 2023

khaledk2 commented Feb 26, 2024

khaledk2 commented Mar 20, 2023 •

edited

Loading

jburel commented May 12, 2023 •

edited

Loading

khaledk2 commented May 12, 2023 •

edited

Loading

khaledk2 commented Oct 30, 2023 •

edited

Loading

pwalczysko commented Nov 20, 2023 •

edited

Loading