add dry_run flag #44

jburel · 2022-08-30T12:12:11Z

this PR re-activates the work started by @joshmoore in #8

Set dry_run flag as part of vals to be consumed within the process pool

khaledk2 · 2022-08-31T12:32:50Z

It may be a good idea to write the data to a JSON or CSV file rather than print them, what do you think?

jburel · 2022-08-31T12:39:32Z

printing to a file will make sense. Do we want a flag to indicate the output?
No flag: print to the console
output: specify format e.g. json csv

khaledk2 · 2022-08-31T13:13:16Z

I think we do not need another flag as we can set the dry_mode as an integer, which can have three values:
0 off (default i.e. push the date to the Elasticsearch index)
1 print to the console,
2 save to a file.
What do you think?

jburel · 2022-08-31T13:15:32Z

you mean re-using --dry_run flag. We do not use boolean but an int
That might be confusing, in that case let's for for Json and if an error occurs when saving to Json we print to the console instead

validate the indexing if the dry_run is False

clean the up the code for writing JSON file

khaledk2 · 2022-08-31T17:21:09Z

It is now writing a JSON file in case the dry_run is True, and printing to the console in case of any error.

I have tested it and it seems to be working fine.

Use the JSON format which is used to insert the data into Elasticsearch.

khaledk2 · 2022-09-02T17:02:00Z

I have fixed the JSON format, it is similar to that used to insert the data into Elasticsearch.

@joshmoore I have deployed it in pilot-idr0000-omeroreadwrite searchengine. You can use the following docker command to run the dry_mode:

sudo docker run -d --name searchengine_d --rm -v /data/searchengine/searchengine/:/etc/searchengine/ --network=searchengine-net khaledk2/searchengine:latest get_index_data_from_database -d

It will save the results to files named in this format:
data_n.json , n=1,2, .....

The files are saved to:
/data/searchengine/searchengine/

khaledk2 · 2022-09-02T19:50:01Z

The JSON is a list that contains dicts, each has a format like that:

{
        "_index": "image_keyvalue_pair_metadata",
        "_source": {
            "doc_type": "image_keyvalue_pair_metadata",
            "id": 1462,
            "owner_id": 2,
            "experiment": null,
            "group_id": 3,
            "name": "X_110222_S1 [Well C-11; Field #1]",
            "description": null,
            "project_name": null,
            "project_id": null,
            "dataset_name": null,
            "dataset_id": null,
            "screen_id": 3,
            "screen_name": "idr0001-graml-sysgro/screenA",
            "plate_id": 53,
            "plate_name": "X_110222_S1",
            "well_id": 293,
            "wellsample_id": 1939,
            "key_values": [
                {
                    "name": "Gene Identifier",
                    "value": "SPAC25G10.06",
                    "index": 0
                },
                {
                    "name": "Organism",
                    "value": "Schizosaccharomyces pombe",
                    "index": 0
                },
                {
                    "name": "Strain",
                    "value": "rps2801",
                    "index": 0
                },
                {
                    "name": "Channels",
                    "value": "GFP:endogenous alpha tubulin 2;Cascade blue:growth media",
                    "index": 1
                },
                {
                    "name": "Gene Identifier URL",
                    "value": "http://www.pombase.org/spombe/result/SPAC25G10.06",
                    "index": 1
                },
                {
                    "name": "Gene Symbol",
                    "value": "rps2801",
                    "index": 2
                },
                {
                    "name": "Replicate Group",
                    "value": "1",
                    "index": 2
                }
            ]
        }
    }

for more information, see https://pre-commit.ci

add dry_run flag

614e902

jburel requested a review from khaledk2 August 30, 2022 12:12

khaledk2 added 3 commits August 31, 2022 13:00

Change dry_run flag option to d

d1734ee

Update transform_data.py

09252b3

Set dry_run flag as part of vals to be consumed within the process pool

Update transform_data.py

d4f089d

jburel mentioned this pull request Aug 31, 2022

Dry run #8

Closed

khaledk2 added 3 commits August 31, 2022 17:34

Update manage.py

48b92ae

validate the indexing if the dry_run is False

Update transform_data.py

6cf299c

Update transform_data.py

e995728

clean the up the code for writing JSON file

Update transform_data.py

abf8766

Use the JSON format which is used to insert the data into Elasticsearch.

khaledk2 and others added 2 commits May 4, 2023 23:48

Merge branch 'main' into dry_run

b40959f

[pre-commit.ci] auto fixes from pre-commit.com hooks

7e9e69d

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add dry_run flag #44

add dry_run flag #44

jburel commented Aug 30, 2022

khaledk2 commented Aug 31, 2022

jburel commented Aug 31, 2022

khaledk2 commented Aug 31, 2022

jburel commented Aug 31, 2022 •

edited

Loading

khaledk2 commented Aug 31, 2022

khaledk2 commented Sep 2, 2022

khaledk2 commented Sep 2, 2022

add dry_run flag #44

Are you sure you want to change the base?

add dry_run flag #44

Conversation

jburel commented Aug 30, 2022

khaledk2 commented Aug 31, 2022

jburel commented Aug 31, 2022

khaledk2 commented Aug 31, 2022

jburel commented Aug 31, 2022 • edited Loading

khaledk2 commented Aug 31, 2022

khaledk2 commented Sep 2, 2022

khaledk2 commented Sep 2, 2022

jburel commented Aug 31, 2022 •

edited

Loading