Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zstandard compression for parameters and results #5995

Open
wants to merge 70 commits into
base: master
Choose a base branch
from

Conversation

guzzijones
Copy link
Contributor

@guzzijones guzzijones commented Jun 22, 2023

why

mitigate (as much as possible) 'document too large' mongo exceptions. 16 mb is the limit for mongodb document size.

done

  1. zstandard compression for parameter and results
  2. remove embedded liveaction doc in executions db and replace with liveaction id string. This change will potentially save 1/2 the time to write the liveaction to the database as the liveaction won't be written 2 times.
  3. added traceback info to exception handling for action execution engine and workflow engine. This should make it easier to debug for someone when the workflow fails for some reason. invalid orjson or unterminated yaql strings cause these exceptions in my experience.
  4. This will require a data migration for the executionDB model for the liveaction paramter as it is now a string as opposed to an embedded document
  5. I added a config option to turn off zstandard compression for parameters and results in the execution db and the liveaction db

@pull-request-size pull-request-size bot added the size/XXL PR that changes 1000+ lines. You should absolutely split your PR into several. label Jun 22, 2023
@guzzijones guzzijones changed the title initial zstandard compression for parameters and results WIP: initial zstandard compression for parameters and results Jun 22, 2023
@guzzijones guzzijones changed the title WIP: initial zstandard compression for parameters and results WIP: zstandard compression for parameters and results Jun 23, 2023
@guzzijones guzzijones self-assigned this Jul 3, 2023
@guzzijones
Copy link
Contributor Author

@amanda11 @cognifloyd would you please take a look at this and give me some feedback. I am interested in if the direction the code is going in is acceptable.

We are having numerous instances now of mongo documents being too large. I ran a check and one of the larger objects we have would compress down to .5 mb from 10 mb using zstandard.

@guzzijones
Copy link
Contributor Author

guzzijones commented Jul 5, 2023

Benchmark results

TLDR.
zstandard is much faster for single field large json
zstandard is about the same time for all other tests. +/- a few ticks

The cost here is minimal from a speed perspective

$ /st2/st2common/benchmarks/micro$ pytest test_mongo_field_types.py --benchmark-columns=mean,rounds

--------------------------------- benchmark 'live_action_read': 12 tests --------------------------------
Name (time in ms)                                                                  Mean            Rounds
---------------------------------------------------------------------------------------------------------
test_read_large_execution[old_json_dict_field-tiny_1]                            2.3494 (1.0)         387
test_read_large_execution[json_dict_field-tiny_1]                                2.3697 (1.01)        391  

test_read_large_execution[old_json_dict_field-json_61kb]                         3.2883 (1.40)        254
test_read_large_execution[json_dict_field-json_61kb]                             3.3988 (1.45)        269   

test_read_large_execution[old_json_dict_field-json_4mb_single_large_field]      17.2653 (7.35)         56  
test_read_large_execution[json_dict_field-json_4mb_single_large_field]          12.3281 (5.25)         78  


test_read_large_execution[old_json_dict_field-json_647kb]                       13.6826 (5.82)         66
test_read_large_execution[json_dict_field-json_647kb]                           15.9034 (6.77)         67  


test_read_large_execution[old_json_dict_field-json_4mb]                        114.0750 (48.55)        11
test_read_large_execution[json_dict_field-json_4mb]                            115.1644 (49.02)        11  

test_read_large_execution[old_json_dict_field-json_8mb]                        246.7735 (105.03)        5
test_read_large_execution[json_dict_field-json_8mb]                            248.3191 (105.69)        5
---------------------------------------------------------------------------------------------------------

--------------------------------- benchmark 'live_action_save': 12 tests --------------------------------
Name (time in ms)                                                                  Mean            Rounds
---------------------------------------------------------------------------------------------------------
test_save_large_execution[old_json_dict_field-tiny_1]                            6.5457 (1.0)          25
test_save_large_execution[json_dict_field-tiny_1]                                6.7954 (1.04)        134  

test_save_large_execution[old_json_dict_field-json_61kb]                        11.6969 (1.79)         78
test_save_large_execution[json_dict_field-json_61kb]                            13.1218 (2.00)         75  

test_save_large_execution[old_json_dict_field-json_4mb_single_large_field]      30.7534 (4.70)         28  
test_save_large_execution[json_dict_field-json_4mb_single_large_field]          18.3983 (2.81)         54


test_save_large_execution[old_json_dict_field-json_647kb]                       73.0526 (11.16)        14
test_save_large_execution[json_dict_field-json_647kb]                           81.4653 (12.45)        13  

test_save_large_execution[old_json_dict_field-json_4mb]                        381.4303 (58.27)         5
test_save_large_execution[json_dict_field-json_4mb]                            385.8807 (58.95)         5  

test_save_large_execution[old_json_dict_field-json_8mb]                        737.6227 (112.69)        5
test_save_large_execution[json_dict_field-json_8mb]                            756.7961 (115.62)        5  
---------------------------------------------------------------------------------------------------------

-------------------------- benchmark 'live_action_save_multiple_fields': 10 tests -------------------------
Name (time in ms)                                                                    Mean            Rounds
-----------------------------------------------------------------------------------------------------------
test_save_multiple_fields[old_json_dict_field-tiny_1]                              6.6645 (1.0)         113
test_save_multiple_fields[json_dict_field-tiny_1]                                  6.7131 (1.01)        123  

test_save_multiple_fields[old_json_dict_field-json_61kb]                          22.1928 (3.33)         47
test_save_multiple_fields[json_dict_field-json_61kb]                              24.7535 (3.71)         41  

test_save_multiple_fields[old_json_dict_field-json_4mb_single_large_field]       119.9479 (18.00)        10    
test_save_multiple_fields[json_dict_field-json_4mb_single_large_field]            37.2357 (5.59)         25  

test_save_multiple_fields[old_json_dict_field-json_647kb]                        198.9707 (29.86)         5
test_save_multiple_fields[json_dict_field-json_647kb]                            228.3664 (34.27)         5  

test_save_multiple_fields[old_json_dict_field-json_4mb]                        1,144.1683 (171.68)        5
test_save_multiple_fields[json_dict_field-json_4mb]                            1,139.4643 (170.98)        5
-----------------------------------------------------------------------------------------------------------

@guzzijones
Copy link
Contributor Author

This will require a data migration for the executionDB model for the liveaction paramter as it is now a string as opposed to an embedded document

@guzzijones guzzijones modified the milestone: 3.9.0 Jul 5, 2023
@guzzijones
Copy link
Contributor Author

guzzijones commented Jul 6, 2023

data migration tested

$ ./st2-migrate-liveaction-executiondb  --config-file ../../../../conf/st2.dev.conf
StackStorm v3.9 database field data migration script

Will migrate objects with creation date between 2023-06-06 13:01:20 UTC and 2023-07-06 13:01:20 UTC.

You are strongly recommended to create database backup before proceeding.

Depending on the number of the objects in the database, migration may take multiple hours or more. You are recommended to start the script in a screen session, tmux or similar.

To proceed with the migration, press enter and to cancel it, press CTRL+C.


Migrating affected database objects between 2023-06-06 13:01:20 and 2023-07-06 13:01:20

Migrating execution objects
Will migrate 10 ActionExecutionDB objects

[1/10] Migrating ActionExecutionDB with id 64a6ba4fdce54d96db28f4ae
ActionExecutionDB with id 64a6ba4fdce54d96db28f4ae has been migrated
[2/10] Migrating ActionExecutionDB with id 64a6ba40dce54d96db28f4a2
ActionExecutionDB with id 64a6ba40dce54d96db28f4a2 has been migrated
[3/10] Migrating ActionExecutionDB with id 64a6ba3ddce54d96db28f49c
ActionExecutionDB with id 64a6ba3ddce54d96db28f49c has been migrated
[4/10] Migrating ActionExecutionDB with id 64a6ba4edce54d96db28f4ab
ActionExecutionDB with id 64a6ba4edce54d96db28f4ab has been migrated
[5/10] Migrating ActionExecutionDB with id 64a6ba50dce54d96db28f4b7
ActionExecutionDB with id 64a6ba50dce54d96db28f4b7 has been migrated
[6/10] Migrating ActionExecutionDB with id 64a6ba3fdce54d96db28f49f
ActionExecutionDB with id 64a6ba3fdce54d96db28f49f has been migrated
[7/10] Migrating ActionExecutionDB with id 64a6ba4fdce54d96db28f4b1
ActionExecutionDB with id 64a6ba4fdce54d96db28f4b1 has been migrated
[8/10] Migrating ActionExecutionDB with id 64a6ba4edce54d96db28f4a8
ActionExecutionDB with id 64a6ba4edce54d96db28f4a8 has been migrated
[9/10] Migrating ActionExecutionDB with id 64a6ba41dce54d96db28f4a5
ActionExecutionDB with id 64a6ba41dce54d96db28f4a5 has been migrated
[10/10] Migrating ActionExecutionDB with id 64a6ba50dce54d96db28f4b4
ActionExecutionDB with id 64a6ba50dce54d96db28f4b4 has been migrated
SUCCESS: All database objects migrated successfully (duration: 0 seconds).

@guzzijones
Copy link
Contributor Author

I added a config option to turn off zstandard compression for parameters and results in the execution db and the liveaction db

@guzzijones guzzijones changed the title WIP: zstandard compression for parameters and results zstandard compression for parameters and results Jul 6, 2023
@guzzijones
Copy link
Contributor Author

  1. add a field for compression of parameters somewhere.

@guzzijones
Copy link
Contributor Author

I changed to liveaction_id.

Copy link
Member

@cognifloyd cognifloyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the migration, I wonder if there's a raw query we can use that delegates the migration to mongo. Something like {"$set": {"liveaction_id": doc.liveaction.id}}

@guzzijones
Copy link
Contributor Author

Since liveaction field of the ActionExecutionDB was required before I made liveaction_id also required. This should help ensure the liveaction actually exists.

@guzzijones
Copy link
Contributor Author

also, what is up with the circle-ci failures? is there anything we can do to fix that?

st2common/st2common/fields.py Outdated Show resolved Hide resolved
st2common/st2common/fields.py Outdated Show resolved Hide resolved
st2common/st2common/models/api/execution.py Outdated Show resolved Hide resolved
st2common/tests/unit/test_db_execution.py Show resolved Hide resolved
@guzzijones
Copy link
Contributor Author

For the migration, I wonder if there's a raw query we can use that delegates the migration to mongo. Something like {"$set": {"liveaction_id": doc.liveaction.id}}

I changed the migration to use a raw query. I couldn't get it to work in one update statement so I had to write a for loop.

@guzzijones
Copy link
Contributor Author

I rebased the changes. Now that we completed the mongo, python3.9, 3.10 migration code can we look at this again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/XXL PR that changes 1000+ lines. You should absolutely split your PR into several.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants