You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Transformation is very slow. It takes around 2.03 s/statement (*):
I, [2024-04-30T12:57:31.688140 #1] INFO -- : [shardId-000000000002] [49651600875111508319672393859288357183009020135870562338] /company/15694555/persons-with-significant-control/individual/7AYbJehiTTD91IcUhbXdBvumeaY
I, [2024-04-30T12:57:33.697593 #1] INFO -- : [shardId-000000000002] [49651600875111508319672393859531351272751560599986503714] /company/12462557/persons-with-significant-control/individual/jjdgflZphXeM0pPxRkQkZs5kbHM
I, [2024-04-30T12:57:35.710237 #1] INFO -- : [shardId-000000000002] [49651600875111508319672393860093501778872363166224875554] /company/02496500/persons-with-significant-control/individual/3QzT_0qRrYgG6E4rTdpVpMEWENQ
I, [2024-04-30T12:57:38.723755 #1] INFO -- : [shardId-000000000002] [49651600875111508319672393860683457578844302203481489442] /company/NI696959/persons-with-significant-control/individual/qV-mSBMeZPPSXn2mGU_T_hcZjlA
I, [2024-04-30T12:57:40.731559 #1] INFO -- : [shardId-000000000002] [49651600875111508319672393861147685093576319806568661026] /company/NI047824/persons-with-significant-control/individual/h0i1_kzZJGYCUHwlUbezCFHf9NI
I, [2024-04-30T12:57:43.743871 #1] INFO -- : [shardId-000000000002] [49651600875111508319672393861385843480040401753985777698] /company/SC592338/persons-with-significant-control/individual/CPAtOXYPPh2bprJsArvAXScn288
I, [2024-04-30T12:57:45.757287 #1] INFO -- : [shardId-000000000002] [49651600875111508319672393861905681582474692299109433378] /company/08491828/persons-with-significant-control/individual/2p9zF0twy2ehnKfDkJOWeWtCbBc
I, [2024-04-30T12:57:48.775943 #1] INFO -- : [shardId-000000000002] [49651600875111508319672393862146257820578003573595439138] /company/15694559/persons-with-significant-control/individual/tKng5vVsd8XjFteN4dFDy0ej68M
I, [2024-04-30T12:57:50.793828 #1] INFO -- : [shardId-000000000002] [49651600875111508319672393862406176871795148846157266978] /company/11325059/persons-with-significant-control/individual/VQyyTGV40yj-uBRvDEC4MGQhb_M
I, [2024-04-30T12:57:53.815453 #1] INFO -- : [shardId-000000000002] [49651600875111508319672394004352201981846837448732049442] /company/12208333/persons-with-significant-control/individual/s_E1ELb-HXARDTizpSdoT88hz0g
I, [2024-04-30T12:57:55.836929 #1] INFO -- : [shardId-000000000002] [49651600875111508319672394004890173971575347431476297762] /company/15694561/persons-with-significant-control/individual/AokX43_Re-u54us-uib6l800pSs
I, [2024-04-30T12:57:57.860308 #1] INFO -- : [shardId-000000000002] [49651600875111508319672394005883910995298572613084774434] /company/15694565/persons-with-significant-control/individual/M51kbMDneeZlVlBLkKP6XjSf160
I, [2024-04-30T12:57:59.879818 #1] INFO -- : [shardId-000000000002] [49651600875111508319672394007039644078850158172823355426] /company/15547479/persons-with-significant-control/individual/8Znvy7tV9M7VuyBRluVSs8Cbdis
I, [2024-04-30T12:58:00.744422 #1] INFO -- : [shardId-000000000002] [49651600875111508319672394007555855403825604830422892578] /company/15512467/persons-with-significant-control/individual/q-O8qcELY7atL9GtbCzi4L2IhHQ
I, [2024-04-30T12:58:02.884764 #1] INFO -- : [shardId-000000000002] [49651600875111508319672394007852042229631188978225905698] /company/15545523/persons-with-significant-control/individual/iXEklbEKlTUZswR1QqhBCYtcKYE
I, [2024-04-30T12:58:03.729477 #1] INFO -- : [shardId-000000000002] [49651600875111508319672394008402103477555845252717215778] /company/15547510/persons-with-significant-control/individual/JoGuhDIatSMKhdRakaOH5P4Lmos
I, [2024-04-30T12:58:04.429314 #1] INFO -- : [shardId-000000000002] [49651600875111508319672394009458704643899031151410413602] /company/15563627/persons-with-significant-control/individual/Yhb4Ev3VME8uaphRpnLByDgeknM
I, [2024-04-30T12:58:05.156642 #1] INFO -- : [shardId-000000000002] [49651600875111508319672394010336384788939252000966574114] /company/15603805/persons-with-significant-control/individual/fTFm-H4_iJs4CVsEEoDEiLwVhFU
I, [2024-04-30T12:58:05.877892 #1] INFO -- : [shardId-000000000002] [49651600875111508319672394010674884018431348169884303394] /company/15607311/persons-with-significant-control/individual/wvtX8efpM-Ehsj7vGcR5RwF-0IU
I, [2024-04-30T12:58:08.909962 #1] INFO -- : [shardId-000000000002] [49651600875111508319672394147026000435306532951918903330] /company/14796746/persons-with-significant-control/individual/95cXqZ-14aTkGpC7jjU8MJ0xRJE
I, [2024-04-30T12:58:10.933029 #1] INFO -- : [shardId-000000000002] [49651600875111508319672394147864994954119085599164989474] /company/15694572/persons-with-significant-control/individual/jT3EiIgt0BbS5p2Dq6G0ABOdeKI
I, [2024-04-30T12:58:12.954265 #1] INFO -- : [shardId-000000000002] [49651600875111508319672394148082601601649718850612101154] /company/15694574/persons-with-significant-control/individual/zktlC6GFce-khAoOygiG-5d_K0I
I, [2024-04-30T12:58:14.976031 #1] INFO -- : [shardId-000000000002] [49651600875111508319672394286094782094675404118692986914] /company/09568573/persons-with-significant-control/individual/Y5zBOJaAmDfJ35r9LUJCXB_L86k
I, [2024-04-30T12:58:16.993920 #1] INFO -- : [shardId-000000000002] [49651600875111508319672394286435699175806729545960128546] /company/15135615/persons-with-significant-control/individual/6KP-hx2juhDTdhlodF7RM-jYLXk
I, [2024-04-30T12:58:19.011294 #1] INFO -- : [shardId-000000000002] [49651600875111508319672394286950701574962561574384959522] /company/15694579/persons-with-significant-control/individual/JcnmYPWoI3AYo4MdpJM1PaQQXXs
D, [2024-04-30T12:58:22.441569 #1] DEBUG -- : [shardId-000000000000] LAG: 86400s | N: 0
(*): Note that these measurements are from a development environment on a small Elasticsearch instance, not a production Elasticsearch cluster.
I think this has most likely always been the case, although it's possible that things are currently running slower than they used to (I am not sure why, if so). There could be a mixture of reasons for this, including: design, extensive work within Ruby code, Elasticsearch roundtrip, Elasticsearch write-confirmations.
Given our current design and performance characteristics, this means we couldn't cope with even 1 event/s from upstream PSC datasource, and that if the transformation stream falls behind, catching up could take an exceedingly long time.
It's worth spending some time trying to see if there are any simple performance improvements which could be made, here. Anything more complex would likely be too risky or time-consuming to change, but if some small improvements could be found here or there (or perhaps some regression discovered and fixed), even a 0.5 s/event increase in speed would compound to our benefit.
Although this prioritises optimisation of the PSC data transformation pipeline only, any benefits would likely also be applicable to SK and DK pipelines.
This was already known about from bulk data imports, but work on #251 made this more obvious and easier to measure.
The text was updated successfully, but these errors were encountered:
Transformation is very slow. It takes around 2.03 s/statement (*):
(*): Note that these measurements are from a development environment on a small Elasticsearch instance, not a production Elasticsearch cluster.
I think this has most likely always been the case, although it's possible that things are currently running slower than they used to (I am not sure why, if so). There could be a mixture of reasons for this, including: design, extensive work within Ruby code, Elasticsearch roundtrip, Elasticsearch write-confirmations.
Given our current design and performance characteristics, this means we couldn't cope with even 1 event/s from upstream PSC datasource, and that if the transformation stream falls behind, catching up could take an exceedingly long time.
It's worth spending some time trying to see if there are any simple performance improvements which could be made, here. Anything more complex would likely be too risky or time-consuming to change, but if some small improvements could be found here or there (or perhaps some regression discovered and fixed), even a 0.5 s/event increase in speed would compound to our benefit.
Although this prioritises optimisation of the PSC data transformation pipeline only, any benefits would likely also be applicable to SK and DK pipelines.
This was already known about from bulk data imports, but work on #251 made this more obvious and easier to measure.
The text was updated successfully, but these errors were encountered: