Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise Transformer PSC #262

Open
tiredpixel opened this issue Apr 30, 2024 · 0 comments
Open

Optimise Transformer PSC #262

tiredpixel opened this issue Apr 30, 2024 · 0 comments

Comments

@tiredpixel
Copy link
Contributor

tiredpixel commented Apr 30, 2024

Transformation is very slow. It takes around 2.03 s/statement (*):

I, [2024-04-30T12:57:31.688140 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672393859288357183009020135870562338] /company/15694555/persons-with-significant-control/individual/7AYbJehiTTD91IcUhbXdBvumeaY
I, [2024-04-30T12:57:33.697593 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672393859531351272751560599986503714] /company/12462557/persons-with-significant-control/individual/jjdgflZphXeM0pPxRkQkZs5kbHM
I, [2024-04-30T12:57:35.710237 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672393860093501778872363166224875554] /company/02496500/persons-with-significant-control/individual/3QzT_0qRrYgG6E4rTdpVpMEWENQ
I, [2024-04-30T12:57:38.723755 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672393860683457578844302203481489442] /company/NI696959/persons-with-significant-control/individual/qV-mSBMeZPPSXn2mGU_T_hcZjlA
I, [2024-04-30T12:57:40.731559 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672393861147685093576319806568661026] /company/NI047824/persons-with-significant-control/individual/h0i1_kzZJGYCUHwlUbezCFHf9NI
I, [2024-04-30T12:57:43.743871 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672393861385843480040401753985777698] /company/SC592338/persons-with-significant-control/individual/CPAtOXYPPh2bprJsArvAXScn288
I, [2024-04-30T12:57:45.757287 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672393861905681582474692299109433378] /company/08491828/persons-with-significant-control/individual/2p9zF0twy2ehnKfDkJOWeWtCbBc
I, [2024-04-30T12:57:48.775943 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672393862146257820578003573595439138] /company/15694559/persons-with-significant-control/individual/tKng5vVsd8XjFteN4dFDy0ej68M
I, [2024-04-30T12:57:50.793828 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672393862406176871795148846157266978] /company/11325059/persons-with-significant-control/individual/VQyyTGV40yj-uBRvDEC4MGQhb_M
I, [2024-04-30T12:57:53.815453 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394004352201981846837448732049442] /company/12208333/persons-with-significant-control/individual/s_E1ELb-HXARDTizpSdoT88hz0g
I, [2024-04-30T12:57:55.836929 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394004890173971575347431476297762] /company/15694561/persons-with-significant-control/individual/AokX43_Re-u54us-uib6l800pSs
I, [2024-04-30T12:57:57.860308 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394005883910995298572613084774434] /company/15694565/persons-with-significant-control/individual/M51kbMDneeZlVlBLkKP6XjSf160
I, [2024-04-30T12:57:59.879818 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394007039644078850158172823355426] /company/15547479/persons-with-significant-control/individual/8Znvy7tV9M7VuyBRluVSs8Cbdis
I, [2024-04-30T12:58:00.744422 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394007555855403825604830422892578] /company/15512467/persons-with-significant-control/individual/q-O8qcELY7atL9GtbCzi4L2IhHQ
I, [2024-04-30T12:58:02.884764 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394007852042229631188978225905698] /company/15545523/persons-with-significant-control/individual/iXEklbEKlTUZswR1QqhBCYtcKYE
I, [2024-04-30T12:58:03.729477 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394008402103477555845252717215778] /company/15547510/persons-with-significant-control/individual/JoGuhDIatSMKhdRakaOH5P4Lmos
I, [2024-04-30T12:58:04.429314 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394009458704643899031151410413602] /company/15563627/persons-with-significant-control/individual/Yhb4Ev3VME8uaphRpnLByDgeknM
I, [2024-04-30T12:58:05.156642 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394010336384788939252000966574114] /company/15603805/persons-with-significant-control/individual/fTFm-H4_iJs4CVsEEoDEiLwVhFU
I, [2024-04-30T12:58:05.877892 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394010674884018431348169884303394] /company/15607311/persons-with-significant-control/individual/wvtX8efpM-Ehsj7vGcR5RwF-0IU
I, [2024-04-30T12:58:08.909962 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394147026000435306532951918903330] /company/14796746/persons-with-significant-control/individual/95cXqZ-14aTkGpC7jjU8MJ0xRJE
I, [2024-04-30T12:58:10.933029 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394147864994954119085599164989474] /company/15694572/persons-with-significant-control/individual/jT3EiIgt0BbS5p2Dq6G0ABOdeKI
I, [2024-04-30T12:58:12.954265 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394148082601601649718850612101154] /company/15694574/persons-with-significant-control/individual/zktlC6GFce-khAoOygiG-5d_K0I
I, [2024-04-30T12:58:14.976031 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394286094782094675404118692986914] /company/09568573/persons-with-significant-control/individual/Y5zBOJaAmDfJ35r9LUJCXB_L86k
I, [2024-04-30T12:58:16.993920 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394286435699175806729545960128546] /company/15135615/persons-with-significant-control/individual/6KP-hx2juhDTdhlodF7RM-jYLXk
I, [2024-04-30T12:58:19.011294 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394286950701574962561574384959522] /company/15694579/persons-with-significant-control/individual/JcnmYPWoI3AYo4MdpJM1PaQQXXs
D, [2024-04-30T12:58:22.441569 #1] DEBUG -- : [shardId-000000000000] LAG: 86400s | N: 0

(*): Note that these measurements are from a development environment on a small Elasticsearch instance, not a production Elasticsearch cluster.

I think this has most likely always been the case, although it's possible that things are currently running slower than they used to (I am not sure why, if so). There could be a mixture of reasons for this, including: design, extensive work within Ruby code, Elasticsearch roundtrip, Elasticsearch write-confirmations.

Given our current design and performance characteristics, this means we couldn't cope with even 1 event/s from upstream PSC datasource, and that if the transformation stream falls behind, catching up could take an exceedingly long time.

It's worth spending some time trying to see if there are any simple performance improvements which could be made, here. Anything more complex would likely be too risky or time-consuming to change, but if some small improvements could be found here or there (or perhaps some regression discovered and fixed), even a 0.5 s/event increase in speed would compound to our benefit.

Although this prioritises optimisation of the PSC data transformation pipeline only, any benefits would likely also be applicable to SK and DK pipelines.

This was already known about from bulk data imports, but work on #251 made this more obvious and easier to measure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant