-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XDR processing using BytesIO class #646
Comments
Hi @ReidMcLaughlinSecurrency, thanks for your feedback, I will improve it later. |
What would be required to make this change? Would it break the API of the SDK, or only affect internals? |
Hi @ReidMcLaughlinSecurrency, could you please demonstrate a demo to let me understand how you achieve such significant performance improvements? |
If implemented correctly, none of these changes would affect the rest of the code base. Keeping the signatures the same but changing the underlying implementation does not appear to have any issues with our efforts to assess this option and how it would affect the rest of the SDK. |
I conducted a simple test on my MacBook Air (M1, 2020, 16GB RAM) using version 8.1.1 of the SDK. It takes about 0.18 ms to process a single row (of course, this may vary depending on the data being used - I chose a more complex row). Processing 50,000,000 rows would take approximately 2.5 hours. However, if users are using a version < 7.0.3, additional checks may slow down processing significantly, we have already fixed this issue in version 7.0.3. Could you please provide your test data, host machine info, and the SDK version used during testing? simple test code:
Thank you. |
Whenever doing performance testing it is critical that we choose the right data to perform those tests. Your test is showing a "more complex" I however would say that the selected item is actually less complex than most of the data that we process. I agree, that it isn't a super simple sample, but I would also argue that the operations in this test data sample is mostly buy/sell offer operations. I think that if you ran some tests with samples that had other operations that are complex, you would likely see the performance numbers I have stated in the ticket. So, lets discuss how tests are run to get an idea of real world performance. So we would parse the data for the Stellar for a day, maybe a week tops. We would statistically determine what operations were performed and then try to find a representative example data sample. This of course is just a swag at things, but we need something to really tell us how fast or slow things are tested. So we would run our tests on that sample, run some tests on more complex transactions, a few on less complex to ensure that our choices are "good enough" to make a generalized statement on what the performance is. After doing that, implement these operations used (not optimized of course, but simply switching how things are decoded) and we compare the results from the results and samples we took earlier. The sample you provided is in fact, one of the simplest in terms of converting XDR data, again most of this is buy/sell offers and those are relatively simpler data structure than other operators. Try your sampling against a few more items such as thee data for transaction has = 36cdf221cd46dbf3c9e0675abff53b9d93647176d8bfeded9e202e222f82da36 This is not the most complex data item to test, but I believe it will likely show more of the performance issues. Again, we cannot have a discussion about performance until we are testing the same things and testing something that is indicative of the real world data. |
It looks like the XDR lib is depricated and is going to be removed soon https://docs.python.org/3/library/xdrlib.html Any update on how this might be handled moving forward? |
Yes, please refer to #739 |
Sorry, I missed this message. I tested the data you provided on my device (MacBook Air, M1, 2020, 16GB RAM), and it takes about 1ms, which is far from the 100ms mentioned in the issue. |
Sorry, haven't seen info posted up here. Care to share your benchmark information. I find it hard to believe that things are 1ms when it was said 18ms was found for a smaller item to decode than the one I mentioned. |
It's fine, even 1ms is high for simple decoding of data, going a more performant route gets things down in the low microseconds which is inline with estimates of how its performance should be based on the message formats and lengths etc. |
Is your feature request related to a problem? Please describe.
As a team we are processing large sets of stellar operations by decoding the XDR packages available from the public dataset that Stellar has made available via Big Query. Using the current XDR unpacking routines, we are seeing long processing times for large datasets. Based on our tests we have seen the current SDK takes 100ms to perform extraction. We have tested with 50,000,000 rows (1 year of data for one most used token on Stellar) from the BigQuery table for our extractions. The current SDK decoding would take over 2.5 hours.
Describe the solution you'd like
We suggest that the SDK is updated to use the Python BytesIO class for XDR decoding. This would operates on the data completely in memory and will dramatically increase the throughput of the XDR processing task. This change would also bring better compatability to future Python versions.
Additional context
On some prototype code we have tested the processing time has been reduced to around 15 minutes for the same dataset.
The text was updated successfully, but these errors were encountered: