Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make loadAllElements fall back to an iterative implementation once a certain depth is reached #103

Merged
merged 1 commit into from
Sep 9, 2024

Conversation

popematt
Copy link
Contributor

@popematt popematt commented Aug 27, 2024

Issue #, if available:

None

Description of changes:

This ensures that we cannot get a StackOverflowException by loading data that has very deeply nested containers.

I started out with a purely iterative approach, but I discovered that it introduced a performance regression for some of my sample data, so instead, I made it use the existing recursive approach with a fallback to the iterative approach once the IonReader reaches a certain depth of nested containers. (In this case, I've selected 100 as an arbitrary-ish value that ought to be good enough to use recursion for most use cases.)

I also noticed that some of my test data had a slight performance improvement using the iterative approach, so I think that we should add an iterative vs recursive flag to the IonElementLoaderOptions class. However, it's not straightforward to do so because it has the potential to be a breaking change if done incorrectly, so I want to defer that for a future PR.

Performance

  • "IonValue" is a comparison with the latest release of ion-java.
  • "Baseline" is the current tip of master.
  • "MAX_DEPTH=100" is the implementation in this PR.
  • "MAX_DEPTH=0" is the implementation in this PR if the MAX_RECURSION_DEPTH constant is set to 0 (i.e. immediately fall back to the iterative implementation).

There is a slight performance regression, but I believe that it is acceptable in order to prevent StackOverflowErrors from occurring. A purely iterative approach is faster for some data, which is why I plan to make it configurable.

Composition of some arbitrary data (~5kb)

Condition Time (ms/op) gc.alloc.rate.norm (B/op)
IonValue 0.053 ± 0.001 81128.034 ± 0.004
Baseline 0.023 ± 0.001 54856.014 ± 0.002
MAX_DEPTH=100 0.025 ± 0.001 54832.016 ± 0.002
MAX_DEPTH=0 0.027 ± 0.001 55912.018 ± 0.002

Sample binary Ion application logs (~20MB)

Condition Time (ms/op) gc.alloc.rate.norm (B/op)
IonValue 635.450 ± 75.384 625408257.218 ± 49.211
Baseline 286.219 ± 28.832 449205604.132 ± 17.491
MAX_DEPTH=100 295.489 ± 32.007 449205608.339 ± 23.524
MAX_DEPTH=0 310.604 ± 22.694 454411169.864 ± 13.729

Sample product catalog data (~40kb)

Condition Time (ms/op) gc.alloc.rate.norm (B/op)
IonValue 0.616 ± 0.093 696584.433 ± 0.297
Baseline 0.310 ± 0.006 632136.199 ± 0.023
MAX_DEPTH=100 0.315 ± 0.004 632080.202 ± 0.026
MAX_DEPTH=0 0.284 ± 0.003 632968.182 ± 0.022

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link

codecov bot commented Aug 27, 2024

Codecov Report

Attention: Patch coverage is 43.75000% with 45 lines in your changes missing coverage. Please review.

Please upload report for BASE (master@b635d5c). Learn more about missing BASE report.

Files Patch % Lines
...com/amazon/ionelement/impl/IonElementLoaderImpl.kt 43.03% 39 Missing and 6 partials ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##             master     #103   +/-   ##
=========================================
  Coverage          ?   86.69%           
  Complexity        ?      531           
=========================================
  Files             ?       34           
  Lines             ?     1135           
  Branches          ?      161           
=========================================
  Hits              ?      984           
  Misses            ?       99           
  Partials          ?       52           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@popematt popematt force-pushed the recursive-iterative-combo branch from 9b0edb7 to 14785d3 Compare August 27, 2024 22:12
@popematt popematt marked this pull request as ready for review August 27, 2024 22:20
Comment on lines +45 to +46
*
* This will fail for IonDatagram if the IonDatagram does not contain exactly one user value.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This was a non-obvious edge case that I ran into while testing my changes, so I figured I'd add a note here.

@@ -31,7 +31,7 @@ internal class StructElementImpl(
) : AnyElementBase(), StructElement {

override val type: ElementType get() = ElementType.STRUCT
override val size = allFields.size
override val size: Int get() = allFields.size
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Both ListElementImpl and SexpElementImpl (by way of inheritance from SeqElementImpl) delegate to the underlying collection rather than storing the size in a field. It's necessary to delegate to the underlying collection so that we don't read the size from the collection before it is fully populated by the element loader.

@@ -45,6 +50,8 @@ class IonElementLoaderTests {
// Converting from IonValue to IonElement should result in an IonElement that is equivalent to the
// parsed IonElement
assertEquals(parsedIonValue.toIonElement(), parsedIonElement)

assertEquals(tc.expectedElement, parsedIonElement)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ I was surprised to discover that there was previously no assertion that the parsed element was equal to the expected element. 😮

@@ -57,15 +64,29 @@ class IonElementLoaderTests {

IonElementLoaderTestCase("1", ionInt(1)),

IonElementLoaderTestCase("existence::42", ionInt(1).withAnnotations("existence")),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ A couple of test cases were wrong, but otherwise these changes are just adding test cases for every Ion type.

@@ -128,26 +142,41 @@ internal class IonElementLoaderImpl(private val options: IonElementLoaderOptions
IonType.BLOB -> BlobElementImpl(ionReader.newBytes(), annotations, metas)
IonType.LIST -> {
ionReader.stepIn()
val list = ListElementImpl(loadAllElements(ionReader).toImmutableListUnsafe(), annotations, metas)
val listContent = ArrayList<AnyElement>()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would improve performance to pool these using something like IonJava's RecyclingStack. We could get away with having one per depth level right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm intentionally not using a recycling stack because on L154, we wrap it in an immutable list, and then forget about the underlying mutable reference.

ListElementImpl(listContent.toImmutableListUnsafe(), annotations, metas)

However, it would be worth investigating whether there is a performance benefit to having a pool of ArrayLists that have a relatively larger capacity, and then copying to an exact-sized immutable list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants