Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std.foldl over (not so) large list is causing stack overflow #169

Closed
DocX opened this issue Jun 18, 2024 · 9 comments
Closed

std.foldl over (not so) large list is causing stack overflow #169

DocX opened this issue Jun 18, 2024 · 9 comments

Comments

@DocX
Copy link

DocX commented Jun 18, 2024

What happens
std.foldl seems to be implemented internally with recursion, or otherwise causing stack overflow when iteration over large array.

Additionally, memory usage grows immensely with more foldl count

What is expected
std.foldl should not cause stack overflow even with infinite arrays

Verison

jrsonnet --version
jrsonnet 0.5.0-pre96

Context
We use "large" foldl to apply list of "mutations" onto list of kubernetes manifests.

The example below is simplified version of our use case. We run into stack overflow issues with ~50 mutations, but our code is more complex. So to reproduce here, the number is higher than our practical use case, but not that big still.

Or code works ok with go-jsonnet, but reaches stack overflow in jrsonnet. If we bump os-stack we get it work but memory consumption of our use case gets to >30GB.

How to reproduce

local createManifest = function(number) {
  kind: 'Deployment',
  metadata: {
    name: 'test-deployment-' + number,
  },
  spec: {
    template: {
      nodeSelectors: {
        'node-type': 'node',
        'other-selector': 'foo',
      },
    },
  },
};

local mutation1 = function(manifest) (
  if manifest.kind == 'Deployment' then
    manifest {
      metadata+: {
        labels+: {
          test: 'foo',
        },
      },
    }
  else
    manifest
);

local mutation2 = function(manifest) (
  local removeSelectors = [
    'beta.kubernetes.io/arch',
    'kubernetes.io/arch',
    'node.kubernetes.io/role',
    'node-type',
  ];

  if manifest.kind == 'Deployment' then
    manifest {
      spec+: {
        template+: {
          nodeSelectors: {
            [field]: manifest.spec.template.nodeSelectors[field]
            for field in std.objectFields(manifest.spec.template.nodeSelectors)
            if std.member(removeSelectors, field)
          },
        },
      },
    }
  else
    manifest
);

local mutations = std.flatMap(
  function(index) [
    mutation1,
    mutation2,
  ],
  std.range(0, std.parseInt(std.extVar('fold-count')) - 1)
);

local manifests = std.map(
  createManifest,
  std.range(0, std.parseInt(std.extVar('map-count')) - 1)
);

std.map(
  function(manifest) std.foldl(function(result_manifest, mutation) mutation(result_manifest), mutations, manifest),
  manifests
)

Run:

for FOLD_COUNT in 1 10 100 500; do
  echo ""
  echo "==== FOLD_COUNT: $FOLD_COUNT ===="
  /usr/bin/time -p -h -l jrsonnet results/jrsonnet-foldl-bug.jsonnet --ext-str fold-count=$FOLD_COUNT --ext-str map-count=100 > /dev/null
done

Result:

==== FOLD_COUNT: 1 ====
real 0.01
user 0.00
sys 0.00
             4055040  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                1198  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   0  voluntary context switches
                  13  involuntary context switches
            40967561  instructions retired
            32092929  cycles elapsed
             2818048  peak memory footprint

==== FOLD_COUNT: 10 ====
real 0.05
user 0.03
sys 0.01
            27561984  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                6937  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   0  voluntary context switches
                 102  involuntary context switches
           251001710  instructions retired
           161304203  cycles elapsed
            26324992  peak memory footprint

==== FOLD_COUNT: 100 ====
real 1.78
user 1.43
sys 0.34
          1413394432  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              345276  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   0  voluntary context switches
                 291  involuntary context switches
          9893883867  instructions retired
          5985518894  cycles elapsed
          1412157440  peak memory footprint

==== FOLD_COUNT: 500 ====
stack overflow, try to reduce recursion, or set --max-stack to bigger value
    results/jrsonnet-foldl-bug.jsonnet:43:26-81: function <builtin_object_fields> call
    results/jrsonnet-foldl-bug.jsonnet:43:66-80: field <nodeSelectors> access
    argument <o> evaluation

    ... repeated many times ...

    results/jrsonnet-foldl-bug.jsonnet:43:26-81: function <builtin_object_fields> call
    results/jrsonnet-foldl-bug.jsonnet:43:66-80: field <nodeSelectors> access
    argument <o> evaluation
    results/jrsonnet-foldl-bug.jsonnet:43:26-81: function <builtin_object_fields> call
    field <nodeSelectors> manifestification
    field <template> manifestification
    field <spec> manifestification
    elem <0> manifestification
real 0.37
user 0.29
sys 0.07
           261378048  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
               64020  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                  48  voluntary context switches
                 150  involuntary context switches
          1843870396  instructions retired
          1171073480  cycles elapsed
           260128768  peak memory footprint

with --max-stack 4096

==== FOLD_COUNT: 1 ====
real 0.01
user 0.00
sys 0.00
             2121728  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                 726  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   0  voluntary context switches
                  13  involuntary context switches
            17444472  instructions retired
            16588490  cycles elapsed
              884736  peak memory footprint

==== FOLD_COUNT: 10 ====
real 0.01
user 0.00
sys 0.00
             4493312  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                1305  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   0  voluntary context switches
                  34  involuntary context switches
            38361929  instructions retired
            34536039  cycles elapsed
             3256320  peak memory footprint

==== FOLD_COUNT: 100 ====
real 0.21
user 0.16
sys 0.04
           143482880  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
               35236  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   0  voluntary context switches
                 250  involuntary context switches
          1001957115  instructions retired
           664777830  cycles elapsed
           142245888  peak memory footprint

==== FOLD_COUNT: 200 ====
real 0.73
user 0.57
sys 0.14
           542973952  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              132770  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   0  voluntary context switches
                 224  involuntary context switches
          3673679293  instructions retired
          2352762219  cycles elapsed
           541736960  peak memory footprint

==== FOLD_COUNT: 300 ====
real 1.61
user 1.25
sys 0.32
          1201475584  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              293537  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   0  voluntary context switches
                 494  involuntary context switches
          8038891961  instructions retired
          5283522033  cycles elapsed
          1200238592  peak memory footprint

==== FOLD_COUNT: 400 ====
real 2.90
user 2.27
sys 0.60
          2116505600  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              516933  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   0  voluntary context switches
                1343  involuntary context switches
         14061168814  instructions retired
          9470288606  cycles elapsed
          2115268608  peak memory footprint

==== FOLD_COUNT: 500 ====
real 5.26
user 4.09
sys 1.08
          3291742208  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              803856  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   0  voluntary context switches
                6634  involuntary context switches
         21968229954  instructions retired
         16527925347  cycles elapsed
          3290505216  peak memory footprint

For comparison here is go-jsonnet result:

==== FOLD_COUNT: 1 ====
real 0.01
user 0.00
sys 0.00
             5984256  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                1656  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   1  signals received
                   0  voluntary context switches
                 105  involuntary context switches
            26255260  instructions retired
            27298520  cycles elapsed
             3678208  peak memory footprint

==== FOLD_COUNT: 10 ====
real 0.02
user 0.01
sys 0.00
            11087872  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                2903  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                  10  signals received
                   5  voluntary context switches
                 174  involuntary context switches
            92086600  instructions retired
            69245289  cycles elapsed
             7053312  peak memory footprint

==== FOLD_COUNT: 100 ====
real 0.49
user 0.66
sys 0.05
           140193792  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
               34462  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                  83  signals received
                  16  voluntary context switches
                 797  involuntary context switches
          4181923667  instructions retired
          2352113122  cycles elapsed
           136097792  peak memory footprint

==== FOLD_COUNT: 200 ====
real 2.39
user 3.08
sys 0.19
           501780480  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              122925  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                 286  signals received
                  62  voluntary context switches
                2090  involuntary context switches
         18943427191  instructions retired
         11051043727  cycles elapsed
           497668096  peak memory footprint

==== FOLD_COUNT: 300 ====
real 7.48
user 8.97
sys 0.52
          1121615872  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              274565  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                 609  signals received
                 113  voluntary context switches
                9582  involuntary context switches
         49595121172  instructions retired
         31446608435  cycles elapsed
          1117507584  peak memory footprint

==== FOLD_COUNT: 400 ====
real 19.14
user 19.98
sys 1.39
          2028417024  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              496646  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                1376  signals received
                 194  voluntary context switches
               70547  involuntary context switches
         96566460731  instructions retired
         69863591500  cycles elapsed
          2024267776  peak memory footprint

==== FOLD_COUNT: 500 ====
real 29.76
user 32.34
sys 1.53
          3204464640  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              784387  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                1898  signals received
                 200  voluntary context switches
               34467  involuntary context switches
        168218433001  instructions retired
        113452304861  cycles elapsed
          3200331776  peak memory footprint
@DocX DocX changed the title std.foldl over large list is causing stack overflow std.foldl over (not so) large list is causing stack overflow Jun 18, 2024
@CertainLach
Copy link
Owner

Which version of jrsonnet do you use?

@DocX
Copy link
Author

DocX commented Jun 18, 2024

jrsonnet --version
jrsonnet 0.5.0-pre96

@CertainLach
Copy link
Owner

  1. std.foldl doesn't use recursion
    Your code, however, does, due to lazy evaluation
  2. os-stack does nothing here, only max-stack will help
    max-stack specifies how much recursion is allowed, os-stack - what OS stack to allocate for execution (in megabytes). For me, only max-stack is required here, and the code works with the default os stack size. It is ok that go-jsonnet and jrsonnet require a different max-stack value, as they use different metering techniques.

Nonetheless, I'll look what can be done here.

@CertainLach
Copy link
Owner

CertainLach commented Jun 18, 2024

Making this call tailstrict
for field in std.objectFields(manifest.spec.template.nodeSelectors) tailstrict
greately reduces stack size usage, due to field nodeSelectors being calculated prior to getting it fields, instead of going 500 overlays deep due to lazy-evaluated field get, however level of stacking object overlays makes this thing still highly inefficient

I wonder if there should be forced (i.e not-lazy) object comprehension syntax for patterns like this

@DocX
Copy link
Author

DocX commented Jun 19, 2024

Thank you for looking into it.

I tried to extract the core issue from our actual code. But maybe it is not actually showing the real issue. We have quite complex jsonnet that works in go-jsonnet but was getting stack overflow in jrsonnet. I was trying to pinpoint which jsonnet code is doing that, but concluded it was just something around how many of those "mutations" we have in the list.

os-stack does nothing here, only max-stack will help

For me we had to actually bump os-stack on jrsonnet to make our code work. So maybe the issue is somewhere else. I was not really able to understand where is it happening since it only gives:

thread 'main' has overflowed its stack
fatal runtime error: stack overflow

Is there a way to debug this better?

@CertainLach
Copy link
Owner

CertainLach commented Jun 19, 2024

Oh, so it happens in the real code, not this one.
Yes, only os-stack will help here then.
Unfortunately, Rust/C++ are unable to recover from stack overflow, only golang is invulnerable to this due to dynamic stack size. Sometimes, when increasing --max-stack, you also need to increase --os-stack to make it fit. If you get fatal runtime error: stack overflow - then it's the time to increase os-stack

I am planning on finishing #121, which will solve many stack size issues.

EDIT: Turns out, there is a way to grow stack in rust safely: https://docs.rs/stacker/latest/stacker/
I'll try to add it in couple of places, it would be useful even with proper TCO, because my TCO implementation will have escape hatches for easier builtins implementation

@DocX
Copy link
Author

DocX commented Jun 19, 2024

Great. We will evaluate once next version is released. Thank you

@CertainLach
Copy link
Owner

I have implemented dynamic stack growth on master, so you should not experience need to use --os-stack anymore.
However, max-stack is still there, and its limit is not changed. I will not do anything with it, because its current limit is reasonable, and its usage is just different from go-jsonnet.

@CertainLach
Copy link
Owner

While this is not a permanent solution, this issue is resolved for now, with better solution in development.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

2 participants