Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove recursive interleave calls #59

Open
rlouf opened this issue Aug 10, 2022 · 13 comments
Open

Remove recursive interleave calls #59

rlouf opened this issue Aug 10, 2022 · 13 comments
Labels
bug Something isn't working enhancement New feature or request help wanted Extra attention is needed important performance stream processing Operations related to (goal) stream processing

Comments

@rlouf
Copy link
Contributor

rlouf commented Aug 10, 2022

The following code fails with a RecursionError:

from kanren import eq, lall, lany, run, vars

observations = [(0, 1), (1, 0)]

def generate(pixels):
    pairs = [(a, b) for a, b in zip(pixels[:-1], pixels[1:])]
    return lall(
        *[lany(*[eq(obs, pair) for obs in observations]) for pair in pairs]
    )

N = 1000
pixels = vars(N)
res = run(1, pixels, generate(pixels))

The error is raised in the interleave function from toolz.

@rlouf
Copy link
Contributor Author

rlouf commented Aug 10, 2022

It appears that toolz does not use trampoline evaluation, and has no intention to do so. We may need to implement a version of interleave that uses it to avoid this RecursionError.

@brandonwillard
Copy link
Member

It appears that toolz does not use trampoline evaluation, and has no intention to do so. We may need to implement a version of interleave that uses it to avoid this RecursionError.

Yes, that's the same type of thing I did in etuples and unification to get past recursion limits/errors.

@brandonwillard
Copy link
Member

The error is raised in the interleave function from toolz.

I'm not able to reproduce the error locally.

@rlouf
Copy link
Contributor Author

rlouf commented Aug 11, 2022

Ah. I tried in a fresh environment and the error still shows on my machine.

@brandonwillard
Copy link
Member

brandonwillard commented Aug 16, 2022

FYI: If I set N to something large, the issue appears.

Anyway, yes, we should make our own version of interleave that doesn't use recursion.

@brandonwillard brandonwillard changed the title RecursionError on simple example Use a non-recursive interleave Aug 16, 2022
@brandonwillard brandonwillard added bug Something isn't working enhancement New feature or request help wanted Extra attention is needed labels Aug 16, 2022
@brandonwillard brandonwillard changed the title Use a non-recursive interleave Remove recursive interleave calls Aug 16, 2022
@wrq
Copy link

wrq commented Jan 25, 2023

I may be way out my league here, but this could not be done with zip and enumerate? I've included a small snippet to illustrate what I mean.

def interleave(*iters):
  res = []
  for z in zip(*iters):
    for _,x in enumerate(z):
      res.append(x)
  return res

print(interleave([1,2,3], [4,5,6], [7,8,9]))
# returns: [1, 4, 7, 2, 5, 8, 3, 6, 9]

If this is a complete misunderstanding, just let me know; I do not mean to interrupt or obstruct.

Edit: a little more experimenting below -->

import random

def split(s: str) -> list[str]:
  return [x for x in s]

def listbench(lstlen, x):
  res = []
  for x in range(x):
    aux = [random.randrange(1,11) for _ in range(lstlen)]
    res.append(aux)
  return res

print(interleave(*listbench(10, 5000)))

This appears to work even with larger lists, I tested 50_000 and 500_000, both successful.

@brandonwillard
Copy link
Member

I may be way out my league here, but this could not be done with zip and enumerate? I've included a small snippet to illustrate what I mean.

def interleave(*iters):
  res = []
  for z in zip(*iters):
    for _,x in enumerate(z):
      res.append(x)
  return res

print(interleave([1,2,3], [4,5,6], [7,8,9]))
# returns: [1, 4, 7, 2, 5, 8, 3, 6, 9]

If this is a complete misunderstanding, just let me know; I do not mean to interrupt or obstruct.

Now that I'm coming back to this, I don't think the problem is in interleave itself, as much as our use of it in lall -> lconj -> lconj_seq -> bind. The generator produced by lconj_seq (well, the lconj_seq_goal it creates) is effectively composing N - 1 interleave calls. I think that's actually the thing we need to "flatten".

@wrq
Copy link

wrq commented Jan 25, 2023

I can see that I am in fact out of my league here then, cheers and good luck!

@brandonwillard
Copy link
Member

I can see that I am in fact out of my league here then, cheers and good luck!

There's a bit of context here that may make things seem foreign, but we're really just talking about generators calling generators in a particular way.

The first step toward solving this issue is to remove all the unnecessary elements and obtain a more direct illustration of the issue—if possible.

Here's a first pass at that:

from kanren import var, eq
from kanren.core import lconj_seq


q_lv = var()

goal_1 = lconj_seq((eq(q_lv, i) for i in range(10000)))

goal_1
# <function kanren.core.lconj_seq.<locals>.lconj_seq_goal(S)>

state_0 = {}
state_stream_1 = goal_1(state_0)

state_stream_1
# <generator object lconj_seq.<locals>.lconj_seq_goal at 0x7f6e14f580d0>

next(state_stream_1)
# RecursionError: maximum recursion depth exceeded while calling a Python object

Now, we can focus exclusively on the code in kanren.core.lconj_seq.lconj_seq_goal.

@brandonwillard brandonwillard added important stream processing Operations related to (goal) stream processing performance labels Jan 26, 2023
@brandonwillard
Copy link
Member

@wrq, here's a MWE demonstrating the underlying issue:

from functools import reduce
from itertools import chain
from toolz import interleave


def bind(z, g):
    # Both fail for the same reason.
    # return chain.from_iterable(map(g, z))
    return interleave(map(g, z))


def goal_constructor(i):

    def g(s):
        yield (i,) + (s,)

    return g


z = reduce(bind, (goal_constructor(i) for i in range(1, 10)), iter((0,)))

list(z)
# [(9, (8, (7, (6, (5, (4, (3, (2, (1, 0)))))))))]


z = reduce(bind, (goal_constructor(i) for i in range(1, 10000)), iter((0,)))

next(z)
# RecursionError: maximum recursion depth exceeded while calling a Python object

@wrq
Copy link

wrq commented Feb 5, 2023

Ok. First, I'm sorry that I haven't gotten to this sooner. I'm actually a part-time Walmart worker and a stay-at-home father, not really a professional programmer, so this has been on the backburner for me.

I was able to rig this up:

from functools import reduce
from itertools import chain, zip_longest
#from toolz import interleave

def interleave(*iters):
  res = []
  for z in zip_longest(*iters):
    for _,x in enumerate(z):
      res.append(x)
  return res

def bind(z, g):
    # Both fail for the same reason.
    # return chain.from_iterable(map(g, z))
    return interleave(map(g, z))

def goal_constructor(i):
    def g(s):
        yield (i,) + (s,)
    return g

z = reduce(bind, (goal_constructor(i) for i in range(1, 10)), iter((0,)))

print(z)        # debug
print(list(z)) # checking for data model weirdness

# [(9, (8, (7, (6, (5, (4, (3, (2, (1, 0)))))))))]

z = reduce(bind, (goal_constructor(i) for i in range(1, 10000)), iter((0,)))

print(next(next(iter(z))))

--- OUTPUT ---

[<generator object goal_constructor.<locals>.g at 0x7fa23d01cdd0>]
[<generator object goal_constructor.<locals>.g at 0x7fa23d01cdd0>]
(9999, <generator object goal_constructor.<locals>.g at 0x7fa23c954eb0>)

This uses my little interleave hack from before with run-of-the-mill itertools, and then I noticed that z needs to be an iterator and cannot be a regular list. This is obviously a cumbersome way of thinking about the objects, because of the original Scheme domain would allow it.

I suppose the new interleave could just "return iter(res)" ?

I'm a little embarrassed to say that I'm not even confident that this is a helpful response and that I'm missing something fundamental about all of this.

Edit: I just realized something. zip() will terminate on the shortest iterator, which is probably not the desired behavior? It should be zip_longest() up there. I've made the edit, but if that's what is desired, then it could just be zip().

@brandonwillard
Copy link
Member

Ok. First, I'm sorry that I haven't gotten to this sooner. I'm actually a part-time Walmart worker and a stay-at-home father, not really a professional programmer, so this has been on the backburner for me.

No problem; I just don't want you to get the impression that this stuff is over your head, because it's not. It's just a rather context-specific problem with some unstated constraints.

From a quick look at your approach, it seems like you have the right idea: i.e. delay evaluation further. This will require changes to the way that z is handled downstream, though.

That approach could be feasible, as long as it doesn't dramatically change the stream processing semantics and has a manageable refactoring footprint. In other words, all the user-facing functions involved (e.g. the goal constructors, bind, ldisj_seq, lconj_seq, etc.) need to provide roughly the same types of objects as they currently do.

Also, the interleave implementation will likely need to be a generator itself, because essentially every iterator in this context can be infinite.

@brandonwillard
Copy link
Member

brandonwillard commented Feb 17, 2023

I think we might need to "flatten" this example and see what can be done from there. That would involve combining the interleave, bind, and reduce into a single generator function that takes the goal_constructor iterator as an argument.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request help wanted Extra attention is needed important performance stream processing Operations related to (goal) stream processing
Projects
None yet
Development

No branches or pull requests

3 participants