Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SparkR-237][WIP] Work around fix with cleanClosure() in SparkR. #72

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

hlin09
Copy link

@hlin09 hlin09 commented Mar 25, 2015

@piccolbo I have submitted a new patch in SparkR that fixes some of the major issues of cleanClosure. Some other issues might take longer to fix, so I submitted this PR as a quick work around with minimal changes to plyrmr. Currently in my local setup, it passes the examples but fails at the first test. If you have time, you could try this patch and maybe give me some pointers about the error messages. Thanks.

@hlin09 hlin09 changed the title [SparkR-237][WIP] Work around fix with cleanClosure() in SparkR. Examples pass. [SparkR-237][WIP] Work around fix with cleanClosure() in SparkR. Mar 25, 2015
@piccolbo
Copy link
Collaborator

The workaround is pretty heavy handed and at times puzzling. It boils down to

  • using fully qualified names for all imported functions (which was the problem reported in the first place)
  • modifying promises by setting the enclosing environment to .GlobalEnv, which is all but certain to be a bug in and of itself
  • renaming some variables. Which variables would require the same treatment?

I have multiple problems with this

  • we are fixing the symptoms not the cause. It shows that after the cleanClosure patch, environments are not reproduced properly in the workers. This all worked just before updating SparkR, I am not trying anything new in plyrmr.
  • we are trading correctness for efficiency.
  • we are introducing new bugs to make the symptoms go away (promise manipulation). These new bugs affect also other backends -- changes are outside pipespark
  • since several calls in plyrmr accept expression or closures as arguments, presumably end users of plyrmr would have to apply the same workarounds. We would have to try and generalize them, then explain them. That's not something we can even contemplate.

By the way, cleanClosure is probably also implicated in SparkR-238. Moreover, as I was reviewing your code, it was clear it was not able to support advanced R features like get, assign, eval/as.name, e.g. x = 1; collect(lapply(parallelize(sc, 1:10), function(y) get("x"))) These are not features commonly used by end users, but they are in base and nobody knows how many packages use them (plyrmr does use as.name) and how many will break if used on a worker.

So to summarize, I can't merge this. If you want just as an experiment, I can merge into a different branch but I can tell you right away these changes are not going into master now or later.

@hlin09
Copy link
Author

hlin09 commented Mar 25, 2015

Thanks @piccolbo for your feedback. I will not try to merge this PR, and instead it just gives you an idea of what cleanClosure can do for now. It is good to have your feedback. I guess the issues of "imported qualifiers" and "renaming variables" are solvable in cleanClosure. The "promise manipulation" is a harder bug. Admittedly, R has quite a lot advanced features and it is hard to mimic its mechanism in such small code of cleanClosure. Apparently, we are trading correctness for efficiency. We can talk to Shiva if cleanClosure is too aggressive to be practical, and we can fall back to previous approach. I am OK with that.

cleanClosure tries to add useful variables to closure while the old approach, getDependencies removes unrelated vars from closure. getDependencies also have correctness and efficiency problems. We can either make cleanClosure more complete or getDependencies more aggressive. It is hard to see which is easier now.

@piccolbo
Copy link
Collaborator

OK thanks, I am bringing the discussion back to the ticket.

On Wed, Mar 25, 2015 at 8:52 AM, hlin09 [email protected] wrote:

Thanks @piccolbo https://github.com/piccolbo for your feedback. I will
not try to merge this PR, and instead it just gives you an idea of what
cleanClosure can do for now. It is good to have your feedback. I guess
the issues of "imported qualifiers" and "renaming variables" are solvable
in cleanClosure. The "promise manipulation" is a harder bug. Admittedly,
R has quite a lot advanced features and it is hard to mimic its mechanism
in such small code of cleanClosure. Apparently, we are trading
correctness for efficiency. We can talk to Shiva if cleanClosure is too
aggressive to be practical, and we can fall back to previous approach. I am
OK with that.

cleanClosure tries to add useful variables to closure while the old
approach, getDependencies removes unrelated vars from closure.
getDependencies also have correctness and efficiency problems. We can
either make cleanClosure more complete or getDependencies more
aggressive. It is hard to see which is easier now.


Reply to this email directly or view it on GitHub
#72 (comment)
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants