Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Added Methods #320

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

DawDavis
Copy link

@DawDavis DawDavis commented Nov 3, 2020

OVERVIEW

This PR aims to bring a receiver-alike functionality to Tengo, allowing for getters, setters, mutators, and more. Overall, the change makes it much easier to use some OOP principles in Tengo, much as the 'this' keyword does in certain JS applications.

A method declaration is syntactically defined as follows:

//derived from func (r *receiver) method (argument interface{}) {...}
method := func(receiver)(argument) { ... }

This is done to alleviate two major concerns I had with this feature:

  • The strange 'this' keyword functionality in JS would not be very idiomatic.
  • Golang methods require a declaration, but Tengo specifically avoids declarations.

Specifics

These proposed Tengo receivers do not carry a guarantee about their underlying type: It could be any index-able Tengo type, or undefined, should the method be called in a non-parent context. For example:

f1 := func(r)() { return r }

a := f1() // a = undefined

m := { f2: f1 }

b := m.f2() // b = m

You may notice that unlike JS, where 'this' contains some scope information no matter the context, Tengo's receivers very often carry undefined as their value.

Another important question is how this is different from a self-capturing closure. e.g:

m := { get: func() { return m }}

a := m.get() // a = m

at first glance, this serves a similar purpose, however the given relationship breaks down if you attempt to copy m.

m := { get: func() { return m }}
o := copy(m)
a := o.get() // a = m, even though we copied it.

This is because closures capture at definition time, whereas receivers capture at call-time. The same operation, when done with a receiver, acts as a method call:

m := { get: func(r)() { return r }}
o := copy(m)
a := o.get() // a = o

Implementation

The implementation of receivers is handled by altering the way free-vars are stored when a method is indexed out of a map or array. At parsing, a look-ahead is performed on any selector or index expressions, in order to see if they directly feed into a call operation. If they do, they mark themselves as being a method call.

At compile time, a select or index expression is then compiled to the OpIndex opcode, with a new flag set afterwards, akin to the spread flag, that indicates whether the given indexing operation should include its context info in the indexed object.

Then finally, at run time, when OpIndex is called with a method flag, it behaves as per usual, except that it attempts to cast the retrieved object as a compiled function, and then if it is successful, copies the function, sets freevar[0] inside of the function as itself (the indexed map/array), and then continues normally.

This copy prevents the given map context from escaping this particular function call.


It is also useful to note that the freevars in compiled functions are now always +1 in length, as the 0th item is reserved for the receiver implementation. If no receiver exists, a placeholder "" freevar is created in order to reserve the space. This is done to streamline the function calls at runtime.

Caveats

This feature does not extend to non-compiled methods, mainly because that would require a huge API change that I do not feel comfortable suggesting. Additionally, I questioned the usefulness of being able to move context out into Go-space.

Accessing methods in any way that is not an "index" or "select" operation results in an undefined receiver. To my mind, this is fine, mainly because those are the places where parent-access is most frequently required.

Continued extensions of this feature

I considered adding a bind keyword to the specification, which would allow a receiver to be permanently bound to a specific map/array.

m := { f: func(r)() { return r }}
fn := bind(m.f)
a := fn() // a = m, even though it was called without context

And while I believe this may be a useful feature, I do think its use is extremely niche.

Benchmarks:

Master:
-------------------------------------
fibonacci(35)
-------------------------------------
Result:  9227465
Go:      60.857359ms
Parser:  46.585µs
Compile: 131.165µs
VM:      3.346390935s
-------------------------------------
fibonacci(35) (tail-call #1)
-------------------------------------
Result:  9227465
Go:      82.520452ms
Parser:  21.779µs
Compile: 87.366µs
VM:      3.444464444s
-------------------------------------
fibonacci(35) (tail-call #2)
-------------------------------------
Result:  9227465
Go:      335ns
Parser:  20.929µs
Compile: 77.084µs
VM:      16.204µs
This Branch:
-------------------------------------
fibonacci(35)
-------------------------------------
Result:  9227465
Go:      63.743294ms
Parser:  44.97µs
Compile: 137.976µs
VM:      3.265088942s
-------------------------------------
fibonacci(35) (tail-call #1)
-------------------------------------
Result:  9227465
Go:      82.495663ms
Parser:  20.835µs
Compile: 128.924µs
VM:      3.373017762s
-------------------------------------
fibonacci(35) (tail-call #2)
-------------------------------------
Result:  9227465
Go:      305ns
Parser:  19.629µs
Compile: 79.299µs
VM:      15.721µs

@DawDavis DawDavis changed the title Feature/add parent support Feature: Added Methods Nov 3, 2020
@geseq
Copy link
Collaborator

geseq commented Nov 3, 2020

Thanks for the PR. I'm gonna give this some thought.

At first glance I am comfortable with a feature like this, but I'm not entirely sure about the syntax for declaring the methods.

@d5 thoughts?

m := { f1: func(r)() { return r == undefined}}
out = m.f1()`,
nil, false)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add tests for array as well

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out - it completely slipped my mind while writing the other cases.

I added a few that I think test the broadest categories of array operations. If there are any other unit tests needed, please let me know.

@d5
Copy link
Owner

d5 commented Nov 3, 2020

Thanks for the PR. I really like the documentation on this change. I thought a lot about adding "methods" in the past, and, I've explored Python-like syntax as well. I will definitely think more about this suggested syntax.

I'm curious, what would be the problems if we introduce a new language construct such as "this" or even "context" (maybe we can find an interesting approach that is similar to Go's context)? Just a quick thought.

EDIT: I totally misunderstood the suggested syntax. Lol. Still I will think more about it. 😄

@DawDavis
Copy link
Author

DawDavis commented Nov 3, 2020

Thanks for getting to this so quickly, both of you.

I hemmed and hawed over what to do in regard to the a keyword (Obviously it would be "this" I think, just due to convention). Several of my prototype versions used a keyword like that, but it just never felt go-like. The context idea is neat - it could perhaps even be an array of context, with each context on top of the other?

(That would be more difficult to implement, but I did consider it. There was always the question in my mind about how useful that would end up being.)

A ctx object would be interesting though... especially if the most "root" object contained the context from golang.

I'm not certain either way. I landed on this syntax as a kind of call-back to how golang tends to do things.

EDIT: Oh and thanks for the compliment on the documentation changes - I spent a lot of time trying to make sure it was of the same calibre as the rest of Tengo's docs.

@DawDavis DawDavis requested a review from geseq November 3, 2020 20:10
Copy link
Collaborator

@geseq geseq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. Just two comments. An array example in the documentation would be nice. That said, should we have receivers on arrays at all. I suppose it simplifies things a little on the OpIndex, but it seems counterintuitive.

@DawDavis
Copy link
Author

DawDavis commented Nov 3, 2020

@geseq Yeah, I considered omitting them from the specification, but because Tengo has no defined classes nor definitions, it made more sense to me to not artificially limit which collection scopes allowed receiver passing.

@DawDavis
Copy link
Author

DawDavis commented Nov 3, 2020

Syntax

I've been thinking about the method declaration syntax, and these are a few thoughts I've had on the subject. This is all just speculative brainstorming. I based my ideas off of this python snippet:

class Obj:
   def __init__(self, y):
      self.x = y

   def getX(self):
      return self.x
//...
o = Obj(23)
n = o.getX() //n = 23

Translated to Tengo-ish syntax:

newObj := func() {
   return {
      x: 23
      getX: func(self) {
         return self.x
      }
   }
}

n := newObj().getX() //n = 23

Pros:

  • The "self" keyword makes the receiver-nature of the method call obvious.

Cons:

  • Adding "self" to the function definition looks messy, and is a bit obtuse.

If we went with a standard keyword, self should be it IMO. It's more intuitive than the c++/java/JS/c# flavored this. However, we should not include it in the definition (too cluttery) - that said if we did both of those things, it then becomes a compile-time challenge as to how to figure out that a function is in fact a method. The obvious check, I suppose, would be to verify if the self variable is ever accessed while inside a function body, and then modify the FuncLitExpr in order to reflect that.

To avoid that, we could set up a new definition keyword to append to func or to replace func. This version may end up looking something like:

newObj := func() {
   return {
      x: 23
      getX: func method () {
         return self.x
      }
   }
}

n := newObj().getX() //n = 23

Here I am using method as a new keyword. I kinda hate this formatting: it's weird to actually see the word method in code... Maybe that's just a personal thing. And now that I think about it, it doesn't quite make sense to refer to this feature as a 'true' method, because the defined function is not actually bound to the parent map.

Perhaps changing func method to func& or func* would be slightly less jarring?

This format would likely open itself to having a self.parent hierarchy access syntax, which may be a nice-to-have.

However, this idea does make it less intuitive to capture a receiver via a closure.
e.g:

Current Branch
m := {
   grab: func(outer)() {
      return func(inner)() {
         return outer
      }
   }
}

f := m.grab()
f() // returns m
Possible Change with 'Self'
m := {
   grab: func*() {
      outer := self
      return func*() {
         return outer
      }
   }
}

f := m.grab()
f() // returns m

Personally, I think this is relatively niche, but I thought I'd mention it, due to how many hours have been lost figuring out what this means in a JavaScript context. If Tengo can prevent that sort of head-scratching at the syntax level, I think it might be worth it.

Overall though, If we don't go with the syntax proposed in the PR comment, I think that the above snippet is the second-best candidate.

@DawDavis DawDavis requested a review from geseq November 7, 2020 20:01
@d5
Copy link
Owner

d5 commented Nov 8, 2020

I've spent some time to try this, but, I'm still not quite convinced if we should make this change. This is a significant change and naturally increases the maintenance costs by a lot (more bugs and edge cases) and makes the syntax a bit more alienating to Go coders in my opinion. But my main concern is maintainability to be honest.

I'm actually more interested in Go's context-like approach: is immutable, can be safely passed around, and, (like @DawDavis mentioned earlier) can be potentially useful if we use it to pass from actual Go context value.

@DawDavis
Copy link
Author

DawDavis commented Nov 8, 2020

@d5 , I have had half a mind to create a module that includes ctx stuff for Tengo, due to the project I upstreamed this PR from. Is that a direction that you're willing to explore with Tengo moving forward?

@d5
Copy link
Owner

d5 commented Nov 8, 2020

I'm sure there will be more details we will have to figure out, but, I think Go-like "context" construct is better because it's more easily understood and very commonly used by most Go coders. One obvious question is should that be explicitly declared? Or should we make it some kind of intrinsic value (or keyword)?

@Bai-Yingjie
Copy link

How about this approach?

fmt := import("fmt")

newCgroup := func(name) {
    cg := {name:name, pids:[]}

    cg.addPid = func(pid) {
        fmt.println("add ", pid, " to cgroup ", cg.name)
        cg.pids = append(cg.pids, pid)
        fmt.println("all pids:", cg.pids)
    }

    return cg
}

cg1 := newCgroup("cg1")
cg2 := newCgroup("cg2")

cg1.addPid(123)
cg1.addPid(456)

cg2.addPid(111)
cg2.addPid(222)
cg2.addPid(333)

cg1.addPid(789)

//output:
//add 123 to cgroup cg1
//all pids:[123]
//add 456 to cgroup cg1
//all pids:[123, 456]
//add 111 to cgroup cg2
//all pids:[111]
//add 222 to cgroup cg2
//all pids:[111, 222]
//add 333 to cgroup cg2
//all pids:[111, 222, 333]
//add 789 to cgroup cg1
//all pids:[123, 456, 789]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants