Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deviations from the R langauge #6

Open
3 of 9 tasks
dgkf opened this issue May 30, 2023 · 7 comments
Open
3 of 9 tasks

Deviations from the R langauge #6

dgkf opened this issue May 30, 2023 · 7 comments
Labels
type-design Discussion regarding design of enhancements or project at large

Comments

@dgkf
Copy link
Owner

dgkf commented May 30, 2023

An ongoing catalog of intentional deviations from R, in varying stages of maturity

Confident

  • Lowercase equivalents for all uppercase keywords (NULL, TRUE, FALSE, Inf, NA)
  • fn as an alias for function (and replaces R lambda \() syntax)
  • Allow trailing commas in all function calls (eg list(a = 1,))

Undecided

  • [ as a primitive for creating a vector (ie, [1, 2, 3])
  • ( as a primitive for creating a list (ie (1, 2, 3) or (a = 1, b = 2, c = 3))
  • Introduce scalar values
  • Remove [[ as an indexing operator.
    With the above, indexing with a scalar value could return an element (x[1]), while indexing with a vector could return a vector (x[ [1, 2] ]; spaces to emphasize that this also uses the [ operator by passing a vector, [1, 2] - not a separate [[ operator). This is more similar to python's pandas indexing, but doesn't play nicely without scalar values in the first place (otherwise 1 is equivalent to c(1) an we always index by vector anyways).

Needs Feedback

  • A language construct for flagging when non-standard evaluation is enabled. This should exist at function declaration. Undecided on whether it should apply to a whole function or specific arguments.
  • A type system (some early exploration in an R-native implementation in dgkf/typewriter)
@dgkf dgkf pinned this issue May 30, 2023
@dgkf dgkf added the type-design Discussion regarding design of enhancements or project at large label May 31, 2023
@sebffischer
Copy link
Collaborator

A very minor thing, but it would be great if list(a = 1,) work just like list(a = 1) :D

@dgkf
Copy link
Owner Author

dgkf commented Oct 11, 2023

A very minor thing, but it would be great if list(a = 1,) work just like list(a = 1) :D

Done!

image

This was a stealth feature, even to me! I only discovered that I baked it in last time it was mentioned over on mastodon. However, I never went back and updated this issue to document it.

One point of contention here is whether to allow arbitrary numbers of empty arguments (list(,,,,,b = 3) or x[, 2]). Still not sure if empty arguments are desirable as meaningful call arguments generally. The only place where I think this really has impact is for axis indexing. Personally, I found R's x[,2] syntax much more confusing at first than python's df.loc[:, 2] and julia's df[:, 2] which both at least have some argument, but that might just be a matter of which language I encountered first. Ultimately they're all expressing the same idea, and R's is definitely the most terse.

@sebffischer
Copy link
Collaborator

sebffischer commented Oct 17, 2023

If I may add a little more to the wishlist, having sub-modules / sub-packages would also be something that I would appreciate quite a bit :)

@dgkf
Copy link
Owner Author

dgkf commented Oct 17, 2023

If I may add a little more to the wishlist, having sub-modules / sub-packages would also be something that I would appreciate quite a bit :)

Yeah, I think this will be a nice-to-have. For a long time I've felt like the inability to nest packages is actually a powerful feature that forces the minimal scoping of packages and splitting them into more focused sub packages. However, what I've come to appreciate is how hard it is to develop packages that are inter-dependent (similar to the tidyverse). I think there might be a best of both worlds in here where a package can take a dependency on only a sub-package without taking a dependency on the whole set (Depends: tidyverse::dplyr (>= 1.0.0)).

In that sense, the tidyverse style meta-package becomes a more formalized construct, allowing for the release of a cohort of related packages without the overhead of necessitating all of them to be used together.

There's tons to consider in here that warrants its own issue when we get to that point. If you're passionate about the considerations involved here, I welcome the initiative to get the planning and design started.

@sebffischer
Copy link
Collaborator

I would definitely like to work on this, but I would first try to get a better understanding of the code-base! I like the idea of making a "universe" a formalized construct. I am working on a project (mlr3) that is also a collection of dependent R packages and I know some of the troubles related to that, e.g.:

  • package A has a non-exported function f that I want to use in package B, but I don't want to export it in package A for regular users.
  • I want to make a change in package A that will break package B, therefore I either have to synchronize new releases of package A and B or have to do some additional releases that are compatible with different versions.
  • Where exactly should integration tests live?
  • Dependency management

Maybe one could use rust's idea of a workspace for inspiration.

@sebffischer
Copy link
Collaborator

To put another item on the agenda: I think the list() datatype in R is currently somewhat overloaded, as it is being used both as a standard (unnamed) list but also as a dictionary (when it has names). Unfortunately it does not ensure that names are unique, and stuff like list(a = 1, a = 2)$a is definitely a source of bugs and something that should not be possible imo.

It might be a good idea, to define this a bit more rigorously and possibly split the list() datatype into a dict() and a (unnamed) list(), what do you think?

@sebffischer
Copy link
Collaborator

sebffischer commented Dec 26, 2023

Also, should partial argument matching still be allowed?
Some arguments against it:

  • It can lead to unexpected behavior (at least how it was implemented in R), let's say I have a function with signature function(..., sep = " ") and I call this function with f(s = "a"), the s will partially match sep, which is definitely not something that I would expect.
  • Code might break when additional arguments are added to a function: Let's say my function has signature f(first, second) and I call it with f(fir = 1, sec = 2), but then the function f gets an additional argument f(first, second, fir = NULL). If I keep calling the function with f(firs = 1, sec = 2), this will lead to different results.
  • We don't have to partially match arguments to parameters each time a function is called (this might be negligible though)

Some argument in favour:

  • it saves some typing

Still, I think the amount of typing that is saved (this should actually not really make a big difference when having proper autocompletion) is not worth the disadvantages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-design Discussion regarding design of enhancements or project at large
Projects
None yet
Development

No branches or pull requests

2 participants