-
Notifications
You must be signed in to change notification settings - Fork 978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Documentation for env
Parameter Usage*
#6360
base: master
Are you sure you want to change the base?
Conversation
LGTM |
I removed the DT definition since there are already other DT definitions that we can use for this example. |
Thanks for the PR! Actually, for addressing #6323 I had in mind extending the programming vignette, rather than Although as Toby points out, it looks like this is the first example using |
sure, i m updating the programming vignette now. |
Direct usage of I can imagine it could even potentially fail on trying to build big expression in j. The only reason to use function name as a symbol rather than string is when we want to inject function body into expression. I see no reasonable use case for that. Definitly worth to add some examples to manual as well. Now it was only redirecting to vignette and substitute2 manual. |
by the way Jan it is really super convenient how you can substitute names via substitute2 / env, thanks for implementing that. |
@Nj221102 please also add discussion/example to programming vignette, maybe under |
Yes that was the "new" think vs. substitute or tidy way of meta programming. Afair that was the only part that required to be coded in C. |
@@ -403,6 +403,7 @@ print(DT["b", v2:=84L, on="x"]) # subassign to new column by reference (NA | |||
|
|||
DT[, m:=mean(v), by=x][] # add new column by reference by group | |||
# NB: postfix [] is shortcut to print() | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please undo addition of empty lines
print(result) | ||
``` | ||
|
||
In this example, the string `"sum"` is mapped to the `f` function. The `env` parameter allows `data.table` to recognize this `f` as the `sum` function, so it computes the total of `Sepal.Length` accordingly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is ok for a first example, but I believe the second example below could be better, by showing a more typical use case, such as
my_summarization <- function(DT, f, variable, by){
DT[, f(variable), by=by, env=list(f=f, variable=variable, by=by)]
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Toby's example without it's usage is not enough. Try with by
of length 2+
@@ -373,6 +373,29 @@ print(j) | |||
DT[, j, env = list(j = j)] | |||
``` | |||
|
|||
### Injecting functions using strings in `env` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the term "inject" come from?
I think the more typical R term would be "substitute" (like the function of that name)
@@ -373,6 +373,29 @@ print(j) | |||
DT[, j, env = list(j = j)] | |||
``` | |||
|
|||
### Injecting functions using strings in `env` | |||
|
|||
In `data.table`, you can inject functions into your expressions by passing their names as strings in the `env` parameter. This method allows you to use function names directly as strings, and `data.table` will automatically interpret these strings as the corresponding functions. This approach simplifies the process and makes it easier to work with function names dynamically. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a sentence that says you should use env=list(f="mean") and not env=list(f=mean)
@@ -177,7 +177,7 @@ data.table(\dots, keep.rownames=FALSE, check.names=FALSE, key=NULL, stringsAsFac | |||
See examples as well as \href{../doc/datatable-secondary-indices-and-auto-indexing.html}{\code{vignette("datatable-secondary-indices-and-auto-indexing")}}. | |||
} | |||
|
|||
\item{env}{ List or an environment, passed to \code{\link{substitute2}} for substitution of parameters in \code{i}, \code{j} and \code{by} (or \code{keyby}). Use \code{verbose} to preview constructed expressions. For more details see \href{../doc/datatable-programming.html}{\code{vignette("datatable-programming")}}. } | |||
\item{env}{ List or an environment, passed to \code{\link{substitute2}} for substitution of parameters in \code{i}, \code{j} and \code{by} (or \code{keyby}). For function names, you can use them as strings (e.g., \code{"sum"}) or pass function objects directly (e.g., \code{sum}). Use \code{verbose} to preview constructed expressions. For more details, see \href{../doc/datatable-programming.html}{\code{vignette("datatable-programming")}}.} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please change to:
For substitution of functions, you should use their names as strings (for example, \code{"sum"}) instead of passing function objects directly (for example, \code{sum})
(according to my reading of Jan's comments, we should discourage use of function objects, and instead encourage use of their names).
On second thought, instead of adding a sentence talking about function names specifically, we could just change the docs to say "List or environment, where names are symbols to find, and values are strings (names of objects), passed to substitute2..."
what do you think @jangorecki ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this explanation is oververbose. This is related not only to functions. Whatever symbol will be passed there, it will be substituted with what it refers to. I think just adding an example is enough. This is exactly how base substitute works, no? So if we refer to that we are covers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where names are symbols to find
Unless someone really want to substitute body of a function instead its name... Then that would not apply.
@@ -177,7 +177,7 @@ data.table(\dots, keep.rownames=FALSE, check.names=FALSE, key=NULL, stringsAsFac | |||
See examples as well as \href{../doc/datatable-secondary-indices-and-auto-indexing.html}{\code{vignette("datatable-secondary-indices-and-auto-indexing")}}. | |||
} | |||
|
|||
\item{env}{ List or an environment, passed to \code{\link{substitute2}} for substitution of parameters in \code{i}, \code{j} and \code{by} (or \code{keyby}). Use \code{verbose} to preview constructed expressions. For more details see \href{../doc/datatable-programming.html}{\code{vignette("datatable-programming")}}. } | |||
\item{env}{ List or an environment, passed to \code{\link{substitute2}} for substitution of parameters in \code{i}, \code{j} and \code{by} (or \code{keyby}). For function names, you can use them as strings (e.g., \code{"sum"}) or pass function objects directly (e.g., \code{sum}). Use \code{verbose} to preview constructed expressions. For more details, see \href{../doc/datatable-programming.html}{\code{vignette("datatable-programming")}}.} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where names are symbols to find
Unless someone really want to substitute body of a function instead its name... Then that would not apply.
Suppose you want to calculate the total of `Sepal.Length` in the `iris` dataset using the `sum` function. You can inject the `sum` function by passing its name as a string in the `env` parameter. | ||
|
||
```{r sum_example} | ||
result <- DT[, f(Sepal.Length), env = list(f = "sum")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to change vignette. Other examples in vignette already covers this use case. In the issue what was discussed was to extend manual, not vignette.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Michael commented #6360 (comment) that he wanted to add to vignette
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO other pieces of code in this vignette are sufficiently covering parametrizing function names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so do we need any changes to vignette or not, tbf while adding i also felt that eveything is already covered so i added a section for specific case mentioned in the issue.
Inconsistency is when someone will try to actually inject function body via passing function to env argument. Don't have reasonable use case for that but I don't think we can rule out such situation. For env it is actually sufficient to provide
"sum"
because it is automatically turned into name. So not even need to quote() or as.name(). What seems very reasonable is to add this use case to examples so it is documented to use"sum"
rather thansum
.
hi @MichaelChirico can you please help me out here, like what type of section are we thinking about adding in programming vignette?
closes #6323
Description:
This PR updates the documentation for the
env
parameter in thedata.table
package to include new functionality regarding function injection. Initially, the documentation update was focused solely on using function names as strings (e.g.,"sum"
), as requested in issue #6323. However, following the merge of PR #6026 which allows the direct use of function objects (e.g.,sum
). This PR updates the documentation to reflect both approaches.Changes:
Documentation Update:
env
parameter:"sum"
).sum
).Example Block Addition: