Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Documentation for env Parameter Usage* #6360

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Conversation

Nj221102
Copy link
Contributor

@Nj221102 Nj221102 commented Aug 6, 2024

closes #6323

Description:

This PR updates the documentation for the env parameter in the data.table package to include new functionality regarding function injection. Initially, the documentation update was focused solely on using function names as strings (e.g., "sum"), as requested in issue #6323. However, following the merge of PR #6026 which allows the direct use of function objects (e.g., sum). This PR updates the documentation to reflect both approaches.

Changes:

  1. Documentation Update:

    • The documentation now includes both methods for injecting functions using the env parameter:
      • Function Names as Strings: Using function names as strings (e.g., "sum").
      • Function Objects Directly: Passing function objects directly (e.g., sum).
  2. Example Block Addition:

    • Added examples demonstrating both approaches for function injection.

@TysonStanley
Copy link
Member

LGTM

man/data.table.Rd Outdated Show resolved Hide resolved
@tdhock
Copy link
Member

tdhock commented Aug 9, 2024

I removed the DT definition since there are already other DT definitions that we can use for this example.
I noticed this would add the first example use of env arg to ?data.table man page.
@jangorecki is there some reason why there were no env examples before?

man/data.table.Rd Outdated Show resolved Hide resolved
@MichaelChirico
Copy link
Member

MichaelChirico commented Aug 9, 2024

Thanks for the PR! Actually, for addressing #6323 I had in mind extending the programming vignette, rather than ?data.table.

Although as Toby points out, it looks like this is the first example using env= on that page, so it's good to keep the new example.

@Nj221102
Copy link
Contributor Author

Nj221102 commented Aug 9, 2024

Thanks for the PR! Actually, for addressing #6323 I had in mind extending the programming vignette, rather than ?data.table.

Although as Toby points out, it looks like this is the first example using env= on that page, so it's good to keep the new example.

sure, i m updating the programming vignette now.

@jangorecki
Copy link
Member

jangorecki commented Aug 9, 2024

Direct usage of sum rather than "sum" has to be avoided. Please investigate yourself. Take some big R function, and use it that way, verbose should display what's wrong. substitute2 may be even easier to show the problem.

I can imagine it could even potentially fail on trying to build big expression in j.

The only reason to use function name as a symbol rather than string is when we want to inject function body into expression. I see no reasonable use case for that.

Definitly worth to add some examples to manual as well. Now it was only redirecting to vignette and substitute2 manual.

@tdhock
Copy link
Member

tdhock commented Aug 9, 2024

by the way Jan it is really super convenient how you can substitute names via substitute2 / env, thanks for implementing that.

@tdhock
Copy link
Member

tdhock commented Aug 9, 2024

@Nj221102 please also add discussion/example to programming vignette, maybe under ### `get` ?

@jangorecki
Copy link
Member

jangorecki commented Aug 9, 2024

by the way Jan it is really super convenient how you can substitute names via substitute2 / env, thanks for implementing that.

Yes that was the "new" think vs. substitute or tidy way of meta programming. Afair that was the only part that required to be coded in C.
https://github.com/Rdatatable/data.table/blob/d00f292596d82fd21aac59edc34a9f8b388450f4/src/programming.c

@@ -403,6 +403,7 @@ print(DT["b", v2:=84L, on="x"]) # subassign to new column by reference (NA

DT[, m:=mean(v), by=x][] # add new column by reference by group
# NB: postfix [] is shortcut to print()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please undo addition of empty lines

print(result)
```

In this example, the string `"sum"` is mapped to the `f` function. The `env` parameter allows `data.table` to recognize this `f` as the `sum` function, so it computes the total of `Sepal.Length` accordingly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is ok for a first example, but I believe the second example below could be better, by showing a more typical use case, such as

my_summarization <- function(DT, f, variable, by){
  DT[, f(variable), by=by, env=list(f=f, variable=variable, by=by)]
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Toby's example without it's usage is not enough. Try with by of length 2+

@@ -373,6 +373,29 @@ print(j)
DT[, j, env = list(j = j)]
```

### Injecting functions using strings in `env`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the term "inject" come from?
I think the more typical R term would be "substitute" (like the function of that name)

@@ -373,6 +373,29 @@ print(j)
DT[, j, env = list(j = j)]
```

### Injecting functions using strings in `env`

In `data.table`, you can inject functions into your expressions by passing their names as strings in the `env` parameter. This method allows you to use function names directly as strings, and `data.table` will automatically interpret these strings as the corresponding functions. This approach simplifies the process and makes it easier to work with function names dynamically.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a sentence that says you should use env=list(f="mean") and not env=list(f=mean)

@@ -177,7 +177,7 @@ data.table(\dots, keep.rownames=FALSE, check.names=FALSE, key=NULL, stringsAsFac
See examples as well as \href{../doc/datatable-secondary-indices-and-auto-indexing.html}{\code{vignette("datatable-secondary-indices-and-auto-indexing")}}.
}

\item{env}{ List or an environment, passed to \code{\link{substitute2}} for substitution of parameters in \code{i}, \code{j} and \code{by} (or \code{keyby}). Use \code{verbose} to preview constructed expressions. For more details see \href{../doc/datatable-programming.html}{\code{vignette("datatable-programming")}}. }
\item{env}{ List or an environment, passed to \code{\link{substitute2}} for substitution of parameters in \code{i}, \code{j} and \code{by} (or \code{keyby}). For function names, you can use them as strings (e.g., \code{"sum"}) or pass function objects directly (e.g., \code{sum}). Use \code{verbose} to preview constructed expressions. For more details, see \href{../doc/datatable-programming.html}{\code{vignette("datatable-programming")}}.}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change to:
For substitution of functions, you should use their names as strings (for example, \code{"sum"}) instead of passing function objects directly (for example, \code{sum})
(according to my reading of Jan's comments, we should discourage use of function objects, and instead encourage use of their names).

On second thought, instead of adding a sentence talking about function names specifically, we could just change the docs to say "List or environment, where names are symbols to find, and values are strings (names of objects), passed to substitute2..."

what do you think @jangorecki ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this explanation is oververbose. This is related not only to functions. Whatever symbol will be passed there, it will be substituted with what it refers to. I think just adding an example is enough. This is exactly how base substitute works, no? So if we refer to that we are covers.

Copy link
Member

@jangorecki jangorecki Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where names are symbols to find

Unless someone really want to substitute body of a function instead its name... Then that would not apply.

@@ -177,7 +177,7 @@ data.table(\dots, keep.rownames=FALSE, check.names=FALSE, key=NULL, stringsAsFac
See examples as well as \href{../doc/datatable-secondary-indices-and-auto-indexing.html}{\code{vignette("datatable-secondary-indices-and-auto-indexing")}}.
}

\item{env}{ List or an environment, passed to \code{\link{substitute2}} for substitution of parameters in \code{i}, \code{j} and \code{by} (or \code{keyby}). Use \code{verbose} to preview constructed expressions. For more details see \href{../doc/datatable-programming.html}{\code{vignette("datatable-programming")}}. }
\item{env}{ List or an environment, passed to \code{\link{substitute2}} for substitution of parameters in \code{i}, \code{j} and \code{by} (or \code{keyby}). For function names, you can use them as strings (e.g., \code{"sum"}) or pass function objects directly (e.g., \code{sum}). Use \code{verbose} to preview constructed expressions. For more details, see \href{../doc/datatable-programming.html}{\code{vignette("datatable-programming")}}.}
Copy link
Member

@jangorecki jangorecki Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where names are symbols to find

Unless someone really want to substitute body of a function instead its name... Then that would not apply.

Suppose you want to calculate the total of `Sepal.Length` in the `iris` dataset using the `sum` function. You can inject the `sum` function by passing its name as a string in the `env` parameter.

```{r sum_example}
result <- DT[, f(Sepal.Length), env = list(f = "sum")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to change vignette. Other examples in vignette already covers this use case. In the issue what was discussed was to extend manual, not vignette.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael commented #6360 (comment) that he wanted to add to vignette

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO other pieces of code in this vignette are sufficiently covering parametrizing function names

Copy link
Contributor Author

@Nj221102 Nj221102 Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so do we need any changes to vignette or not, tbf while adding i also felt that eveything is already covered so i added a section for specific case mentioned in the issue.

Inconsistency is when someone will try to actually inject function body via passing function to env argument. Don't have reasonable use case for that but I don't think we can rule out such situation. For env it is actually sufficient to provide "sum" because it is automatically turned into name. So not even need to quote() or as.name(). What seems very reasonable is to add this use case to examples so it is documented to use "sum" rather than sum.

hi @MichaelChirico can you please help me out here, like what type of section are we thinking about adding in programming vignette?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document well how to flexibly pass functions in env=
5 participants