Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inaccurate discussion of dplyr #9

Open
ljanda opened this issue Jul 12, 2019 · 5 comments
Open

inaccurate discussion of dplyr #9

ljanda opened this issue Jul 12, 2019 · 5 comments

Comments

@ljanda
Copy link

ljanda commented Jul 12, 2019

You write that dplyr "consists of 263 functions." Though you do state that "a user initially need not use more than a small fraction of them" you then say "the high complexity is clear". This is not an accurate or responsible discussion. Dplyr has six core functions - mutate, select, filter, summarise, arrange, and group_by - that are by far most commonly needed. You then state "every time a user needs some variant of an operation, she must sift through those hundreds of functions for one suited to her current need, which is also inaccurate since the majority of the added functions, eg mutate_if(), mutate_all(), and mutate_at(), are simply clear variants of a core verb, eg mutate() that can be easily referenced within autofill or the help documentation.

I would suggest you at least add a discussion of the six core dplyr verbs or rewrite this section as such:
Tidyverse students are being asked to learn a [smaller] volume of material, which is [potentially good] pedagogy. See "The Tidyverse Curse" [a post that covers two concerns with Tidyverse that are not related to what is listed here], in which the author says inter alia that he uses "only" 60 Tidyverse functions -- 60! The "star" of the Tidyverse, dplyr, consists of 263 functions. While a user initially need not use more than a small fraction of them, [since there are six core verbs/functions - mutate, select, filter, summarise, arrange, and group_by] the high complexity is [limited]. Every time a user needs some variant of an operation, she [has no need to] sift through those [functions that can be easily referenced within autofill or the help documentation and are usefully named] for one suited to her current need. [Furthermore, many of the added functions, eg mutate_if(), mutate_all(), and mutate_at(), are simply clear variants of a core verb, eg mutate().]

Also, you do the same number of functions citing with purrr, which once again has a small core of functions (most people use some variant of map()). It is not good practice to just give numbers rather than give the actual details.

Furthermore, in terms of pedagogy, there is a lot of evidence that humans learn things more easily though narrative devices, and it is reasonable to argue that the core dplyr verbs are narrative-driven and memorable, thus making them easier to learn than the base R or data.table syntax (especially to the many R users that are researchers and don't have a CS background or exposure to other programming languages, but arguably easy for most people).

@matloff
Copy link
Owner

matloff commented Jul 13, 2019

Sorry, I disagree, based on long experience teaching programming and even English.

@ljanda
Copy link
Author

ljanda commented Jul 14, 2019

You did not address the main issue that you misrepresent dplyr and purrr. I don't want to veer into wild speculation, as your blog post does, but this description seems willfully off-base, especially since you cite a large number to presumably shock/scare readers rather than giving the actual details.

In reality, dplyr relies on six verbs and the teaching materials always start with those six verbs. This is far less complex than base R. People can move on to more complex variants of those verbs, which naturally provides a scaffolded learning experience.

Furthermore, you are giving the "you're wrong because I think you're wrong and I have some supposed credentials" argument. I too have been an educator (high school ELA, undergrad stats, got awards for both) and though I have had experience with teaching I also know that pedagogy research is more reliable than my experience of one.

@matloff
Copy link
Owner

matloff commented Jul 14, 2019

Not sure what to say here. The "sifting through" a large number of functions actually represents what happened to me personally recently in a discussion about pipes. No matter what the function count is, in the end it's more than in base-R, where one need only know how [,] works. Hence my point about "teach a person to fish."

An essay by definition is one's own opinion, informed by one's own experiences. I hope we can at least agree on that.

@ljanda
Copy link
Author

ljanda commented Jul 15, 2019

How can you say that all you need to know with base R is how [,] works when you just told me I should be using tapply?

Of course an essay is opinion-based (this is why I have not opened any issues on your, imo, overblown opinions about the impact of the tidyverse on the future of R), but that does not give one carte blanche to misrepresent facts. At the end of the day, dplyr is six core functions that are easy to learn. You're right, if you only teach people base R, they will be fishing more - fishing for the right solution, that is.

@matloff
Copy link
Owner

matloff commented Jul 15, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants