Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additions and changes to initial part of Group-by section #115

Merged
merged 4 commits into from
Oct 10, 2023

Conversation

mars0i
Copy link
Member

@mars0i mars0i commented Oct 8, 2023

Added paragraph explaining that most operations apply to sub-datasets, and gave example to illustrate the idea.

Moved up two lines that relate to this point.

Added explanation of effect of grouping parameters on selection of rows for sub-datasets. I couldn't figure out how to describe the map from group names to indexes in the same way, so I made this item last, and gave it a different explanation.

Fixed one minor typo in first line of Group-by section (added "s" to "pack").

Fixed minor typos in line about grouped? meta tag.

Added paragraph explaining that most operations apply to
sub-datasets, and gave example to illustrate the idea.

Moved up two lines that relate to this point.

Added explanation of effect of grouping parameters on selection of rows
for sub-datasets.  I couldn't figure out how to describe the map from
group names to indexes in the same way, so I made this item last, and
gave it a different explanation.

Fixed one minor typo in first line of Group-by section (added "s"
to "pack").

Fixed minor typos in line about `grouped?` meta tag.

Additions, changes to beginning of Group-by section

Added paragraph explaining that most operations apply to
sub-datasets, and gave examples to illustrate the idea.

Moved up two lines that relate to this point.

Added explanation of effect of grouping parameters on selection
of rows for sub-datasets.  I couldn't figure out how to describe
the map from group names to indexes in the same way, so I made
this item last, and gave it a different explanation.

Fixed small typo in first line of Group-by section (added "s" to
"pack").

Fixed some typos in the line about the :grouped? meta tag.
@mars0i
Copy link
Member Author

mars0i commented Oct 8, 2023

The commit message contains the same text twice. Sorry about that. I'm sure there's a way to fix it, but I couldn't figure it out, and it didn't seem worth a lot of time.

@genmeblog
Copy link
Member

Cool! Thanks. I'll read it tomorrow (midnight here) and let you know if it's ok.

@mars0i
Copy link
Member Author

mars0i commented Oct 9, 2023

Great. Also, I was thinking of suggesting additions to the group-by docstring, analogous to what I did TMD's group-by (techascent/tech.ml.dataset#375), but the docstring for TC's group-by would be different, because its behavior is more complex. I thought I'd wait and see what you thought of the index.Rmd changes. I can submit a separate PR for the docstring if that seems worthwhile.

docs/index.Rmd Outdated
In the case of the first three of these methods, each sub-dataset contains all and only rows from the original data set that share the same grouping value:

* the value of the row in a specified single column
* the sequence of resulting values for a sequence of column names
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case the result is a map not a sequence.

(tc/group-by DS [:V1])
;; => _unnamed [2 3]:
;;    |   :name | :group-id |                 :data |
;;    |---------|----------:|-----------------------|
;;    | {:V1 1} |         0 | Group: {:V1 1} [5 4]: |
;;    | {:V1 2} |         1 | Group: {:V1 2} [4 4]: |

Copy link
Member Author

@mars0i mars0i Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't realize that. I'll fix it. (I mixed up that case with an example in which I used a function that returned a sequence, which is of course different.)

docs/index.Rmd Outdated
* the sequence of resulting values for a sequence of column names
* the value returned by the function taking row as map

In the case of the map from group names to sequences of indexes, each sub-dataset will contain all and only rows with the indexes in one value sequence of indexes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds incomprehensible a little bit to me:

[...] with the indexes in one value sequence of indexes

Maybe something like that?

[...] with the indexes listed in the sequence for a given group name (a key).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. :-) I like your revision and will make the change.

(I had trouble figuring out a way to spell out grouping by maps to values
in a way that was brief but didn't make my eyes glaze over unless I kept
in mind what I was talking about.  I don't want to do that to a reader.
I think the current version is OK.)
Copy link
Member

@genmeblog genmeblog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@genmeblog
Copy link
Member

Do you want to work more on this topic? Or should I merge and deploy?

@mars0i
Copy link
Member Author

mars0i commented Oct 9, 2023

Thanks @genmeblog. I'm fine with you merging it. (I saw some typos in a keyword name in another part of the doc, but they're completely unrelated, so I think I should submit a different PR for that.)

@mars0i
Copy link
Member Author

mars0i commented Oct 9, 2023

I decided it was silly to leave the typos unfixed. There were two instances of :ungroup in the text that should be :ungroup?. That's now corrected. It concerns aggregate, but applied to grouped datasets, so it's related to the other parts of this PR. I hope you don't mind--I can roll it back if you want.

@genmeblog
Copy link
Member

Great, thanks! I don't mind, of course. We can do doc updates in batches. It doesn't matter if they are related to one topic or many. The most important is to have it better and better.

docs/index.Rmd Outdated
@@ -1680,7 +1704,7 @@ Aggregator is a function or sequence or map of functions which accept dataset as

Where map is given as an input or result, keys are treated as column names.

Grouped dataset is ungrouped after aggreation. This can be turned off by setting `:ungroup` to false. In case you want to pass additional ungrouping parameters add them to the options.
Grouped dataset is ungrouped after aggreation. This can be turned off by setting `:ungroup?` to false. In case you want to pass additional ungrouping parameters add them to the options.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've found one more typo. aggreation -> aggregation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well bundle that into this PR, so I made the change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I didn't read this text carefully! I tried to experiment with ungrouping with aggregate, and it didn't work with :ungroup; then I noticed :ungroup? in the code example.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably there are more such unfortunate things in the text... :/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, can't be helped.

(Do you need me to squash the commits at this point?)

@genmeblog
Copy link
Member

No need to squash.
I will be merging today and releasing new version (there are some other changes waiting).

@genmeblog genmeblog merged commit 515da95 into scicloj:master Oct 10, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants