Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
first attempt to support awkward arrays #647
first attempt to support awkward arrays #647
Changes from 83 commits
5604eac
7dbe908
c0bbf5a
0281324
624a529
05c6c75
3d359de
9bf0cb9
974040c
13c4d59
74ae9e3
1d0e629
aeba549
88a5c83
15b3d1a
7e6beaa
77d5b6c
4aa3d26
3670d8b
72977c8
06032b2
1704aa7
ccc28c2
e524389
bd2f28d
a928198
fee56ee
0775e53
4b89a9b
9d56157
d14de3e
4d62e7e
c4c1b3f
339bce8
012de5e
7884598
e16ae35
0bced2f
588b6af
e687e19
41b1423
fa8a386
733937a
ed532a2
e5706c3
fceab1b
ef0637a
06608e9
3c46363
285e3b3
1dc93a6
7fa65dd
32c44cf
5e1c1da
08154f7
71f0471
2e66409
6cdcaa0
741af1c
c027669
7f7ebb6
771b2ab
4922603
c5c5335
8e7a725
c3ccf2f
a8e1648
c9a6417
d836999
ed95d8f
f7edc67
5a1d056
83effad
21a4b5f
4ff5851
2c59b19
988579e
504cae1
4dc0826
3ab5646
371f66e
d2eaf66
7b57167
b3678b6
d523c89
222998d
3fc9817
6a6657b
7ac4a0c
c016725
0340151
02365a6
c20cc31
8421ee6
2d024f1
ddefdcf
50a8dc3
fe27b74
2aed5b6
746ba7d
a589820
c26db5b
cd1a451
9b2ff61
7637fe3
52a804a
75e7526
d3d1d26
77e3953
536f729
cfe200e
d6d35bd
cf4ad03
46d553f
e8eeb54
5ab0708
5b39691
45a9958
94aa4ef
99853d5
cd2abdd
96bfe31
4a6d119
4243ccc
5ad915a
07246cc
fb137af
6e32637
3883bb0
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do these reindexers do during merge and what happens if we don't reindex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you asking what
Reindexer
s do generally, or specifically what the ones defined here do?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these ones specifically.
I think they are relevant for subsetting columns if
obsm
/varm
is a data frame? I that case I think it would indeed be the right thing to do nothing for awkward arrays, as the second axis is not aligned to anything and could be of variable length.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It's for taking the intersection of the available columns, so we are just concatenating those for an inner join.
For non-labelled arrays (ndarrays and sparse) we take minimum of the column axis size.
That's where I get a little confused, since it depends on the dimensionality of the awkward array.
I think what inner join would be, would be to take the intersection of all axes apart from the one we're concatenating along.
For example, lets say we have:
For an inner join I think we take the intersection of keys from the
Record
dimension, dropping the"b"
key from the record type.For an outer join, I think I would expect an array where
b
is an optional type.However: I see awkard array makes a union type here – so I'm not too sure what's right here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would only work for a
RecordType
, what would you do for aListType
?Also, even when you are using a RecordType, unlike for a dataframe, there's no need that every record has the same columns. So I would keep it simple and just concatenate them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A record type has a fixed set of keys, while a list type is inherently unsized. I'd consider
ListTypes
equivalent, whileRecordType
s are only aligned if the keys are the same.I sorta see it as we have already defined
inner
(intersection) andouter
(union) for Record and Regular dimensions. I think they don't really have a meaning forList
dimensions, so don't do anything.This is one option. That
join
has no meaning for awkward arrays.But, I think there is quite a lot of usefulness to the
inner
join. We are dropping fields which aren't represented for all samples. Do you actually want to always keep those around?I think it also becomes confusing in the case where one of the dimensions is Regular. Now we can have different
Regular
dimensions per row. What does this mean for concatenating values inX
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit afraid that this ends up being a rabbit hole, as the arrays can be arbitrarily nested. If you have a record type that's embedded in a variable-length list type in the 6th dimension, would you still want to attempt an inner/outer join? There might also be nasty
UnionType
combinations (technically it is possible to have both records and lists in the sameListType
).Could we get away with not performing a merge for now, and implement a merging strategy later (if requested) in a backwards-compatible manner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how we could do this in a backwards compatible way, since the current logic isn't a subset of the intersection. If you want to punt on this I think we can say awkward array support is experimental and the concatenation logic will change in the future.
I don't think the arbitrarily nested types would be too difficult to work with. We could just iterate through the levels applying the same logic at each level.
UnionType
s however, I'm not so sure how to deal with.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we go with
for now.
We can create an issue to track adding support for inner joins of awkward arrays (or at least various special cases thereof, such as RecordTypes in the second dimension. Since we decided on releasing this as experimental, I wouldn't worry to much about backwards compatibility.