Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE REQUEST] add indexing via array #607

Closed
Derfies opened this issue May 3, 2023 · 13 comments · May be fixed by #651
Closed

[FEATURE REQUEST] add indexing via array #607

Derfies opened this issue May 3, 2023 · 13 comments · May be fixed by #651
Labels
enhancement New feature or request

Comments

@Derfies
Copy link
Contributor

Derfies commented May 3, 2023

I would like the ability to use one ndarray to index into another one, aka “Advanced indexing” (https://numpy.org/doc/stable/user/basics.indexing.html)

eg:

x = np.arange(10, 1, -1)
-> x
array([10, 9, 8, 7, 6, 5, 4, 3, 2])
-> x[np.array([3, 3, 1, 8])]
array([7, 7, 9, 2])

@Derfies Derfies added the enhancement New feature or request label May 3, 2023
@v923z
Copy link
Owner

v923z commented May 3, 2023

Would one-dimensional indexing suffice? You could still index higher-dimensional arrays, I am asking about the dimension of the indexing array.

@Derfies
Copy link
Contributor Author

Derfies commented May 3, 2023

My specific use case is essentially as above - it's a simple 1 -> 1 lookup table that I am attempting to achieve. So short answer - yes absolutely!

@hamza-712
Copy link

Hi, can you please implement index 1D array

@s-ol
Copy link
Contributor

s-ol commented Dec 8, 2023

I would also find it really useful to be able to use indexing, in my case to reorder elements. To simplify, this could/should be limited to uint* valued arrays maybe. 1D would be a good start I guess.

@s-ol
Copy link
Contributor

s-ol commented Dec 8, 2023

Here's a diff that implements indexing by u/int* valued arrays. The indexing array can be multi-dimensional but the indexed-array needs to be linear. Note that there is also no range-checking whatsoever.

Instead of "wasting" cycles by checking for errors in the tight loop, maybe the indices could be taken mod the length of the indexed array, with negative indices being counted from the back. That way it at least constitutes a "feature".

As a library user, I'd use an "unsafe" version if I could for more performance, but I'm not sure if that is aligned with the spirit of this library.

index.patch

@v923z
Copy link
Owner

v923z commented Dec 8, 2023

Thanks for taking a stab at it!

Here's a diff that implements indexing by u/int* valued arrays. The indexing array can be multi-dimensional but the indexed-array needs to be linear. Note that there is also no range-checking whatsoever.

I think there should be some sort of range checking, and it should happen at the very beginning. There is no point in reserving space for an array, if it turns out that the array cannot be filled.

Instead of "wasting" cycles by checking for errors in the tight loop, maybe the indices could be taken mod the length of the indexed array, with negative indices being counted from the back. That way it at least constitutes a "feature".

numpy is quite clear on what an index means, and we shouldn't break with that.

As a library user, I'd use an "unsafe" version if I could for more performance, but I'm not sure if that is aligned with the spirit of this library.

I would be OK with partial features, e.g., implementing something for 1D arrays only, and bailing out, if you pass a 2D array, but things that lead to unexpected results should be avoided.

As you have realised, indexing by arrays is significantly harder than slicing, because it is relatively easy to inspect slices, while making sure that the values of an indexing array are within limits requires the inspection of the full array.

1D indexing with reasonable error checking would not be hard, so if you are OK with that, I could relatively easily add that. If your main concern is the time that could be lost with checking the indexing array, what could be an alternative is to implement this feature as a function, where you could pass a keyword argument, so that you could skip the error checking step. I don't think that there is such a function in numpy, so we wouldn't have to care about compatibility.

@s-ol
Copy link
Contributor

s-ol commented Dec 9, 2023

numpy is quite clear on what an index means, and we shouldn't break with that.

I haven't used numpy extensively, so I had to look it up. For reference, here's how 'advanced indexing' works there:

  • negative indices i are considered as len(array) - i
  • indices out of bounds (after adjusting for negative indices) raise an exception

That seems to gel quite well with your proposal of checking bounds up-front, a first pass could do bounds-checking, conversion of negative indices, and maybe even account for non-dense source array strides.

1D indexing with reasonable error checking would not be hard, so if you are OK with that, I could relatively easily add that.

I think 1d source arrays would be a great addition, and anything else can be accomplished anyway using .flatten() on the input. It seems to me from my attempt that supporting multiple dimensions for the indexing array (and therefore the output) comes more or less "for free" so that could be included IMO.

what could be an alternative is to implement this feature as a function, where you could pass a keyword argument, so that you could skip the error checking step

I think this is mostly a decision of project direction and scope. If the goal of ulab is to be numpy compatible and enable the reuse of near-verbatim numpy code on embedded platforms, features like this are not necessary.

I myself ended up using ulab from a different perspective though. I have a need to do some calculations involving "mid-size" buffers (50-150 elements) at a comparatively high rate in CircuitPython, on a platform that has no hardware floating point support. I also want this part of my codebase to be easily extensible by users, so if possible I would like to avoid moving my code into a custom C library.

It could definitely be argued that my needs would be met better by a dedicated library that focuses on high-performance fixed-point array primitives, but there is significant overlap with the features that ulab provides today. Features like safety opt-out or the out= arguments would be very welcome for this type of application.

@v923z
Copy link
Owner

v923z commented Dec 10, 2023

That seems to gel quite well with your proposal of checking bounds up-front, a first pass could do bounds-checking, conversion of negative indices, and maybe even account for non-dense source array strides.

And that brings up the first question: since negative indices have to be converted, you can't do this without reserving space for the indices, or having to do the conversion twice. Which one?

1D indexing with reasonable error checking would not be hard, so if you are OK with that, I could relatively easily add that.

I think 1d source arrays would be a great addition, and anything else can be accomplished anyway using .flatten() on the input. It seems to me from my attempt that supporting multiple dimensions for the indexing array (and therefore the output) comes more or less "for free" so that could be included IMO.

Can you create a PR with this, so that we can iron out the details? I'm a bit pessimistic about the higher-dimensional cases, but I might have overlooked something.

what could be an alternative is to implement this feature as a function, where you could pass a keyword argument, so that you could skip the error checking step

I think this is mostly a decision of project direction and scope. If the goal of ulab is to be numpy compatible and enable the reuse of near-verbatim numpy code on embedded platforms, features like this are not necessary.

I'm all for numpy-compatibility, wherever possible, but there are multiple issues that one has to deal with on a microcontroller. Pretty much all are related to the scarcity of resources, and there were a couple of instances, where we had to break with compatibility, simply because there was no reasonable way of implementing a features within the bounds of an MCU.

I myself ended up using ulab from a different perspective though. I have a need to do some calculations involving "mid-size" buffers (50-150 elements) at a comparatively high rate in CircuitPython, on a platform that has no hardware floating point support. I also want this part of my codebase to be easily extensible by users, so if possible I would like to avoid moving my code into a custom C library.

ulab was meant to be extendable (https://micropython-ulab.readthedocs.io/en/latest/ulab-programming.html), though, I've got to admit that, to my knowledge, this idea never got traction.

It could definitely be argued that my needs would be met better by a dedicated library that focuses on high-performance fixed-point array primitives, but there is significant overlap with the features that ulab provides today.

If that's indeed the case, then why don't you consider implementing your features on top of ulab, simply as a user module?

@Derfies
Copy link
Contributor Author

Derfies commented Feb 15, 2024

Was there any further developments on this? I'm in a situation where leveraging fancy indexing or the equivalent of np.take would be seriously advantageous. Or is there a way to simulate this behaviour with a combination of other commands?

@v923z
Copy link
Owner

v923z commented Feb 16, 2024

I would actually be in favour of an implementation of np.take, because it would be easier to configure in the pre-processing step: if you don't need it, you just disable the function altogether. It would also be easier to start with a not-so-complete implementation, and keep track of the progress.

Based on your comment in #607 (comment), np.take would be enough in most cases. I will try to come up with an implementation over the weekend. It would be relatively straightforward, and I don't foresee any major hiccups.

I'd like to ask for at least a minimal test suite for at least 2D arrays.

@v923z
Copy link
Owner

v923z commented Feb 16, 2024

@Derfies Would it be OK, if we closed the current issue, when #661 is completed?

@Derfies
Copy link
Contributor Author

Derfies commented Mar 21, 2024

@v923z those tickets look equivalent, so yes.

Do you still need tests written for this feature?

@v923z
Copy link
Owner

v923z commented Mar 23, 2024

@v923z those tickets look equivalent, so yes.

OK

Do you still need tests written for this feature?

Yes, that would be useful. I'll try to complete the implementation in the near future.

@v923z v923z closed this as completed Mar 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants