-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch k-NN #253
Batch k-NN #253
Conversation
It looks good to me. For now it is a private module, so we don't have to sweat too much about the name. Does this function have any relationship with find_neighbour? If so, maybe we call one |
I would rename scholar/lib/scholar/neighbors/utils.ex Line 13 in 0d1bcc1
|
There is a function for handling multiple distances that are in |
distances = distance_fn.(data, leftover) | ||
indices = Nx.argsort(distances, axis: 1, type: :u64) |> Nx.slice_along_axis(0, k, axis: 1) | ||
distances = Nx.take_along_axis(distances, indices, axis: 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines are common for the main part of the search in the while loop, maybe it is worth moving this into a separate function, but it's up to you since it's just 3 lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm considering it. The code still needs some polishing.
I am also thinking of abstracting the whole thing as kind of map over batches. It might be useful for Task 1 of #246 as well.
{ | ||
data, | ||
batches, | ||
i = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i = 0 | |
i = Nx.u64(0) |
{query_size, dim} = Nx.shape(query) | ||
num_batches = div(query_size, batch_size) | ||
leftover_size = rem(query_size, batch_size) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part might also need to be moved inside Scholar.Shared
as a function that takes a tensor and batch size and returns {batches, leftover}
where batches
is a tensor of shape {num_batches, batch_size, dim}
and leftover
is a tensor of shape {leftover_size, dim}
.
Might be relevant for #246, task 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, definitely. In Python, there is even a function divmod that do that
|
Closing this in favor of #257. |
I am honestly not sure where should this be implemented. Right now I added
linear_search
(a better name might be needed, e.g.brute_force_search
) insideScholar.Neighbors.Utils
. Like this it can be used inside other modules such asScholar.Neighbors.KNearestNeighbors
or dimensionality reduction algorithms (t-SNE, Trimap, PacMAP).The function itself should be documented and tested. More distances are needed. I am just submitting a draft to get some feedback.
Closes #239.