-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Known limitations of KDTree? #44
Comments
Mhmm, I haven't seen this error before. Unfortunately it is a rather cryptic one and That said there are the Could you try the same thing you are doing in Swift4 ? Maybe what you discovered is a general bug in Swift 5 , since it's not stable yet? |
Hi, thanks for your reply. Looks like the error is not in the crypto library, that was just incidental. The problem appears to be that the program runs out of memory when constructing the KDTree instance. I ran my example program in Xcode 10.0 + Swift 4.2 and after loading the data (about 14,000 items with 34 dimensions each) the program was using 60 MB. After entering KDTree.init, memory use increased greatly, and Xcode stopped the program when it reach about 1000 MB. At the moment it stopped, KDTree.init was in partitionLomuto, at the place where randomIndex is created -- the random number generator is apparently in libcrypto, so that's connection with the crypto stuff. Of course that's just incidental. It appears that the real problem is that building the KDTree requires a very large amount of memory. I wonder, what is known about how memory use varies with the number of cases and the number of dimensions in each case? Is there a strategy to limit the amount of memory used? (E.g. limiting the depth of the tree or something like that.) |
That is very interesting indeed @robert-dodier , I haven't tested it that much with Would you be able to distill your example into a small test-case, that we could add as a new test file to the test suite? Then we can play around with the memory usage? @hsnetzer implemented the Also, @robert-dodier , could you run a test with the algorithm the way it was before we merged #38 . That PR by @hsnetzer improved performance. In my experience CPU usage and memory usage optimizations are often tradeoffs between each other. |
Certain data sets cause very deep recursion. Maybe you have lots of duplicate points. If that’s the problem then the earlier sorting algorithm should also fail. All I can think of at the moment. Sorry this is happening in my code. |
Hi, thanks for your replies. I will try to post a condensed version of the problem using some made-up data. The data are a little funny in the sense that for many vectors, there are a number of elements in the vector which are equal to 0 -- it would be interesting to know if that changes things. Or perhaps it's just the number of dimensions (i.e., 34). It might be a couple of days before I can return to this topic. |
I've attached 3 files that might help with testing the KDTree code to see what is going on. It seems like there are some different possibilities: maybe the number of cases + dimensions is the problem? maybe the large number of identical (zero) values? maybe the small number of distinct nonzero values? I've created these data sets to embody these different characteristics. Unfortunately I can't devote more time to these questions right now, I might be able to come back to it next month. Perhaps this is useful in some way all the same. generated_different.csv: same number of cases, same number of dimensions, same min and max values, random values between min and max generated_similar.csv: same number of cases, same number of dimensions, same min and max values, same proportion of 0 values, random values between min and max generated_very_similar.csv: same number of cases, same number of dimensions, same min and max values, same proportion of 0 values, random values chosen from the distinct values of each dimension generated_different.csv.gz |
Hi, I'm working with about 13,000 points in 34 dimensions. I'm loading the data from a file and then creating a KDTree via
KDTree(values: a)
wherea
is my array of 34-element vectors (instances of a class which I created).Is there any reason to think that shouldn't work? I am getting the following after reading the data and then trying to construct the KDTree:
It's not clear to me what's going on -- I assume that it's an out of memory error? or something else? (How would I figure out what kind of error it is?)
What is the most expensive part of constructing a KDTree -- where is the bottleneck? Does KDTree create any auxiliary indices or other data structures which might be large?
Thanks in advance for any insights you can offer.
Robert Dodier
PS. I am working in the Swift repl with Swift 5.0 (current snapshot build). I disabled printing out the values of items declared via
:set set print-decls false
(so it isn't the case that the KDTree gets constructed and then the repl fails while trying to print a representation of it).The text was updated successfully, but these errors were encountered: