Dynamic datapoint dimensions (fixed per tree) #53

AlexDM0 · 2020-07-10T15:23:12Z

Hi there! We'd like to use your KDTree classifier for indoor localisation in a smart home. I liked your talk so I was looking into experimenting with this lib.

Our datavectors (per house for instance) have X dimension, but another house may have Y dimensions. The user is in control of adding houses, and he determines how many dimensions a vector has (by having more/less beacons).

We want to generate a tree for each house and the issue then becomes the static let dimensions. We can set the dimensions of all datapoints by Element.dimensions = X. We could keep a number of types beforehand, but keeping this artificial limit on the amount of houses seems a bit ugly. I can't see a way to have a "type-factory" that generates these structs with custom dimensions either.

I could clone your lib and change the 3 instances where you take it from Element and instead take it from the provided element itself. This makes it of course the responsibility of the user to ensure all datapoints have the same dimensions.

What's your take on this?

Bersaelor · 2020-07-10T15:29:14Z

First , thanks for the interest, it's so great to hear when people are actually using the code!

Mhmm, good question about the dimensions though.
So this would make
static var dimensions: Int { get }
->
var dimensions: Int { get }
an instance variable and shouldn't change too many things but opens the issue of potential size mismatches at runtime as you said.

I mean, you can fork the project, apply your proposed changes and then make a Pull Request.

Then we can see how many changes are necessary and what the impact on the codebase and on the potential performance are?

AlexDM0 · 2020-07-10T16:06:23Z

I'd be happy to make a pull request, but I'm not exactly sure how to properly do it while protecting the user from not putting different dimensions into it.

I'm still pretty into OO design, ( i know..), so from that point of view, hey a tree should just know it's own dimensions. But enums don't do that.

The init can take the first element and use the dimension of that one, we could throw error on init without values, we should loop over all values to check dimensions and throw error on difference. Or we provide a dimensions value in the init, but we can't store it nor check for it in future operations.

The insert and remove operations can also get the dimension from the provided value, or get it as an extra argument.

From what I see, that's it. I don't know how to make it so that we constantly guard the user against user error. I'd do it myself, for my purposes, I'll embrace the terrible crash if I mess up :). That's not friendly for most users though and shouldn't make it into the core.

It will also break everyones code, since the base datapoint protocol needs changing.

If you know of a good way around this I'm open to building it (except full rewrites :p)

EDIT: removed unfinished sentence.

Bersaelor · 2020-07-10T16:23:05Z

Oh well, I'd say try to open a PR, to see that tests succeed and all and we go from there :)

This way we can discuss the individiual changes in the code much better.

(each node in the tree has an element, so if you add a new node, we can check f the new node has the same dimension, so it shouldn't be to hard to make a good error message).

AlexDM0 · 2020-07-10T17:03:07Z

I have made a pull request without breaking anything, any thrown errors will make breaking changes. It is not as friendly as could be, but at least it is fully optional.

Bersaelor · 2020-07-10T19:35:05Z

Mhmm, it looks like the automatic Unit tests aren't running through.
I'll probable have to look into that. Unfortunately rather busy this weekend and afk, give me a few days and I'll look into this.

AlexDM0 · 2020-07-10T20:42:39Z

Sure, no worries. I’ll just continue building my integration after the weekend and we can see how (and if) you want to integrate the result.

Bersaelor · 2020-07-14T13:50:10Z

Just for understanding, @AlexDM0 , in your use-case, the number of dimensions is very diferrent per house, it's not just that all houses fall into 2D, 3D and 4D categories, or s similarly limited list?

AlexDM0 · 2020-07-14T13:56:34Z

A house contains a number of broadcasting beacons, which we sell (called Crownstones). Each user can have a different amount of beacons in his house. A user can be invited to join the houses of friends or an office, which can also contain a different number of beacons.

I have 35 Crownstones in my house, so that is 35 dimensions. Another user may have 4, or 10. The more you have, the more accurate the room level localisation is.

We have testing houses with 50+ Crownstones. We could limit the amount of houses to say 20, and keep 20 instances of datapoint definitions, where we can set the dimensions at runtime, but that's a little dirty :)

Bersaelor · 2020-07-14T14:07:03Z

Mhmm, yes, I see.

So each beacon can then have a Float-value, and a datapoint that you read has a float-distance for each beacon?
like

struct Beacon {
    // ... 
    func value() -> Float { .. }
}

let beacons: [Beacon] = [ ... ] // n beacons
var dataPoint = beacons.map { beacon in
    return beacon.value()
}
// dataPoint is now [Float] with size n

And now for each house you have a Set of Float-vectors where each vector has the same size for that house&app-install?

dataPoints: Set([Float])

Bersaelor · 2020-07-14T14:11:45Z

I'm wondering whether this is actually a rather frequent use case and whether we should build it more clearly into the type KDTreePoint by making dimensions an instance variable.. got to ponder the implications for a bit.

I know, the dimensionsoverride works, but it feels less clean.

AlexDM0 · 2020-07-14T14:14:13Z

Int value actually, they are signal strength indicators. I'll first construct a sorted list from all the collected datapoints. So index 0 is beacon X index 1, beacon Y etc. This array goes into each datapoint (with values for missing data) so all have the same dimensions. Each data point has a room label, and this sorted list. Every incoming dataset will be mapped to this sorted array, and queried in the KD tree.

I'll use the distance function with one we have here to get the distance estimate between points.

At least, that's what I had in mind so far. It's an experiment :). We have a number of difference classifiers and I wanted to add this method as a test.

It is indeed an ugly solution, but if we'd do a cleaner version, it would break everyones code since they use the current datapoint protocol.

I wouldnt blame you if you wouldnt merge this :) It is for a usecase that you not really designed for and thats perfectly fine.

Bersaelor · 2020-07-14T14:18:50Z

It might be that all people have to do is to delete the word static in front of static var dimensions = X, and the rest of the code should just work. Would be a breaking change, but it would increase the number of use-cases KDTree could be applied to.

Will look into it later today, I think there is also a way of annotating the code, so Xcode can make the right suggestion how people can fix their code when the load a new version of the repo.

AlexDM0 · 2020-07-14T14:20:30Z

Except you lose the guarantee that all datapoints in the tree have the same dimension. This is an important point for the tree.

AlexDM0 · 2020-07-14T14:23:57Z

You could have a class which gets the dimensions on initialisation, which can wrap your methods, do checks and error reporting and use the dimensionOverride internally. That way users won't see it, its safe, but its not functional anymore... pick your poison :)

Bersaelor · 2020-07-14T14:24:07Z

I think I can just create a warning in the inserting method. Also, I mean, if you use the overrideDimensions variable wrongly, you'll also get into trouble.

AlexDM0 · 2020-07-14T14:25:19Z

Yes, but you can document that with a warning. If you use this, its up to you to have your stuff in order. A sort of advanced usage. Most users likely wont need it anyway and they're safe due to the static dimensions.

AlexDM0 mentioned this issue Jul 10, 2020

added proposal for dimensionOverride #54

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic datapoint dimensions (fixed per tree) #53

Dynamic datapoint dimensions (fixed per tree) #53

AlexDM0 commented Jul 10, 2020

Bersaelor commented Jul 10, 2020 •

edited

Loading

AlexDM0 commented Jul 10, 2020 •

edited

Loading

Bersaelor commented Jul 10, 2020 •

edited

Loading

AlexDM0 commented Jul 10, 2020

Bersaelor commented Jul 10, 2020

AlexDM0 commented Jul 10, 2020 via email •

edited

Loading

Bersaelor commented Jul 14, 2020

AlexDM0 commented Jul 14, 2020

Bersaelor commented Jul 14, 2020 •

edited

Loading

Bersaelor commented Jul 14, 2020

AlexDM0 commented Jul 14, 2020

Bersaelor commented Jul 14, 2020 •

edited

Loading

AlexDM0 commented Jul 14, 2020

AlexDM0 commented Jul 14, 2020

Bersaelor commented Jul 14, 2020

AlexDM0 commented Jul 14, 2020

Dynamic datapoint dimensions (fixed per tree) #53

Dynamic datapoint dimensions (fixed per tree) #53

Comments

AlexDM0 commented Jul 10, 2020

Bersaelor commented Jul 10, 2020 • edited Loading

AlexDM0 commented Jul 10, 2020 • edited Loading

Bersaelor commented Jul 10, 2020 • edited Loading

AlexDM0 commented Jul 10, 2020

Bersaelor commented Jul 10, 2020

AlexDM0 commented Jul 10, 2020 via email • edited Loading

Bersaelor commented Jul 14, 2020

AlexDM0 commented Jul 14, 2020

Bersaelor commented Jul 14, 2020 • edited Loading

Bersaelor commented Jul 14, 2020

AlexDM0 commented Jul 14, 2020

Bersaelor commented Jul 14, 2020 • edited Loading

AlexDM0 commented Jul 14, 2020

AlexDM0 commented Jul 14, 2020

Bersaelor commented Jul 14, 2020

AlexDM0 commented Jul 14, 2020

Bersaelor commented Jul 10, 2020 •

edited

Loading

AlexDM0 commented Jul 10, 2020 •

edited

Loading

Bersaelor commented Jul 10, 2020 •

edited

Loading

AlexDM0 commented Jul 10, 2020 via email •

edited

Loading

Bersaelor commented Jul 14, 2020 •

edited

Loading

Bersaelor commented Jul 14, 2020 •

edited

Loading