Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconsider id being optional #2054

Open
JKRhb opened this issue Nov 1, 2024 · 13 comments
Open

Reconsider id being optional #2054

JKRhb opened this issue Nov 1, 2024 · 13 comments
Labels
Needs discussion more discussion is needed before getting to a solution Privacy

Comments

@JKRhb
Copy link
Member

JKRhb commented Nov 1, 2024

While the id field in a TD has been mandatory in the early phases of the specification work (see #142), it has been changed to being optional for privacy reasons (see #794 and #820) before the publication of TD version 1.0.

While I understand the rationale for this decision, it is a bit annoying from a developer's point of view that you cannot be sure that there is going to be an id present in a TD, which would be very useful for state management. You could potentially use the title or the base as a fallback mechanism, but since these do not have to be unique, this is not really ideal for keeping track of a device. In an implementation I am working on at the moment, I currently just filter out TDs that do not have an ID, and it would be nice if that wasn't necessary.

While the privacy concerns that have been brought forward to motivate the id field being optional are valid, I think there are enough assertions in place by now (for example via #825) to be able to revert this decision in TD 2.0, as the id field is not meant to be a permanent identifier anymore.

Potential User Story

As a developer, I would like to be sure that every device has an identifier present in its TD so that I can use it for state management.

Potential Use Case

In a smart home domain, a user wants to control their devices via an app and performs a discovery mechanism. In order to prevent a device from appearing in the list of discovered devices twice, it should have an identifier (that should be regenerated on reset or even on restart).

(Currently, this also reads a bit like a user story, so there is probably some refinement needed.)

@github-actions github-actions bot added the needs-triage Automatically added to new issues. TF should triage them with proper labels label Nov 1, 2024
@egekorkan egekorkan added Needs discussion more discussion is needed before getting to a solution Privacy and removed needs-triage Automatically added to new issues. TF should triage them with proper labels labels Nov 1, 2024
@egekorkan
Copy link
Contributor

From the Discovery and TD management perspective, I agree with you. We should talk with PING before making any changes. Regarding:

it has been changed to being optional for privacy reasons (see #794 and #820) before the publication of TD version 1.0.

This is a nicer way to put it, but we reopened the CR process, and TD 1.0 had two CRs (see https://www.w3.org/TR/2019/CR-wot-thing-description-20190516/ and https://www.w3.org/TR/2019/CR-wot-thing-description-20191106/), which added a nice 6 months of delay. We should avoid such a thing from happening again :)

@lu-zero
Copy link
Contributor

lu-zero commented Nov 2, 2024

Once a device is part of a network it does have a local identifier that's unique within the network, one way or another. Probably this should be considered in light of possible onboarding mechanisms (and that's yet another topic...)

I guess we can copy/link the best practices suggested regarding mac address management e.g.:

  • Trustable network -> the mac address stays the same, dhcp can then provide a persistent ip
  • Not trusted network -> the mac address is randomized on join to prevent tracking

@egekorkan
Copy link
Contributor

The issue is more about being able to track it no matter the network etc. If the Thing does not change its id (we cannot mandate that), it would be possible to track the device and its user throughout its lifecycle. Adding something like "the Thing should manage its id" was not strong enough, thus we had to remove it.

@lu-zero
Copy link
Contributor

lu-zero commented Nov 2, 2024

There are use cases in which you do want having a persistent, unique, id.

There are other that would prefer to have it quasi-randomized to make harder to track since it could be a wearable or such.

I dare to say we have more devices of the former kind than the latter. (e.g. all the industrial and agricultural fields)

@egekorkan
Copy link
Contributor

I am not arguing about not having a use case nor whether it makes sense or not. It is more about making it mandatory, which results in the possible poor management of the id by different implementations, which has privacy concerns according to PING review.

I would actually vote for making it mandatory but somehow writing enough mechanisms around it to make the device and its user protected from privacy attacks etc.

@lu-zero
Copy link
Contributor

lu-zero commented Nov 4, 2024

It is an out of box interoperability issue, and that means some binding to model the behavior and a profile to pin it if we had infinite resources :/

@JKRhb
Copy link
Member Author

JKRhb commented Nov 4, 2024

Maybe for TD 2.0, the description of the id could be adjusted to better reflect that it is supposed to be a non-permanent/temporary identifier. Additionally, there could be an additional example of a TD with a permanent identifier that is only accessible via a (protected) property, to make it clearer what the best practice is supposed to be here.

@benfrancis
Copy link
Member

benfrancis commented Nov 4, 2024

I've always thought that the Thing Description URL should be the default identifier of a Thing.

But assuming a lack of consensus on that point, I have to reluctantly agree that if there is an id member it should be mandatory.

Exactly the same problem with an optional id member exists in the W3C Web App Manifest specification, where it's causing all kinds of issues.

FWIW I don't really see the privacy issues as being a big problem. In the rare cases that it's an issue, all that's really necessary is to reset the ID on a factory reset (or equivalent) of a device.

@danielpeintner
Copy link
Contributor

In order to prevent a device from appearing in the list of discovered devices twice, it should have an identifier

I am not arguing against your arguments but I always thought we should have a canonical TD form that allows us to compare TDs. Maybe this would solve your problem also... not sure.

@JKRhb
Copy link
Member Author

JKRhb commented Nov 4, 2024

In order to prevent a device from appearing in the list of discovered devices twice, it should have an identifier

I am not arguing against your arguments but I always thought we should have a canonical TD form that allows us to compare TDs. Maybe this would solve your problem also... not sure.

Yeah, I also had to think about that :) However, there might also be the case where a Thing alters its TD for some reason (maybe a certain feature got activated, leading to the inclusion of an additional property). In that case, the Thing might keep the same ID, while the TD comparison would yield a different result, so this solution does not work as a fallback in all cases.

@relu91
Copy link
Member

relu91 commented Nov 5, 2024

I don't have an answer but I see why having an id makes life easier for quite a few use cases. Another one that has not been mentioned is Schema caching. As explained in this long node-wot issue (particularly this comment) it would be helpful to have an id defined so that we could cache all the processed JSON Schemas for future interactions. It is probably a corner case, but anyhow it exists.

Also, as one of the implementers of the Discovery spec in Zion, I found the handling of anonymous TDs pretty cumbersome. For example everytime you need to return a TD (even when you list them) you have to postprocess them using this function. Again not a big deal, but still unconvient.

@hspaay
Copy link

hspaay commented Nov 16, 2024

Just adding to this sentiment. hiveot is a hub for digital twins. You can't have a digital twin without an identifier. Therefore hiveot cannot support TDs without an ID.

Maybe this is the wrong solution for the 'privacy' argument. Plenty of devices generate a unique ID on a device factory reset. This is a better solution IMHO.

@egekorkan
Copy link
Contributor

I think there is an overwhelming amount of requests to make the id mandatory. Given that this is a rather delicate point to bring back, I propose a discussion in the TD meeting after WoT Week. If we all agree, I would involve PING before making any changes. I even foresee some WG resolution to put this into the spec as messing this up will again have the risk of breaking the review period schedule when the REC publication process starts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs discussion more discussion is needed before getting to a solution Privacy
Projects
None yet
Development

No branches or pull requests

7 participants