Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedded references in SCOs #320

Open
SYNchroACK opened this issue Feb 26, 2024 · 4 comments
Open

Embedded references in SCOs #320

SYNchroACK opened this issue Feb 26, 2024 · 4 comments

Comments

@SYNchroACK
Copy link

SYNchroACK commented Feb 26, 2024

Handling SCOs with IP addresses raises an issue regarding the use of embedded references, specifically resolves_to_refs, and its implications for data integrity and reliability. The core issue arises when an SCO associated with an IP address, such as 8.8.8.8, maintains the same identifier regardless of whether it includes a resolves_to_refs attribute. This scenario suggests a fundamental discrepancy: an SCO featuring solely an IP address significantly differs from one that further delineates a relationship with, for example, a MAC address. The absence of detailed metadata, like the timestamp or the identity of the creator of this IP-to-MAC association, compounds the problem, leaving a gap in the traceability and accountability of such relationships.

Embedded references appear to fit seamlessly within SDOs and SROs, but it seems that the previous concern extends to all other SCOs with embedded references.

This distinction is crucial given the nature of SCOs as unversioned entities, a characteristic underscored in section 3.6 on Versioning of the STIX documentation. According to the guidelines, versioned STIX Objects must employ specific properties (created_by_ref, created, modified, and revoked) to facilitate proper version control. However, SCOs, by definition, do not engage with these versioning properties, highlighting a misalignment between the use of embedded references within SCOs, knowing that SCOs have deterministic IDs.

The following paragraph from the STIX 2.1 specification is correct and it reflects the fact that SCOs by default (without versioning) must not have embedded references because the producer other than the object creator will create a new observable that will conflict with the original one.

STIX Objects have a single object creator, the entity that generates the id for the object and creates the first version. The object creator MAY (but not necessarily will) be identified in the created_by_ref property of the object. Only the object creator is permitted to create new versions of a STIX Object. Producers other than the object creator MUST NOT create new versions of that object. If a producer other than the object creator wishes to create a new version, they MUST instead create a new object with a new id. They SHOULD additionally create a derived-from Relationship object to relate their new object to the original object that it was derived from.

Is my understanding accurate, or have I overlooked something?

@rpiazza
Copy link
Contributor

rpiazza commented Feb 26, 2024

Hi @SYNchroACK,

SCOs are "facts". Therefore ip4-addr-1 "resolves to" mac_addr-2, is a fact, but like any fact it may only true for a certain time interval. This is where the Observed Data SDO comes into play. It is used to indicate when something is observed to be true. When it is later observed that ip-addr-1 resolves to mac_addr_3 - that is a completely different fact, and represented by a different SCO instance, with different observed times for its related Observed Data SDO.

However, I think you may have discovered a minor problem - an UUIDv5 deterministic id of the ip4-addr SCO is based solely on the value of the ip address, so potentially there is an issue, since both SCOs described above could have the same STIX id - which is not allowed.

That is probably something we should revisit in the next release. Because the use of deterministic is not required, I suggest you use UUIDv4 STIX ids at this time.

@SYNchroACK
Copy link
Author

SYNchroACK commented Mar 20, 2024

@rpiazza thank you for your response.

I agree that SCOs, in their simplest form, represent standalone facts; an IPv4 address is a fact that exists independently of other associations. Yet, when we introduce an embedded relationship like resolves_to_refs linking to another entity, such as a MACAddress, the IPv4 can no longer be perceived in isolation. This embedding inherently demands supplementary context, such as the timestamps encapsulating the period when the IPv4 and MACAddress connection was pertinent. This context is precisely what the ObservedData construct provides, delineating the temporal bounds of the observation or the SRO Relationship with the start_time and stop_time.

What intrigues me here is the presence of dual mechanisms to establish relationships between objects like IPv4 and MACAddress. On one hand, we have the direct embedded relationship within an SCO, and on the other, the more structured approach using the SRO Relationship object, which allows for detailed temporal and identity annotations through properties like start_time, stop_time, and potentially the identity of the relationship's observer created_by_ref. This redundancy seems to complicate rather than clarify the data model.

Given the foundational principle of SCOs as independently standing entities, I propose leaning towards a more consistent and singular method of relationship establishment, preferably through the SRO Relationship object. This would not only simplify the conceptual understanding but also enhance data handling, particularly in terms of observable deduplication, given their deterministic ID nature, which should ideally incorporate all relevant attributes to maintain uniqueness.

Despite the current flexibility in ID generation for Observables, I would advocate for a continued use of UUIDv5 for Observables, ensuring they are treated as self-contained units that, by principle, should not encompass any characteristics undermining their standalone nature. This would bring advantages on deduplication of observables when sharing across organizations, therefore, deprecating embedded relationships in SCOs would positively streamline this aspect, ensuring clearer and more consistent data representation.

@rpiazza
Copy link
Contributor

rpiazza commented Mar 22, 2024

I agree that using the resolves_to_refs property should probably be discouraged. Something to deal with in the next release.

@TcM1911
Copy link

TcM1911 commented Oct 29, 2024

You have a similar issue in the Directory object. Only the path is used for the id generation which means you can have multiple objects with the same ID but different contains_refs. I think it's mostly a problem when you support the archive file extension because paths in the archive are relative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants