-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace ID with UUID and make UUID Required Everyplace it is Defined #1990
Comments
@brian-comply0 - If I understand correctly your request, such change will NOT be backwards compatible and it can only be done under a major version. |
A favorable read suggests:
Either of these might be considered separately on its merits. One problem with permitting (and requiring) UUID everywhere is the various ambiguities regarding link targets. I.e. what does the URI syntax If developers really wanted to have either/both (by whatever name) I suppose the above question could be dealt with - but we are again playing wackamole with the complexity. 100% agreed with these not being values in the data to show to end users, at least ordinarily. However their 'wetware processing quotient' is still a factor, under various kinds of scenarios (including debugging scenarios not only ordinary workflow). And many of your complaints could also be addressed by other means, such as a recommendation for a nice portable "how to make a sensible ID rule" that organizations could (re) use. It might well entail the notion of identifying the "canonical identifier" as you call it, at least to start with. (My own recommendation has two steps: 1. identify the Canonical ID you already use, then 2. cast it into NCName form, all lower-case if possible. If you have no canonical ID form, one can be created, but most do.) The NCName requirement is alas an old old legacy, just hard to give up (many system-level XML functions rely on the syntax in its current form) - but you haven't actually suggested giving up on that. It's not all that difficult to follow once we decide it's worth the small cost - and to make a feature out of it. Do you feel current problems / hesitations would be adequately addressed by more detailed guidance on how to form or fashion IDs? How about the caution that (as many people forget) since these documents may be passed between systems, in order to support robust addressing we are always going to need a document's ID along with the link target's ID? (Unless the use case says otherwise?) Since the same target in a different document is not the same - even though it must say it is, we can tell the difference (because same ID or UUID, but different document). We try to alleviate this notionally in OSCAL but we have barely started (with some idea of a "document/import space" for addressing, bigger than a document but smaller than the universe). Work in this area could help provide illustrations. If above proposals don't cut it, what is the minimum that might? |
I agree with @iMichaela in that the structure requirements need to be agnostic and support the various use cases that will crop up, including inconsistent presence of elements. Since some use cases will want human readable ID's to vet their requirements are being met (role-id), there are instances where UUID is critical for linking resources back to their source elements (leveraged-authorization and by-components). Leveraging props and name-spaces allows for these unique scenarios, that you can then reference back to. Instead, what I would love to see is a unified approach to generating the UUIDs to make it easier to validate the contents and reduce time to ATO. This will be vital as we move towards cross version review, where UUIDs can be compared for changes, and changes validated. |
I agree with you, Lacy (@Telos-sa). Consistency is very important. Automating the process is important.
An output of a hash function will be different if an extra space is inserted, or if you use different operating systems to calculate it, or a simple, insignificant punctuation was added. Maybe you could strip all spaces and punctuation before generating the hash? And even in this case, a typo corrected will result in a new hash, but is should not trigger a uuid change. A process that is not well thought through might allow a diabolic mind to trigger a lot of unnecessary assessment work and overwhelm the system so much that can induce a denial of service. All is needed is a kiddy script that is adding and removing a space in a document that is core document for the automation process... Can NLP do a better analysis and determine when a significant change happened? |
Yes I believe so. Being able to strip out spaces, punctuation, html, and looking at pure content should be adequate. Brad, what do you think about incorporating NLP for a future iteration, to compare content before generating UUID? First iteration would be standard strip, but consecutive changes, this could be a good methodology.
Stephanie Lacy | Senior Solutions Architect
***@***.*** | www.telos.com<http://www.telos.com/>
[signature_19392405]
…________________________________
From: Michaela Iorga ***@***.***>
Sent: Monday, April 1, 2024 8:56 PM
To: usnistgov/OSCAL
Cc: Telos Solutions Architects; Mention
Subject: [Caution: External] Re: [usnistgov/OSCAL] Replace ID with UUID and make UUID Required Everyplace it is Defined (Issue #1990)
Instead, what I would love to see is a unified approach to generating the UUIDs to make it easier to validate the contents and reduce time to ATO.
I agree with you, Lacy ***@***.***<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Telos-2Dsa&d=DwMCaQ&c=fwF34uzOsSLA_QyctP8xMw&r=pfbmGckWtc_qcwAJ-keRNhRhyEJgJRmWabzEn4YEDpk&m=6EIZaAm3Uv_kEx-vBAv50su69Uyc4f74sakHorfurGcNM3xZtpnR7R9GHj72VYxc&s=alBHAqcDyRfAquQUGjTh7oB5hrJ1YjULuA9zUYLtmCM&e=>). Consistency is very important. Automating the process is important.
Telos's method is using leveraging bottom up data analysis to generate a hash of the content that becomes the source of the UUID. If the data is the same, then the UUID is the same across versions. If the data changes, then the UUID changes.
An output of a hash function will be different if an extra space is inserted, or if you use different operating systems to calculate it, or a simple, insignificant punctuation was added. Maybe you could strip all spaces and punctuation before generating the hash? And even in this case, a typo corrected will result in a new hash, but is should not trigger a uuid change. A process that is not well thought through might allow a diabolic mind to trigger a lot of unnecessary assessment work and overwhelm the system so much that can induce a denial of service. All is needed is a kiddy script that is adding and removing a space in a document that is core document for the automation process... Can NLP do a better analysis and determine when a significant change happened?
—
Reply to this email directly, view it on GitHub<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_usnistgov_OSCAL_issues_1990-23issuecomment-2D2030887571&d=DwMCaQ&c=fwF34uzOsSLA_QyctP8xMw&r=pfbmGckWtc_qcwAJ-keRNhRhyEJgJRmWabzEn4YEDpk&m=6EIZaAm3Uv_kEx-vBAv50su69Uyc4f74sakHorfurGcNM3xZtpnR7R9GHj72VYxc&s=gh80XILvWFWcNN_o5v_NtytgUZRS5Sp0oswQ4JAS2nU&e=>, or unsubscribe<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_A6KF2RP7MFP7T6M5ZLYTE4TY3H62PAVCNFSM6AAAAABEAEDD3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZQHA4DONJXGE&d=DwMCaQ&c=fwF34uzOsSLA_QyctP8xMw&r=pfbmGckWtc_qcwAJ-keRNhRhyEJgJRmWabzEn4YEDpk&m=6EIZaAm3Uv_kEx-vBAv50su69Uyc4f74sakHorfurGcNM3xZtpnR7R9GHj72VYxc&s=7M3ODgeCpm7YdcLhSNIfW2xzfBykJZRhjcqvEogWyYg&e=>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
If the production of a UUID from given spans of content is expected to be deterministic and reproducible, I'd recommend thinking early about formal definition, specification and conformance testing. In other words, it needs to be possible to say which of two implementations that give different UUIDs for 'the same' content (as defined) is the correct one, whenever they differ. How do we know which is right? In passing, I note that if mandated, this mechanism effectively changes the semantics of UUIDs from 'identifiers' (that is, 'tags' - information added by some person or process) to 'comparands', that is a basis for comparison, but not actually more information, because you can always (knowing the rules) derive a UUID again from its content. (This is assuming that everyone is fine with all elements whose text comes out 'N/A' having the same UUID, which seems backward to me.) In fact you had better know those rules and/or have a trusted implementation in hand, if you want to validate that UUIDs are aligned with the text as expected. If that makes you squirm (it might, and not only for reasons @iMichaela suggests), reflect that what this means is not that certain validations and assurances do not have to be done: it just moves where they are done, by whom and how. (Introducing a 'trust vector' in doing so.) This makes me think that specifying, testing and hardening such an algorithm in public could be a very good thing, while mandating the use of such an algorithm without the testing and hardening would be a very bad thing. (Fortunately that hasn't been suggested.) Meanwhile if organizations wish to add normalized hashes or other enhancements to their data to enable optimizations (even if only 'decorations' from an information-theoretic point of view), they certainly can do that. For a more 'standard' approach, there has to be a reference in the form not only of formal definitions but also of test suites - or (I fear) the standard will be toothless - while nonetheless introduction friction, costs, complicating factors and security risk as @iMichaela describes (worst-case). While pondering that another question to ask is what problem is this solving: are there other approaches to consider as well? not only standards but best practices and reusable solutions? tools? And indeed, 'deepening' the semantics of UUIDs wasn't originally proposed here except by implication - only requiring them everywhere. (Which in itself is bike-shedding, outside a 2.0 working setting.) |
User Story
As an OSCAL tool developer I want a cleaner, more consistent and more predictable way to deal with unique identifiers within OSCAL content so that I can simplify software development requirements.
OSCAL is intended to be machine readable. The retention of
id
flags and values is for humans working with raw OSCAL, and add unnecessary complication to machine processing of OSCAL Theseid
values are never (and should never be) exposed to end-users of OSCAL tools.Background and Additional Details
Pre 1.0.0 release, a decision was made to use UUID in many places in the OSCAL syntax and to require in most of those places. The use of UUID has proven to be a significant benefit to software development where it is implemented and required. Especially when dealing with discrete portions of OSCAL content such as normally happens when a web application is designed with an n-tier architecture.
At that time a decision was also made to continue using
id
in places where a canonical reference may be desirable. This has proven to represent a challenge for developers, sometimes resulting in kludgy work-arounds. Similarly, when dealing with assemblies where UUID is optional, it has proven to be a challenge for n-tier architected software to handle the use cases where UUID is only intermittently present.Software Development Challenges:
As there is no single standard for the actual values of
@id
, each OSCAL tool developer is forced to create their own ID value creation standard when working with catalog control content or metadata role content.The standard guidance offered by NIST for the above situation is that tool developers can just generate UUID values for the ID flags. The problem with this is that not all UUID values are NCname compliant as NCname requires the first character to be alphabetic and UUIDs can start with a letter or number. This forces the need to prefix the identifier, creating a kludgy ID value that is not friendly to humans nor machines. A true UUID value in these cases would create a more straight-forward approach. Especially for control
parts
.Most modern web applications avoid loading a whole document into browser memory and only transmit the portion of OSCAL content being viewed or edited between front and back-end servers. Indeed, with sensitive security content and "need to know" principles, tools are sometimes required to only load a portion of OSCAL content into browser memory.
If ID flags were replaced with UUID flags and all UUID flags were required, OSCAL content would be far more normalized in terms of a tool developer's ability to reference discrete content.
Goals
@id
flags with@uuid
flagscanonical-identifier
property which can be used by implementations (if any) that specifically care about using a canonical identifier@uuid
everyplace it is implementedDependencies
No response
Acceptance Criteria
(For reviewers: The wiki has guidance on code review and overall issue review for completeness.)
Revisions
No response
The text was updated successfully, but these errors were encountered: