-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[question] assignment of stable identifier for a source (and not only its access points) #809
Comments
bump @hancush |
Hi, @tlongers, sharing a relevant email from late last year where we pondered this very question together: Source import flowThe revised source import loops over each row in the sources sheet. First, creates or retrieves and updates an existing access point, based on "source:access_point_id:admin". Then, it creates or retrieves and updates the implicated source based on the combination of fields listed in Sources, below, and associates it with the access point. If the source fields are not harmonized within records referring to the same source, then we'll see multiple versions of that source in our data. Referring back to the example in the current sheet, we have two sources for "By All Means Necessary", one with a publication date and one without, and the access points are split between those versions. Access pointsWe use "source:access_point_id:admin" to create or retrieve an existing access point to relate to the source. Am I understanding you correctly that it, alone, does not uniquely identify an access point? SourcesI agree a unique identifier for sources would be amazing! We actually have one in our data model already, so it'd be a matter of updating it (or, perhaps more easily, flushing and re-importing all sources) if/when it becomes available on your end. Barring that, the fields we use to resolve sources are:
|
Aha, the Tom and Hannah of the past were wise and solved this issue already. Thanks; we'll probably implement this our side. |
Wise then, shudder to think what we are now, @tlongers 😂 |
Thanks, we'll sort this out and let you know when we're done with it. |
@hancush would you be able to do:
|
@tlongers This code creates the sources: https://github.com/security-force-monitor/sfm-cms/blob/master/sfm_pc/management/commands/import_country_data.py#L1389-L1433 I queried the production database and counted 11,968 unique sources. |
Thanks @smcalilly |
This is fixed now in the |
@hancush In our model we assign an UUID to source access point but not the source itself:
Access points are citations of specific parts of a source, and we assign them a stable UUID e.g. page 54 of Source 1 has a different access point (and uuid) to page 68 of Source 1. We don't, however, assign a stable UUID to Source 1.
Although
sfm-cms
draws uuids for access points from our import sheets, it also assign a uuid to the source. Check here, for example, using thesfm-cms
our long neglected "sources" view (login required):https://back.securityforcemonitor.org/en/source/view/079ddd1a-55c4-4694-902a-f6287a2ca09b/1da0094b-02fe-4b4f-a87c-84df1414bea8/#evidence
The URL displays access point
1da0094b-02fe-4b4f-a87c-84df1414bea8
, the record for which contains the following data:However it also assigns
079ddd1a-55c4-4694-902a-f6287a2ca09b
to the source, which in this case is the document calledBAHRAIN – M270 MULTIPLE LAUNCH ROCKET SYSTEMS (MLRS) UPGRADE
.How is it doing this, and does it repeat the process each time data are imported? Is there a requirement for source uniqueness inside
sfm-cms
that is being unmet here, and that we should fill by assigning a stable UUID to each source (and not only it access points)?The text was updated successfully, but these errors were encountered: