store full original path as archival record #178

cccs-ip · 2014-10-11T00:47:22Z

the path is useful as an archival record. otherwise, we would store directory names as attribute tags and perhaps eventually change the folder structure

pwhipp · 2014-10-14T02:41:04Z

Any document can have an arbitrary number of filenames associated with it in its metadata.

cccs-ip · 2014-10-14T03:53:36Z

Cool, and thanks. After we run the sha process, can we compile all the different names into all manifestations of the same file?

pwhipp · 2014-10-14T08:10:48Z

We can easily run a process that deletes the duplicate files and collapses the documents down to one with the other filenames recorded within it. If we do that, the hard bit will be deciding which categorization we keep.
If the categorizations are all valid, these could be added to the document, making the one we choose to keep arbitrary.
For example, if /a/foo.pdf and /b/bar.pdf are identical files, We could end up with one of the following metadata blocks:

{filenames: [/a/foo.pdf, /b/bar.pdf], categories: [a, b]}
{filenames: [/a/foo.pdf, /b/bar.pdf], categories: [a]}
{filenames: [/b/bar.pdf, /a/foo.pdf], categories: [b]}

(1) seems to be the logical choice.

This does not consider the possibility that other metadata could differ (e.g. through a spreadsheet import). If that is the case then both metadata blocks should probably remain (as an optimization, they could be modified to point at the same actual s3 file).

cccs-ip · 2014-10-14T12:25:08Z

Thanks, Paul. Option 1 and the multiple metadata files where information is conflicting sounds like the way forward.

cccs-ip added this to the Sprint 9: Document Management System milestone Oct 14, 2014

cccs-ip added the question label Oct 14, 2014

cccs-ip assigned pwhipp Oct 14, 2014

pwhipp added a commit that referenced this issue Oct 14, 2014

fixed typo (#178)

641ec52

pwhipp added a commit that referenced this issue Oct 14, 2014

fixed typo (#178)

dec3d69

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

store full original path as archival record #178

store full original path as archival record #178

cccs-ip commented Oct 11, 2014

pwhipp commented Oct 14, 2014

cccs-ip commented Oct 14, 2014

pwhipp commented Oct 14, 2014

cccs-ip commented Oct 14, 2014

store full original path as archival record #178

store full original path as archival record #178

Comments

cccs-ip commented Oct 11, 2014

pwhipp commented Oct 14, 2014

cccs-ip commented Oct 14, 2014

pwhipp commented Oct 14, 2014

cccs-ip commented Oct 14, 2014