Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

capture folder names as attribute tag #177

Open
cccs-ip opened this issue Oct 11, 2014 · 5 comments
Open

capture folder names as attribute tag #177

cccs-ip opened this issue Oct 11, 2014 · 5 comments
Assignees

Comments

@cccs-ip
Copy link
Member

cccs-ip commented Oct 11, 2014

folder names useful as attribute tags

@pwhipp
Copy link
Contributor

pwhipp commented Oct 14, 2014

If you want to use the 'categories' (the old folder names) as tags there are problems:

The categories are uniquely named on the basis of their ancestry. There may be multiple sub categories with the same name as long as they have a different parent.

The category is very similar to a tag, save that it is an intentional hierarchy whereas tags are flat.

That said, I can tag all documents with a tag based upon category name(s), creating the new tag as necessary.

@cccs-ip
Copy link
Member Author

cccs-ip commented Oct 14, 2014

I think it's fine to leave the categories as is. If we can merge all the original file names to be stored across document with identical shasum, then I can start the process of eliminating records. What might be helpful / interesting would be a function asking if a specific file object appears anywhere outside of /alt-import/, and if it does, to delete all occurrences within 'alt import'. If a file object appears multiple times within /alt-import/ but not in the other top-level directories, then we would just leave the duplicates alone.

@pwhipp
Copy link
Contributor

pwhipp commented Oct 14, 2014

I can script that quite easily. The time consuming thing will be running the script. I've noticed that the sha generation has slowed the server because the process is using 75% of the available memory. I've left it running as is but have created what I hope is a less memory slurping version for future use.

@cccs-ip
Copy link
Member Author

cccs-ip commented Oct 14, 2014

Thanks, Paul. Fortunately we're not (yet) a high-traffic site.

@pwhipp
Copy link
Contributor

pwhipp commented Oct 23, 2014

The default user memory limit is set more for a desktop (4Gb I think - I need to check the ulimit defaults) which means that any user on the service can use all the memory. This is probably fine for now but it does mean that we have to be a little careful if running long processes like the one I set up. On a large (high traffic) server the limit would have kept things under better control.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants