Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tag indexing #144

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Fix tag indexing #144

wants to merge 3 commits into from

Conversation

adevyish
Copy link

@adevyish adevyish commented Dec 6, 2018

  • Fix exception if tag has / in it
  • Fix tag links not working

- Fix exception if tag has `/` in it
- Fix tag links not working
@bbolli
Copy link
Owner

bbolli commented Dec 6, 2018

That's great, thanks! Does slugify() also work with Windows?
And, seeing that you don't need the first tuple element, it should be removed completely.

@adevyish
Copy link
Author

adevyish commented Dec 6, 2018

I think it should but I don’t have access to a testing environment. Can fix up the tuple thing tomorrow but I was trying to get it working quickly 😅

tumblr_backup.py Outdated
@@ -414,7 +429,8 @@ def save_tag_index(self):
mkdir(path_to(tag_index_dir))
self.fixup_media_links()
tag_index = [self.blog.header('Tag index', 'tag-index', self.blog.title, True), '<ul>']
for tag, index in sorted(self.tags.items(), key=lambda kv: kv[1].name):
for _, index in sorted(self.tags.items(), key=lambda kv: kv[1].name):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use tags.values() to get just the values instead of tuples

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I’m aware of this.

@adevyish
Copy link
Author

adevyish commented Dec 7, 2018

As a sidenote, I didn't want to add an additional dependency but unicode-slugify also handles if you want to slug with non-ascii characters (which I'm doing for my own archive, since I have plenty of CJK tags)

@aspensmonster
Copy link

It seems like tumblr tags are a wonderful example of diverse user input that's always trying to outsmart the slug code. CJK tags, tags with slashes, tags with all kinds of odd unicode, emoji, multiple tags condensing down to one slug (whose sets aren't identical, so the rendered HTML is incomplete)...

Assuming you don't mind too much what the folder name is --if your use of the backup is mostly using the rendered HTML-- this approach seems to work for me for various weird tags (haven't found a broken or empty link yet, though there are thousands of tags in my backup):

import hashlib
...
tag_index = [self.blog.header('Tag index', 'tag-index', self.blog.title, True), '<ul>']
for index in sorted(self.tags.values(), key=lambda v: v.name):
    tag = hashlib.sha256(index.name.encode('utf-8')).hexdigest()
    etc etc etc

I'm also pretty sure hashlib is part of the standard library, so no additional module install is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants