-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doesn't dl images in the reblog of an answer post #533
Comments
That's by design. Files that have already been downloaded are not downloaded again in the same blog (or globally with setting). |
Yes, both posts are processed, but it completely skips the content that's added in the reblog, this is what I showed on the second screenshot. These are the jsons it gives me if I enable dumping crawler data. The content of both posts is exactly the same, that of the original post, whereas there should be the new text and images in the reblog: In comparison, If I take a reblog of another type of post, for example - https://www.tumblr.com/fruitegg/685938465659060224/, there's new text and images in the json of the reblog, and all images are downloaded, as I expect. I understand that it skips duplicates, but those images weren't downloaded even once. If I search 659014555856437248, I only find that one image that's in the original answer post, and if I search 614541567356698624, there are no images with that post id. There are also no occurrences of the original links (64.media.tumblr.com/*) of the images from the reblog in any text files. |
Ok, now I've seen it. When I looked in the JSONs and in the browser I saw the problem. JSON
HTML
In this particular case, the images cannot be downloaded because we parse the data structure and not the HTML page. Maybe they'll fix this error one day. |
Here's an example post: https://www.tumblr.com/fruitegg/659014555856437248/
My settings:
I've found the 659014555856437248 post only in the answers.txt, but the content is the same as of the original answer post (614541567356698624). And it only downloads the image in the 614541567356698624 post.
It seems to work correctly with the reblogs of other types of posts.
Expected behavior
I expect it to parse the content of the reblog of an answer post and download all images.
Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: