was created explicitly for handling a use-case for
In short: Display millions of images from tweets in a collage, providing links back to the original tweets.
juxta itself takes care of the collage generation, but it needs to be fed with a list of image-paths and metadata. This documents is about ways to make that happen.
juxta creates one big collage. At maximum zoom, the size of the images (and their aspect ratio) is defined by RAW_WxRAW_H
, where each block is 256x256 pixels, so setting RAW_W=3 RAW_H=2
means 768x512 pixels
. If there are 1 millions of images, that means a final collage of ~400 gigapixel. An obvious precaution is to create a test-collage of 1000 images or so, to get an idea of how it looks, before running a big job.
Also note that the number of inodes used will be a little more than RAW_WxRAW_H
. With the example above, be sure to check that there at at least 10 million free inodes on the file system (call df -i
under Linux/OS-X).
Important: This requires twarc, a (free) API-key from Twitter and an understanding of Twitters Developer Agreement & Policy.
The base usage of the script is to have a file with a list of tweet-IDs, such as
Feeding the list to
with the command
MAX_IMAGES=10 ./ mytweets.dat tweet_collage
will result in the following actions
- The tweets are resolved from their IDs using twarc hydrate
- A list of tuples with
[timestamp, tweet-ID, image-URL]
is extracted from the hydrated tweets - The images from the tuples are downloaded and a new list of entries
imagePath|tweet-ID timestamp
is created - juxta is called with the list of entries, using the template
to provide custom snippets of JavaScript to resolvetweet-ID timestamp
into tweet-links
If the script is stopped, restarting will cause it to skip the parts that are already completed. Images are stored in destination_downloads
in sub-folders containing at most 20,000 images to avoid performance problems with many files/folder.
If the tweets are already available in the format used by twarc hydrate, step 1 in Tweet-IDs as source is not needed.
auto.guesses if the input is already hydrated and for safety it can be stated as an argument.
ALREADY_HYDRATED=true ./ mytweets.json tweet_collage
It is highly doubtful that an existing collection of tweets and images will follow the folder layout and file name normalisation used by
. In this case, the script should be skipped altogether and a list of entries of the format imagePath|tweet-ID timestamp
should be created by other means. An example list is
te-images/pbs.twimg.com_media_CupTGBlWcAA-yzz.jpg|786532479343599620 2016-10-13T13:42:10
te-images/pbs.twimg.com_media_CFC9E7bVEAAa62-.png|599186643854204928 2015-05-15T14:16:42
te-images/pbs.twimg.com_media_CJiP_t_XAAApWE6.jpg|619403274597363712 2015-07-10T09:10:23
With this list, juxta should be called with
TEMPLATE=demo_twitter.template.html RAW_W=2 RAW_H=2 THREADS=3 INCLUDE_ORIGIN=false ./ mylist.dat twitter_collage