You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In https://github.com/mattburns/exiftool.js-test/blob/master/test.js#L66 you invoke a new instance of exiftool for every new image found. This is not terribly efficient, since there is a huge overhead in starting exiftool (perl interpreter warmup, load modules,...) and we are doing this for every sample image we find.
Better approaches would be
a) invoke exiftool once and let it do the batch processing (e.g. exiftool *.jpg -w .jpg.json) - might require some refactoring in the report generation
b) use the -stay_open option of exiftool together with an ARGFILE where we write the commands to run on each image. Here exiftool stays in memory and executes the commands written to the ARGFILE until we write a terminate command there.
Both approaches can bring speedups of up to 60 times compared to single-command invocation. Actually approach b) could even bring a better performance, since we can prefork multiple "daemonized" instances of exiftool and share the work between them.
The text was updated successfully, but these errors were encountered:
Evaluated the performance of the different exiftool invocation options using pyexiftool, since it had already builtin support for exiftool's faster stay_open invocation.
I compared the following scenarios:
invoking one exiftool instance per image
exiftool's internal batch execution
"external" batch execution using stay_open mode
"external" batch execution with preforking multiple exiftool instances (multiprocessing.Pool in Python)
My results for the 20 sample images from the Acer directory:
Exiftool no batch took 6.37464756469 sec
Exiftool internal batch took 0.590772722123 sec
Exiftool Stay Open/External batch took 0.575033621959 sec
Exiftool multiprocessing batch took 0.64755278114 sec
For the more complex sample images (more tags to decode) from the Nikon directory:
Exiftool no batch took 80.8621684399
Exiftool internal batch took 3.93503120808
Exiftool Stay Open/External batch took 4.23961249768
Exiftool multiprocessing batch took 4.3239334162
It turns out that using one of the batch modes can bring a 10-20 times speedup, while the multiprocessing is actually a bit slower (maybe since exiftool is mostly IO-bound). Note that this numbers might vary for node.js, since it is asynchronous per default, while python is synchronous.
Conclusion: it definitely makes sense to use the exiftool stay_open mode in the node.js test scripts instead of firing up one instance per image.
In https://github.com/mattburns/exiftool.js-test/blob/master/test.js#L66 you invoke a new instance of exiftool for every new image found. This is not terribly efficient, since there is a huge overhead in starting exiftool (perl interpreter warmup, load modules,...) and we are doing this for every sample image we find.
Better approaches would be
a) invoke exiftool once and let it do the batch processing (e.g. exiftool *.jpg -w .jpg.json) - might require some refactoring in the report generation
b) use the -stay_open option of exiftool together with an ARGFILE where we write the commands to run on each image. Here exiftool stays in memory and executes the commands written to the ARGFILE until we write a terminate command there.
Both approaches can bring speedups of up to 60 times compared to single-command invocation. Actually approach b) could even bring a better performance, since we can prefork multiple "daemonized" instances of exiftool and share the work between them.
The text was updated successfully, but these errors were encountered: