Call me Ishmael. Dix is a utility for quantifying large amounts of plaintext data using a revolutionary metric: Moby-Dicks.
Have you ever found yourself analyzing text data and thinking, "Wow, this data is BIG. This is some BIG DATA"?
Of course you have. And if you're like us, you're frustrated with the current tools and metrics at your disposal. How do you quantify how big your data is? Bits and bytes and word counts just don't cut it in the fast moving Data Age.
It's time for a new standard. One that's timeless, yet fully capable of expressing bigness. That's why we created dix
.
dix
is a command line utility that quantifies the size of plaintext data in relation to Herman Melville's classic novel Moby-Dick; or, The Whale, first published in 1851 and considered to be one of the Great American Novels.
*Prerequisites: Python 2.6+, wc (which is included on most nix OSs)
Run sudo pip install dix
to install dix from PyPI (dix
needs sudo access to set permissions so you can run it from anywhere).
More installation options coming soon.
dix
is run from the command line on a plaintext file, as follows.
$> dix text.txt
You can also pipe things into dix if desired:
$> echo “for there is no folly of the beast of the earth...” | dix
dix
also supports a multitude of options. For example, if you feel bad about the size of your data, choose a smaller unit of comparison:
$> dix --tiny text.txt
You can see all the options and how to use them by calling dix -h
.
You can also redirect the output of dix. For example, pipe dix
to cowsay
for a more pleasing visual experience:
$> curl -s 'http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvsection=0&titles=Moby-Dick&rvprop=content&format=json' | python -m json.tool | grep "*" | dix | cowsay
____________________________________
/ 0.0022 Moby-Dicks \
| |
\ You call that BIG data?! Please... /
------------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
We welcome issues and pull requests if you find problems with dix or want to enhance it! You can also reach its creators at [email protected].