This program watches over a directory and returns the N top ranked files for a given query string.
Term Frequency - Inverse Document Frequency is an algorithm for computing the relevance of a word in a file against itself and the corpus of all the others files in the directory.
The time complexity in the worst case is:
And the space is as an array and a dict of files are stored.
In order to watch over a directory TFIDF uses the watchdog module.
$ python setup.py install
This will add tfidf script to PATH. In OSX/UNIX it will be added to /usr/local/bin
$ python tfidf.py -d dir -n N -p P -t "terms"
$ python -m unittest discover -s test -t tfidf