My assumptions: Very large inputs will be served only from file. Even a large texts (for instance 10GB) may contain only unique words. We want to store and compare words writen in lowercase and without punctuation marks