Implement more efficient method of breaking sentences into searchable chunks #2

tbeddy · 2018-05-15T21:55:17Z

Each sentence the user inputs is converted into a list of every possible combination of divisions of the sentence by word. For example, "I love you" becomes

[["I", "love", "you"]]
[["I", "love"], ["you"]]
[["I"], ["love", "you"]]
[["I"], ["love"], ["you"]]

Once the number of words in the sentences enters the double digits, the number of combinations begins to grow at a tremendous rate and the process becomes extremely inefficient (especially through Python). That is why, currently, the user can only enter sentences with a length of 10 words or less. The current implementation could probably tweaked to be more somewhat efficient, but we should be avoiding brute force methods anyways. And while it's certainly comprehensive, we shouldn't be searching phrases that are extremely unlikely to have a matching track.

A natural language processing library such as NLTK could be a useful tool for breaking down sentences only into recognizable phrases and therefore narrow down the list of phrases worth searching.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement more efficient method of breaking sentences into searchable chunks #2

Implement more efficient method of breaking sentences into searchable chunks #2

tbeddy commented May 15, 2018

Implement more efficient method of breaking sentences into searchable chunks #2

Implement more efficient method of breaking sentences into searchable chunks #2

Comments

tbeddy commented May 15, 2018