Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement more efficient method of breaking sentences into searchable chunks #2

Open
tbeddy opened this issue May 15, 2018 · 0 comments

Comments

@tbeddy
Copy link
Owner

tbeddy commented May 15, 2018

Each sentence the user inputs is converted into a list of every possible combination of divisions of the sentence by word. For example, "I love you" becomes

  • [["I", "love", "you"]]
  • [["I", "love"], ["you"]]
  • [["I"], ["love", "you"]]
  • [["I"], ["love"], ["you"]]

Once the number of words in the sentences enters the double digits, the number of combinations begins to grow at a tremendous rate and the process becomes extremely inefficient (especially through Python). That is why, currently, the user can only enter sentences with a length of 10 words or less. The current implementation could probably tweaked to be more somewhat efficient, but we should be avoiding brute force methods anyways. And while it's certainly comprehensive, we shouldn't be searching phrases that are extremely unlikely to have a matching track.

A natural language processing library such as NLTK could be a useful tool for breaking down sentences only into recognizable phrases and therefore narrow down the list of phrases worth searching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant