Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize varnam_learn #141

Open
navaneeth opened this issue Feb 11, 2017 · 0 comments
Open

Optimize varnam_learn #141

navaneeth opened this issue Feb 11, 2017 · 0 comments

Comments

@navaneeth
Copy link
Member

Today when a new word is learned, Varnam does the following:

  • Identifies all possible patterns
  • Sometimes patterns are too much, so it skips after a limit
  • All the patterns and word prefixes are stored to the learnings file.
  • Varnam stores patterns and words into different schema
  • When transliterating, varnam looks at patterns table and perform the transliteration

This is inefficient because of the following reasons:

  • More storage is used because all the patterns are persisted
  • Some patterns are skipped to restrict the disk usage. This could be important ones
  • Learned data is not reusable across different schemes in the same language. For eg: if someone uses ml-phonetic and ml-inscript, they need to store the learned data multiple times for each scheme

The following points has to be considered when attempting to solve this:

  • Performance of transliterate has to be really good. With this change, transliterate will have to do more work in terms of tokenizing and finding all possible paths. So there is a possibility of introducing performance issues. Think about in-memory data store, constant time lookup etc
  • A new data structure has to be designed to persist the learned data. This has to be space and computation efficient
@navaneeth navaneeth changed the title Optimized learn Optimize varnam_learn Feb 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant