You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Today when a new word is learned, Varnam does the following:
Identifies all possible patterns
Sometimes patterns are too much, so it skips after a limit
All the patterns and word prefixes are stored to the learnings file.
Varnam stores patterns and words into different schema
When transliterating, varnam looks at patterns table and perform the transliteration
This is inefficient because of the following reasons:
More storage is used because all the patterns are persisted
Some patterns are skipped to restrict the disk usage. This could be important ones
Learned data is not reusable across different schemes in the same language. For eg: if someone uses ml-phonetic and ml-inscript, they need to store the learned data multiple times for each scheme
The following points has to be considered when attempting to solve this:
Performance of transliterate has to be really good. With this change, transliterate will have to do more work in terms of tokenizing and finding all possible paths. So there is a possibility of introducing performance issues. Think about in-memory data store, constant time lookup etc
A new data structure has to be designed to persist the learned data. This has to be space and computation efficient
The text was updated successfully, but these errors were encountered:
Today when a new word is learned, Varnam does the following:
This is inefficient because of the following reasons:
ml-phonetic
andml-inscript
, they need to store the learned data multiple times for each schemeThe following points has to be considered when attempting to solve this:
transliterate
has to be really good. With this change, transliterate will have to do more work in terms of tokenizing and finding all possible paths. So there is a possibility of introducing performance issues. Think about in-memory data store, constant time lookup etcThe text was updated successfully, but these errors were encountered: