-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix KeyError
encountered with some texts
#181
Conversation
b9c9b2e
to
f250645
Compare
KeyError
encountered with some texts
@dimkart I've already marked this as ready for review but we're still waiting for the updated simplebooks coverage test. @nikhilkhatri I've implemented an update to the reader, hence the review request. PTAL! 🙏🏼 |
reindexer = {} | ||
j = 0 | ||
for i, otok in enumerate(parsed_toks): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@neiljdo in what situation are an enumeration and parsed_tok_indxs
going to disagree? I thought in the tree they're guaranteed to be [0 ... n-1]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nikhilkhatri it happens when the tree is modified after, e.g. when adding the ''
word for missing nouns. The ''
was never in the original tokens. These were what I saw when investigating the sentence Anna gave. You can use the following example:
text112 = "Gently and Bacchus delve into a world of army secrets when a young former soldier, Scott Tanner, commits a murder in a Turkish bath. Whilst investigating Tanner's history, Gently hears of horrific allegations of what some soldiers have to face from their own side. He is forced to question the uncomfortable truth of what it means to serve one's queen and country, as an event from the past presses on his conscience."
@neiljdo Let's merge after the end of coverage tests. |
Fixes #180.
There are two causes to the
KeyError
:break_cycles=True
''
word placeholder when reindexing. This is fixed by pairing the tokens with their token index, obtained viatree.get_word_indices()
, instead of just enumeratingtree.get_words()