Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix KeyError encountered with some texts #181

Merged
merged 2 commits into from
Nov 14, 2024
Merged

Conversation

neiljdo
Copy link
Collaborator

@neiljdo neiljdo commented Nov 13, 2024

Fixes #180.

There are two causes to the KeyError:

  • Self-cycles causing entire branches to be pruned when break_cycles=True
  • Wrong index for the '' word placeholder when reindexing. This is fixed by pairing the tokens with their token index, obtained via tree.get_word_indices(), instead of just enumerating tree.get_words()

@neiljdo neiljdo changed the title Fix cycle-breaking when tree has self-cycle Fix KeyError encountered with some texts Nov 13, 2024
@neiljdo neiljdo marked this pull request as ready for review November 13, 2024 16:48
@neiljdo
Copy link
Collaborator Author

neiljdo commented Nov 13, 2024

@dimkart I've already marked this as ready for review but we're still waiting for the updated simplebooks coverage test.

@nikhilkhatri I've implemented an update to the reader, hence the review request. PTAL! 🙏🏼

reindexer = {}
j = 0
for i, otok in enumerate(parsed_toks):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neiljdo in what situation are an enumeration and parsed_tok_indxs going to disagree? I thought in the tree they're guaranteed to be [0 ... n-1]?

Copy link
Collaborator Author

@neiljdo neiljdo Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikhilkhatri it happens when the tree is modified after, e.g. when adding the '' word for missing nouns. The '' was never in the original tokens. These were what I saw when investigating the sentence Anna gave. You can use the following example:

text112 = "Gently and Bacchus delve into a world of army secrets when a young former soldier, Scott Tanner, commits a murder in a Turkish bath. Whilst investigating Tanner's history, Gently hears of horrific allegations of what some soldiers have to face from their own side. He is forced to question the uncomfortable truth of what it means to serve one's queen and country, as an event from the past presses on his conscience."

@dimkart
Copy link
Contributor

dimkart commented Nov 14, 2024

@neiljdo Let's merge after the end of coverage tests.

@neiljdo neiljdo merged commit 19e2312 into CQCL:main Nov 14, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pregroup tree: Bug when breaking cycles
3 participants