-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Stanza option to TTR, Entity Grid #13
Conversation
@@ -19,7 +19,7 @@ | |||
change this. | |||
""" | |||
|
|||
SPACY_UNIVERSAL_NOUN_TAGS = set([u'NOUN', u'PRON', u'PROPN']) | |||
UNIVERSAL_NOUN_TAGS = set([u'NOUN', u'PRON', u'PROPN']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need to update unit tests as there are ones failing ( Let me know if I can help with this), the rest looks good! Thanks for the PR!
src/TRUNAJOD/entity_grid.py
Outdated
@@ -63,7 +63,7 @@ class EntityGrid(object): | |||
module. It only supports 2-transitions entity grid. | |||
""" | |||
|
|||
def __init__(self, doc): | |||
def __init__(self, doc, model="spacy"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think to make easier further improvements to internals, we might create an extra module with the constants, something like:
from enum import Enum
class SupportedModels(str, Enum):
SPACY = "spacy"
STANZA = "stanza"
Then in all instances we could use the enum. For example, the constructor arguments could be:
def __init__(self, doc, model_name="spacy"):
# The following might do error checking on model name, will raise if not in Enum
model = SupportedModels(model_name)
# Checks could be done as
if model == SupportedModels.SPACY:
In that manner, all instances could be replaced by something like model=SupportedModels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alright, I'll look into it
stanza_test.py
Outdated
@@ -0,0 +1,43 @@ | |||
from src.TRUNAJOD.entity_grid import EntityGrid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
I made a pull request and some summary:
|
Hello @brucewlee , this looks pretty good. Could I ask just one last thing I forgot to mention previously. Could you install the
Thanks in advance! |
Yup, I've tried this but I'm not sure what it does. Nothing seems to be failing nor changing. |
Okay, then we are good to go I guess. The pre-commit hooks are a set of checks that are run before committing, they just do some checks like running |
I merged! thanks! |
NP :) |
Hello. First of all, thanks for making the amazing repo available.
I made the necessary changes to TTR and Entity Grid files, adding the Stanza option.
The major issue to tackle was the two models' different architectures and how one could access the token class.
I handled the differences in such architectures.
Quite notably, there's now an option in TTR and Entity Grid methods to choose Stanza by passing in (model="Stanza").
The model default is set to spaCy.
The examples are available in stanza_test.py.
spaCy and Stanza results differ by a slight margin.