-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pronoun/determinative/possessive lemmas #128
Comments
Possible solution that would minimize manual annotation effort:
|
I guess I'm not following. You write, "We should standardize these and enforce in the validator. As is, e.g. "its" is sometimes lemmatized as "it"." That seems fine. Is the issue that it is only sometimes lemmatized as "it"? Or is there some reason it shouldn't ever be lemmatized? |
A lemma is only sometimes provided explicitly for "its"—the annotations are inconsistent across files. We have to decide: (1) For pronouns and determinatives, which are a closed set, do we want to ask annotators to specify the lemma explicitly in the .cgel file, or compute it automatically as part of the API? (2) If their lemmas are specified explicitly, do we want to be compatible with UD lemmas? |
OK, I get it.
|
A reminder to myself that we DO want hand-specified lemmas not just for nouns and verbs, but also adjectives/adverbs inflected for grade (comparative/superlative). Coordinators, Subordinators, Prepositions are not normally expected to inflect/have lemmas, though it is conceivable in the cases of spelling variation ("&" / "and", "@" / "at" etc.). Or the non-abbreviated form could be indicated as the |
We should standardize these and enforce in the validator. As is, e.g. "its" is sometimes lemmatized as "it".
The UD lemmatization policies have evolved and are summarized here for pronouns. Basically,
(discussion at UniversalDependencies/docs#517)
We could simply adopt the UD policies; or, because they potentially diverge from CGEL at least with regard to possessives, and as pronouns and determinatives are closed classes, we could simply omit the lemmas from the CGELBank trees, and provide a lookup table for anyone who wants them.
Also, for full nouns with a possessive ending, whether that is lemmatized to the non-possesssive form should be consistent. (The possessive ending is considered a separate syntactic word in UD, but not in CGEL; in UD-derived data this is make explicit with
:subt
features.)The text was updated successfully, but these errors were encountered: