Pronoun/determinative/possessive lemmas #128

nschneid · 2024-08-19T02:00:08Z

We should standardize these and enforce in the validator. As is, e.g. "its" is sometimes lemmatized as "it".

The UD lemmatization policies have evolved and are summarized here for pronouns. Basically,

in the personal pronouns, accusative pronouns are mapped to nominative and independent possessives are mapped to dependent possessives
"whom" should be mapped to "who" and "whomever" to "whoever"
in demonstratives (which in CGEL are always determinatives), plurals are mapped to singular lemma
the article "an" is mapped to "a"

(discussion at UniversalDependencies/docs#517)

We could simply adopt the UD policies; or, because they potentially diverge from CGEL at least with regard to possessives, and as pronouns and determinatives are closed classes, we could simply omit the lemmas from the CGELBank trees, and provide a lookup table for anyone who wants them.

Also, for full nouns with a possessive ending, whether that is lemmatized to the non-possesssive form should be consistent. (The possessive ending is considered a separate syntactic word in UD, but not in CGEL; in UD-derived data this is make explicit with :subt features.)

The text was updated successfully, but these errors were encountered:

nschneid · 2024-08-19T02:15:26Z

Possible solution that would minimize manual annotation effort:

N_pro and D nodes (whether it is a standard form of a pronoun, or has a :correct feature indicating the standard form) never receive an :l feature in the .cgel file
possessive Ns (e.g. "John's", "store's") do receive an :l with the genitive ending removed (and converting plural to singular)
the cgel.py API provides access to several attributes:
- the raw :l annotation (if present)
- the raw :correct annotation (if present)
- the :l annotation if present else :correct annotation if present else surface form
- the udlemma, which is a string or list of strings additionally incorporating PRON/DET/possessive lemmatization and tokenization per UD guidelines
- the cgellemma, which is a string providing a lemma per CGEL guidelines (normalizing across all pronoun cases including genitive, and removing rather than splitting off s-genitive endings)

BrettRey · 2024-08-19T10:37:07Z

I guess I'm not following. You write, "We should standardize these and enforce in the validator. As is, e.g. "its" is sometimes lemmatized as "it"." That seems fine. Is the issue that it is only sometimes lemmatized as "it"? Or is there some reason it shouldn't ever be lemmatized?

nschneid · 2024-08-19T15:14:11Z

A lemma is only sometimes provided explicitly for "its"—the annotations are inconsistent across files.

We have to decide: (1) For pronouns and determinatives, which are a closed set, do we want to ask annotators to specify the lemma explicitly in the .cgel file, or compute it automatically as part of the API? (2) If their lemmas are specified explicitly, do we want to be compatible with UD lemmas?

BrettRey · 2024-08-19T15:53:11Z

OK, I get it.

I see no need for annotators to specify the lemma, but it would be good if they were computed automatically.
I don't have a strong opinion on UD compatibility.

nschneid · 2024-09-08T19:35:17Z

A reminder to myself that we DO want hand-specified lemmas not just for nouns and verbs, but also adjectives/adverbs inflected for grade (comparative/superlative).

Coordinators, Subordinators, Prepositions are not normally expected to inflect/have lemmas, though it is conceivable in the cases of spelling variation ("&" / "and", "@" / "at" etc.). Or the non-abbreviated form could be indicated as the :correct form.

nschneid mentioned this issue Aug 19, 2024

Update 13-4.cgel nert-nlp/legal-cgel#26

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pronoun/determinative/possessive lemmas #128

Pronoun/determinative/possessive lemmas #128

nschneid commented Aug 19, 2024

nschneid commented Aug 19, 2024

BrettRey commented Aug 19, 2024

nschneid commented Aug 19, 2024

BrettRey commented Aug 19, 2024

nschneid commented Sep 8, 2024

Pronoun/determinative/possessive lemmas #128

Pronoun/determinative/possessive lemmas #128

Comments

nschneid commented Aug 19, 2024

nschneid commented Aug 19, 2024

BrettRey commented Aug 19, 2024

nschneid commented Aug 19, 2024

BrettRey commented Aug 19, 2024

nschneid commented Sep 8, 2024