Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

label(s) for underspecified roles #22

Open
jtauber opened this issue Jun 16, 2017 · 2 comments
Open

label(s) for underspecified roles #22

jtauber opened this issue Jun 16, 2017 · 2 comments

Comments

@jtauber
Copy link
Collaborator

jtauber commented Jun 16, 2017

As part of an initial incremental analysis or maybe even as a learning exercise, we may want to:

  • indicate the constituents of a clause without yet assigning them specific roles
  • indicate the complements/arguments (in contrast to adjuncts) of a verb without further refining the particular complement/argument roles (o vs io vs o2 vs pc vs ...)
  • (for those that might want to make a distinction between subjects and other complements/arguments) indicate the non-subject complements/arguments of a verb without further refining the particular roles (o vs io vs o2 vs pc vs ...)

Or put briefly: add a way to just say "this is a constituent" or "this is an argument".

@rkjtan
Copy link

rkjtan commented Jun 17, 2017

Below is just a general outline of how I think things might work (it is definitely not complete & may not work in the current form I propose). I lay it out for brainstorming purposes. I'm basically thinking aloud below about how best to have a system that can allow seamless integration between different levels of semi-automated analysis & complete manual analysis, depending on what data one has to work with (especially whether morphological analysis is available or not):

I. Semi-automated analysis followed by manual correction/supplementation
If one is starting first with an initial pass using automated parsing with morphology, then (what I list below are the broad requirements, more details would need to be specified to make the grammar for the parser work):

Step 1: Analyze into separate predications
Every verb is the core of a minimal predication
Proposed convention automatically labeled, but hidden = P
Automatically labeled, but revealed = V
Verb type also automatically labeled (if morphological parsing available) = Indicative, Imperative, Subjunctive, Optative, Participle, Infinitive

Step 2: Determine components belonging to each core predication
Conjunctions separating verbs used to separate components more likely belonging to one of two verbs separated by a conjunction (conjunctions between words or phrases typically conjoin words of the same word class & same case--most often nouns with nouns & adjectives with adjectives)
Any verb forms a new core predication, whether preceded by a conjunction or not
Components before the conjunction belong to the verb before the conjunction & components after the conjunction belong to the verb after the conjunction (if no conjunction, likewise by default put components before a second verb with the first of the two verbs & components after the second verb with the second of the two verbs)

Step 3: Determine typical phrase level structure (many exceptions, but try to capture as many as possible automatically)
Adjectives adjectivally modify immediately adjacent nouns that match case, gender, number
Genitive nouns restrict non-genitive nouns that immediately precede
Articles modify immediately following nouns (or noun phrases, if adjectives & genitives already attached to noun) or adjectives that match case, gender, number
Nouns (or noun phrases, if adjectives & genitives already attached to noun) apposition to immediately following nouns (or noun phrases, if adjectives & genitives already attached to noun) that match case, gender, number
Non-nominative nouns (or noun phrases, if adjectives & genitives already attached to noun) are objects to prepositions that immediately precede

Step 3: Determine the presence of subjects (if any)
Nominative nouns or pronouns that belong in the same predication as a verb that match verb in person & number get automatically labelled S
Special case 1: If two nominative nouns (or noun phrases, if adjectives & genitives already attached to noun) form the two arguments of the verb & the verb is a "to be" verb, one is S & one is the predicate complement
Special case 2: If two nominative nouns (or noun phrases, if adjectives & genitives already attached to noun) are apparently separated into a predication without a verb to form the core predication, one is S & one is the predicate complement
Special case 3: If accusative noun immediately adjacent to infinitive, may be subject of infinitival clause

Step 4: Analyze into arguments & adjuncts
Prepositional phrases automatically labeled P (hidden) adjuncts
Most accusative nouns (or noun phrases, if adjectives & genitives already attached to noun) automatically labeled P (hidden) complement (hidden) patient/direct object
Most dative nouns (or noun phrases, if adjectives & genitives already attached to noun) automatically labeled P (hidden) complement (hidden) recipient/indirect object
Ask whether animate or inanimate--if inanimate, dative noun automatically switches to P (hidden) adjuncts instead
Ask whether location or time--if location or time, dative or accusative noun automatically switches to P (hidden) adjuncts locative/temporal (according to whether location or time is indicated) instead
If a predication has either a complement (hidden) patient/direct object or a complement (hidden) recipient/indirect object, the verb in the predication gets the additional label transitive; all other verbs automatically get labeled intransitive

II. Manual analysis with semi-automated assistance
If one is proceeding manually, then the higher labels like S, V, Patient/Direct Object, Recipient/Indirect Object would imply the lower hidden labels--have them automatically added in. Maybe as a check on manual analysis, when doing the labeling, if a user is using a more sophisticated editor tool, it would ask questions like: animate? accusative? When the user tries to label something as Patient/Direct Object. If the answer is no, the user can indicate it is an exception & maybe even add a notation on what/why. (Likewise with Recipient/Indirect Object on whether animate or dative.) If a verb has a Patient/Direct Object or Recipient/Indirect Object, it automatically is labeled transitive. Users can also chose to change verbs automatically labeled intransitive to transitive & to indicate elided Patient/Direct Object &/or Recipient/Indirect Object.

Users can use any tool to build their files & the semi-automated assistance could come during the process of annotation (if using a more sophisticated editor tool) or post-processing (provided the core labels are consistently applied, the additional data can be automatically added in by a post-processing script).

@jonathanrobie
Copy link
Member

I opened a new issue (#23) for the editing environment Randall describes here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants