Skip to content

Treebank Use Cases: Syntactic Constructs

Jonathan Robie edited this page Jan 17, 2018 · 37 revisions

We are currently evaluating treebanks and treebank models for the Greek New Testament, looking for the best way to create and maintain a set of treebanks. We are using this page to catalog issues that we identify in displaying and querying syntactic constructs, comparing solutions taken by existing treebanks.

For a list of existing treebanks, see Nine Kinds of Ancient Greek Treebanks.

General requirements

We want our syntax trees to satisfy the following requirements:

  • Easy to read
  • Easy to compare multiple interpretations of the same text
  • Can be created and maintained with reasonable effort (either by using something that already exists or leveraging existing tools and datasets)
  • Can quickly be used to create treebanks for new editions
  • Adequately represent the relationships among sentence components
  • Easily queried using languages like XPath / XQuery
  • Easily processed using standard scripting languages and programming languages
  • Suitable for displays that are useful for Bible translation environments, such as interlinear display
  • Suitable for annotation tools

Word order

At least one representation must describe the syntax while presenting the text in sentence order for both queries and display. Note that some of the most popular treebanks do not have this property, which is in tension with the next requirement. See Nine Kinds of Ancient Greek Treebanks for one example of this tension.

Constituent structure

At least one representation must correctly describe the relationships among constituents in a way that is directly accessible to queries (without using recursive algorithms). If more than one representation is needed, all representations should be created from the same source.

Returning query results in sentence order

Even if a query ignores sentence order, results must be returnable in sentence order.

Discontinuous Constituents

Queries should be able to find discontinuous constituents. They should also be able to query the constituent structure while ignoring discontinuities.

Matt 1:20 τὸ γὰρ ἐν αὐτῇ γεννηθὲν ἐκ πνεύματός ἐστιν ἁγίου·

Annotation

Words, sentences, references, and intermediate structures must be able to carry IDs that can be used for annotations. All of the treebanks we have looked at have this property.

We need a way to add notes to explain aspects of syntax that are not obvious, like Zerwick's parsing guide does when it points to his intermediate grammar. If the information found in the tree and the morphology is not sufficient to understand the construct, an annotation should provide the additional information. See https://github.com/biblicalhumanities/Nestle1904/issues/11 for an example where this kind of annotation would be helpful.

Complexity and Length

Both queries and displays should scale gracefully with complexity, representing simple things simply and complex things gracefully.

Simple sentences should be simple

A few simple sentences for comparison:

  • John 11:35 ἐδάκρυσεν ὁ Ἰησοῦς
  • John 1:1 Ἐν ἀρχῇ ἦν ὁ λόγος, καὶ ὁ λόγος ἦν πρὸς τὸν θεόν, καὶ θεὸς ἦν ὁ λόγος.

Complex sentences should still be readable

Complex and long sentences should still be readable without erupting into total chaos or scrolling off the page.

  • Luke 1:1 Ἐπειδήπερ πολλοὶ ἐπεχείρησαν ἀνατάξασθαι διήγησιν περὶ τῶν πεπληροφορημένων ἐν ἡμῖν πραγμάτων, καθὼς παρέδοσαν ἡμῖν οἱ ἀπ’ ἀρχῆς αὐτόπται καὶ ὑπηρέται γενόμενοι τοῦ λόγου, ἔδοξε κἀμοὶ παρηκολουθηκότι ἄνωθεν πᾶσιν ἀκριβῶς καθεξῆς σοι γράψαι, κράτιστε Θεόφιλε, ἵνα ἐπιγνῷς περὶ ὧν κατηχήθης λόγων τὴν ἀσφάλειαν.
  • Eph 1:3 Εὐλογητὸς ὁ θεὸς καὶ πατὴρ τοῦ κυρίου ἡμῶν Ἰησοῦ Χριστοῦ, ὁ εὐλογήσας ἡμᾶς ἐν πάσῃ εὐλογίᾳ πνευματικῇ ἐν τοῖς ἐπουρανίοις ἐν Χριστῷ, καθὼς ἐξελέξατο ἡμᾶς ἐν αὐτῷ πρὸ καταβολῆς κόσμου, εἶναι ἡμᾶς ἁγίους καὶ ἀμώμους κατενώπιον αὐτοῦ ἐν ἀγάπῃ, προορίσας ἡμᾶς εἰς υἱοθεσίαν διὰ Ἰησοῦ Χριστοῦ εἰς αὐτόν, κατὰ τὴν εὐδοκίαν τοῦ θελήματος αὐτοῦ, εἰς ἔπαινον δόξης τῆς χάριτος αὐτοῦ ἧς ἐχαρίτωσεν ἡμᾶς ἐν τῷ ἠγαπημένῳ, ἐν ᾧ ἔχομεν τὴν ἀπολύτρωσιν διὰ τοῦ αἵματος αὐτοῦ, τὴν ἄφεσιν τῶν παραπτωμάτων, κατὰ τὸ πλοῦτος τῆς χάριτος αὐτοῦ ἧς ἐπερίσσευσεν εἰς ἡμᾶς ἐν πάσῃ σοφίᾳ καὶ φρονήσει γνωρίσας ἡμῖν τὸ μυστήριον τοῦ θελήματος αὐτοῦ, κατὰ τὴν εὐδοκίαν αὐτοῦ ἣν προέθετο ἐν αὐτῷ εἰς οἰκονομίαν τοῦ πληρώματος τῶν καιρῶν, ἀνακεφαλαιώσασθαι τὰ πάντα ἐν τῷ Χριστῷ, τὰ ἐπὶ τοῖς οὐρανοῖς καὶ τὰ ἐπὶ τῆς γῆς· ἐν αὐτῷ, ἐν ᾧ καὶ ἐκληρώθημεν προορισθέντες κατὰ πρόθεσιν τοῦ τὰ πάντα ἐνεργοῦντος κατὰ τὴν βουλὴν τοῦ θελήματος αὐτοῦ, εἰς τὸ εἶναι ἡμᾶς εἰς ἔπαινον δόξης αὐτοῦ τοὺς προηλπικότας ἐν τῷ Χριστῷ· ἐν ᾧ καὶ ὑμεῖς ἀκούσαντες τὸν λόγον τῆς ἀληθείας, τὸ εὐαγγέλιον τῆς σωτηρίας ὑμῶν, ἐν ᾧ καὶ πιστεύσαντες ἐσφραγίσθητε τῷ πνεύματι τῆς ἐπαγγελίας τῷ ἁγίῳ, ὅ ἐστιν ἀρραβὼν τῆς κληρονομίας ἡμῶν, εἰς ἀπολύτρωσιν τῆς περιποιήσεως, εἰς ἔπαινον τῆς δόξης αὐτοῦ.

Verbs of Perception and Expression

Mark.8.5 καὶ ἠρώτα αὐτούς· πόσους ἔχετε ἄρτους; οἱ δὲ εἶπαν· ἑπτά.

Mark.11.3 καὶ ἐάν τις ὑμῖν εἴπῃ Τί ποιεῖτε τοῦτο; εἴπατε Ὁ Κύριος αὐτοῦ χρείαν ἔχει, καὶ εὐθὺς αὐτὸν ἀποστέλλει πάλιν ὧδε.

Articles that govern more than one nominal

Acts 4:18 καὶ καλέσαντες αὐτοὺς παρήγγειλαν τὸ καθόλου μὴ φθέγγεσθαι μηδὲ διδάσκειν ἐπὶ τῷ ὀνόματι τοῦ Ἰησοῦ.

Scope of Negation

Participles: Supplemental

Participles: Circumstantial

Participles: Absolute

Ellipsis

Coordination

See https://jktauber.com/2017/05/24/comparing-analyses-herodotus/.

Relative Clauses

Adjuncts (time, manner, purpose, place, etc.)