Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect breathing marks for some lemmas beginning with vowels, given both as a combining mark and as part of a composed character #37

Open
bcrowell opened this issue Apr 8, 2022 · 0 comments

Comments

@bcrowell
Copy link

bcrowell commented Apr 8, 2022

This is an error that is easy to miss by eye but that causes problems with processing the xml. The first example I came across was at Iliad 5.719, where the proper noun Ἀθήνη is lemmatized as the following unicode string: 787 7936 952 ... Here 787 is a combining comma above, and 7936 is an alpha with a smooth breathing mark. So you have the breathing mark in there twice: once as a combining character and once built into the composed ἀ. If you view the string on a screen, the result will depend a little on what software is rendering it. For example, in the terminal program I use, the combining comma is almost on top of the breathing mark, so it looks like a slightly fatter breathing mark.

This seems to occur repeatedly, but not 100% of the time, for the following lemmas representing proper names (which are in the xml as lowercase): ἀπόλλων, ἀλέξανδρος, ἀφροδίτη, ἀτρείδης, ἀθήνη, ἀχαιός, ἀνδρομάχη, ἰδομενεύς, ὠκεανός, ὀδυσσεύς, ἀσκληπιάδης, ἀντίλοχος, ἀχιλλεύς, ἀγχίσης, ἀλκίνοος, ἀρήτη, ὠγυγία.

Also: ἐνψύω at iliad 8.382, and some other non-proper nouns: ἐννοσίγαιος, ἀμφίμαχος.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant