-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interlinear alignment off in .eaf stories #14
Comments
The story shown above is "12 de diciembre", viewable here. The problem seems consistent throughout that story. It might be a general problem with how LingView handles ELAN files in When I open the "12 de diciembre" file in ELAN, it looks correct, with the transcription spanning several shorter annotations. The translation is in a "symbolic associaton" (untimed, 1-to-1) relationship with the transcription, and the morphemes and their glosses are in a "symbolic subdivision" (untimed, many-to-1) relationship with the transcription. We actually have very few examples of ELAN files with morpheme breakdowns in LingView right now. At this moment, the Yucatec Maya site uses only ELAN files and very few of them have morpheme breakdowns; most just have a transcription and free translation. The Cofan site uses only FLEx files. |
Waait a minute... the ELAN screenshot (above) shows that there are actually two non-divided copies of "Tene' in k'aaba'e' Maricruz Kuyoc." and one non-divided Spanish translation. This is exactly what is displayed by LingView. So I think there's no LingView bug here; the fix is just to reorder or delete tiers from the ELAN file so that it looks better on the LingView site. |
Should we close this issue, or should we leave it open to perhaps add some code in preprocess_eaf.js to detect and fix issues in ELAN files that might cause formatting problems in LingView? E.g. do you think we should make sure that aligned tiers are right next to each other? |
We should fix the ELAN file and then close this issue. You can reorder or hide tiers in ELAN by right-clicking on the tier name, if I remember correctly. When you save, those changes will be recorded in the .pfsx file that LingView reads to determine tier order. We can also explore changing LingView's default tier ordering, but I'm not sure how much that would help in this case. Feel free to explore it if you think there's something here or if it seems like a fun way to get familiar with the codebase. Duplicate tiers are a large part of why this story looks "wrong", but I wouldn't want LingView to overzealously or mysteriously hide duplicates. Putting aligned tiers next to each other would be a very nice default, but unless there's a formal relationship between the tiers that says they're aligned, it could be hard for LingView to detect aligned tiers. Some kind of best-effort "check the first three timestamps and assume from there" might actually work well enough, especially since the user can override the ordering by including a .pfsx file. I'm sure exhaustive checking is possible, but it would likely make the preprocessing step slower, and we'll have to weigh the benefit against the cost. We might also be able to prettify the default ordering just by looking at how many subdivisions there are (which LingView stores as the "num_slots" property), although I'm not sure what the right rule would be. Most slots to least slots? No, that puts the morphemes tier above the words tier. Least to most? No, that puts free glosses at the top, when they ideally should go at the bottom. |
The text was updated successfully, but these errors were encountered: