Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interlinear alignment off in .eaf stories #14

Open
maksymilian-dabkowski opened this issue Jun 5, 2020 · 4 comments
Open

Interlinear alignment off in .eaf stories #14

maksymilian-dabkowski opened this issue Jun 5, 2020 · 4 comments

Comments

@maksymilian-dabkowski
Copy link
Collaborator

image

@sciepsilon
Copy link
Member

The story shown above is "12 de diciembre", viewable here. The problem seems consistent throughout that story.

It might be a general problem with how LingView handles ELAN files in preprocessing/preprocess_eaf.js, but the demo ELAN file is just fine. Maybe those two files were processed by slightly different versions of LingView, although their most recent Build & Deploy ran at about the same time ("last month") for both of them.

When I open the "12 de diciembre" file in ELAN, it looks correct, with the transcription spanning several shorter annotations. The translation is in a "symbolic associaton" (untimed, 1-to-1) relationship with the transcription, and the morphemes and their glosses are in a "symbolic subdivision" (untimed, many-to-1) relationship with the transcription.

image

We actually have very few examples of ELAN files with morpheme breakdowns in LingView right now. At this moment, the Yucatec Maya site uses only ELAN files and very few of them have morpheme breakdowns; most just have a transcription and free translation. The Cofan site uses only FLEx files.

@sciepsilon
Copy link
Member

Waait a minute... the ELAN screenshot (above) shows that there are actually two non-divided copies of "Tene' in k'aaba'e' Maricruz Kuyoc." and one non-divided Spanish translation. This is exactly what is displayed by LingView. So I think there's no LingView bug here; the fix is just to reorder or delete tiers from the ELAN file so that it looks better on the LingView site.

@elisharf elisharf self-assigned this Nov 16, 2020
@elisharf
Copy link
Collaborator

Should we close this issue, or should we leave it open to perhaps add some code in preprocess_eaf.js to detect and fix issues in ELAN files that might cause formatting problems in LingView?

E.g. do you think we should make sure that aligned tiers are right next to each other?

@sciepsilon
Copy link
Member

We should fix the ELAN file and then close this issue. You can reorder or hide tiers in ELAN by right-clicking on the tier name, if I remember correctly. When you save, those changes will be recorded in the .pfsx file that LingView reads to determine tier order.

We can also explore changing LingView's default tier ordering, but I'm not sure how much that would help in this case. Feel free to explore it if you think there's something here or if it seems like a fun way to get familiar with the codebase.

Duplicate tiers are a large part of why this story looks "wrong", but I wouldn't want LingView to overzealously or mysteriously hide duplicates.

Putting aligned tiers next to each other would be a very nice default, but unless there's a formal relationship between the tiers that says they're aligned, it could be hard for LingView to detect aligned tiers. Some kind of best-effort "check the first three timestamps and assume from there" might actually work well enough, especially since the user can override the ordering by including a .pfsx file. I'm sure exhaustive checking is possible, but it would likely make the preprocessing step slower, and we'll have to weigh the benefit against the cost.

We might also be able to prettify the default ordering just by looking at how many subdivisions there are (which LingView stores as the "num_slots" property), although I'm not sure what the right rule would be. Most slots to least slots? No, that puts the morphemes tier above the words tier. Least to most? No, that puts free glosses at the top, when they ideally should go at the bottom.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants