Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic text formatting from website to other documents #81

Open
hollyyuqizheng opened this issue Feb 26, 2021 · 7 comments
Open

Automatic text formatting from website to other documents #81

hollyyuqizheng opened this issue Feb 26, 2021 · 7 comments

Comments

@hollyyuqizheng
Copy link
Collaborator

It would be cool if there is a feature on each story's page that allows the user to take a certain line of text and have it automatically formatted for other documents.

For example, it'd be nice to have a button at each time stamp that exports the current time stamp's line as formatted in LaTeX (eg. 4-line glossing plus the citation for this example) -- this can save people's time when they try to copy-paste examples from LingView to include in a LaTeX paper!

A related use is to export a time stamp's line to the error log that Scott created. This error log keeps track of instances in the texts where something is glossed or translated wrongly so that someone can update the errors in the FLEx files once in a while.

@hollyyuqizheng
Copy link
Collaborator Author

hollyyuqizheng commented Mar 2, 2021

Some initial thoughts:

  • We could add a small button below each sentence's timestamp (this button could potentially only appear when the mouse hovers over the timestamp? just to make the UI not too cluttered). Probably should introduce this new button component to Timed.jsx and Untimed.jsx. The sentences are passed to these sentence display blocks in Story.jsx.
  • When this button is clicked, there could be a new window pop-up with different options of formatted texts. For the LaTeX formatting, could see if we can include both the LaTeX code and (a picture of?) the ultimate result text as well.
  • Some texts might be harder to grab than others. For example, we'd need to extract that line of text from the FLEx file (along with the morphemes, translations, etc.), and we need to extract the metadata (speakers, title of the story, etc.)

@hollyyuqizheng
Copy link
Collaborator Author

hollyyuqizheng commented Mar 17, 2021

Working in the text-formatter branch.

Some key steps:

  • Add button UI and connect a popup window to it
  • Add functions to gather relevant data/text to convert -- potentially a map between words to morphemes and glossing, and then other metadata
  • Add functions for the actual text converting
    -- might need some hard-ish coding of certain formatting, eg. the LaTeX markups for 4-line glossing. I think the main purpose is that the LaTeX code is generated for the user so that they don't have to type it out themselves.
    -- Not sure how to render LaTeX in JavaScript... Many libraries online are for rendering math equation in JS, but if we can render the 4-line glossing LaTeX code, it's also possible to present the final result of the generated LaTeX code.

@hollyyuqizheng
Copy link
Collaborator Author

Thoughts on making the latex formatter more scalable:
I think that most of the changes that need to happen from the current version are in the parts where the formatter “decides” which tiers to grab data from. For the functions that do the actual converting (eg. adding the LaTeX commands such as “\glt” and “\textsc{}”) can remain unchanged, as these functions assume that the appropriate information is passed into them

For the LaTeX package, the 4 pieces of information we need are: the full sentence, each word as divided and marked into its morphemes, the morpheme translation into a certain language, and the translation of the entire sentence. I can think of two main solutions:

(a) Ask for user input whenever the format button is clicked: The button will first trigger a window asking the user to select which tier corresponds to which of these 4 sections that are needed for LaTeX. After the selection, the text conversion happens and the final result is displayed in the window.

  • Need to make sure that the UI for this selection process is clear and easy to understand what is happening
  • A drawback of this approach is that the user potentially has to do the selection everytime they click on the button. We could have some mechanism to allow the user to save their selection when they click on a format button for the first time within the same story?
  • Although, selecting every time a button is clicked also provides some flexibility. For example, a user could be writing both an English paper and a Spanish paper for the target language, and if they have this selection window after clicking the button, they can select to include the English vs. Spanish translation tiers accordingly.
  • I also imagine that this format button is not used very frequently. I imagine that a user needs the Latex formatted code only when they need to include the current sentence in a Latex document, like a paper or handout, etc., so maybe asking the user to select tiers after each button click is not that bad.
  • Another potential drawback with this approach: any organization of the words into a list of all the morphemes in each word needs to happen on the client-side, after the tier selection process. This organization can’t happen during preprocessing.

(b) During preprocessing and building the site, we can require some input annotations explaining which tier corresponds to which section from the LaTeX format. Then, when a sentence is passed into the format button component, this sentence object can hopefully contain information that describes which tiers should be used for which section of the LaTeX code. We could ask for a separate file that is needed to build the site, called “latex-map.txt” or something, and this file is where the site creator describes which tier matches which section of a LaTeX formatting.

  • This “latex-map” file can exist in the “data” folder, and the content of this file can be processed in “rebuild.js” before (and then potentially passed into) buildFlex and buildElan. The output of this processing could be incorporated into each story’s json file, where each tier has an additional annotation whose key could be “isLatexSection”, and the value can be empty (which means this tier is not used in latex conversion), or the name of the section this tier corresponds. So, the tier that corresponds to the morphological translations that should be used in the Latex code could have an entry { “isLatexSection” : “morphologicalTranslation” }. When a button is clicked for any of these sentences, the four entries can be used directly by the button.
  • Although, I’m not sure if we need to do any additional preprocessing than what is done currently in buildFlex and buildElan. What’s in the organizeWords function is mostly grouping each word’s morphemes together, instead of using sentence as the unit for grouping, and I wonder if it’s very necessary to move this function into preprocessing?
  • This file can be made optional, so if the user doesn’t want this feature, they don’t have to provide this file.
  • This approach seems to solve the two drawbacks from the first solution: the site user only needs to provide the tier matching file once, and this matching allows any morpheme organization to happen during preprocessing instead of on the client-side. But, this approach does provide more limited flexibility, such as in the English vs. Spanish paper writing scenario.

@sciepsilon
Copy link
Member

This issue was partially addressed in PR #84, which formats a LingView sentence into a gb4e or gb4e-modified LaTeX gloss. There may be other text formats worth adding in the future.

@hollyyuqizheng
Copy link
Collaborator Author

Some initial feedback from Scott for the initial version:

  • Maybe having the button just say LaTeX would be more clear?
  • I think I would vote for having the button be a bit more out of the way when it pops up, maybe all the way to the right side? I feel like people who want this feature will know where to find it and so I feel like it should be unobtrusive for users that don't care about it.
  • I can see the logic behind having the code popup like it does. I don't know if this is harder, but I wonder if it might be more straightforward if pushing the button just copies the code to to clipboard so you can paste it in your doc without having to cut/paste. I don't think this is super important
  • I think for the Line at the bottom that just had the text name, I think the link should go in there too. For me, I'd want the format Kuke Chiste -- 3:18 with the whole thing as a link to the sentence URL as a hyperlink if you click on that (i.e. \href{https://brownclps.github.io/LingView/#/story/1ed3d641-acd9-4466-811d-17c8ed59844c?7954}{Kuke Chiste -- 3:18}. An alternative would be just to have the URL commented out after. More generally though I think there's sort of a question of how tailored the formatting should be to specific LaTeX choices and what ways we can easily offer customizability (either for the site admin or the end-user) since there's no single way that everyone wants there LaTeX
  • Relatedly, it seems to me that for each LingView site, it might be the case that there is more or less a single optimal LaTeX setup. I wonder if it might make sense to have say 3-4 different options . I don't know if this is too much work and/or just an inelegant software design, but it seems like I will sort of always want the LaTeX output a particular way for the ALDP site and other users are likely to want a single standard format always, so this seems like a feature that site admins/users will want to customize, but end users won't as much need customizability perhaps? This might all be overthinking though if LingView just delivers one usable code chunk since obviously end users can edit their resulting LaTeX code without too much hassle
  • I like that it is by default 4-line format, that is great!

@sciepsilon
Copy link
Member

Also, the current tier selection UI is annoying for FLEx or ELAN files with long tier names. (It looks great on Kuke Chiste, but bad on Singo A'i.) We could improve it by using a grid, like this. The labels along the left side should be the 4 output LaTeX tiers, because users expect to select one button per row (not per column).

                                  palabra en                morfema en a'ingae
                                  a'ingae (Borman)          (Borman)
original sentence                      o                             o
morphemes                              o                             o

@hollyyuqizheng
Copy link
Collaborator Author

More updates from #89 and #90 , including changing the tier selection button panel into a grid-view

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants