Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import and display translations for model values #783

Open
hancush opened this issue Oct 12, 2021 · 1 comment
Open

Import and display translations for model values #783

hancush opened this issue Oct 12, 2021 · 1 comment

Comments

@hancush
Copy link

hancush commented Oct 12, 2021

Through conversation elsewhere, we've arrived at an approach to importing and displaying translated model values.

At a high level, this approach will expand the import to ingest a new sheet of translated model values, with the following fields (h/t @tlongers):

1   translation:model_type_admin # entity name ("unit")
2   translation:model_type_fieldname # fieldname from entity ("unit:name")
3   translation:value_type # whether for a specific instance or from a category ("category","instance")
4   translation:model_type_id # for instance values, this records the uuid; otherwise, value should be "NA_character_"
5   translation:raw # the original value, language agnostic
6   translation:ar # for Arabic translation string
7   translation:en # for English translation string
8   translation:es # for Spanish translation string
9   translation:fr # for French translation string

For fields containing more than one value, e.g., unit other name, values should be decomposed into one value per row in the translation sheet. Categorical values that appear many times in the source data, such as person posting title, need only appear once in the translation sheet.

On the back end, the app will leverage django-modeltranslation to declare translated fields and display the correct value based on the selected locale. We will also adjust our search implementation to use a distinct core for each configured language, in order to display both search results, as well as search facets, in the selected locale.

An evaluation of the multi-core approach and scaffolding for the import changes have been started in this branch.

@tlongers Please feel free to add anything I've missed!

@tlongers
Copy link
Member

Thanks for the precis. I think that covers the essence of the work and the branch shows the direction of travel with respect to implementation.

Only some of the fields in the SFM models have values that need to be translated. As Hannah mentioned, there are also only two types of data: "instance" and "category". "Instance" includes unique values like unit names and person names that are tied to a specific record; "category" includes stuff like unit classifications, which appear frequently outside of the context of any specific record.

entity type fieldname
unit instance unit:name
unit instance unit:other_names
unit category unit:classification
person instance person:name
person category person:posting_role
person category person:posting_title
person category person:posting_rank
incident category incident:violation_type
incident instance incident:violation_description
location instance location:name

The solution we've settled on is to wrestle the SFM data sheets into a big list of unique values, tied to the UUID of their record, which can be then be worked on by translators and imported to WhoWasInCommand. Here's an example using our Mali data. The tab ml_translation contains the translatable values from the other three sheets.

For future readers of this issue (including our forgetful future selves), it's important to note that the rationale is to be able to supply localized data to end users, and not only a localized interface for viewing data that may be a completely different langauage. The issue of interface translation is a solved problem using gettext and various frontends to manage the string translations. However, translating the data itself is more a more involved and ongoing process, because the datasets change often. It's better to see the translation of data values as part of the research workflow rather than solely within the domain of the WWIC technical product. The work discussed here is part of a package of workflow, methodological, modelling and internal tooling changes that need to happen all at once. For example, we will need a non-WWIC tool to keep the evolving dataset and the translation files in sync, and point out new and changed entries to the research team and translators. The new translation model is also Another Thing to Validate, so we will need to think of ways of doing that successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants