Multilingual string data model for OARepo.
pip install oarepo-multilingual
The library provides multilingual type for json schema with marshmallow validation and deserialization and elastic search mapping.
Multilingual is type which allows you to add multilingual strings in your json schema in format "en":"something, "en-us":"something else"
or default value "_" : "default value"
Add this package to your dependencies and use it via $ref
in json schema as "[server]/schemas/multilingual-v2.0.0.json#/definitions/multilingual"
.
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"title": {
"$ref": "https://localhost:5000/schemas/multilingual-v2.0.0.json#/definitions/multilingual"
}
}
}
{
"type": "object",
"properties": {
"title": {
"en": "something",
"en-us": "something else"
}
}
}
For data validation and deserialization.
If marshmallow validation is performed within application context, languages are validated against SUPPORTED_LANGUAGES config. If the validation is performed outside app context, the keys are not checked against a list of languages but a generic validation is performed - keys must be in ISO 639-1 or language-region format from RFC 5646.
class MD(marshmallow.Schema):
title = MultilingualStringSchemaV2()
data = {
'title':
{
"en": "something",
"en-us": "something else",
}
}
MD().load(data)
You can specified supported languages in your application configuration in SUPPORTED_LANGUAGES
. Then only these
languages are allowed as multilingual string.
You must specified your languages in format "en"
or "en-us"
.
app.config.update(SUPPORTED_LANGUAGES = ["cs", "en"])
Define type of your multilingual string as multilingual
Add to your configuration definition of ELASTICSEARCH_DEFAULT_LANGUAGE_TEMPLATE
which will be used as default mapping template for supported languages.
ELASTICSEARCH_DEFAULT_LANGUAGE_TEMPLATE={
"type": "text",
"fields": {
"keywords": {
"type": "keyword"
}
}
}
You can also specified different templates for specific languages with ELASTICSEARCH_LANGUAGE_TEMPLATES
. Use #
and id
for adding more
templates for one specific language
ELASTICSEARCH_LANGUAGE_TEMPLATES={
"cs": {
"type": "text",
"fields": {
"keywords": {
"type": "keyword"
}
}
},
"cs#plain": {
"type": "text",
},
"en": {
"type": "text",
"fields": {
"keywords": {
"type": "keyword"
}
}
}
}
{
"mappings": {
"properties": {
"title":
{"type": "multilingual"}
}
}
}
{
"mappings": {
"properties": {
"title":
{"type": "multilingual#plain"}
}
}
}
You can specified analysis in app configuration with ELASTICSEARCH_LANGUAGE_ANALYSIS
. Use #
and id
for adding more
analysis for one specific language.
ELASTICSEARCH_LANGUAGE_ANALYSIS= {
"cs#title": {"czech#title": {
"type": "custom",
"char_filter": [
"html_strip"
],
"tokenizer": "standard"
}},
"cs": {"czech": {
"type": "custom",
"char_filter": [
"html_strip"
],
"tokenizer": "standard",
"filter": [
"lowercase",
"stop",
"snowball"
]
}}
}
{
"settings":{
"analysis": {
"analyzer": {
"oarepo:extends": "multilingual_analysis"
}
}
},
"mappings": {
...
}
}
{
"settings":{
"analysis": {
"analyzer": {
"oarepo:extends": "multilingual_analysis#title"
}
}
},
"mappings": {
...
}
}