Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support setting language on posts #579

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

chdorner
Copy link
Contributor

@chdorner chdorner commented May 15, 2023

I think this is everything that's needed to implement Mastodon's post language feature.

  • New identity config setting for the preferred language, will be used when creating local posts without a language key in the request.
  • Support for the Mastodon API, either creating/updating a post, or wherever posts are rendered
  • Support for ActivityPub federation via contentMap
  • Helps screenreaders on the web frontend by setting the lang attribute on the post content

Note on client compatibility:

  • Ivory doesn't seem to support any of this so far
  • Elk
    • allows to set the language on each post
    • doesn't use the posting:default:language preference, but uses its interface language as the default value when creating a new post
    • viewing a post in a different language, it offers to translate it

activities/models/post.py Outdated Show resolved Hide resolved
@pauloxnet
Copy link
Contributor

pauloxnet commented May 15, 2023

LGTM

You can also update the content_vector_gin to use the new language field in the SearchVector's config attribute instead of the default english, as done in the djangoproject.com search document:
https://github.com/django/djangoproject.com/blob/main/docs/search.py#L42

@chdorner
Copy link
Contributor Author

@pauloxnet interesting. just to make sure that I'm getting this right since the Mastodon languages are stored as 2-char strings (ISO-639-1, en, de, etc.), we'd have to store an extra column with the search config translating these values to the known postgres search configs (looks like there are 28, plus simple), and then recreate the search index pointing the config to that column instead of hardcoding the english config

@pauloxnet
Copy link
Contributor

It seems right.

In the Django Project code I used a dictionary to map 2 characters long iso languages into language names for PostgreSQL config.

Maybe there's a similar way to map language code into config names without an additional fiepd?

@andrewgodwin
Copy link
Member

I think the problem with changing the SearchVector language is that it's embedded into the index, is it not? We can't have 20-odd indexes on the content of posts, one for each language, and doing a search query without an index for it sounds painful.

@pauloxnet
Copy link
Contributor

I think the problem with changing the SearchVector language is that it's embedded into the index, is it not?

Actually, I think you can have an index based on the language stored in a column in the same table, but I'd leave the change until after this PR is merged.

@chdorner
Copy link
Contributor Author

Did a bit more research on adding the tsvector config to the index for this. I can get it to work with just SQL, but having a hard time to get Django to do this for me. Any help would be appreciated!

-- adding a column of type `regconfig` to store each record's tsvector config
ALTER TABLE activities_post ADD COLUMN tsvector_config regconfig DEFAULT 'simple';

-- creating the GIN index
DROP INDEX content_vector_gin;
CREATE INDEX content_vector_gin ON activities_post USING GIN (to_tsvector(tsvector_config, content));

@pauloxnet
Copy link
Contributor

The SQL code is ok for me.
What's your issue in generating that same code with the Django ORM ?
I can guess the regconfig type of column?

Copy link
Member

@andrewgodwin andrewgodwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the migration - let's get this landed without it, and we'll figure that out separately. The new Django index objects should support this, but if it's complex or weird enough then we'll just do a raw SQL one, since we only support PostgreSQL anyway.

activities/models/post.py Outdated Show resolved Hide resolved
[
(lang.alpha_2, lang.name)
for lang in pycountry.languages
if hasattr(lang, "alpha_2")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably good enough for now - I think it's how most of the other clients work - but we might need to add support for separating the various Spanish dialects in future, for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the Mastodon docs are pretty explicit that it's the two-letter language code. Not sure how other clients would behave if the language code would be different. Already increasing the max length of the database field sounds safe enough to do already 👍

activities/models/post.py Outdated Show resolved Hide resolved
@chdorner chdorner force-pushed the feat/post-lang branch 2 times, most recently from df29a43 to b6f9df6 Compare June 6, 2023 18:15
@pauloxnet
Copy link
Contributor

pauloxnet commented Jun 14, 2023

@chdorner since the language field is no longer nullable, some type hints and checks on None need to be updated.

@alphatownsman
Copy link
Contributor

alphatownsman commented Feb 10, 2024

@chdorner @AstraLuma this is absolutely great feature. any chance get this updated / merged? happy to do anything I can to help.

@@ -1152,7 +1174,6 @@ def to_mastodon_json(self, interactions=None, bookmarks=None, identity=None):
if isinstance(self.type_data, QuestionData)
else None,
"card": None,
"language": None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be updated to forward the stored language?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants