Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store content object collections in search index [FC-0062] #680

Conversation

pomegranited
Copy link
Member

Description

This PR ensures that when a content object is added/removed from a Collection, its search index document is updated to reflect this change. The collection.key values for each collection are stored in a new tags.collections list.

To support this, we added a new event called CONTENT_OBJECT_ASSOCIATIONS_CHANGED, and trigger it when an object's collections change. This event is also now triggered when an object's tags change, and the previous CONTENT_OBJECT_TAGS_CHANGED event is deprecated.

Supporting information

Related Tickets:

Depends on / blocked by:

Testing instructions

  • Setup tutor with this PR and https://github.com/open-craft/tutor-contrib-meilisearch plugin, plus mounted directories for all the open PRs in the "depends on" section above.
  • Run migrations: tutor dev run cms python manage.py migrate
  • In the content-authoring MFE, create a library, e.g. lib:SampleTaxonomyOrg1:AL1
  • Add some blocks to the library, and note their block keys.
    Hint: you can retrieve the block keys using the REST API, e.g. http://studio.local.edly.io:8001/api/libraries/v2/lib:SampleTaxonomyOrg1:AL1/blocks/
  • Run tutor dev run cms ./manage.py cms reindex_studio --experimental to add the new tags.collections field.
  • Run below snippet to add some collections -- reindexing should happen automatically.
# tutor dev run cms python manage.py cms shell
from opaque_keys.edx.keys import UsageKey
from opaque_keys.edx.locator import LibraryLocatorV2
from openedx.core.djangoapps.content_libraries import api                                                                                                
lib_key_str = "lib:<your lib key>"
block_key_strs = [
    "lb:<your block key",
    "lb:<your block key",
    # ....
]

library_key = LibraryLocatorV2.from_string(lib_key_str)
block_keys = [
    UsageKey.from_string(key_str) for key_str in block_key_strs
]

api.create_library_collection(library_key, "FAL-3787", title="Collection FAL-3787")
api.update_library_collection_components(library_key, "FAL-3787", usage_keys=block_keys)
# <Collection> (lp:1 FAL-3787:Collection FAL-3787) 
  • Open http://meilisearch.local.edly.io:7700/
    Your api key can be found with tutor config printvalue MEILISEARCH_API_KEY
  • Search for FAL-3787 and verify that each of your blocks has FAL-3787 showing in its tags.collections list.

They expect a UsageKey object, not the string object_id.
and ensures this new field is searchable and filterable.

Serializes the object's collections to a list of collection.key values.
which adds CONTENT_OBJECT_ASSOCIATIONS_CHANGED
whenever a content object's tags or collections have changed,
and handle that event in content/search.

The deprecated CONTENT_OBJECT_TAGS_CHANGED event is still emitted when
tags change; to be removed after Sumac.
Copy link
Member

@rpenido rpenido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍
Good work here, @pomegranited!
And thank you for improving the tests!!

I added a nit about the deprecated event and a comment about using the tags field.

  • I tested this using the instructions from the PR (thank you for the thoughtful instructions!)
  • I read through the code
  • I checked for accessibility issues
  • Includes documentation

{
"tags": {
"collections": [3, 4],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not thrilled with storing collections together with tags. If we store it in the same property, I think we couldn't use the reason field to do partial updates inside this doc property (although we can do partial updates at the doc level).

Despite being synchronous, I will not object because these operations are cheap. I'm just failing to see any advantage in this approach. 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rpenido Noted..

I'm still not thrilled with storing collections together with tags. If we store it in the same property, I think we couldn't use the reason field to do partial updates inside this doc property (although we can do partial updates at the doc level).

@bradenmacdonald What do you think? Should we be storing collections in the tags index structure, or should it be its own field?

Despite being synchronous, I will not object because these operations are cheap. I'm just failing to see any advantage in this approach. 🤔

I'm pretty sure it's asynchronous outside of the devstack -- events are handled by the "worker" processes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make it it's own field, unless that's a difficult change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do, it's not difficult at all.

@@ -399,9 +402,12 @@ def tag_object(
taxonomy=taxonomy,
tags=tags,
)
CONTENT_OBJECT_TAGS_CHANGED.send_event(
CONTENT_OBJECT_ASSOCIATIONS_CHANGED.send_event(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should emit the deprecated event here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh oops yes -- we need to emit both events in content_tagging while the deprecation process happens. I'll fix this!

@pomegranited pomegranited marked this pull request as draft September 10, 2024 03:55

e.g. for something in Collections "COL_A" and "COL_B", this would return:
{
"collections": ["COL_A", "COL_B"],
Copy link
Member

@rpenido rpenido Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pomegranited Sorry for the late request (I overlooked this before), but what about changing the structure here?

Suggested change
"collections": ["COL_A", "COL_B"],
"collections": [
{ "display_name": "Collection A", slug: "COL_A" },
{ "display_name": "Collection B", slug: "COL_B" },
],

That way, we can use the display_name as a searchable attribute and the slug/key to a dev action (like a redirect).

Edit: I also added this comment in the upstream PR, so please ignore it here

@pomegranited
Copy link
Member Author

Closed in favor of openedx#35469

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants