Store Kafka messages in user session #106

cc-a · 2024-09-24T21:49:03Z

Description

Moves from storing Kafka messages in a hacky global variable to using the Session data of active users. This works by:

Running the Kafka consumer as a Django admin command. This means it has access to the Django context and settings and can directly populate messages into any active user sessions (i.e. the database).
Having the index view pop all messages for the session of the user making the request.
Using appropriate database settings and atomic transactions to prevent race conditions between the consumer process and the web app.

This approach is a fairly basic but gives some nice properties:

storing messages in the session of each individual user gives an easy mechanism for telling which messages have been displayed to which user.
the lifetime of messages stored in the sessions is bounded by the standard handling of Django session data (though see below caveat).
having the consumer populate into the database directly avoids having to think about how to protect an API end point.
readily supports multiple threads/processes/replicas of the web application.

One caveat is the need to clear the session store in deployment - https://docs.djangoproject.com/en/5.1/topics/http/sessions/#clearing-the-session-store. We should also agree a session expiry period with the project team.

Fixes #76.

Type of change

Documentation (non-breaking change that adds or improves the documentation)
New feature (non-breaking change which adds functionality)
Optimization (non-breaking, back-end change that speeds up the code)
Bug fix (non-breaking change which fixes an issue)
Breaking change (whatever its nature)

Key checklist

All tests pass (eg. python -m pytest)
The documentation builds and looks OK (eg. python -m sphinx -b html docs docs/build)
Pre-commit hooks run successfully (eg. pre-commit run --all-files)

Further checks

Code is commented, particularly in hard-to-understand areas
Tests added or an issue has been opened to tackle that in the future. (Indicate issue here: # (issue))

codecov-commenter · 2024-09-24T21:50:47Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (3864185) to head (76c440a).
Report is 4 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff            @@
##              main      #106   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            5         5           
  Lines           43        44    +1     
=========================================
+ Hits            43        44    +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

alexdewar

Some v minor suggestions, but otherwise looks good 👍.

alexdewar · 2024-09-25T09:44:20Z

docker-compose.yml

@@ -2,11 +2,10 @@ services:
 app:
 build: .
 command:
- - sh
+ - bash


Did plain old sh not work?

Not a strictly necessary change in this PR. I found that bash seems to be better at forwarding interrupt signals to child processes so services shutdown more quickly and cleanly.

docker-compose.yml

alexdewar · 2024-09-25T09:46:28Z

main/management/commands/kafka_consumer.py

+ """Add commandline options."""
+ parser.add_argument("--debug", action="store_true")
+
+ def handle(self, debug: bool = False, **kwargs: Any) -> None: # type: ignore[misc]


Presumably this is because we disallow explicit Any? Tbh I'm not sure that setting is super helpful -- there are cases (like this one) where an explicit Any makes sense. Maybe we should change that?

I actually like to disallow explicity Any as I think Any's can be a bit of a crutch so prefer that we think carefully about where we use them and explicitly silence the check where necessary.

alexdewar · 2024-09-25T09:49:31Z

main/management/commands/kafka_consumer.py

+ self.stdout.flush()
+ bm = BroadcastMessage()
+ bm.ParseFromString(message.value)
+ message_bodies.append(bm.data.value.decode("utf-8"))


Encoding defaults to UTF-8:

Suggested change

message_bodies.append(bm.data.value.decode("utf-8"))

message_bodies.append(bm.data.value.decode())

dalonsoa

Looks very neat!

dalonsoa · 2024-09-25T09:57:06Z

docker-compose.yml

+ kafka_consumer:
+ build: .
+ command:
+ - bash
+ - -c
+ - |
+ python manage.py kafka_consumer --debug


So, just to make sure I get this, we are using the very same code AND database in both apps, but in one we run the server and in the other we run the kafka_consumer. Is that to better use the compute resources or to better manage the different components in case one of them fail?

That's right. Ideally Docker Compose is built around the idea of only having one process running in each container. Whilst we could run this is the background under the app service we would indeed struggle with handling error states. I also think it's cleaner to have the consumer logs in a separate feed.

Ideally we might also run the ssh server in the drunc service as a separate service but there is a hard coded restriction in the dummy_boot command that the process must be started on localhost.

It's also a nice proof of principle that these things can be run in separate places which will be important for deployment where we may have multiple copies of the app running but will only ever want one consumer.

cc-a · 2024-09-25T11:22:06Z

Thanks for the reviews so far. I'm planning to merge this towards the end of the day.

jamesturner246

Okay, I follow the logic. Apologies; pile of questions as usual.

Messages are polled from Kafka by a special meta/admin Django function non-stop.

Users who are logged in at a given moment have their sessions atomically updated with messages.

This means that if one user pops the messages, they are not popped for all users.

jamesturner246 · 2024-09-27T11:05:51Z

main/management/commands/kafka_consumer.py

+ sessions = Session.objects.all()
+ for session in sessions:
+ store = SessionStore(session_key=session.session_key)
+ store.setdefault("messages", []).extend(message_bodies)


So setdefault handles where messages hasn't been added to session, and extend is a funny way of saying append?

Right. setdefault is shorthand for the below pattern:

value = mydict.get("key") if value is None: value = [] mydict["key"] = value

Extend is basically append but it takes an iterator of values to add rather than a single one. The key thing is that it mutates the existing list.

jamesturner246 · 2024-09-27T11:06:29Z

main/management/commands/kafka_consumer.py

+ with transaction.atomic():
+ # atomic here to prevent race condition with messages being
+ # popped by the web application
+ sessions = Session.objects.all()


This gets all currently logged in users' sessions?

It gets all sessions in the database. This broadly corresponds to logged in users subject to the caveat I mention above.

jamesturner246 · 2024-09-27T11:07:27Z

main/management/commands/kafka_consumer.py

This command file is in some magic directory which is automatically loaded by Django?

Correct. Docs - https://docs.djangoproject.com/en/5.1/howto/custom-management-commands/#module-django.core.management.

cc-a added 2 commits September 24, 2024 22:48

Store Kafka messages in Django user sessions

ef926da

Add test for message processing by index view

e8f48b0

cc-a requested review from dalonsoa, jamesturner246, TinyMarsh, alexdewar and AdrianDAlessandro September 24, 2024 21:49

alexdewar approved these changes Sep 25, 2024

View reviewed changes

dalonsoa approved these changes Sep 25, 2024

View reviewed changes

Merge branch 'main' into messages-in-user-session

76c440a

cc-a enabled auto-merge September 25, 2024 17:08

cc-a merged commit a34e2c9 into main Sep 25, 2024
4 checks passed

cc-a deleted the messages-in-user-session branch September 25, 2024 17:09

jamesturner246 reviewed Sep 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store Kafka messages in user session #106

Store Kafka messages in user session #106

cc-a commented Sep 24, 2024

codecov-commenter commented Sep 24, 2024 •

edited

Loading

alexdewar left a comment

alexdewar Sep 25, 2024

cc-a Sep 25, 2024

alexdewar Sep 25, 2024

cc-a Sep 25, 2024

alexdewar Sep 25, 2024

dalonsoa left a comment

dalonsoa Sep 25, 2024

cc-a Sep 25, 2024 •

edited

Loading

cc-a commented Sep 25, 2024

jamesturner246 left a comment

jamesturner246 Sep 27, 2024

cc-a Sep 27, 2024

jamesturner246 Sep 27, 2024 •

edited

Loading

cc-a Sep 27, 2024

jamesturner246 Sep 27, 2024

cc-a Sep 27, 2024

	message_bodies.append(bm.data.value.decode("utf-8"))
	message_bodies.append(bm.data.value.decode())

Store Kafka messages in user session #106

Store Kafka messages in user session #106

Conversation

cc-a commented Sep 24, 2024

Description

Type of change

Key checklist

Further checks

codecov-commenter commented Sep 24, 2024 • edited Loading

Codecov Report

alexdewar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dalonsoa left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cc-a Sep 25, 2024 • edited Loading

Choose a reason for hiding this comment

cc-a commented Sep 25, 2024

jamesturner246 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamesturner246 Sep 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Sep 24, 2024 •

edited

Loading

cc-a Sep 25, 2024 •

edited

Loading

jamesturner246 Sep 27, 2024 •

edited

Loading