-
Notifications
You must be signed in to change notification settings - Fork 19
Cleanup of old user accounts - Issue#218 #230
base: master
Are you sure you want to change the base?
Conversation
|
||
from piplmesh.account import models as account_models | ||
from piplmesh.api import base, models as api_models | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
base
is not used?
Fixed. |
from piplmesh.api import models as api_models | ||
|
||
@task.task | ||
def clean_inactive_lazy_users(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. But this we will have to REALLY improve. Now you read ALL users from database into Python just to know which users not to process. ;-)
This is why databases support queries. So that you can limit what is transferred between database and Python to only what you are interested in at the end.
So please create a MongoEngine query which will return only users which have not content and which are was more than the timeout inactive. Then run over them is_anonymous
and delete them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been trying for few days now to find a way to build such query, but I had no luck. I literally checked all pages on google about this topic but I'm still stuck at the query. Any help would be much appreciated. I assume I have to use something like this:
api_models.Post.objects(comments__author=user)
- returns me the posts, which have comments written by an user. But this is just one part of the query and I don't know how to combine posts, comments and users together in a query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this you can get all posts by the user at the same time as all posts with comments by the user: http://mongoengine-odm.readthedocs.org/en/latest/guide/querying.html#advanced-queries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(But it is not necessary that it helps you here much.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But probably you will have to do a server-side query. Have you tried asking the question on StackOverflow?
OK. So I suggest you to do this in two steps. First define periodic Celery task which computes statistics for each user, that is, for now, number of posts and comments. You can define in User document new field, For the map-reduce, you use incremental map-reduce over posts collection with user as output collection, where in map for each post you compute for each user counts, then in reduce you add this count to temporary field in user ( Then, you define another Celery task, which selects all inactive users with This could in theory work. In practice - you will see. ;-) |
Ping. |
Please update from main repository. |
Ok. |
And of course, finish this. :-) |
How do I even call the map_reduce()? I tried this: Result:
I read that I have to import MongoDBManager, but I don't know where do I have to import it from? It is done like this in docs: Also, I asked for help on this Issue at Stack Overflow, but there was no accurate answer. I only managed to transform this python code ...
... into a query. But still no nice way to iterate over posts' comments. |
|
You could provide a link to your question on Stack Overflow. Maybe I would understand more what is question here. |
So, we are using MongoDB as a database. For it, there are multiple libraries. One is low-level library for direct access, and then on top of that there are some abstractions. One of those high-level libraries is mongoengine and we are using it. |
Here it is written about map-reduce interface exposed by MongoEngine. So it seems |
Here is my question on SO. |
You do understand why we are wanting map reduce? Because we do not want to iterate over all posts every time. Iterate over all posts and transform them into documents. One approach is to use |
So proposed approach with map-reduce is not easy. :-) I must admit that I didn't get to it myself, but @kostko did. :-) But now you just have to implement it. This should be easy. :-) |
Yeah, I do understand why we want map/ reduce. But at the time I asked the question on SO I was wondering if there's any other way to do this nicely. |
How is this going? |
TODO. |
Ping? Do you want to work more on this? |
I moved account code to separate Python package, so you will have to continue there: |
Added a Celery task that checks for inactive lazy users once a day and removes them form DB.