Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Outreachy: "MapKnitter.org Spam Management System" Project Q&A #1745

Open
PeculiarE opened this issue Jun 2, 2022 · 19 comments
Open

Outreachy: "MapKnitter.org Spam Management System" Project Q&A #1745

PeculiarE opened this issue Jun 2, 2022 · 19 comments

Comments

@PeculiarE
Copy link
Contributor

This is the discussion thread for questions on my Outreachy project (#1744).

@PeculiarE
Copy link
Contributor Author

PeculiarE commented Jun 2, 2022

Hi @jywarren! I plan on starting work on the first couple of tasks on the list at #1744 but to do that, I need clarifications on the following, please.

(1) So @grvsachdeva has already done some work on these tasks in the PR #1030. And the project goals did mention building on existing work. So I'd like to clarify if this existing PR will be merged or am I supposed to just simply incorporate some elements from it into my own?

(2) When adding the status column to the maps (as seen in #1030), all maps will get a default status of 1 (i.e. published) including maps which have been archived using the old spam/archive method. This means we would then have two conditions for fetching map(s): their status(published/spammed) and whether or not they've been archived.

Should we then:

a. Make every archived map automatically a spammed map when doing migrations, i.e. for rows whose archived value is 1, assign the status of 0 instead of the default 1 to the status column. This way, our fetching endpoints will only fetch based on a singular criteria: the status column

b. Leave as is. Give all maps a default of 1 and fetch based on status as well as check if the map is archived.

I believe these are the 2 questions I need answered first before I get started. Other qs will definitely crop up as I move from one task to the next on the list.

Lastly, even though the design implementation won't be happening now, I would like to clarify if the proposed design is okay or we need to make a few tweaks to it before implementation later down the line.

Thank you!

@jywarren
Copy link
Member

jywarren commented Jun 2, 2022

(1) So @grvsachdeva has already done some work on these tasks in the PR #1030. And the project goals did mention building on existing work. So I'd like to clarify if this existing PR will be merged or am I supposed to just simply incorporate some elements from it into my own?

I would be OK with either route. Do you have a feeling for which you think makes more sense, based on the work Gaurav did, or would you like me or another mentor to look through the PR and see how far along it is, and whether we think there's a best way forward?

(2) When adding the status column to the maps (as seen in #1030), all maps will get a default status of 1 (i.e. published) including maps which have been archived using the old spam/archive method. This means we would then have two conditions for fetching map(s): their status(published/spammed) and whether or not they've been archived.

Should we then:

a. Make every archived map automatically a spammed map when doing migrations, i.e. for rows whose archived value is 1, assign the status of 0 instead of the default 1 to the status column. This way, our fetching endpoints will only fetch based on a singular criteria: the status column

b. Leave as is. Give all maps a default of 1 and fetch based on status as well as check if the map is archived.

I believe these are the 2 questions I need answered first before I get started. Other qs will definitely crop up as I move from one task to the next on the list.

I think we should make spammed/archived maps status 0, as essentially all of them are spam. And in any case, we aren't deleting them -- since it's non-destructive, it's OK to just reclassify them under the new system.

Lastly, even though the design implementation won't be happening now, I would like to clarify if the proposed design is okay or we need to make a few tweaks to it before implementation later down the line.

I think the design is good as it is! Since spam moderators will be using both PublicLab.org and MapKnitter's spam dashboards, we want both the style/layout and functionality to be very similar, or they could get confused and make mistakes. So unless you see an opportunity for a major improvement, or elimination of a serious ambiguity or bug, we should stick as close as we can to the PublicLab.org design. That said, if you have extra time towards the end of your project, we could think about making improvements across both systems in synchrony!

Thank you for the excellent questions!!

@PeculiarE
Copy link
Contributor Author

I would be OK with either route. Do you have a feeling for which you think makes more sense, based on the work Gaurav did, or would you like me or another mentor to look through the PR and see how far along it is, and whether we think there's a best way forward?

I believe I'd prefer incorporate stuffs from the PR as I work down the planning list. While there are some functions in there that are pretty straightforward and I can understand what they do and where they are to be used, I'm not so sure of others. But they might begin to make sense as I progress through the project. And I believe in that case, it's better to come to those conclusions myself rather than merging them all at once without being sure what each function does and how it ties to the project goals and tasks.

I think we should make spammed/archived maps status 0, as essentially all of them are spam. And in any case, we aren't deleting them -- since it's non-destructive, it's OK to just reclassify them under the new system.

Alrighty! Thank you. This further buttresses my point about not merging the PR straight up as the migrations there do not take this new addition into consideration.

However, this leads to another very important question, I think. My thought process is still garbled on this one, been turning my head on it for a couple of hours now. So, I'm just gonna lay it all out and hopefully, we can decide on a clear-cut process:

The policy on PublicLab (which we fully intend to incorporate in this project) is that spamming content automatically bans the owner of the content. If we spam maps with the archived status, we should also ban the author of those maps as well during the migrations.

But does this make sense from a business/community perspective? Banning authors now whose maps were archived maybe 2 years ago.... I'm not sure if this is okay. What if the author is no longer a spammer and now uses MapKnitter as it should be used?

And in a sorta roundabout way, we have to consider that one of the project goals is to spam all maps of a banned user. So do we also spam all maps of that banned user?

In summary, if we give an archived map a status of 0 (i.e. spammed), should we ban the user, and then spam all maps belonging to that user? Is this overkill, or is there a better way of handling this?

I think the design is good as it is! Since spam moderators will be using both PublicLab.org and MapKnitter's spam dashboards, we want both the style/layout and functionality to be very similar, or they could get confused and make mistakes. So unless you see an opportunity for a major improvement, or elimination of a serious ambiguity or bug, we should stick as close as we can to the PublicLab.org design. That said, if you have extra time towards the end of your project, we could think about making improvements across both systems in synchrony!

Cool, thank you @jywarren. This makes a lot of sense.

@jywarren
Copy link
Member

jywarren commented Jun 4, 2022

I think you're right that although, as a matter of policy, we usually ban authors, we don't have to in this retrospective way. Although I want to note that banning is reversible and we usually tell people how to appeal in the message they see upon logging in, so it's not a permanent state. But I still agree with you, let's just spam the maps, not the users for now.

One additional advantage of taking pieces from the PR instead of merging the whole thing is that we could look for more opportunities to break it into pieces. Some parts could probably be done before, even if those parts don't actually get activated in the live site without their companion pieces. But if we write tests to protect them, they are still ok to merge in their own earlier PR, simplifying the review process and lowering the stakes of publishing to the live site. For example, we could decide to deal with old maps (and the migration) in it's own PR, without yet worrying about code that actually interacts with the migration. That's just one example.

It might be good to start thinking about how your tasks break into PRs and which if any might be dependent on others. But don't worry too much about knowing every step ahead of time. You can also just keep it in mind, while picking out smaller tasks that can be done in a PR, and it'll naturally filter out the big ones that'll take more coordination in a big PR later.

Great questions!

@PeculiarE
Copy link
Contributor Author

Thank you @jywarren! I do agree with you on the importance of writing tests to help protect against any breaking changes.

I'm used to solving a task in one huge PR, and breaking one task into several related PRs is going to be a major (but very doable) shift in my coding style. I'm excited (and a wee bit anxious) to delve into this and learn a new way of doing things 😅

@PeculiarE
Copy link
Contributor Author

Hi @jywarren, @cesswairimu, @TildaDares....I'm not so conversant with tests in the Rails framework so I'm not sure what the error in the image below means.

Screenshot 2022-06-08 at 11 11 15

I believe that all the tests pass (because of the 0 failures) but then codecov throws an error about not finding a repo or a owner and I'm not quite sure what this means and how to debug it. Is this error related to the codebase or just specific to codecov?

Thank you!

@cesswairimu
Copy link
Collaborator

Hi @PeculiarE you are right, for codecov, we normally use it on pull requests to check for test coverage and its weird that the reports are appearing locally I will look into that but you can ignore those for now, thanks

@PeculiarE
Copy link
Contributor Author

Alright! Thank you 😄 @cesswairimu

@PeculiarE
Copy link
Contributor Author

Hi @jywarren....so I've been working on the single spam and batch spam endpoints for the past few days. Usually, the policy at PublicLab.org is that spamming content bans the author, but since we have anonymous authors on MapKnitter, what I implemented checks if the owner of the map is anonymous, and if yes, the only action to be carried out will be the spamming of the map; banning of the author is skipped.

Does this pass muster? Or is there a better way of going about maps with anonymous authors?

@jywarren
Copy link
Member

what I implemented checks if the owner of the map is anonymous, and if yes, the only action to be carried out will be the spamming of the map; banning of the author is skipped.

I think that sounds great. Thanks!

@PeculiarE
Copy link
Contributor Author

PeculiarE commented Jun 19, 2022

Hi @jywarren, so I have a PR up at #1762 on spamming maps (either singly or in batches).

Code Climate is requesting for a bit of refactoring to reduce complexity. While I work on that, I'd love to get a bit of clarity on some questions:

1. In the project goals, we have:

  • Display a list of any maps a banned PublicLab.org user has (if any) when they are banned
  • Offer a button to spam all maps by that user in one click, like mapknitter.org/spam/user/

Now, when we spam a map or several maps, we ban the author(s) (as long as they aren't anonymous). Based on the project goals above, is the spamming cycle supposed to run like this:

spam a map -> automatically ban the author -> spam all maps belonging to that author

Or does this feature of spamming all maps belonging to a banned author only apply to when the moderator manually clicks on ban user(s)?

2. In the general project discussion thread at #10754, you mentioned that advertisement on a map or any of its images was a major spam flag. So, this got me thinking: when spamming a map, we ban the main author (the user who created the map), but shouldn't we also ban co-authors (users who uploaded images on the map) if any?

@jywarren
Copy link
Member

Hi @PeculiarE - sounds super, this is great to see.

to your questions, for 1) each of these takes some careful consideration. I think it's likely that an author posting spam will mean their previous maps are also spam. But it's not 100% the case. I think it's OK to spam all maps belonging to that author, as long as we a) clearly tell the admin that this happened, and b) link to an easy place to review what those maps were and let the admin undo the action if need be. This could just be that in the alert we show, we name the maps and link to them, and that we ensure that an "unspam" button is on each map page we link to. OR, maybe it's easier to just link to a sorted list of recently spammed maps? But if it's an old map, maybe it won't appear in a list of spammed maps, since we probably don't track the timestamp of when we spammed it? What do you think? I think maybe easier to link to each map and put a button on those pages (visible to mods only) which lets us undo the action? I believe there is a per-author page with a list of maps too. So that's a 3rd option perhaps - show map state there, and a button to republish the map or something? Isn't it supposed to be something like http://mapknitter.org/profile/warren ?

  1. Hmm, co-authors... well, let's see -- looking over some spam maps, do we see that multiple authors is a common scenario? I might guess that spammers don't work in synchrony, or that a single spammer doesn't make multiple accounts... that would be my guess. So I think it's unnecessary... what do you think?

@PeculiarE
Copy link
Contributor Author

Hi! @jywarren

1. I was thinking through your responses and had already started mapping a feasible solution based on these ideas when I realised a flaw(or at least something I think will be a major design/logic flaw) with an undo action:

If a user is banned and all their maps automatically spammed, providing the admin the option of 'undoing' this auto-spam (i.e. unspamming one or more of the spammed maps) poses a problem. Because unspamming a map automatically unbans the author, the admin's original action is also reversed/undone.

Based on this, I propose the automatic spamming of a banned user's maps be limited to only actions not induced/started by an admin/mod(e.g. scheduled calls from MapKnitter to PublicLab's API to check if an author has been banned.) In this case, it makes sense to automatically spam the maps of such users

But in situations where the admin is directly banning an author or spamming a map (and consequently banning the author), the admin should be the one to decide if they then want to spam some, all, or even none of the maps belonging to the banned author.

I wrote about how we could achieve this in my proposal in this particular section. A couple of adjustments will have to be made to my proposed ideas there, but in summary, it will work like this:

When a single author is banned either directly or via the spamming of their map: The admin will be taken to the profile page of the author(e.g. http://mapknitter.org/profile/warren) and they will have the option to spam a single map or batch-spam multiple maps of the author.

When multiple authors are banned at once either directly or via the spamming of their maps: While we can list the links to all the authors' profile pages on the alert, the alert message could become extremely long and cluttered in cases where we have a sizable number of banned authors.
I believe a better idea will be to link to a page (visible only to admins/mods) showing authors banned recently (within the past 30mins? 1hour? What do you think is an appropriate time range?). Clicking on each author on the page will then take us to their profile page where we can then spam a single map or batch-spam multiple maps.

To achieve this, we will need to create a timestamp column on the users table that is only updated when the user status changes (maybe something like status_updated_at). I dunno if we need to do same for the maps table. I don't see the need right now, but who knows? We just might need it in the nearest future. What do you think?

One last thing: In the alert message to the admin, we can provide the link to the profile page (for a single author) or in the case of multiple authors, the recently banned page. The admin will have to click on the provided link to visit these pages. An alternative will be to automatically redirect them to these pages. The alert message will inform them they will be redirected to where they can spam the maps of the banned author(s) if they so choose. I'm in favour of the redirection option as it ensures the admin visits the page just like we intended to review all the maps belonging to that author. Would you prefer us just providing the clickable link instead, and then it's up to the admin/mod to decide to visit the page or not?

2. It is very possible for a single spammer to make multiple accounts. For example, these maps look like one person created them but with different accounts. But you're right when you say the chances of both spam accounts authoring the same map is pretty low. So yeah, multiple-author banning is unnecessary. Thank you 😄

@jywarren
Copy link
Member

Hi, @PeculiarE i appreciate your careful thinking on this! I'm OK with this extra page showing recently spammed maps, but yes you are right it will require a new column. I think we can not for the maps table for the time being.

Actually, does changing the user status automatically change user.updated_at? Could we sort by that?

Could we also have a button on that page that just says "spam all maps by author" -- imagine if the user is named "freeCarWashesDotCom" and you are really quite sure. It could save some time!

@PeculiarE PeculiarE added the spam label Jun 29, 2022
@PeculiarE
Copy link
Contributor Author

Hiya @cesswairimu, @TildaDares, @jywarren

While working on the PR at #1772, I checked the plots2 repo to see how the publish action executes over there. And I noticed the 'Post spent so and so time in moderation' for first-time posts.

I'm not so sure the reason why this message is displayed to the moderator, and I'd appreciate some clarification.

Also, Is it something we want implemented on MapKnitter as well? Along with the extra directions within the alert?

Please see the lines of code being referred to here.

Secondly, the concept of a 'moderated' user (i.e. a user with the status of 5) is still vague to me. I'm not sure what preempts the decision to moderate the user and what are the defined limitations of a moderated user. I did read somewhere (can't remember the exact issue or pull request) that moderated users are 'almost' the same as banned users. And so I'm not sure....if a user is moderated already, can they still be banned? Or does banning only happen to users with status of 1?

@jywarren
Copy link
Member

jywarren commented Jul 5, 2022

Hi @PeculiarE - sorry for my slow reply, i was traveling back to the US and missed some things.

That message was supposed to give the moderators a sense of whether the moderation process is introducing too much delay for newcomers - like, do we feel bad if they had to wait 2 days, vs. 2 hours? I think if we have the data, we might as well show it for MK as well.

I would not worry about duplicating the moderated user state. We have used it only once ever. It's designed so that if someone breaks the code of conduct, we may ban them, but not necessarily delete their contributions (that's left to the discretion of the moderator). Because they may have been banned for conduct unrelated to their work (as opposed to being banned for posting a commercial advertisement unrelated to Public Lab themes). Does that make sense? It's a very rare occurrence. So I don't think its necessary to do this in MK.

@PeculiarE
Copy link
Contributor Author

No worries @jywarren 😄 .....glad you arrived back home safely.

We do have the data on MK, thank you. Would include it in my next PR or make an FTO for it. I think we can skip the 'Now reach out to the new community member; thank them, just say hello....' part of the alert message present in the Public Lab version because:

  1. map creators are not necessarily new members to the community (logged-in users are coming from Public Lab)
  2. Non-logged-in users are anonymous and we can't even reach them
    What do you think?

Thank you for your detailed explanation on the concept of a moderated user. Wasn't planning on implementing same on MK though. I asked because of a question constantly buzzing in my brain: "if a user is moderated, can he or she still be banned?" And based on your explanations, the answer to that will be yes. Because the moderated state might be due to violating our code of conduct policy and not necessarily for posting spam content.

Thank you!

@PeculiarE
Copy link
Contributor Author

That message was supposed to give the moderators a sense of whether the moderation process is introducing too much delay for newcomers - like, do we feel bad if they had to wait 2 days, vs. 2 hours? I think if we have the data, we might as well show it for MK as well.

Based on our recent conversation at #1786 and pending confirmation from the community reps, I do believe this is redundant, and there is no need to show this data anymore.

@jywarren
Copy link
Member

jywarren commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants