Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content Blocker Bot #653

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

Content Blocker Bot #653

wants to merge 16 commits into from

Conversation

kelvinkipruto
Copy link

Description

This PR introduces a new app to check which media houses in the MediaData database block AI crawlers.

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation

Sorry, something went wrong.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Signed-off-by: Kipruto <[email protected]>
Signed-off-by: Kipruto <[email protected]>
Signed-off-by: Kipruto <[email protected]>
Signed-off-by: Kipruto <[email protected]>
…idiadata-init

Signed-off-by: Kipruto <[email protected]>
Signed-off-by: Kipruto <[email protected]>
@kelvinkipruto kelvinkipruto marked this pull request as ready for review May 7, 2024 06:44
Signed-off-by: Kipruto <[email protected]>
@kilemensi
Copy link
Member

mediadata_ai_blocklist is a terrible name Mr. @kelvinkipruto ... we're 100% going to use this app for more than media data for one. My vote is naming it something related to checking whether website content data is accessible to bots e.g. accessbot, etc.

Copy link
Contributor

@koechkevin koechkevin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@thepsalmist thepsalmist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial thoughts LGTM,

  1. consistency in logging, in one/more files you've setup logging & you're using print() for logging/debugging? in multiple instances
  2. Interesting choice for using asyncio, from my earlier understanding thought a synchronous approach would have been straightforward. So could be helpful to add docs to help understand the process flow & motivations

Signed-off-by: Kipruto <[email protected]>
@kelvinkipruto kelvinkipruto changed the title MediaData AI Crawler Blocker Checker Content Blocker Bot May 15, 2024
Signed-off-by: Kipruto <[email protected]>
Signed-off-by: Kipruto <[email protected]>
Signed-off-by: Kipruto <[email protected]>
Signed-off-by: Kipruto <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants