Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Telegram vectorize, added automated module creation! #310

Merged
merged 6 commits into from
Oct 22, 2024

Conversation

amindadgar
Copy link
Member

@amindadgar amindadgar commented Oct 22, 2024

Summary by CodeRabbit

  • New Features

    • Introduced the TelegramModules class for managing Telegram modules within a MongoDB database.
    • Enhanced TelegramPlatform functionality to return both community and platform IDs.
  • Bug Fixes

    • Updated assertions in tests to align with new return values from check_platform_existence and create_platform methods.
  • Documentation

    • Improved documentation for methods in TelegramPlatform to reflect new return types and clarify functionality.
  • Chores

    • Removed the TelegramUtils class to streamline platform management.

Copy link
Contributor

coderabbitai bot commented Oct 22, 2024

Walkthrough

The pull request involves the removal of the utility.py file containing the TelegramUtils class, which has been replaced by the introduction of a new module.py file that includes the TelegramModules class. The TelegramPlatform class in platform.py has been updated to change the return types and logic of its methods. Additionally, the hivemind_telegram_etl.py file has been modified to integrate these new classes, reflecting a shift in how platform-related functionalities are managed. Test cases for the TelegramPlatform class have also been updated accordingly.

Changes

File Path Change Summary
dags/hivemind_etl_helpers/src/db/telegram/utility.py Deleted file; removed class TelegramUtils.
dags/hivemind_etl_helpers/src/db/telegram/utils/__init__.py Added import: from .module import TelegramModules.
dags/hivemind_etl_helpers/src/db/telegram/utils/module.py Added class TelegramModules with methods for managing Telegram modules in MongoDB.
dags/hivemind_etl_helpers/src/db/telegram/utils/platform.py Updated check_platform_existence and create_platform methods to return tuples instead of single IDs.
dags/hivemind_etl_helpers/tests/integration/test_telegram_comminity.py Updated tests to reflect changes in return values and assertions related to check_platform_existence and create_platform.
dags/hivemind_telegram_etl.py Replaced TelegramUtils with TelegramModules and updated logic in chat_existence function.

Possibly related PRs

  • fix: updated to use telegram config! #300: Changes in hivemind_telegram_etl.py regarding the chat_existence function's logic and return type are related as they involve the TelegramPlatform class impacted by the removal of TelegramUtils.
  • fix: telegram vectorize - runtime bugs! #305: Modifications to the chat_existence function to return a dictionary instead of a tuple are relevant since the main PR removed the TelegramUtils class previously used in this context.
  • feat: Telegram vectorize handling the banned users! #306: Changes to the ExtractMessages class in messages.py that involve filtering out banned users are related to the main PR as they affect the overall message extraction process previously involving TelegramUtils.

🐰 "In the code where rabbits hop,
The TelegramUtils took a drop.
New modules bloom, with logic bright,
Managing chats, oh what a sight!
With platforms linked, the data flows,
In this rabbit hole, progress grows!" 🐇


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Outside diff range and nitpick comments (7)
dags/hivemind_etl_helpers/src/db/telegram/utils/platform.py (1)

Line range hint 24-47: LGTM with a minor typo correction.

The changes to check_platform_existence method look good. The return type, docstring, and logic have been updated consistently to return both the community and platform IDs.

There's a minor typo in the docstring:

-            the paltform id if available
+            the platform id if available
dags/hivemind_telegram_etl.py (1)

67-67: Add logging to confirm module creation.

Adding a logging statement after modules.create() can help track the successful creation of modules and assist in monitoring.

dags/hivemind_etl_helpers/src/db/telegram/utils/module.py (5)

65-65: Add return type annotation to _check_platform_existence method

Adding a return type annotation enhances code readability and maintainability.

Apply this diff to add the return type annotation:

-def _check_platform_existence(self):
+def _check_platform_existence(self) -> bool:

66-78: Update docstring to include Returns section

Including a Returns section in the docstring provides clarity on what the method returns.

Apply this diff to update the docstring:

     """
     check if the platform exist in a module holding the community id
+
+    Returns
+    -------
+    bool
+        True if the platform exists in the module; False otherwise.
     """
🧰 Tools
🪛 Ruff

78-78: Use bool(...) instead of True if ... else False

Replace with `bool(...)

(SIM210)


80-80: Add return type annotation to _add_platform_to_community method

Since the method returns a boolean, adding a return type annotation improves clarity.

Apply this diff to add the return type annotation:

-def _add_platform_to_community(self):
+def _add_platform_to_community(self) -> bool:

Also applies to: 98-98


81-99: Update docstring to include Returns section

Clarify the return value in the docstring for better understanding.

Apply this diff to update the docstring:

     """
     Having the community_id modules insert the platform into it
+
+    Returns
+    -------
+    bool
+        True if the platform was successfully added; False otherwise.
     """

26-29: Improve clarity of create method docstring

Rewriting the docstring can enhance readability and better explain the method's functionality.

Apply this diff to improve the docstring:

     """
-    create a module if not exists for community_id
-    else, add a platform into the module if not exist and else do nothing
+    Create a module for the community if it does not exist.
+    If the module exists but the platform is not associated, add the platform to the module.
+    If both exist, no action is taken.
     """
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 4e8fea8 and f477be6.

📒 Files selected for processing (6)
  • dags/hivemind_etl_helpers/src/db/telegram/utility.py (0 hunks)
  • dags/hivemind_etl_helpers/src/db/telegram/utils/init.py (1 hunks)
  • dags/hivemind_etl_helpers/src/db/telegram/utils/module.py (1 hunks)
  • dags/hivemind_etl_helpers/src/db/telegram/utils/platform.py (3 hunks)
  • dags/hivemind_etl_helpers/tests/integration/test_telegram_comminity.py (3 hunks)
  • dags/hivemind_telegram_etl.py (2 hunks)
💤 Files with no reviewable changes (1)
  • dags/hivemind_etl_helpers/src/db/telegram/utility.py
🧰 Additional context used
🪛 Ruff
dags/hivemind_etl_helpers/src/db/telegram/utils/__init__.py

1-1: .module.TelegramModules imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


2-2: .platform.TelegramPlatform imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

dags/hivemind_etl_helpers/src/db/telegram/utils/module.py

63-63: Use bool(...) instead of True if ... else False

Replace with `bool(...)

(SIM210)


78-78: Use bool(...) instead of True if ... else False

Replace with `bool(...)

(SIM210)

🔇 Additional comments (10)
dags/hivemind_etl_helpers/src/db/telegram/utils/platform.py (2)

Line range hint 49-75: LGTM! Changes are consistent and well-implemented.

The modifications to the create_platform method are well-executed:

  1. The return type has been correctly updated to a tuple of ObjectId.
  2. The docstring accurately reflects the new return values.
  3. The logic for creating a new platform and returning both community and platform IDs is correct and consistent with the method's purpose.

The use of ObjectId() to generate a new community ID and the unchanged platform creation logic are both appropriate.


Line range hint 1-75: Summary: Good refactoring, consider impact on dependent code.

The changes to TelegramPlatform class are well-implemented and align with the broader restructuring of Telegram-related utilities. The new return types for check_platform_existence and create_platform methods provide more comprehensive information about platform and community IDs.

These changes will impact other parts of the codebase that interact with these methods. Please ensure that all dependent code has been updated accordingly. You can use the following script to identify potential areas that might need updates:

dags/hivemind_etl_helpers/tests/integration/test_telegram_comminity.py (5)

26-26: Ensure Assertion Matches Updated Return Value

The assertion self.assertIsNone(result) correctly reflects that check_platform_existence() now returns None when no platform is available.


46-48: Verify Returned IDs Match Expected Values

The test appropriately checks that the created_community_id and created_platform_id returned by check_platform_existence() match the inserted community_id and platform_id.


99-101: Confirm Correct Return Values When No Matching Platforms Exist

The assertions ensure that both community_id and platform_id are None when check_platform_existence() finds no matching platforms, which is the expected behavior.


104-107: ⚠️ Potential issue

Add Assertion for community_id in Created Platform

Currently, only platform_id is checked for None. To fully verify the creation of the platform, consider adding an assertion to check that community_id is not None.

Apply this change to include the missing assertion:

 def test_create_platform():
     community_id, platform_id = self.telegram_platform.create_platform()

+    self.assertIsNotNone(community_id)
     self.assertIsNotNone(platform_id)

Likely invalid or redundant comment.


108-110: 🛠️ Refactor suggestion

Ensure Both IDs Are Verified After Fetching

After fetching the IDs using check_platform_existence(), the test currently verifies fetched_platform_id. To enhance the test, also assert that fetched_community_id equals the original community_id.

Apply this change to add the missing assertion:

 fetched_community_id, fetched_platform_id = self.telegram_platform.check_platform_existence()
+self.assertEqual(fetched_community_id, community_id)
 self.assertEqual(fetched_platform_id, platform_id)

Likely invalid or redundant comment.

dags/hivemind_telegram_etl.py (3)

10-10: Imports are updated correctly.

The new classes TelegramModules and TelegramPlatform are imported as expected.


56-57: Platform initialization and existence check look good.

The initialization of TelegramPlatform with chat_id and chat_name, and the retrieval of community_id and platform_id via check_platform_existence() are correctly implemented.


64-65: Platform creation logic is appropriate.

The platform is created when it does not exist, and the community_id and platform_id are correctly retrieved from create_platform().

dags/hivemind_telegram_etl.py Show resolved Hide resolved
dags/hivemind_etl_helpers/src/db/telegram/utils/module.py Outdated Show resolved Hide resolved
dags/hivemind_etl_helpers/src/db/telegram/utils/module.py Outdated Show resolved Hide resolved
dags/hivemind_etl_helpers/src/db/telegram/utils/module.py Outdated Show resolved Hide resolved
dags/hivemind_etl_helpers/src/db/telegram/utils/module.py Outdated Show resolved Hide resolved
dags/hivemind_etl_helpers/src/db/telegram/utils/module.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)
dags/hivemind_etl_helpers/src/db/telegram/utils/__init__.py (1)

1-1: LGTM! Consider defining __all__ for explicit API exposure.

The addition of TelegramModules import is appropriate, as it aligns with the changes mentioned in the PR summary.

Consider defining an __all__ list in this file to explicitly specify which classes should be part of the public API. This would make the package's interface more clear and address the static analysis tool's suggestion. For example:

from .module import TelegramModules
from .platform import TelegramPlatform

__all__ = ['TelegramModules', 'TelegramPlatform']

Note that the static analysis hint about TelegramModules being unused is likely a false positive in this context. In __init__.py files, it's common to import classes to expose them as part of the package's API without using them directly within the file.

🧰 Tools
🪛 Ruff

1-1: .module.TelegramModules imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

dags/hivemind_etl_helpers/src/db/telegram/utils/module.py (1)

26-29: Clarify the docstring for the create method

The docstring for the create method can be improved for clarity. It currently reads a bit awkwardly and may confuse readers.

Consider rephrasing it as:

-    """
-    create a module if not exists for community_id
-    else, add a platform into the module if not exist and else do nothing
-    """
+    """
+    Create a module for the community if it does not exist.
+    If the module exists, add the platform to the module if it's not already present.
+    """
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between f477be6 and 10556a3.

📒 Files selected for processing (3)
  • dags/hivemind_etl_helpers/src/db/telegram/utils/init.py (1 hunks)
  • dags/hivemind_etl_helpers/src/db/telegram/utils/module.py (1 hunks)
  • dags/hivemind_etl_helpers/tests/integration/test_telegram_comminity.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • dags/hivemind_etl_helpers/tests/integration/test_telegram_comminity.py
🧰 Additional context used
🪛 Ruff
dags/hivemind_etl_helpers/src/db/telegram/utils/__init__.py

1-1: .module.TelegramModules imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

dags/hivemind_etl_helpers/src/db/telegram/utils/module.py

63-63: Use bool(...) instead of True if ... else False

Replace with `bool(...)

(SIM210)


78-78: Use bool(...) instead of True if ... else False

Replace with `bool(...)

(SIM210)

🔇 Additional comments (4)
dags/hivemind_etl_helpers/src/db/telegram/utils/module.py (4)

49-49: Correct the parameter reference in the docstring

The docstring refers to chat_id, but it should be community_id to match the parameter used.


78-78: Simplify the return statement using bool()

You can make the return statement more concise by using bool(document) instead of True if document else False.

🧰 Tools
🪛 Ruff

78-78: Use bool(...) instead of True if ... else False

Replace with `bool(...)

(SIM210)


94-94: Simplify datetime assignments using datetime.now(timezone.utc)

You can simplify the datetime assignments by directly passing timezone.utc to datetime.now().

Also applies to: 117-118


41-41: Correct the typo in the logging message

There's a typo in the logging message at line 41. It should read "to the already existing community" for clarity.

dags/hivemind_etl_helpers/src/db/telegram/utils/module.py Outdated Show resolved Hide resolved
dags/hivemind_etl_helpers/src/db/telegram/utils/module.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 10556a3 and 76e98e7.

📒 Files selected for processing (2)
  • dags/hivemind_etl_helpers/src/db/telegram/utils/module.py (1 hunks)
  • dags/hivemind_etl_helpers/tests/integration/test_telegram_comminity.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • dags/hivemind_etl_helpers/tests/integration/test_telegram_comminity.py
🧰 Additional context used
📓 Learnings (1)
dags/hivemind_etl_helpers/src/db/telegram/utils/module.py (1)
Learnt from: amindadgar
PR: TogetherCrew/airflow-dags#310
File: dags/hivemind_telegram_etl.py:66-67
Timestamp: 2024-10-22T11:38:39.536Z
Learning: In the `TelegramModules` class defined in `dags/hivemind_etl_helpers/src/db/telegram/utils/module.py`, the `create()` method checks if a module already exists for the given `community_id` and `platform_id` before creating a new one.
🔇 Additional comments (4)
dags/hivemind_etl_helpers/src/db/telegram/utils/module.py (4)

1-24: LGTM: Imports and class definition are well-structured.

The imports are appropriate for the functionality, and the class definition follows good practices with clear type hints and docstrings.


48-79: LGTM: Well-implemented existence check methods.

The _check_module_existence and _check_platform_existence methods are concise, efficient, and follow good practices. The use of bool(document) is a clean way to convert the query result to a boolean.


81-99: 🛠️ Refactor suggestion

Simplify datetime assignments using datetime.now(timezone.utc)

The method looks good overall. However, you can simplify the datetime assignment by directly passing timezone.utc to datetime.now().

Apply this diff to simplify the datetime assignment:

-                "$set": {"updatedAt": datetime.now().replace(tzinfo=timezone.utc)},
+                "$set": {"updatedAt": datetime.now(timezone.utc)},

101-121: 🛠️ Refactor suggestion

Enhance _create_module method with error handling and return value

The _create_module method looks good, but it could be improved in a few ways:

  1. Simplify datetime assignments as mentioned earlier.
  2. Add error handling.
  3. Return a value indicating success or failure.

Here's a suggested implementation:

def _create_module(self) -> bool:
    try:
        result = self._client[self.database][self.collection].insert_one(
            {
                "name": "hivemind",
                "community": ObjectId(self.community_id),
                "options": {
                    "platforms": [
                        {
                            "platform": ObjectId(self.platform_id),
                            "name": "telegram",
                            "_id": ObjectId(),
                        }
                    ]
                },
                "createdAt": datetime.now(timezone.utc),
                "updatedAt": datetime.now(timezone.utc),
            }
        )
        return result.acknowledged
    except Exception as e:
        logging.error(f"Error creating module: {str(e)}")
        return False

This implementation simplifies the datetime assignments, adds error handling, and returns a boolean indicating whether the insertion was successful.

@amindadgar amindadgar merged commit 01aea01 into main Oct 22, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant