-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Telegram vectorize, added automated module creation! #310
Conversation
this was required for the hivemind-bot.
WalkthroughThe pull request involves the removal of the Changes
Possibly related PRs
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Outside diff range and nitpick comments (7)
dags/hivemind_etl_helpers/src/db/telegram/utils/platform.py (1)
Line range hint
24-47
: LGTM with a minor typo correction.The changes to
check_platform_existence
method look good. The return type, docstring, and logic have been updated consistently to return both the community and platform IDs.There's a minor typo in the docstring:
- the paltform id if available + the platform id if availabledags/hivemind_telegram_etl.py (1)
67-67
: Add logging to confirm module creation.Adding a logging statement after
modules.create()
can help track the successful creation of modules and assist in monitoring.dags/hivemind_etl_helpers/src/db/telegram/utils/module.py (5)
65-65
: Add return type annotation to_check_platform_existence
methodAdding a return type annotation enhances code readability and maintainability.
Apply this diff to add the return type annotation:
-def _check_platform_existence(self): +def _check_platform_existence(self) -> bool:
66-78
: Update docstring to includeReturns
sectionIncluding a
Returns
section in the docstring provides clarity on what the method returns.Apply this diff to update the docstring:
""" check if the platform exist in a module holding the community id + + Returns + ------- + bool + True if the platform exists in the module; False otherwise. """🧰 Tools
🪛 Ruff
78-78: Use
bool(...)
instead ofTrue if ... else False
Replace with `bool(...)
(SIM210)
80-80
: Add return type annotation to_add_platform_to_community
methodSince the method returns a boolean, adding a return type annotation improves clarity.
Apply this diff to add the return type annotation:
-def _add_platform_to_community(self): +def _add_platform_to_community(self) -> bool:Also applies to: 98-98
81-99
: Update docstring to includeReturns
sectionClarify the return value in the docstring for better understanding.
Apply this diff to update the docstring:
""" Having the community_id modules insert the platform into it + + Returns + ------- + bool + True if the platform was successfully added; False otherwise. """
26-29
: Improve clarity ofcreate
method docstringRewriting the docstring can enhance readability and better explain the method's functionality.
Apply this diff to improve the docstring:
""" - create a module if not exists for community_id - else, add a platform into the module if not exist and else do nothing + Create a module for the community if it does not exist. + If the module exists but the platform is not associated, add the platform to the module. + If both exist, no action is taken. """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (6)
- dags/hivemind_etl_helpers/src/db/telegram/utility.py (0 hunks)
- dags/hivemind_etl_helpers/src/db/telegram/utils/init.py (1 hunks)
- dags/hivemind_etl_helpers/src/db/telegram/utils/module.py (1 hunks)
- dags/hivemind_etl_helpers/src/db/telegram/utils/platform.py (3 hunks)
- dags/hivemind_etl_helpers/tests/integration/test_telegram_comminity.py (3 hunks)
- dags/hivemind_telegram_etl.py (2 hunks)
💤 Files with no reviewable changes (1)
- dags/hivemind_etl_helpers/src/db/telegram/utility.py
🧰 Additional context used
🪛 Ruff
dags/hivemind_etl_helpers/src/db/telegram/utils/__init__.py
1-1:
.module.TelegramModules
imported but unused; consider removing, adding to__all__
, or using a redundant alias(F401)
2-2:
.platform.TelegramPlatform
imported but unused; consider removing, adding to__all__
, or using a redundant alias(F401)
dags/hivemind_etl_helpers/src/db/telegram/utils/module.py
63-63: Use
bool(...)
instead ofTrue if ... else False
Replace with `bool(...)
(SIM210)
78-78: Use
bool(...)
instead ofTrue if ... else False
Replace with `bool(...)
(SIM210)
🔇 Additional comments (10)
dags/hivemind_etl_helpers/src/db/telegram/utils/platform.py (2)
Line range hint
49-75
: LGTM! Changes are consistent and well-implemented.The modifications to the
create_platform
method are well-executed:
- The return type has been correctly updated to a tuple of
ObjectId
.- The docstring accurately reflects the new return values.
- The logic for creating a new platform and returning both community and platform IDs is correct and consistent with the method's purpose.
The use of
ObjectId()
to generate a new community ID and the unchanged platform creation logic are both appropriate.
Line range hint
1-75
: Summary: Good refactoring, consider impact on dependent code.The changes to
TelegramPlatform
class are well-implemented and align with the broader restructuring of Telegram-related utilities. The new return types forcheck_platform_existence
andcreate_platform
methods provide more comprehensive information about platform and community IDs.These changes will impact other parts of the codebase that interact with these methods. Please ensure that all dependent code has been updated accordingly. You can use the following script to identify potential areas that might need updates:
dags/hivemind_etl_helpers/tests/integration/test_telegram_comminity.py (5)
26-26
: Ensure Assertion Matches Updated Return ValueThe assertion
self.assertIsNone(result)
correctly reflects thatcheck_platform_existence()
now returnsNone
when no platform is available.
46-48
: Verify Returned IDs Match Expected ValuesThe test appropriately checks that the
created_community_id
andcreated_platform_id
returned bycheck_platform_existence()
match the insertedcommunity_id
andplatform_id
.
99-101
: Confirm Correct Return Values When No Matching Platforms ExistThe assertions ensure that both
community_id
andplatform_id
areNone
whencheck_platform_existence()
finds no matching platforms, which is the expected behavior.
104-107
:⚠️ Potential issueAdd Assertion for
community_id
in Created PlatformCurrently, only
platform_id
is checked forNone
. To fully verify the creation of the platform, consider adding an assertion to check thatcommunity_id
is notNone
.Apply this change to include the missing assertion:
def test_create_platform(): community_id, platform_id = self.telegram_platform.create_platform() + self.assertIsNotNone(community_id) self.assertIsNotNone(platform_id)
Likely invalid or redundant comment.
108-110
: 🛠️ Refactor suggestionEnsure Both IDs Are Verified After Fetching
After fetching the IDs using
check_platform_existence()
, the test currently verifiesfetched_platform_id
. To enhance the test, also assert thatfetched_community_id
equals the originalcommunity_id
.Apply this change to add the missing assertion:
fetched_community_id, fetched_platform_id = self.telegram_platform.check_platform_existence() +self.assertEqual(fetched_community_id, community_id) self.assertEqual(fetched_platform_id, platform_id)
Likely invalid or redundant comment.
dags/hivemind_telegram_etl.py (3)
10-10
: Imports are updated correctly.The new classes
TelegramModules
andTelegramPlatform
are imported as expected.
56-57
: Platform initialization and existence check look good.The initialization of
TelegramPlatform
withchat_id
andchat_name
, and the retrieval ofcommunity_id
andplatform_id
viacheck_platform_existence()
are correctly implemented.
64-65
: Platform creation logic is appropriate.The platform is created when it does not exist, and the
community_id
andplatform_id
are correctly retrieved fromcreate_platform()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (2)
dags/hivemind_etl_helpers/src/db/telegram/utils/__init__.py (1)
1-1
: LGTM! Consider defining__all__
for explicit API exposure.The addition of
TelegramModules
import is appropriate, as it aligns with the changes mentioned in the PR summary.Consider defining an
__all__
list in this file to explicitly specify which classes should be part of the public API. This would make the package's interface more clear and address the static analysis tool's suggestion. For example:from .module import TelegramModules from .platform import TelegramPlatform __all__ = ['TelegramModules', 'TelegramPlatform']Note that the static analysis hint about
TelegramModules
being unused is likely a false positive in this context. In__init__.py
files, it's common to import classes to expose them as part of the package's API without using them directly within the file.🧰 Tools
🪛 Ruff
1-1:
.module.TelegramModules
imported but unused; consider removing, adding to__all__
, or using a redundant alias(F401)
dags/hivemind_etl_helpers/src/db/telegram/utils/module.py (1)
26-29
: Clarify the docstring for thecreate
methodThe docstring for the
create
method can be improved for clarity. It currently reads a bit awkwardly and may confuse readers.Consider rephrasing it as:
- """ - create a module if not exists for community_id - else, add a platform into the module if not exist and else do nothing - """ + """ + Create a module for the community if it does not exist. + If the module exists, add the platform to the module if it's not already present. + """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- dags/hivemind_etl_helpers/src/db/telegram/utils/init.py (1 hunks)
- dags/hivemind_etl_helpers/src/db/telegram/utils/module.py (1 hunks)
- dags/hivemind_etl_helpers/tests/integration/test_telegram_comminity.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- dags/hivemind_etl_helpers/tests/integration/test_telegram_comminity.py
🧰 Additional context used
🪛 Ruff
dags/hivemind_etl_helpers/src/db/telegram/utils/__init__.py
1-1:
.module.TelegramModules
imported but unused; consider removing, adding to__all__
, or using a redundant alias(F401)
dags/hivemind_etl_helpers/src/db/telegram/utils/module.py
63-63: Use
bool(...)
instead ofTrue if ... else False
Replace with `bool(...)
(SIM210)
78-78: Use
bool(...)
instead ofTrue if ... else False
Replace with `bool(...)
(SIM210)
🔇 Additional comments (4)
dags/hivemind_etl_helpers/src/db/telegram/utils/module.py (4)
49-49
: Correct the parameter reference in the docstringThe docstring refers to
chat_id
, but it should becommunity_id
to match the parameter used.
78-78
: Simplify the return statement usingbool()
You can make the return statement more concise by using
bool(document)
instead ofTrue if document else False
.🧰 Tools
🪛 Ruff
78-78: Use
bool(...)
instead ofTrue if ... else False
Replace with `bool(...)
(SIM210)
94-94
: Simplify datetime assignments usingdatetime.now(timezone.utc)
You can simplify the datetime assignments by directly passing
timezone.utc
todatetime.now()
.Also applies to: 117-118
41-41
: Correct the typo in the logging messageThere's a typo in the logging message at line 41. It should read "to the already existing community" for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- dags/hivemind_etl_helpers/src/db/telegram/utils/module.py (1 hunks)
- dags/hivemind_etl_helpers/tests/integration/test_telegram_comminity.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- dags/hivemind_etl_helpers/tests/integration/test_telegram_comminity.py
🧰 Additional context used
📓 Learnings (1)
dags/hivemind_etl_helpers/src/db/telegram/utils/module.py (1)
Learnt from: amindadgar PR: TogetherCrew/airflow-dags#310 File: dags/hivemind_telegram_etl.py:66-67 Timestamp: 2024-10-22T11:38:39.536Z Learning: In the `TelegramModules` class defined in `dags/hivemind_etl_helpers/src/db/telegram/utils/module.py`, the `create()` method checks if a module already exists for the given `community_id` and `platform_id` before creating a new one.
🔇 Additional comments (4)
dags/hivemind_etl_helpers/src/db/telegram/utils/module.py (4)
1-24
: LGTM: Imports and class definition are well-structured.The imports are appropriate for the functionality, and the class definition follows good practices with clear type hints and docstrings.
48-79
: LGTM: Well-implemented existence check methods.The
_check_module_existence
and_check_platform_existence
methods are concise, efficient, and follow good practices. The use ofbool(document)
is a clean way to convert the query result to a boolean.
81-99
: 🛠️ Refactor suggestionSimplify datetime assignments using
datetime.now(timezone.utc)
The method looks good overall. However, you can simplify the datetime assignment by directly passing
timezone.utc
todatetime.now()
.Apply this diff to simplify the datetime assignment:
- "$set": {"updatedAt": datetime.now().replace(tzinfo=timezone.utc)}, + "$set": {"updatedAt": datetime.now(timezone.utc)},
101-121
: 🛠️ Refactor suggestionEnhance
_create_module
method with error handling and return valueThe
_create_module
method looks good, but it could be improved in a few ways:
- Simplify datetime assignments as mentioned earlier.
- Add error handling.
- Return a value indicating success or failure.
Here's a suggested implementation:
def _create_module(self) -> bool: try: result = self._client[self.database][self.collection].insert_one( { "name": "hivemind", "community": ObjectId(self.community_id), "options": { "platforms": [ { "platform": ObjectId(self.platform_id), "name": "telegram", "_id": ObjectId(), } ] }, "createdAt": datetime.now(timezone.utc), "updatedAt": datetime.now(timezone.utc), } ) return result.acknowledged except Exception as e: logging.error(f"Error creating module: {str(e)}") return FalseThis implementation simplifies the datetime assignments, adds error handling, and returns a boolean indicating whether the insertion was successful.
Summary by CodeRabbit
New Features
TelegramModules
class for managing Telegram modules within a MongoDB database.TelegramPlatform
functionality to return both community and platform IDs.Bug Fixes
check_platform_existence
andcreate_platform
methods.Documentation
TelegramPlatform
to reflect new return types and clarify functionality.Chores
TelegramUtils
class to streamline platform management.