-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: wrong collection name assignment in ingestion pipeline! #307
Conversation
The community_id was assigned to collection_name before but now it is updated to actual "`community_id`_`platform_name`" docstore is also updated to be as before which is "docstore_`community_id`"
WalkthroughThe changes in the pull request focus on modifications to the Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
dags/hivemind_etl_helpers/ingestion_pipeline.py (2)
35-35
: LGTM! Consider a minor improvement for clarity.The change correctly implements the collection name format as described in the PR objectives. This ensures unique collection names for each community and platform combination.
For improved clarity, consider renaming the
collection_name
parameter toplatform_name
in the method signature, as it now represents the platform rather than the full collection name. This would make the code more self-explanatory:def __init__(self, community_id: str, platform_name: str, testing: bool = False): # ... self.collection_name = f"{community_id}_{platform_name}" self.platform_name = platform_name # ...
66-66
: LGTM! Consider a minor improvement for consistency.The changes correctly implement the new collection naming convention and revert the docstore naming as described in the PR objectives. The modifications ensure consistency across different parts of the ingestion pipeline.
For improved consistency, consider updating the
namespace
parameter in theMongoDocumentStore.from_uri
call to useself.platform_name
instead ofself.platform_name
:docstore=MongoDocumentStore.from_uri( uri=get_mongo_uri(), db_name=f"docstore_{self.community_id}", namespace=self.platform_name, ),This change would make the usage of
platform_name
consistent throughout the class.Also applies to: 76-76, 82-82
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- dags/hivemind_etl_helpers/ingestion_pipeline.py (3 hunks)
🧰 Additional context used
🔇 Additional comments (2)
dags/hivemind_etl_helpers/ingestion_pipeline.py (2)
66-66
: LGTM! Typo correction improves logging quality.The correction of "docuemnts" to "documents" in the logging message is a good catch. While it doesn't affect functionality, it improves the quality and professionalism of the logging output.
Line range hint
1-165
: Overall, the changes successfully address the PR objectives and improve code consistency.The modifications to the
CustomIngestionPipeline
class correctly implement the new collection naming convention and revert the docstore naming as described in the PR objectives. These changes ensure consistency across different parts of the ingestion pipeline and fix the previously incorrect collection name assignment.Key improvements:
- The
collection_name
is now correctly constructed using bothcommunity_id
andplatform_name
.- The docstore naming convention has been reverted to use
community_id
.- The ingestion cache collection name has been simplified and is now consistent with the new
collection_name
format.- A typo in a logging message has been corrected.
These changes enhance the overall quality and consistency of the code while addressing the specific issues mentioned in the PR objectives.
community_id
_platform_name
"community_id
"Summary by CodeRabbit
Bug Fixes
Improvements