-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Added logs to identify which platform is being processed! #338
Conversation
WalkthroughThe changes in the Changes
Possibly related PRs
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (2)
dags/discourse_analyzer_etl.py (2)
Line range hint
71-86
: Fix incorrect type annotation in docstringThe docstring incorrectly shows
platform_id
asdatetime
type, but it's actually used as astr
throughout the code.``` { - 'platform_id' : datetime, + 'platform_id' : str, 'id' : str, 'period' : datetime, 'recompute': bool,
Line range hint
109-152
: Consider enhancing logging strategyWhile adding platform identification logs is helpful, consider implementing a more comprehensive logging strategy:
- Add structured logging with consistent fields across all log messages
- Include timing information for performance monitoring
- Add log correlation IDs to track entire platform processing flow
Example structured logging pattern:
logging.info( "Processing platform data", extra={ "platform_id": platform_id, "operation": "etl_raw_data", "correlation_id": f"{platform_id}_{period.isoformat()}", } )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
dags/analyzer_helper/discourse/transform_raw_members.py (1)
27-27
: Consider adding initial log message for total count.While the progress logging is helpful, consider adding an initial log message before the loop to show the total number of members to be processed. This provides immediate visibility of the workload.
def transform(self, raw_members: list) -> list: transformed_members = [] + logging.info(f"Starting to process {len(raw_members)} raw members") for idx, member in enumerate(raw_members):
dags/analyzer_helper/discourse/transform_raw_data.py (1)
120-121
: Consider logging enhancements for better visibility.
- Add an initial log message showing total entries to be processed
- Consider logging subtasks (reactions, replies) for more detailed progress tracking
def transform(self, raw_data: list) -> list: transformed_data = [] + total_entries = len(raw_data) + logging.info(f"Starting to process {total_entries} raw data entries") for idx, entry in enumerate(raw_data): # Create main post entry transformed_data.append(self.create_data_entry(entry)) + logging.debug(f"Processed main post for entry {idx + 1}") # Create entries for reactions for reaction in entry["reactions"]: transformed_data.append( self.create_data_entry( entry, interaction_type="reaction", interaction_user=int(reaction), ) ) + if entry["reactions"]: + logging.debug(f"Processed {len(entry['reactions'])} reactions for entry {idx + 1}") # Create entry for reply if entry["replied_post_id"]: transformed_data.append( self.create_data_entry( entry, interaction_type="reply", interaction_user=int(entry["replied_post_user_id"]), ) ) + logging.debug(f"Processed reply for entry {idx + 1}") - logging.info(f"Preparing raw data: {idx + 1}/{len(raw_data)}") + logging.info(f"Preparing raw data: {idx + 1}/{total_entries}")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
dags/analyzer_helper/discourse/transform_raw_data.py
(3 hunks)dags/analyzer_helper/discourse/transform_raw_members.py
(1 hunks)
🔇 Additional comments (3)
dags/analyzer_helper/discourse/transform_raw_members.py (1)
22-26
: LGTM! Good use of enumerate for progress tracking.
The change to use enumerate provides a clean way to track progress without affecting the existing functionality.
dags/analyzer_helper/discourse/transform_raw_data.py (2)
1-2
: LGTM! Clean import addition.
The logging import is correctly placed at the top of the file.
95-95
: LGTM! Good use of enumerate for progress tracking.
The change to use enumerate provides a clean way to track progress without affecting the existing functionality.
Summary by CodeRabbit
platform_info
parameter.