fix: Move NA values to end of table #124

fxwiegand · 2024-10-16T08:30:08Z

Summary by CodeRabbit

New Features
- Enhanced data processing and visualization capabilities for comparing datasets.
- Added interactive exploration features in the visual output.
- Improved handling of null values in sorting operations.
- Standardized logging format across multiple rules, replacing notebook logs with log files.
Bug Fixes
- Ensured clarity in output by renaming columns to reflect effects associated with specified labels.
- Enhanced filtering and joining logic for datasets.

coderabbitai · 2024-10-16T08:32:32Z

Walkthrough

The changes to the compare_diffexp.py script involve enhancements to data processing and visualization logic. The prepare function now selects specific columns, focusing on the first "b_" column. Datasets are joined on target_id and ext_gene, and minimum values for qval and pval are computed. The results are filtered based on a qval_min threshold, and columns are renamed accordingly. Additionally, null values in pi_value are handled during sorting, and an Altair chart is constructed for interactive data visualization.

Changes

File Path	Change Summary
workflow/scripts/compare_diffexp.py	Enhanced data processing by updating the `prepare` function to select specific columns and compute minimum values for `qval` and `pval`. Implemented null value handling in sorting and constructed an Altair chart for visualization.
workflow/rules/meta_comparisons.smk	Updated logging mechanism in rules `meta_compare_diffexp`, `meta_compare_enrichment`, and `meta_compare_pathways` to replace Jupyter notebook logs with standard log files.

Poem

In the meadow where data flows,
A script now dances, as knowledge grows.
With colors bright and points that gleam,
Nulls at the end, a coder's dream.
Join the datasets, let insights play,
In the world of numbers, hop away! 🐇✨

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 5f8dc17 and 58d7cd5.

📒 Files selected for processing (1)

workflow/scripts/compare_diffexp.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

workflow/scripts/compare_diffexp.py

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)

workflow/scripts/compare_diffexp.py (1)

49-62: Improvements in data handling and sorting

The changes in this segment effectively address the PR objective of moving NA values to the end of the table. Here are the key points:

The added print statement at line 49 may be useful for debugging. Consider removing it or wrapping it in a debug condition for production use.

Handling of NaN values for 'pi_value' and 'difference' columns (lines 55-56) improves data quality and prevents potential issues in downstream operations.

The sorting modification with nulls_last=True (line 61) ensures that NA values are moved to the end of the table, aligning with the PR objective.

These changes enhance the robustness of the data processing pipeline.

Consider wrapping the print statement in a debug condition or removing it for production use.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between f594b15 and dca9161.

📒 Files selected for processing (1)

workflow/scripts/compare_diffexp.py (2 hunks)

🧰 Additional context used

🪛 Ruff

workflow/scripts/compare_diffexp.py

76-76: SyntaxError: Expected a statement

76-77: SyntaxError: Expected a statement

🔇 Additional comments (1)

workflow/scripts/compare_diffexp.py (1)
Line range hint 1-150: Overall improvements and addition of data visualization

The changes in this file effectively address the PR objective of moving NA values to the end of the table. The modifications to the data processing pipeline, particularly the handling of NaN values and the sorting with nulls_last=True, achieve this goal.

Additionally, the implementation of an Altair chart significantly enhances the data visualization capabilities of the script. This is a valuable addition, although it wasn't explicitly mentioned in the PR objectives.

The changes meet the stated PR objective and provide additional functionality through data visualization.

To ensure that the Altair chart implementation doesn't introduce any new dependencies or conflicts, please run the following script:

This script will help verify that Altair is properly imported and listed as a dependency in the project.

✅ Verification successful

To ensure that Altair is properly imported and listed as a dependency, please run the following updated script:

This script corrects the previous command and expands the search to additional dependency files to verify that Altair is properly included in the project dependencies.

To further verify the Altair import and dependency management, please run the following updated script:

This script adjusts the rg command to search for Python files based on their .py extension and performs a broader search for any references to Altair across all files in the repository. Additionally, it lists all Python files to help understand the project's structure and dependency management approach.

Dependency Verification Successful

The verification confirms that Altair is correctly imported in multiple scripts and is listed in the project's environment YAML files (QC.yaml, pystats.yaml). This ensures that Altair dependencies are properly managed.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify Altair-related imports and check for any new dependencies

# Test: Check for Altair import in other files
rg --type python "import altair as alt" --files-without-match workflow/scripts/compare_diffexp.py

# Test: Check for any new dependencies in the project
if [ -f "requirements.txt" ]; then
    grep "altair" requirements.txt
elif [ -f "environment.yml" ]; then
    grep "altair" environment.yml
else
    echo "No requirements.txt or environment.yml found. Please ensure Altair is properly listed as a dependency."
fi
Length of output: 404

Script:
#!/bin/bash
# Description: Verify Altair-related imports and check for any new dependencies

# Corrected Test: Check for Altair import in other Python files
rg -t python "import altair as alt" --files-without-match workflow/scripts/compare_diffexp.py

# Test: Check for Altair in common dependency files
DEPENDENCY_FILES=("requirements.txt" "environment.yml" "setup.py" "pyproject.toml" "Pipfile")

found_dependency=0

for file in "${DEPENDENCY_FILES[@]}"; do
    if [ -f "$file" ]; then
        grep "altair" "$file"
        if [ $? -eq 0 ]; then
            found_dependency=1
            break
        fi
    fi
done

if [ $found_dependency -eq 0 ]; then
    echo "Altair is not listed in any recognized dependency files. Please ensure it is added appropriately."
fi
Length of output: 791

Script:
#!/bin/bash
# Description: Verify Altair-related imports and check for any new dependencies

# Corrected Test: Search for Altair import in all Python files by extension
rg "import altair as alt" --glob "*.py" --files-with-matches

# Comprehensive Test: Search for Altair in all files to identify dependency declarations
grep -r "altair" .

# Optionally, list all Python files to understand the repository structure
fd -e py
Length of output: 1478

coderabbitai · 2024-10-16T13:38:56Z

workflow/scripts/compare_diffexp.py

+)
+print(combined)


⚠️ Potential issue

Syntax error and additional print statement

There are two issues in this segment:

There's an extra closing parenthesis at line 76, which is causing a syntax error. This needs to be removed.

Another print statement is added at line 77, which might be useful for debugging but could clutter output in production.

To fix the syntax error and address the print statement, apply the following changes:

-) -print(combined) +print(combined) # Consider removing or wrapping in a debug condition

This will resolve the syntax error and maintain the print statement. However, consider removing the print statement or wrapping it in a debug condition for production use.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

)

print(combined)

print(combined) # Consider removing or wrapping in a debug condition

🧰 Tools

🪛 Ruff

76-76: SyntaxError: Expected a statement

76-77: SyntaxError: Expected a statement

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)

workflow/rules/meta_comparisons.smk (2)

70-70: LGTM! Completes consistent logging update.

This change finalizes the transition from .ipynb to .log format for all three rules in this file. The consistency in logging format across meta_compare_diffexp, meta_compare_enrichment, and meta_compare_pathways rules is commendable.

Consider updating any documentation or README files to reflect this change in logging format. This will help maintain clear and up-to-date information for other developers or users of the workflow.

Line range hint 14-70: Overall changes improve logging, but direct connection to PR objective is unclear.

The consistent change from .ipynb to .log format across all three rules (meta_compare_diffexp, meta_compare_enrichment, and meta_compare_pathways) improves the logging mechanism and potentially simplifies log parsing. While this could indirectly support the PR objective of moving NA values to the end of tables by making log analysis easier, the direct connection is not immediately apparent from these changes alone.

Consider adding a comment in the file or updating the PR description to clarify how these logging changes contribute to the overall objective of moving NA values to the end of the table. This would help reviewers and future maintainers understand the full context and impact of these modifications.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between dca9161 and 5f8dc17.

📒 Files selected for processing (2)

workflow/rules/meta_comparisons.smk (3 hunks)
workflow/scripts/compare_diffexp.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

workflow/scripts/compare_diffexp.py

🧰 Additional context used

🔇 Additional comments (2)

workflow/rules/meta_comparisons.smk (2)
45-45: LGTM! Consistent change across rules.

The modification from .ipynb to .log format for logging is consistent with the change in the meta_compare_diffexp rule. This uniformity in logging format across rules is a good practice.

14-14: LGTM! Consider verifying downstream log processing.

The change from .ipynb to .log format for logging is appropriate and aligns with standard practices. This modification should make log parsing easier, potentially supporting the PR's objective of reorganizing table data.

To ensure this change doesn't break any downstream processes, please run the following script:

✅ Verification successful

Verified: No remaining references to .ipynb log format found.

The change from .ipynb to .log format has been thoroughly verified. No references to the old .ipynb log format exist in the codebase, ensuring that downstream processes remain unaffected.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Check for any references to the old .ipynb log format in the codebase

# Search for references to the old .ipynb log format
rg --type python --type yaml --type snakemake "logs/meta_compare_diffexp/.*\.ipynb"

# If the above command returns any results, those files may need to be updated to use the new .log format
Length of output: 116

Script:
#!/bin/bash
# Description: Check for any references to the old .ipynb log format in the codebase

# Search for references to the old .ipynb log format
rg --type py --type yaml --type snakemake 'logs/meta_compare_diffexp/.*\.ipynb'

# If the above command returns any results, those files may need to be updated to use the new .log format
Length of output: 115

Script:
#!/bin/bash
# Description: Check for any references to the old .ipynb log format in the codebase

# Define snakemake file type
rg --type-add 'snakemake:*.smk' --type snakemake 'logs/meta_compare_diffexp/.*\.ipynb'
Length of output: 88

dlaehnemann · 2024-10-18T08:25:35Z

workflow/scripts/compare_diffexp.py

+ .with_columns(
+ pl.col("pi_value").fill_nan(None),
+ pl.col("difference").fill_nan(None),
+ )


This seems convoluted, but at least it's well documented:
https://docs.pola.rs/user-guide/expressions/missing-data/#notanumber-or-nan-values

dlaehnemann

LGTM.

Not sure though, why the CodeRabbit summary is so verbose...

🤖 I have created a release *beep* *boop* --- ## [2.8.2](v2.8.1...v2.8.2) (2024-10-18) ### Bug Fixes * Convert pval and qval to heatmap ([#123](#123)) ([478c759](478c759)) * Move NA values to end of table ([#124](#124)) ([df807a5](df807a5)) * Update max in memory rows for diffexp report ([#122](#122)) ([830e678](830e678)) ### Performance Improvements * Update to latest datavzrd wrapper ([#125](#125)) ([3a4c020](3a4c020)) * Update to latest datavzrd wrapper version ([#120](#120)) ([7b2767c](7b2767c)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

fix: Move NA values to end of table

fb9239a

fxwiegand added 2 commits October 16, 2024 13:17

Update compare_diffexp.py

f594b15

Update compare_diffexp.py with dbg prints

dca9161

fxwiegand added the WIP label Oct 16, 2024

coderabbitai bot reviewed Oct 16, 2024

View reviewed changes

fxwiegand added 3 commits October 17, 2024 12:22

Update compare_diffexp.py

69ef4d2

Update compare_diffexp.py

4776153

refactor: Make meta comparison use proper logs

5f8dc17

coderabbitai bot reviewed Oct 17, 2024

View reviewed changes

Update compare_diffexp.py

58d7cd5

fxwiegand removed the WIP label Oct 18, 2024

fxwiegand requested a review from johanneskoester October 18, 2024 08:11

dlaehnemann reviewed Oct 18, 2024

View reviewed changes

dlaehnemann approved these changes Oct 18, 2024

View reviewed changes

dlaehnemann merged commit df807a5 into main Oct 18, 2024
5 of 7 checks passed

dlaehnemann deleted the fxwiegand-patch-3 branch October 18, 2024 08:56

github-actions bot mentioned this pull request Oct 18, 2024

chore(main): release 2.8.2 #121

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Move NA values to end of table #124

fix: Move NA values to end of table #124

fxwiegand commented Oct 16, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 16, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Oct 16, 2024

coderabbitai bot left a comment

dlaehnemann Oct 18, 2024

dlaehnemann left a comment

	)
	print(combined)
	print(combined) # Consider removing or wrapping in a debug condition

fix: Move NA values to end of table #124

fix: Move NA values to end of table #124

Conversation

fxwiegand commented Oct 16, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Oct 16, 2024 • edited Loading

Walkthrough

Changes

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Oct 16, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

dlaehnemann Oct 18, 2024

Choose a reason for hiding this comment

dlaehnemann left a comment

Choose a reason for hiding this comment

fxwiegand commented Oct 16, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 16, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)