Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Di 1112 exomiser ranking fix #11

Merged
merged 10 commits into from
Sep 16, 2024
Merged

Di 1112 exomiser ranking fix #11

merged 10 commits into from
Sep 16, 2024

Conversation

kjwinfield
Copy link
Collaborator

@kjwinfield kjwinfield commented Sep 2, 2024

Correct issue where variants in GEL tiering page could be duplicated onto exomiser tiering page.
Prev version shows variants ranked 1, 2, and 3
image
although 2 and 3 are already in GEL tiering page

New versions shows 1, 5, and 8, which is correct:
image


This change is Reviewable

@pep8speaks
Copy link

pep8speaks commented Sep 2, 2024

Hello @kjwinfield! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 802:47: W605 invalid escape sequence '_'

Comment last updated at 2024-09-16 14:53:21 UTC

Copy link

@Addy81 Addy81 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 6 files at r1.
Reviewable status: 1 of 6 files reviewed, 3 unresolved discussions (waiting on @kjwinfield)


resources/home/dnanexus/get_variant_info.py line 349 at r1 (raw file):

        top = []

        ordered_list = dict(sorted(ranked.items(), key=lambda item: item[1]))

this is not a list if you are casting it back to a dict?

Could you also add some comments in this function. I'm struggling to visualise what the input dict is. Could you add an example in your docstring and some comments to explain your logic?


resources/home/dnanexus/make_workbook.py line 801 at r1 (raw file):

            # columns ending _x)
            d = {}
            for col in self.var_df.columns:

I think you can skip some of this by setting suffices in your colum so it only renames one then you can drop the other if you don't want it:

image.png


resources/home/dnanexus/make_workbook.py line 816 at r1 (raw file):

            ranks = ex_df['Priority'].to_dict()
            for k,v in ranks.items():
                ranks[k] = int(v.split(' ')[-1].split('.')[0])

why is this needed? is this {index: int(exomiser rank)} the ranks dict when first created or the intended dict?

Copy link
Collaborator Author

@kjwinfield kjwinfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 6 files reviewed, 3 unresolved discussions (waiting on @Addy81)


resources/home/dnanexus/get_variant_info.py line 349 at r1 (raw file):

Previously, Addy81 (Adriana) wrote…

this is not a list if you are casting it back to a dict?

Could you also add some comments in this function. I'm struggling to visualise what the input dict is. Could you add an example in your docstring and some comments to explain your logic?

Done.


resources/home/dnanexus/make_workbook.py line 801 at r1 (raw file):

Previously, Addy81 (Adriana) wrote…

I think you can skip some of this by setting suffices in your colum so it only renames one then you can drop the other if you don't want it:

image.png

Done.


resources/home/dnanexus/make_workbook.py line 816 at r1 (raw file):

Previously, Addy81 (Adriana) wrote…

why is this needed? is this {index: int(exomiser rank)} the ranks dict when first created or the intended dict?

Done. hopefully this explains it better. the current df has "Exomiser Rank 1.0" when we just want the integer to pass to get_top_3_ranked which expects a dict of {index in df: integer denoting rank}

Copy link

@Addy81 Addy81 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 6 files at r1.
Reviewable status: 2 of 6 files reviewed, 4 unresolved discussions (waiting on @kjwinfield)


resources/home/dnanexus/get_variant_info.py line 389 at r3 (raw file):

                else:
                    break
        return top

if not returning top, the docstring feels now to not match the function?

if you don't need gold silver and bronze anymore, could this be done with a df and a sort/select values out instead of all the if statements?

Copy link

@Addy81 Addy81 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 6 files reviewed, 4 unresolved discussions (waiting on @kjwinfield)


resources/home/dnanexus/get_variant_info.py line 389 at r3 (raw file):

Previously, Addy81 (Adriana) wrote…

if not returning top, the docstring feels now to not match the function?

if you don't need gold silver and bronze anymore, could this be done with a df and a sort/select values out instead of all the if statements?

Sorry that was meant to say if only returning top

Copy link
Collaborator Author

@kjwinfield kjwinfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 6 files reviewed, 4 unresolved discussions (waiting on @Addy81)


resources/home/dnanexus/get_variant_info.py line 389 at r3 (raw file):

Previously, Addy81 (Adriana) wrote…

Sorry that was meant to say if only returning top

Done

Copy link
Contributor

@jethror1 jethror1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 6 files at r1, all commit messages.
Reviewable status: 3 of 6 files reviewed, 5 unresolved discussions (waiting on @Addy81 and @kjwinfield)


resources/home/dnanexus/get_variant_info.py line 348 at r4 (raw file):

        df = df.copy()
        # First change "Exomiser Rank #" string to int
        df['Priority'] = df['Priority'].map(

I would generally avoid modifying your underlying data and then trying to return its state, and just create a new column that you can drop at the end, something like:

df['priority_as_int'] = df['Priority'].map(
       lambda x: int(x.split(' ')[-1].split('.')[0])
    )

...

df.drop(['priority_as_int'], axis=1, inplace=True)

Code quote:

        # First change "Exomiser Rank #" string to int
        df['Priority'] = df['Priority'].map(

Copy link
Contributor

@jethror1 jethror1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r5, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @Addy81 and @kjwinfield)

@jethror1 jethror1 merged commit 2d23676 into main Sep 16, 2024
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants