Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Fix units configuration docs for bam files #104

Merged
merged 1 commit into from
Jul 17, 2024

Conversation

fxwiegand
Copy link
Contributor

Just a quick PR to update the docs for the recent changes from #94.

Copy link
Collaborator

@dlaehnemann dlaehnemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One thing we might think about, is making the units.tsv also work with just one of those options' columns present. That way, we wouldn't have to always drag around a bunch of empty columns. But this might entail changing quite a bit of code whenever these columns are called. What do you think?

@fxwiegand
Copy link
Contributor Author

LGTM. One thing we might think about, is making the units.tsv also work with just one of those options' columns present. That way, we wouldn't have to always drag around a bunch of empty columns. But this might entail changing quite a bit of code whenever these columns are called. What do you think?

I absolutely see the point you're making. We would need to make the pandas calls on these lines a more robust so they can handle when the bam columns are not present:

return fq2_not_present and bam_paired_not_present
def get_fastqs(wildcards):
"""Get raw FASTQ files from unit sheet."""
if not pd.isnull(units.loc[(wildcards.sample, wildcards.unit), "bam_single"]):
return f"results/fastq/{wildcards.sample}-{wildcards.unit}.fq.gz"
elif not pd.isnull(units.loc[(wildcards.sample, wildcards.unit), "bam_paired"]):
fqfrombam1 = f"results/fastq/{wildcards.sample}-{wildcards.unit}.1.fq.gz"
fqfrombam2 = f"results/fastq/{wildcards.sample}-{wildcards.unit}.2.fq.gz"
return [fqfrombam1, fqfrombam2]
elif is_single_end(wildcards.sample, wildcards.unit):
return units.loc[(wildcards.sample, wildcards.unit), "fq1"]
else:
u = units.loc[(wildcards.sample, wildcards.unit), ["fq1", "fq2"]].dropna()
return [f"{u.fq1}", f"{u.fq2}"]
def get_all_fastqs(wildcards):
for item in units[["sample", "unit"]].itertuples():

Besides that these lookup calls from snakemake would cause trouble. I am not sure how they handle missing columns.

@dlaehnemann
Copy link
Collaborator

You can nowadays offer a default="" argument in the lookup() function, to capture those: https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#the-lookup-function. We just have to enforce the latest Snakemake version, then. Would be willing to create a new PR referencing this discussion? Then we can continue discussing there, as well.

@dlaehnemann dlaehnemann merged commit 9b193dc into main Jul 17, 2024
5 checks passed
@dlaehnemann dlaehnemann deleted the fxwiegand-patch-1 branch July 17, 2024 10:24
@fxwiegand
Copy link
Contributor Author

You can nowadays offer a default="" argument in the lookup() function, to capture those: https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#the-lookup-function. We just have to enforce the latest Snakemake version, then. Would be willing to create a new PR referencing this discussion? Then we can continue discussing there, as well.

Yeah i will give it shot and we will discuss further in the new PR! This would also be quite nice for saving the versioning because now i think about it #94 was indeed a breaking change since the config needs the new bam columns for the workflow to run.

@dlaehnemann
Copy link
Collaborator

Yeah, I think you are right. But I have honestly not payed enough attention to what's a breaking change in these workflows in the past. But we should. This regularly trips up users (us included). 😅

@fxwiegand
Copy link
Contributor Author

Yeah, I think you are right. But I have honestly not payed enough attention to what's a breaking change in these workflows in the past. But we should. This regularly trips up users (us included). 😅

Yep indeed - @Addimator #105 will let you remove the bam columns from your config again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants