Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for conda lock file #642

Open
wants to merge 32 commits into
base: master
Choose a base branch
from

Conversation

munishchouhan
Copy link
Member

@munishchouhan munishchouhan commented Sep 12, 2024

depends upon seqeralabs/libseqera#25
the above PR needs 'git revert d088604' before merging

This PR will add the following

  1. upload lockfile to bucket
  2. download lockfile from bucket
  3. link in build page

Signed-off-by: munishchouhan <[email protected]>
Signed-off-by: munishchouhan <[email protected]>
@munishchouhan munishchouhan linked an issue Sep 12, 2024 that may be closed by this pull request
@munishchouhan munishchouhan self-assigned this Sep 12, 2024
@munishchouhan munishchouhan marked this pull request as draft September 12, 2024 14:46
@pditommaso
Copy link
Contributor

Good start. To tell the truth, still not sure we should go ahead with this approach or just store the lock file in the surreal db like we are doing for the conda env, even tho it there's the possibility to have the same problem as #559

@munishchouhan
Copy link
Member Author

Good start. To tell the truth, still not sure we should go ahead with this approach or just store the lock file in the surreal db like we are doing for the conda env, even tho it there's the possibility to have the same problem as #559

#559 will be solved in surrealdb version 2.0.0

@pditommaso
Copy link
Contributor

I'm bit confused by this comment in the issue

SURREAL_HTTP_MAX_ML_BODY_SIZE (defaults to 4 GiB)
SURREAL_HTTP_MAX_SQL_BODY_SIZE (defaults to 1 MiB)
SURREAL_HTTP_MAX_RPC_BODY_SIZE (defaults to 4 MiB)
SURREAL_HTTP_MAX_KEY_BODY_SIZE (defaults to 16 KiB)
SURREAL_HTTP_MAX_SIGNUP_BODY_SIZE (defaults to 1 KiB)
SURREAL_HTTP_MAX_SIGNIN_BODY_SIZE (defaults to 1 KiB)
SURREAL_HTTP_MAX_IMPORT_BODY_SIZE (defaults to 4 GiB)

It seems suggesting the default sql body size is 1 MB, instead the error we are hitting is much smaller.

@munishchouhan
Copy link
Member Author

It seems suggesting the default sql body size is 1 MB, instead the error we are hitting is much smaller.

we are using /key routes, which is constrained by 16 KiB
SURREAL_HTTP_MAX_KEY_BODY_SIZE (defaults to 16 KiB)

@munishchouhan
Copy link
Member Author

if we use sql to store it then we can bypass this limit

@pditommaso
Copy link
Contributor

This sounds like a plan. please give it a try

@munishchouhan
Copy link
Member Author

This sounds like a plan. please give it a try

ok sure

@munishchouhan
Copy link
Member Author

munishchouhan commented Sep 13, 2024

There is an issue accessing the Conda lock file. The lock file is present in the generated image, not in the buildkit container we are running in Wave.
so either we need to pull the image in another pod and get the file or we need to generate conda lockfile from conda.yml file

cc @ewels @pditommaso

@munishchouhan
Copy link
Member Author

in latter case of generating conda lockfile from conda file, we still need another job to achieve that

@ewels
Copy link
Member

ewels commented Sep 13, 2024

Better to get the file from the container - I was trying to avoid generating the lock file separately because then there's no absolute guarantee that it'll end up the same as the actual environment. If it comes from the environment itself it's certain.

@ewels
Copy link
Member

ewels commented Sep 13, 2024

Can we print the lock file to stdout and then capture that from the build?

@munishchouhan
Copy link
Member Author

munishchouhan commented Sep 13, 2024

Can we print the lock file to stdout and then capture that from the build?

we can do this:

FROM {{base_image}}
COPY --chown=$MAMBA_USER:$MAMBA_USER conda.yml /tmp/conda.yml
RUN micromamba install -y -n base -f /tmp/conda.yml \
    {{base_packages}}
    && micromamba env export --explicit > environment.lock \
    && cat environment.lock
    && micromamba clean -a -y
RUN 
USER root
ENV PATH="$MAMBA_ROOT_PREFIX/bin:$PATH"

I ran it for conda package 'bwa'
i got this in the stdout

#10 12.24 # This file may be used to create an environment using:
#10 12.24 # $ conda create --name <env> --file <this file>
#10 12.24 # platform: linux-aarch64
#10 12.24 @EXPLICIT

Signed-off-by: munishchouhan <[email protected]>
@ewels
Copy link
Member

ewels commented Sep 15, 2024

Exactly - that works!

Were there a load of lines after the @EXPLICIT yeah?

We'd need to remove the line prefixes but that's all I think..

@pditommaso
Copy link
Contributor

A better approach (maybe) could be: 1) creating the container "locally"; 2) copy the lock file from the built container via buildkit; 3) uploading it to the registry.

Something similar is done for singularity, here.

@munishchouhan
Copy link
Member Author

A better approach (maybe) could be: 1) creating the container "locally"; 2) copy the lock file from the built container via buildkit; 3) uploading it to the registry.

Something similar is done for singularity, here.

Ok sure, I will try this one

final query = """\
INSERT into wave_conda_lock {
buildId: '$buildId',
condaLock = '$condaLock'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the lock file contains a ' ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point
I have not tested yet with surrealDB
I will use bytes datatype to save it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to byte[]

Signed-off-by: munishchouhan <[email protected]>
Signed-off-by: munishchouhan <[email protected]>
@munishchouhan
Copy link
Member Author

munishchouhan commented Sep 18, 2024

Tested locally:

Screenshot 2024-09-18 at 17 50 35 Screenshot 2024-09-18 at 17 41 10

logs:
Screenshot 2024-09-18 at 17 28 52

Signed-off-by: munishchouhan <[email protected]>
Signed-off-by: munishchouhan <[email protected]>
Signed-off-by: munishchouhan <[email protected]>
Signed-off-by: munishchouhan <[email protected]>
Signed-off-by: munishchouhan <[email protected]>
@munishchouhan munishchouhan marked this pull request as ready for review September 18, 2024 17:57
@munishchouhan
Copy link
Member Author

This PR can only be tested on dev once we release new wave-utils after merging this PR
seqeralabs/libseqera#25

@pditommaso
Copy link
Contributor

Do we have an estimation how big can be a lock file? If it's not too big it could be stored in the db as the conda env file.

@munishchouhan
Copy link
Member Author

not able to find anything on internet,

this is from chatgpt:

1. Conda Lock Files (conda-lock.yml)
Size Range: A typical Conda lock file can range from a few kilobytes to several megabytes, depending on the number of dependencies and platforms included.
Small environment (few dependencies): Can be under 100 KB.
Larger environments (many dependencies, or multi-platform): Can reach several MB (1-5 MB or more).

Real-World Examples:
Conda lock files for a moderately complex data science project (with dependencies like numpy, pandas, tensorflow, etc.) can easily be between 500 KB to 2 MB.
A web application with many dependencies across platforms (npm, pipenv, etc.) might generate a lock file of 2-5 MB.

@pditommaso
Copy link
Contributor

Umm, seems to big. @pinin4fjords do you any clue how big can be - on average - a conda lock file?

@ewels
Copy link
Member

ewels commented Sep 27, 2024

  • multiqc: 13K
  • bwa + samtools: 3.4K
  • fastqc: 8.2K
  • numpy + pandas: 4.8K
  • cuda: 20K

Struggled to get any huge lists of packages to build, so instead pulled out the largest environment.yml files in nf-core/modules:

They're still not that big, so I went back in time to some DSL1 pipelines, where we had a single conda environment for the entire pipeline:

@ewels
Copy link
Member

ewels commented Sep 27, 2024

So even with the biggest real-life conda environment that I could find, we're no-where near to the MB range..

@pditommaso
Copy link
Contributor

This is a good point to save the lock directly in the db. it would make it much simpler

@pinin4fjords
Copy link
Member

What is the upper limit this would place on the size of the locks? I think we all know that even if locks from nf-core are small, there will be that customer that does something funky to get us into the MB range.

@munishchouhan
Copy link
Member Author

What is the upper limit this would place on the size of the locks? I think we all know that even if locks from nf-core are small, there will be that customer that does something funky to get us into the MB range.

1MB

@pinin4fjords
Copy link
Member

What is the upper limit this would place on the size of the locks? I think we all know that even if locks from nf-core are small, there will be that customer that does something funky to get us into the MB range.

1MB

OK, then I'm going to out on a limb and suggest that this would eventually come back to bite us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for Conda lock file
4 participants