Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exporting an environment #45

Closed
limx0 opened this issue Feb 14, 2018 · 6 comments
Closed

Exporting an environment #45

limx0 opened this issue Feb 14, 2018 · 6 comments

Comments

@limx0
Copy link

limx0 commented Feb 14, 2018

Is it worth adding a couple of util scripts that would export the users current environment (conda or virtual env) and builds a docker image on top of daskdev/dask:latest?

Being able to pass a list of conda/pip packages is fine for relatively simple environments / prototyping but I can see value in something slightly more stable. Building a new image (Shouldn't need to be done too often) will increase the connection time to the KubeCluster, but will reduce the worker start up time.

I have some basic POC of this, which I am currently using, which looks something like

  • On Jupyterlab, build my conda env / validate locally in notebook
  • Export the environment
  • Build a docker image on top of daskdev/dask:latest, using something like
dockerfile_template = (
    'FROM daskdev/dask:latest\n'
    'ADD {environment_file} /opt/app/environment.yml\n'
    'RUN /opt/conda/bin/conda env update -n dask -f /opt/app/environment.yml && \ \n'
    '    conda clean -tipsy'
)

def build_publish_dockerfile(context_dir, dockerfile_txt, tag):
    with pathlib.Path(os.getcwd()).joinpath('dockerfile').open('w') as f:
        f.write(dockerfile_txt)
    client.images.build(
        path='.', dockerfile='dockerfile', tag='%s/%s' % (DOCKER_HUB_REPO, tag), nocache=True
    )


def image_from_conda_env(env_name, tag, conda_bin='conda'):
    with tempfile.TemporaryDirectory() as tmp_dir:
        env_file = pathlib.Path(tmp_dir).joinpath('environment.yml')
        export_conda_env(env_name, env_file, conda_bin)
        dockerfile = dockerfile_template.format(env_file)
        build_publish_dockerfile(tmp_dir, dockerfile_txt=dockerfile, tag=tag)

image_from_conda_env('myenv', 'dask-worker-myenv')

k = KubeCluster(image='dask-worker-myenv')

Is this in the works? Or any thoughts on the above?

@mrocklin
Copy link
Member

I agree that something like this would be useful. Some random thoughts:

  1. To keep the image small you may want to base it off of continuumio/miniconda3 rather than daskdev/dask
  2. The approach in https://github.com/jupyter/repo2docker might be interesting, either to use directly or to use as inspiration

@limx0
Copy link
Author

limx0 commented Feb 14, 2018

Thats very useful, thanks @mrocklin. Closing for now.

@limx0 limx0 closed this as completed Feb 14, 2018
@mrocklin
Copy link
Member

Lets keep it open for now if that's alright. This is a valuable goal to keep in mind.

@mrocklin mrocklin reopened this Feb 14, 2018
@mrocklin
Copy link
Member

@jcrist 's conda-pack might also be of interest.

Ideally for the workers we can even drop conda and just distribute a zip of an environment.

@mturok
Copy link

mturok commented Apr 16, 2018

+1 - especially for an approach like the conda-pack method, which feels a bit lighter weight, if possibly less generic.

@jacobtomlinson
Copy link
Member

The classic KubeCluster was removed in #890. All users will need to migrate to the Dask Operator. Closing.

@jacobtomlinson jacobtomlinson closed this as not planned Won't fix, can't repro, duplicate, stale Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants