Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to override repo using an HTML script #137

Open
wants to merge 5 commits into
base: staging
Choose a base branch
from

Conversation

Miniland1333
Copy link
Member

@Miniland1333 Miniland1333 commented Jan 8, 2021

Allows for pages to have a different repo instead of just default-env. This is useful for binder-plugin pages that need repositories with nonstandard plugins or that need to access data files in the repo.

Edit 1/10/21: BLOCKED by LibreTexts/metalc/issues/213

Allows for pages to have a different repo instead of just default-env.

Are there any security concerns raised by this change?
@Miniland1333 Miniland1333 added enhancement New feature or request question Further information is requested labels Jan 8, 2021
@Miniland1333 Miniland1333 self-assigned this Jan 8, 2021
@Miniland1333
Copy link
Member Author

I will likely need to also add something relating to ref as well.

binderRepoConfig contents will be JSON formatted to allow overriding of binderOptions.
@Miniland1333
Copy link
Member Author

image
Good, it looks like it is parsing the HTML/JSON into a usable configuration object.

@moorepants
Copy link
Member

I'm not sure we are ready for this. We are specifically limiting to default-env on our binder so that things are as snappy as they can be. My preffered solution to this is to support a way to have users add to default-env via a docker dependency and repo2docker features. I'm -1 to making this addition until we discuss and make a better plan for support.

@moorepants
Copy link
Member

These are two issues where this should be discuss and decided:

LibreTexts/metalc#85

LibreTexts/metalc#146

@Miniland1333
Copy link
Member Author

FYI, the package that raised this issue/PR is psi4 since it is only available through conda for some reason. I can definitely see data files being layered into Docker, though I don't have anywhere near enough experience with Repo2docker about how you would do layered dependency management and resolution. If you all can figure that out, then yes I agree that layering on top of default-env is quite promising.

I've done the file layering before with Docker and it is as easy as using the FROM and COPY commands to add onto an existing Docker image.

@moorepants
Copy link
Member

If someone wants to install a custom package they can do so in the first cells on their page.

@moorepants
Copy link
Member

The instructions for installing psi4numpy are here: https://github.com/psi4/psi4numpy#getting-started. If a user needs this package they should put those commands in the first cell of the page on libretexts or first cell in the notebook on their jupyterhub. That is a more than adequate solution for a package like that on this service. If you'd like us to add support for this, I recommend you submit a PR here: https://github.com/conda-forge/staged-recipes to add it to conda forge.

@moorepants
Copy link
Member

I opened this for you: psi4/psi4numpy#107

@Miniland1333
Copy link
Member Author

The instructions for installing psi4numpy are here: https://github.com/psi4/psi4numpy#getting-started. If a user needs this package they should put those commands in the first cell of the page on libretexts or first cell in the notebook on their jupyterhub. That is a more than adequate solution for a package like that on this service. If you'd like us to add support for this, I recommend you submit a PR here: https://github.com/conda-forge/staged-recipes to add it to conda forge.

Looking at the "how to incorporate data" guide you all made, I was able to deduce that adding an exclamation point to the beginning of a command to make it run at the terminal-level. I was not aware of this functionality in python but it now makes sense why you were telling me to put non-python commands into the code blocks.

Anyway, I whipped up a demo page here and it accepts the conda command but seems to be taking forever to resolve the packages. I will keep this thread updated if I make progress, but currently I'm not sure why the conda install is so slow.

@Miniland1333
Copy link
Member Author

image
It's been solving environment for a half hour now so maybe I am doing something wrong. I'll try again tomorrow.

@rkevin-arch
Copy link
Member

Replace conda with mamba and it should work

@rkevin-arch
Copy link
Member

Still can't get psi4 to install, because apparently it requires an older version of mpc (1.0.1) that's seemingly unavailable? mamba install -yc psi4 mpc=1.0.1 complains that Problem: nothing provides requested mpc 1.0.1**, while mamba install -yc psi4 mpc=1.1.0 works perfectly fine. This is a problem that only psi4 devs can fix, I think.

@Miniland1333
Copy link
Member Author

I was able to swap the psi4 branch to psi4/label/dev and we are now in business! Look at all that physical chemistry!
image

I will consider this issue partially fixed since my psi4-specific usage is handled. However, there is still significant room for improvement before you would want people using this for general use:

  1. The other languages aside from python may not have the ability to pass commands to the shell, so this trick wouldn't work there.
  2. Related issues Create a way for a user to specify a repo2docker config file for any given Jupyter Enabled Libretexts page metalc#85 and Users need to be able to install custom persistent packages metalc#146 need to be considered.
  3. Since this is done user-side in the code, it does not have the caching speedup that Jason's docker-layering approach would have. It took about 3 minutes for mamba to resolve for this package, and without caching a user would have to do this every time they restart the kernel.

@rkevin-arch thanks for your help debugging!

@moorepants
Copy link
Member

moorepants commented Jan 10, 2021

I was not aware of this functionality in python

This is not python functionally, but jupyter functionality. If users want to use the jupyter interface they will have to learn about jupyter from the jupyter documentation.

The other languages aside from python may not have the ability to pass commands to the shell, so this trick wouldn't work there.

As far as I know, you can call system commands from all languages using the language specific approach or the jupyter functionality.

Since this is done user-side in the code, it does not have the caching speedup that Jason's docker-layering approach would have. It took about 3 minutes for mamba to resolve for this package, and without caching a user would have to do this every time they restart the kernel.

This is an issue being addressed at the conda/mamba level. We can't do much about it. A users can install a package however they want though (compiling from source, whatever).

As a last note, I don't think we should spend a great deal of time trying to install difficult-to-install packages for users, nor teaching users how to install these things. It is really out of scope and users that want advanced things will need figure it out themselves. Our approach, if it installs with apt on ubuntu 18.04 or with conda from conda forge, we will install it for you. Otherwise, you are on your own. But we can offer a bit nicer experience for users to include repo2docker configuration files in the future if they want to do advanced installs, but then again, we shouldn't be helping them do the installs (it is a dark hole to go into).

@moorepants
Copy link
Member

However, there is still significant room for improvement before you would want people using this for general use:

One last thing. I don't think we want people using this for general use. It just happens to be that a user can self install things and if they can figure out how to do it, then they can use it. I'd call custom software installs an "unsupported feature" of our system.

@moorepants
Copy link
Member

I opened this issue: LibreTexts/metalc#215

@Miniland1333
Copy link
Member Author

I'm not sure we are ready for this. We are specifically limiting to default-env on our binder so that things are as snappy as they can be. My preffered solution to this is to support a way to have users add to default-env via a docker dependency and repo2docker features. I'm -1 to making this addition until we discuss and make a better plan for support.

By the way, is what you are saying here is that rather than allowing any arbitrary github repo is that you instead want for the repo to always extend default-env?

@Miniland1333
Copy link
Member Author

I'm not sure we are ready for this. We are specifically limiting to default-env on our binder so that things are as snappy as they can be. My preffered solution to this is to support a way to have users add to default-env via a docker dependency and repo2docker features. I'm -1 to making this addition until we discuss and make a better plan for support.

By the way, is what you are saying here is that rather than allowing any arbitrary github repo is that you instead want for the repo to always extend default-env?

If so, then Docker layering is very nice: https://github.com/Miniland1333/pibsolver. Note how the dockerfile is able to add additional dependencies and files on top of default-env with just a few lines. And of course since default-env is cached because everyone uses it the operation is very quick.

@moorepants
Copy link
Member

By the way, is what you are saying here is that rather than allowing any arbitrary github repo is that you instead want for the repo to always extend default-env?

If we want the load times to stay low, then we need to ensure default-env is the core docker image that is loaded always. Otherwise we are back to slow loads because we allow any arbitrary docker image. Otherwise we have to figure out how to consistently cache any docker images associated with libretexts pages.

@Miniland1333
Copy link
Member Author

By the way, is what you are saying here is that rather than allowing any arbitrary github repo is that you instead want for the repo to always extend default-env?

If we want the load times to stay low, then we need to ensure default-env is the core docker image that is loaded always. Otherwise we are back to slow loads because we allow any arbitrary docker image. Otherwise we have to figure out how to consistently cache any docker images associated with libretexts pages.

This theoretically makes a lot of sense, though it looks like practical usage is currently blocked by LibreTexts/metalc/issues/213. Based on my testing of https://github.com/Miniland1333/pibsolver, this is because while layering on top of default-env is quick, the pushing and pulling to dockerhub is very bandwidth-intensive due to the size of default-env. It had to download 4.8GB inbound and then upload 6.26GB up. A local registry will allow for the default-env pull to be cached and will make the push of the finished repo2docker very fast since it never leaves the internal network.

Let's put this issue on hold for now since when I first opened this issue a few days ago I wasn't expecting such a rabbit hole of problems. Still, it has already been very educational on my end and with your help I was able to get psi4 working as a proof-of-concept for adding new dependencies.

BLOCKED by LibreTexts/metalc/issues/213 as of 1/10/21.

@Miniland1333 Miniland1333 added the blocked Item is waiting on another issue or pull request to be completed first label Jan 11, 2021
@rkevin-arch
Copy link
Member

Two things:

  1. If "layering on top of default-env" means writing a Dockerfile that contains FROM libretexts/default-env:something, then caching ensures we don't have to download/upload as much data. Dockerhub will automatically recognize some layers are the same when uploading, and our machines will automatically recognize some layers are the same when downloading. However, this means they essentially have to write their own Dockerfiles, and cannot use any other repo2docker features like just specifying an environment.yml (sihce that causes repo2docker to rebuild, creating a brand new image where none of the layers are the same)
  2. I think getting a local registry working would definitely help quite a bit, and also getting [Feature request] Prelaunching specified repositories jupyterhub/binderhub#1167 working would also help us specify a list of textbook-author-maintained repos that we should automatically build and prepull on all our machines. If we get these two working, I don't see why not let textbook authors supply their own image, and we can add that to an approved list of images to use. Just my two cents.

@moorepants
Copy link
Member

Thanks Kevin. Your two points help show that this isn't a trivial problem to solve correctly. If we do solve this (which I'm not sure we will because it isn't top priority), I want to do in correctly and it a way that we can actually help end users with their needs in a sane way. We have to keep in mind that every feature we add, we have to support. There are already features in ckeditor plguin that exist but don't really work. They need to be fixed before adding new features. We have to realize that adding support for complex custom installs is really only for a very small percentage of people using libretexts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked Item is waiting on another issue or pull request to be completed first enhancement New feature or request question Further information is requested
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants