Skip to content
This repository has been archived by the owner on Nov 30, 2023. It is now read-only.

Add R - Jupyter - R Markdown - Data Science - Machine Learning DevConatiner #1314

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

R-icntay
Copy link

An environment to perform Data Science and Machine Learning in R with support for .R scripts, Jupyter Notebooks, and R Markdown Notebooks.

@Chuxel
Copy link
Member

Chuxel commented Feb 21, 2022

There is an existing R, Jupyter datascience, and Anaconda definition in this repository. https://github.com/microsoft/vscode-dev-containers/tree/main/containers/r, https://github.com/microsoft/vscode-dev-containers/tree/main/containers/python-3-anaconda, and https://github.com/microsoft/vscode-dev-containers/tree/main/containers/jupyter-datascience-notebooks. Is there a reason we need to create a new one verses adapting what is there?

//cc @dynamicwebpaige as well for feedback along with @kmehant and @eitsupi on the existing definition.

@eitsupi
Copy link
Contributor

eitsupi commented Feb 22, 2022

Since this is a pure extension of the R definition, I don't think it is appropriate to include it in this repository.

@R-icntay
Copy link
Author

Since this is a pure extension of the R definition, I don't think it is appropriate to include it in this repository.

@eitsupi yes yes. It is an extension of the R definition to include Jupyter Notebooks support. I had a really hard time trying to run Jupyter Notebooks on the existing R container.

Perhaps I could do a PR directly into the R definition instead of an entirely new definition?

@R-icntay
Copy link
Author

There is an existing R, Jupyter datascience, and Anaconda definition in this repository. https://github.com/microsoft/vscode-dev-containers/tree/main/containers/r, https://github.com/microsoft/vscode-dev-containers/tree/main/containers/python-3-anaconda, and https://github.com/microsoft/vscode-dev-containers/tree/main/containers/jupyter-datascience-notebooks. Is there a reason we need to create a new one verses adapting what is there?

//cc @dynamicwebpaige as well for feedback along with @kmehant and @eitsupi on the existing definition.

@Chuxel, the reason for this definition is that we were in a situation where we wanted an out of the box container to run .R, .Rmd and Jupyter Notebooks for students without much fuss/tweaking of the existing definitions. Hence the reason for the PR. cc @leestott

@eitsupi
Copy link
Contributor

eitsupi commented Feb 22, 2022

Perhaps I could do a PR directly into the R definition instead of an entirely new definition?

Like the definitions of other languages, the definition of R is intended to include only the bare essentials.

For example, my personal preference is the tidyverse packages, so I use edited container definitions like the following for my own use.
https://github.com/eitsupi/r-ver/blob/be6add710fc31f77d70ddb6547c2d3ed6f10d574/.devcontainer/Dockerfile

I think it is a good idea to create a template repository or maintain documentation on VSCode Remote-Containers.
If you are willing to help, you could post such a document on the Rocker website.
https://github.com/rocker-org/website

If my understanding is correct, the VSCode Remote-Containers team are currently working on a mechanism to easily download third party container definitions in https://github.com/microsoft/dev-container-spec.
Once that is in place, it is possible that you could include such definitions in another repository that is not here, and make it easier to download them.

@leestott
Copy link

@Chuxel So as @R-icntay mention the reason for this definition is that we were in a situation where we want an out of the box container to run .R, .Rmd and Jupyter Notebooks for students and educator workshops, We are in the process of launching some R modules on MS Learn + Learn R Jupyter Sandbox created.

Our longer term we want a R+Jupyter+DataScience+Machine DevContainer image which will NOT require and tweaking of the existing definitions as this is a request from the Edu Community to have a preconfigured R+DataScience+ML image for Tinyverse/Tidy Model etc.

Hence the reason for the PR. cc @dynamicwebpaige @eitsupi

@Chuxel
Copy link
Member

Chuxel commented Feb 23, 2022

Yeah, reading through more, this has quite a bit of opinion in it that makes it a bit more than a "definition" in this repository. For example, things like gitlens are quality of life extensions rather than something to enable a scenario. VS Code has settings sync to allow you to pull across these types of extensions from your personal preferences, so for a general scenario, including them in a definition is a bit counterproductive - and can actually irritate developers. e.g. ... this is a lot:

"ionutvmi.path-autocomplete",
"usernamehw.errorlens",
"mhutchie.git-graph",
"tomoki1207.pdf",
"DavidAnson.vscode-markdownlint",
"Rubymaniac.vscode-paste-and-indent",
"GrapeCity.gc-excelviewer",
"IBM.output-colorizer",
"Mohamed-El-Fodil-Ihaddaden.shinysnip",
"hediet.vscode-drawio",
"MS-vsliveshare.vsliveshare-pack",
"eamodio.gitlens",
"GitHub.vscode-pull-request-github"

I understand that this could make sense if a particular curriculum drove the desire to have these present, but part of me wonders whether what you are describing is more along the lines of the vscode-remote-try-* repositories (as I think eitsupi was also mentioning). Different curriculums are going to push someone towards or away from adding these extensions - they add more UX that can actually cause confusion for people new to VS Code (our experience has actually been that educators often want to remove default VS Code features more than add more).

A GitHub template repository can have much more opinion in it based on the assumptions about the curriculum than something designed to drop into an arbitrary project like these are intended to do. You can then click to create a repo with the opinion in it.

No objections to R + Jupyter... more trying to dig into intent given this extensions list. Is there perhaps a happy medium here? Otherwise using a template repository could be as or more effective if the desire is to be very opinionated.

@dynamicwebpaige
Copy link

Thank you for tagging me into this issue, @Chuxel! And thank you to @R-icntay for proposing an R-centric devcontainer definition. I agree that the current R devcontainer is a bit spare; and that the tidyverse data analysis tools might not mesh well with an environment for pure R language development.

To @Chuxel's point, above: adding too many extensions might result in performance degradations, or conflicts for shortcuts / hotkeys when using VS Code. Would it be possible just to include the R-centric and data analysis extensions in this devcontainer (as an example, the extension for Shiny snippets, but to remove the git-centric, Live Share, and formatting extensions?

A couple of additional questions:

  • Are you using VS Code as your primary IDE for R development?
    • If yes: have you been experiencing issues when attempting to use R kernels with VS Code notebooks?
    • If no: what other IDE(s) have you been using? Am assuming RStudio, and Jupyter / JupyterLab.
  • Will this devcontainer be in support of a data science course?

cc: @tanmayeekamath, as this devcontainer might be useful for the genomics team to review, once it has been created.

@eitsupi
Copy link
Contributor

eitsupi commented Feb 23, 2022

so for a general scenario, including them in a definition is a bit counterproductive

My understanding is that extensions like GitLens (which I also use all the time!) should be defined in remote.containers.defaultExtensions in each person's settings.json, and only extensions used by the whole project should be included in devcontainer.json.
https://code.visualstudio.com/docs/remote/containers#_always-installed-extensions

However, I think many users are unaware of this feature and try to include everything in devcontainer.json...

Jupyter

I prefer RMarkdown to ipynb and haven't written R in Jupyter on VSCode for a long time, although I do use VSCode Jupyter when writing Python.
I agree that this is a great extension, but note that even the devcontainer definitions for Python do not have jupyter packages installed.

@eitsupi
Copy link
Contributor

eitsupi commented Feb 23, 2022

I tried about installing jupyter. It is enough to add the following contents to the existing Dockerfile.

RUN apt-get update && apt-get -y install \
    libzmq3-dev \
    && apt-get autoremove -y && apt-get clean -y && rm -rf /var/lib/apt/lists/* \
    && install2.r --error --skipinstalled --ncpus -1 IRkernel \
    && rm -rf /tmp/downloaded_packages \
    && python3 -m pip --no-cache-dir install jupyter \
    && R --vanilla -s -e 'IRkernel::installspec(user = FALSE)'

How about adding this to the documentation?

This is the first time I've touched R with Jupyter since the Jupyter extension became VSCode Native Notebook, and it was a pretty good experience.

However, there seems to be a problem that R variables do not show up in either jupyter variables or the vscode-R's R workspace. (microsoft/vscode-jupyter#5264)

image

@renkun-ken As the primary developer of vscode-R, do you have any thoughts?

@R-icntay
Copy link
Author

R-icntay commented Feb 23, 2022

@Chuxel , @dynamicwebpaige. Thank you for getting back with great feedback on this.

Yes, I must admit, there was a bit of an overkill with the extensions. It would be possible to remove the extensions suggested by @Chuxel and everything would work just fine. In the devcontainer.json, they had been commented as // Other extensions that make life a little bit easier right off the bat

@dynamicwebpaige, we wanted to use the devcontainer for workshops relating to an upcoming R course on Microsoft Learn. Ideally we want to use VS Code and VS Code Notebooks (since .ipynb is supported on Microsoft Learn). From what we have gathered from the learners community, part of making it easier for learners to ramp up on R and in extension R + VS Code is an out of the box environment where students can start running R code in no time. I only had a bit of a hiccup in setting up the R kernel for VS Code Notebooks in a devcontainer, but on a local machine it's really easy to do so. I use RStudio but am really enjoying VS Code. Tagging @leestott in case I missed anything.

@R-icntay
Copy link
Author

R-icntay commented Feb 23, 2022

RUN apt-get update && apt-get -y install
libzmq3-dev
&& apt-get autoremove -y && apt-get clean -y && rm -rf /var/lib/apt/lists/*
&& install2.r --error --skipinstalled --ncpus -1 IRkernel
&& rm -rf /tmp/downloaded_packages
&& python3 -m pip --no-cache-dir install jupyter
&& R --vanilla -s -e 'IRkernel::installspec(user = FALSE)
How about adding this to the documentation?

Thank you @eitsupi. Much neater implementation than the one I did. It would be great to have it documented somewhere. As a new user to docker and everything, it took a while to figure out how to make the R kernel visible to Jupyter.

@eitsupi
Copy link
Contributor

eitsupi commented Feb 23, 2022

we wanted to use the devcontainer for workshops relating to an upcoming R course on Microsoft Learn.

Note that the Rocker project has an image called rocker/binder with a large number of packages installed on rocker/r-ver, including IRkernel and tidyverse.
List of installed packages. https://github.com/rocker-org/rocker-versioned2/wiki/binder_6cc03d713ae1

This image is so huge that it may contain packages that are unnecessary for many users, but may be a good option for learning purposes.
RStudio Server and JupyterLab are already installed, and users can use either IDE.

If you want to use this with VSCode Remote-Containers, I think you just need to rewrite rocker/r-ver to rocker/binder and change the user name to rstudio in the R definition's Dockerfile and devcontainer.json.
(We may need to adjust the part where installing radian since that image use venv)

cc @cboettig

@Chuxel
Copy link
Member

Chuxel commented Feb 24, 2022

I merged in the updated comments in #1320 given the discussion here.

Note that the Rocker project has an image called rocker/binder ... I think you just need to rewrite rocker/r-ver to rocker/binder

@R-icntay @eitsupi There is also a way where you can wire up an option for Notebook support that shows up in VS Code "Add Dev Container Config UX" based on a comment in the Dockerfile. We want to formalize this a bit more as we move forward on some of the repository proposals mentioned above, but it's in heavy use in the repo. What you can do is the following:

# [Option] Enable Notebook support
ARG ENABLE_JUPYTER=false
RUN if [ "${ENABLE_JUPYTER}" = "true" ]; then
      apt-get update && apt-get -y install libzmq3-dev \
      && apt-get autoremove -y && apt-get clean -y && rm -rf /var/lib/apt/lists/* \
      && install2.r --error --skipinstalled --ncpus -1 IRkernel \
      && rm -rf /tmp/downloaded_packages \
      && python3 -m pip --no-cache-dir install jupyter \
      && R --vanilla -s -e 'IRkernel::installspec(user = FALSE);
    fi

devcontainer.json then lists this as a build arg, and the UX will present the option and update it as appropriate.

{
    "build": {
        "dockerfile": "Dockerfile",
        "args": {
            "ENABLE_JUPYTER":  "false"
        }
  }
}

You can also do this with the image - the Dockerfile could include the following:

# [Choice] Start with a minimal image (r-ver) or full image (binder): r-ver, binder
ARG VARIANT=r-ver
FROM rocker/${VARIANT}

The UX will update the "VARIANT" in devcontainer.json automatically based on what you pick. This is how all the version and image variants work for things like the Python definition: https://github.com/microsoft/vscode-dev-containers/blob/main/containers/python-3/.devcontainer/Dockerfile

@eitsupi
Copy link
Contributor

eitsupi commented Feb 24, 2022

# [Option] Enable Notebook support
ARG ENABLE_JUPYTER=false
RUN if [ "${ENABLE_JUPYTER}" = "true" ]; then
      apt-get update && apt-get -y install libzmq3-dev \
      && apt-get autoremove -y && apt-get clean -y && rm -rf /var/lib/apt/lists/* \
      && install2.r --error --skipinstalled --ncpus -1 IRkernel \
      && rm -rf /tmp/downloaded_packages \
      && python3 -m pip --no-cache-dir install jupyter \
      && R --vanilla -s -e 'IRkernel::installspec(user = FALSE);
    fi

devcontainer.json then lists this as a build arg, and the UX will present the option and update it as appropriate.

{
    "build": {
        "dockerfile": "Dockerfile",
        "args": {
            "ENABLE_JUPYTER":  "false"
        }
  }
}

You can also do this with the image - the Dockerfile could include the following:

# [Choice] Start with a minimal image (r-ver) or full image (binder): r-ver, binder
ARG VARIANT=r-ver
FROM rocker/${VARIANT}

The UX will update the "VARIANT" in devcontainer.json automatically based on what you pick. This is how all the version and image variants work for things like the Python definition: main/containers/python-3/.devcontainer/Dockerfile

Thank you for your suggestion. That sounds good!
I would like to work on this when I have time.

@eitsupi
Copy link
Contributor

eitsupi commented Mar 4, 2022

@R-icntay @leestott @dynamicwebpaige
Switching the Remote-Containers extension to the pre-release version, we can use the latest R definition updated via #1327.
I think this will give you the container you are looking for if you only add some R packages to rocker/binder based Dockerfile and extensions to the devcontainer.json.

@bamurtaugh
Copy link
Member

Thanks again for opening the PR and for the discussion so far.

As a heads up, our team has been actively focused on an updated plan for community contributions and this repo moving forward, which we've now outlined in this issue: #1589. This includes moving to a couple new repos for images (https://github.com/devcontainers/images) and Features (https://github.com/devcontainers/features).

We anticipate to have a similar repo and distribution process for templates/definitions. We'll keep everyone updated (likely via another issue in this repo or comment on #1589) when our new templates repo is available and the process is defined.

Please let me know if you have any questions, thank you!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants