Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate adding R/Py to $PATH in Connect Content images #821

Open
npelikan opened this issue Aug 6, 2024 · 5 comments
Open

Evaluate adding R/Py to $PATH in Connect Content images #821

npelikan opened this issue Aug 6, 2024 · 5 comments

Comments

@npelikan
Copy link

npelikan commented Aug 6, 2024

In our connect content images currently neither R nor Py are referenced in $PATH. In certain cases this can create unexpected errors -- I discovered this via using Databricks with Connect, where the R Databricks packages require reticulate, python and rpy2, and do not build without R referenced in $PATH. I confirmed via custom image where I added ENV PATH=$PATH:/opt/R/${R_VERSION}/bin that adding R and Py to $PATH fixes the above issue.

@aronatkins
Copy link
Contributor

This probably should be a Connect feature request, as we will want to ensure that only the target version of R is on the path when executing Python content.

@npelikan
Copy link
Author

npelikan commented Aug 6, 2024

@aronatkins fair point -- my interpretation of Connect build logs is that currently Connect does not add the target version of R/Py to path until the end of the content build process. Do I have that right? And you're suggesting that the content build process should instead add those to path at the beginning of the build process?

@aronatkins
Copy link
Contributor

Building an R environment or a Python environment for some content item happens independently -- R packages are installed without knowing the target Python interpreter and the Python virtual environment is created without knowing the target R interpreter. The constructed packages/environment are not bound to some other language interpreter version.

Are you implying that the target R interpreter is needed while installing Python packages (or vice versa)?

@npelikan
Copy link
Author

npelikan commented Aug 6, 2024

That's right -- this is specifically the case for rpy2, a dependency for using Databricks R UDFs. Trying to build a python environment containing rpy2 in connect results in the following error (if PATH isn't set in the image like above):

2024/08/05 19:06:40.127284548   Using cached rpy2-3.5.16.tar.gz (220 kB)
2024/08/05 19:06:40.216007648   Installing build dependencies: started
2024/08/05 19:06:44.778537047   Installing build dependencies: finished with status 'done'
2024/08/05 19:06:44.782697293   Getting requirements to build wheel: started
2024/08/05 19:06:45.221318301   Getting requirements to build wheel: finished with status 'error'
2024/08/05 19:06:45.255397641   error: subprocess-exited-with-error
2024/08/05 19:06:45.255412600   
2024/08/05 19:06:45.255457779   × Getting requirements to build wheel did not run successfully.
2024/08/05 19:06:45.255459407   │ exit code: 1
2024/08/05 19:06:45.255469493   ╰─> [6 lines of output]
2024/08/05 19:06:45.255470250       Unable to determine R home: [Errno 2] No such file or directory: 'R'
2024/08/05 19:06:45.255478691       cffi mode is CFFI_MODE.ANY
2024/08/05 19:06:45.255479468       Looking for R home with: R RHOME
2024/08/05 19:06:45.255487283       Unable to determine R home: [Errno 2] No such file or directory: 'R'
2024/08/05 19:06:45.255488443       R home found: None
2024/08/05 19:06:45.255496124       Error: rpy2 in API mode cannot be built without R in the PATH or R_HOME defined. Correct this or force ABI mode-only by defining the environment variable RPY2_CFFI_MODE=ABI
2024/08/05 19:06:45.255517436       [end of output]
2024/08/05 19:06:45.255532828   
2024/08/05 19:06:45.255533698   note: This error originates from a subprocess, and is likely not a problem with pip.
2024/08/05 19:06:45.258786322 error: subprocess-exited-with-error
2024/08/05 19:06:45.258805102 
2024/08/05 19:06:45.258838477 × Getting requirements to build wheel did not run successfully.
2024/08/05 19:06:45.258839850 │ exit code: 1
2024/08/05 19:06:45.258849370 ╰─> See above for output.
2024/08/05 19:06:45.258850890 
2024/08/05 19:06:45.258869224 note: This error originates from a subprocess, and is likely not a problem with pip.
2024/08/05 19:06:51.616153216 pip install failed with exit code 1

@aronatkins
Copy link
Contributor

Thanks for that output.

Unfortunately, this implies that the resulting rpy2 installation (and the containing Python virtual environment) would be restricted to a specific R interpreter. Because Connect does not know about this restriction, I believe Connect could incorrectly try to use that same virtual environment for content that wants to use a different version of R.

Connect can share virtual environments and uses only the Python interpreter and package requirements to determine if an existing virtual environment can be reused.

In the very narrow example where an image has a single R and Python installation and those interpreter versions never change, the approach you outline appears safe, but outside that situation, reuse does not feel appropriate.

CC @mmarchetti - in case there are other alternatives.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants