Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Add something like pip_requirements to get_source() #406

Closed
aaronsteers opened this issue Oct 1, 2024 · 2 comments
Closed

Comments

@aaronsteers
Copy link
Contributor

aaronsteers commented Oct 1, 2024

When installing Python connectors with get_source(), a new virtualenv is created, and then we run pip install {pip_url} where pip_url is either a user-provided value or airbyte-{connector_name} otherwise. Two problems with this:

  1. There's no way to inject additional dependencies into the virtual env.
  2. There's no way to restict or pin dependencies beyond what is pinned in the pyproject.toml file.

The second of these can lead to issues like this one:

One option would be to accept a string or a text file path that conforms to the style of requirements.txt, and then PyAirbyte would pass those restrictions along to pip install.

Workaround available without this feature (NEW):

This workaround was identified on Nov 1, 2024, much later than writing this initial issue.

Image

Since pip_url is literally just the string we'll pass to the venv's pip install, we can hack it by passing two libraries - one the connector itself, but two an explicit CDK version ref.

Example Notebook here:

https://colab.research.google.com/drive/1D_HqkMt_Vw3nnWJMHOye93X9x9gQ0cKb#scrollTo=hxgMbBNqrvE6

Other Workarounds:

  1. If Docker is available, the user can call get_source() with docker_image=True in order to use the Docker image instead of installing via pip. Because docker images are built once and frozen at time of release, they are not subject to version drift of dependent libraries.
  2. Otherwise, the user can pre-build the connector's virtualenv with pipx, uv, or the standard Python processes, and then pass the executable to get_source() using the local_executable arg. This gives the user full control of how the virtual environment is created, including injection of additional dependencies, and/or constraints.
@aaronsteers
Copy link
Contributor Author

aaronsteers commented Nov 1, 2024

To anyone watching this issue or finding this later, we have identified another workaround, which I've added to the body above.

source = ab.get_source(
    "source-google-sheets",
    pip_url="airbyte-source-google-sheets airbyte-cdk==4.5.0"
)

This workaround takes advantage of the fact that pip_url is any string that can be passed to pip install, which can include multiple libraries. By adding more libraries (or constraints) to the pip_url string, delimited by spaces, the user can install any number of libraries, and/or add version constraints above what the connector itself pins to. (The above example forces downgrade to a specific CDK version, even though the connector allows any version prior to v5.0.)

@aaronsteers
Copy link
Contributor Author

I think I am going to close this issue for now, since the workaround described above appears to be working well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant