Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use resource_path() to access datatypes_conf.xml.sample as a package resource #19331

Merged
merged 2 commits into from
Dec 17, 2024

Conversation

nsoranzo
Copy link
Member

The bugfix in the title is needed when using the galaxy-tool-util package as a zip file, see https://importlib-resources.readthedocs.io/en/latest/using.html for the long explanation.

Unfortunately the simple fix (originally opened by @bernt-matthias in #19188 ) uncovered a few other issues described in the first commit message below, in particular changing how we resolve an empty value for a path configuration option.

Also in this PR:

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

…age resource

This required other changes included here as well:

1) Extend `parse_xml()` to accept a `Traversable` (i.e. the output of
   `resource_path()`.

Open the file before passing it to `etree.parse()`, since lxml's
implementation probably doesn't support a `Traversable`. This also
simplifies the exception handling, since the workaround for the missing
`errno` in the `OSError` exception is not needed any more.

------

2) Fix Galaxy startup when the ``data_manager_config_file`` config option
   is set to an empty value

When does this happen?
----------------------

planemo (which we use in the converters tests) set this option to an
empty value when not testing data managers:

https://github.com/galaxyproject/planemo/blob/master/planemo/galaxy/config.py#L452

This empty config value was resolved to the `path_resolves_to` directory
for this option.
With the change in 1) above, this results in the following traceback at
Galaxy startup:

```
Traceback (most recent call last):
  File "/usr/users/ga002/soranzon/software/nsoranzo_galaxy/lib/galaxy/webapps/galaxy/buildapp.py", line 68, in app_pair
    app = galaxy.app.UniverseApplication(global_conf=global_conf, is_webapp=True, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/users/ga002/soranzon/software/nsoranzo_galaxy/lib/galaxy/app.py", line 777, in __init__
    self.data_managers = self._register_singleton(DataManagers)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/users/ga002/soranzon/software/nsoranzo_galaxy/lib/galaxy/di.py", line 27, in _register_singleton
    instance = self[dep_type]
               ~~~~^^^^^^^^^^
  File "/usr/users/ga002/soranzon/software/nsoranzo_galaxy/.venv/lib/python3.12/site-packages/lagom/container.py", line 381, in __getitem__
    return self.resolve(dep)
           ^^^^^^^^^^^^^^^^^
  File "/usr/users/ga002/soranzon/software/nsoranzo_galaxy/.venv/lib/python3.12/site-packages/lagom/container.py", line 265, in resolve
    return self._reflection_build_with_err_handling(dep_type, suppress_error)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/users/ga002/soranzon/software/nsoranzo_galaxy/.venv/lib/python3.12/site-packages/lagom/container.py", line 392, in _reflection_build_with_err_handling
    return self._reflection_build(dep_type)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/users/ga002/soranzon/software/nsoranzo_galaxy/.venv/lib/python3.12/site-packages/lagom/container.py", line 408, in _reflection_build
    return dep_type(**sub_deps)  # type: ignore
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/users/ga002/soranzon/software/nsoranzo_galaxy/lib/galaxy/tools/data_manager/manager.py", line 44, in __init__
    self.load_from_xml(filename)
  File "/usr/users/ga002/soranzon/software/nsoranzo_galaxy/lib/galaxy/tools/data_manager/manager.py", line 58, in load_from_xml
    tree = util.parse_xml(xml_filename)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/users/ga002/soranzon/software/nsoranzo_galaxy/lib/galaxy/util/__init__.py", line 360, in parse_xml
    with source.open("rb") as f:
         ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/pathlib.py", line 1013, in open
    return io.open(self, mode, buffering, encoding, errors, newline)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IsADirectoryError: [Errno 21] Is a directory: '/tmp/tmps2qq5htq'
```

Why is this a problem now?
--------------------------

Before this commit, `util.parse_xml()` was raising a `XMLSyntaxError`
exception if passed a directory, but now it raises an `IsADirectoryError`,
which is a subclass of `OSError` with errno 21.

As a consequence, when `load_from_xml()` in
lib/galaxy/tools/data_manager/manager.py
calls `util.parse_xml()` passing a directory, the resulting exception
is now re-raised in the `except OSError as e:` branch instead of
just being logged, which is actually good since this is a
misconfiguration.

Solutions
--------

- Do not resolve an empty/None path to the `path_resolves_to` directory.
- Do not check empty/None paths against root
- Adjst unit tests accordingly
- Raise an OSError if the `job_config_file` config option is set to null
  (previously this would be resolved to a directory, also raising an
  OSError when trying to open that as a file).

N.B.: This will also need to be fixed in Planemo by not setting the
`data_manager_config_file` option when not needed.
`__package__` is deprecated since Python 3.13, see
https://docs.python.org/3/reference/datamodel.html#module.__package__

`importlib.resources.files()` can be passed a non-package module
since Python 3.12 (or importlib-resources 5.10.0, see
https://importlib-resources.readthedocs.io/en/latest/history.html#v5-10-0 ),
so we can use `__name__` instead of `__package__`.

Also, I noticed that in lib/galaxy/config/__init__.py we are passing
a directory to `importlib.resources.as_file()`, which is supported
only since Python 3.12 (or importlib-resources 5.9.0).

For the reasons above:
- Update conditional imports in `lib/galaxy/util/resources.py` to
  import from `importlib.resources` only for Python 3.12,
  otherwise from the importlib-resources package.
- Update importlib-resources requirement in galaxy-util package.
- Drop importlib-resources requirement from galaxy-web-apps as it's
  not imported directly from its code any more.
@nsoranzo
Copy link
Member Author

Test failures unrelated.

@mvdbeek mvdbeek merged commit b1217af into galaxyproject:dev Dec 17, 2024
54 of 57 checks passed
@mvdbeek
Copy link
Member

mvdbeek commented Dec 17, 2024

Thanks! Does this also need to go to 24.2 ?

@nsoranzo nsoranzo deleted the resource_path_fixes branch December 17, 2024 13:46
@nsoranzo
Copy link
Member Author

The first commit probably should (I originally worked on that branch), but with the config handling that I also had to change I opted for dev. Maybe backport later if someone encounters the issue with accessing datatypes_conf.xml.sample ?

@mvdbeek
Copy link
Member

mvdbeek commented Dec 17, 2024

Won't we need this for planemo if we don't want to exclude the 24.2 package ?

@nsoranzo
Copy link
Member Author

Won't we need this for planemo if we don't want to exclude the 24.2 package ?

Maybe? I'm happy to backport the first commit if the changes look safe enough for you.

@mvdbeek
Copy link
Member

mvdbeek commented Dec 17, 2024

That looks good to me, yes.

mvdbeek added a commit that referenced this pull request Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants