Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sanity_checks: accept dash in packages' version strings #1706

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

rherilier
Copy link
Contributor

Projects may use dashs ('-') in their version strings (for example, taocpp-json use "-beta.N" for years) but it prevents accepting such projecs in the wrapdb database.

As the string is splitted with rsplit(1), such version strings should not cause a problem.

@rherilier rherilier force-pushed the accept-dash-in-packages-version-strings branch from 0efcfbf to b07d1cf Compare September 19, 2024 15:36
@rherilier
Copy link
Contributor Author

typo in commit message...

@eli-schwartz
Copy link
Member

Projects may use dashs ('-') in their version strings (for example, taocpp-json use "-beta.N" for years) but it prevents accepting such projecs in the wrapdb database.

Many software packaging systems handle this by fully rejecting such version strings and mandating that adding those packages to the database should first involve mangling the version string to a consistent format, for example replacing dashes with underscores.

@bgilbert
Copy link
Contributor

With this change, I believe the script would decide that the package version was beta.14, which doesn't seem to be what you'd want.

@rherilier
Copy link
Contributor Author

with version equal to 1.0.0-beta.14-1 (which is the correct version scheme in wrapdb), rsplit(1) should make ver contains 1.0.0-beta.14 and rev 1, which is what is wanted by sanity_checks to work.

@rherilier
Copy link
Contributor Author

@eli-schwartz

I'm not sure to understand...

The documentation of wrapDB does have any constraint for the entries' version string. The only one is for the wrap filename (the given regexp is [a-z0-9._]+) but this is neither respected in practice (73 packages use a dash in their wrap name) nor by sanity_checks.py (which does not do any test).

And if you look at the content of releases.json, all wrap versions match the scheme package-version-wrapdb-revision and the use of rsplit(1) in sanity_checks.py to get those 2 information reflect this implicit rule.

So, I don't see any reason for not supporting the dash in packages' version.

@bgilbert
Copy link
Contributor

It's not true that there's no restriction on dashes in the version number. The restriction is explicitly enforced by the code you're modifying.

You're right that rsplit would do the right thing here. Dashes in versions can still cause problems, though, even if the WrapDB tooling were okay with it1. If you have a tarball named taocpp-json-1.0.0-beta.14.tar.gz, how is a general-purpose tool supposed to parse out the package name and version number from the filename? Some package managers care about that.

1 versions.py would also need an update, and possibly other tools as well?

@rherilier
Copy link
Contributor Author

It's not true that there's no restriction on dashes in the version number. The restriction is explicitly enforced by the code you're modifying.

I said the documentation does not give any restriction on the version, not there is no restriction at all.

I said too that in the page Adding new projects to WrapDB, section Choosing the wrap name, the restriction on wrap names is non-sense. Here is the text:

NOTE: Wrap names must fully match this regexp: [a-z0-9._]+

The contributor is also invited to read the WrapDB's README.md for the next step:

Add your release information in releases.json. It is a dictionary where the key is the wrap name and the value is a dictionary containing with the following keys

The first project in releases.json, abseil-cpp, fails the requirement of the HTML documentation.

So, please, update the documentation.

Dashes in versions can still cause problems, though, even if the WrapDB tooling were okay with it.

Sure, it can, but where?

In wrapDB? This is the purpose of this PR?

In Meson? the function dependency() does not give any restriction for the version parameter. So, from a meson user point of view, there is no restriction to have a dependency which has dashes in its version.

So, WrapDB should do the same.

If you have a tarball named taocpp-json-1.0.0-beta.14.tar.gz, how is a general-purpose tool supposed to parse out the package name and version number from the filename? Some package managers care about that.

Why are we debating that? WrapDB use wrap files to avoid such headache.

versions.py would also need an update, and possibly other tools as well?

Thanks, I was too hasty...

I reworked my PR (and I accept any complain about the function name).

versions.py has been updated accordingly.

I fixed the test name too.

scripts tools/create_release.py and tools/update-packagefiles.py do not need any change.

But, the regexp I intend to change accepts underscores. So, how can sanity_checks.py and create_release.py be sure to correctly extract name and version from the git tags using rsplit('_', 1)?

There is no project with such version, so the error never occurs.

And tools/create_release.py fails with error:

$ ./tools/create_release.py
...
requests.exceptions.InvalidSchema: No connection adapters were found for 'ftp://ftp.gnu.org/gnu/gdbm/gdbm-1.23.tar.gz'
...

Is this script still used?

The script tools/import-wraps.py crashes too with the message:

...
$ ./tools/import-wraps.py whatever
Traceback (most recent call last):
  File "/home/..././tools/import-wraps.py", line 162, in <module>
    all_wraps.remove('sqlite')
ValueError: list.remove(x): x not in list
...

Is this script still used?

Do I open new issues for those 2 crashes or do I let you investigate on your side?

@rherilier rherilier force-pushed the accept-dash-in-packages-version-strings branch from b07d1cf to 53edee2 Compare September 22, 2024 18:09
@eli-schwartz
Copy link
Member

And tools/create_release.py fails with error:

$ ./tools/create_release.py
...
requests.exceptions.InvalidSchema: No connection adapters were found for 'ftp://ftp.gnu.org/gnu/gdbm/gdbm-1.23.tar.gz'
...

Is this script still used?

It is used on every single merge to master.

It also uses requests instead of urllib, which is why such a wrap works fine with meson but not with the create_release.py script... something that matters on the irregular occasion that the single wrap using an ftp url (gdbm) gets updated. It hasn't happened in years, since the code in create_release.py that tries to use it was added. ;)

The script tools/import-wraps.py crashes too with the message:
[...]
Is this script still used?

That script was implemented for a one-time event, and removing wraps that have been deleted will no longer work now that they are deleted and no longer in the list! :)

It is not intended to be still used, it is there for historicity.

@rherilier
Copy link
Contributor Author

@eli-schwartz: thanks for the full explanation.

Copy link
Contributor

@bgilbert bgilbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dashes in versions can still cause problems, though, even if the WrapDB tooling were okay with it.

Sure, it can, but where?

Downstream. These sorts of changes can break third-party tooling. Maybe there aren't any tools that care, but it's worth at least considering whether reneging on our prior guarantees is worth doing.

Why should we not instead require wraps to convert dashes in the version string to underscores?

tools/utils.py Outdated
@@ -128,3 +128,6 @@ def is_msys() -> bool:

def is_macos():
return any(platform.mac_ver()[0])

def split_release_version(version: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: split_version_release would be clearer because that's the order of the returned list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I follow your logic, split_version_revision should be a little bit better, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, yes.

@eli-schwartz
Copy link
Member

The elephant in the room is that the only reason dashes in the version are an issue here, is because of a proposed wrap of a beta release of a project.

Does taocpp-json not consider itself to be broadly usable, that it should be 14 attempts deep into the beta release cycle and still no stable version in sight?

@rherilier
Copy link
Contributor Author

@eli-schwartz:

The real concern is that meson accepts such version strings while sanity_checks.py does not because it add a pointless precondition on the upstream version.

With the next project:

meson.build:

project('foo', 'cpp')
d = dependency('bar', version: '>=1.0.0-beta.14')

bar.pc

includedir="/usr/include"

Name: bar
Description: Bar
Version: 1.0.0-beta.13
Cflags: -I${includedir}

and in a shell:

$ PKG_CONFIG_PATH=`pwd` meson setup . b
...
Dependency bar found: NO. Found 1.0.0-beta.13 but need: '>1.0.0-beta.14'
...

It fails as expected.

Using the comparison >=1.0.0-beta.13 will success as expected.

But using the comparison >=1.0.0 will success while it should fail because the used algorithm does not support that kind of notation. But it is not our concern.

And the second important point is that the actual regexp accepts underscores in upstream version while it should reject them:

  1. sanity_checks.py uses it to retrieve the wrap's name+version when parsing git tags.
  2. meson wrap does the same thing with patch filenames.

Because they both use rsplit('_', 1) to do the extraction. With tag (or patch filename) like my-project_1.0.0_12-1 the split will return my-project_1.0.0 as project name and 12-1 as wrap release version.

upstream versions with underscores break the way wrap files are handled: this
character is the delimiter used to extract the wrap's name and version from
different sources:

* `meson wrap` when retrieving patch files;
* sanity_checks.py when retrieving git tags.
@rherilier rherilier force-pushed the accept-dash-in-packages-version-strings branch from 53edee2 to 56fbd77 Compare September 24, 2024 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants