Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated OctopusVersionParser regex to ensure delimiters between version numbers #73

Closed
wants to merge 3 commits into from

Conversation

tleed5
Copy link
Contributor

@tleed5 tleed5 commented Jan 4, 2024

Background

[SC-67249]

OctopusVersionParser incorrectly parsed some version strings. Example:

Input: 123ABC
Result: 
    Major: 123
    Minor: 0
    Patch: 0
    Revision: 0
    Release: ABC

While the above example wouldn't cause issues there is cases where if an input version string started with a number that was larger than int32.MaxValue it would cause an overflow exception to be thrown.

Solution

The regex has been updated to requires a word boundary directly after the version numbers in the version string to ensure that version numbers are correctly delimited

Examples

Old Regex:

Screenshot 2024-01-04 at 3 41 37 pm

New Regex:

Screenshot 2024-01-04 at 3 41 58 pm

Warning

This is a breaking change, some versions that previously parsed will no longer parse, this is limited to versions that didn't have a separator character between the first version number and the rest of the version string e.g. 1_0Alpha will now result in 0.0.0-1_0Alpha rather than 1.0.0-0Alpha. The effect of this is that packages that have incorrect version strings will no longer sort correctly

tleed5 added 2 commits January 4, 2024 15:28
- Now requires a word boundary directly after the version numbers in the version string, prevents input strings like 123ABC from parsing as 123.0.0.0-ABC
Copy link

This pull request has been linked to Shortcut Story #67249: Git sourcing fails if repo version is specific style.

Comment on lines -686 to -711
[TestCase("1_0_alpha",
1,
0,
0,
0,
0,
0,
0,
"",
"0_alpha",
"0",
"alpha",
"")]
[TestCase("1_0alpha",
1,
0,
0,
0,
0,
0,
0,
"",
"0alpha",
"0alpha",
"",
"")]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like these tests are checking if the maven version parser and octopus version parser returns the same result for certain version strings, these test cases that I removed no longer return the same result between the two version parsers.

That said I think the maven parser suffers the same issue where it doesn't require a delimiter between the major version and the rest of the version string

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth running this one by @mcasperson who was involved in adding the maven support

… homemade word boundary that handles _ characters
@mcasperson
Copy link
Contributor

I'll add some context to the original design decisions around the Octopus version scheme.

At the time the need was to allow any Docker tag within reason to be parsed semantically. Docker tags have no semantics (beyond maybe latest symolising, but not enforcing, the latest images), and the original approach was to only allow docker images with semver tags to be consumed by Octopus. This was a problem because a lot of Docker images used tags based on the semantic versioning of their underlying languages, and these semantic versions were not necessarily semver.

The Octopus version was designed to be as much as superset of semver as possible while allowing almost any string to be parsed into a major.minor.patch.build-prerelease+meta format. It borrowed heavily from the Python and Maven versioning schemes which were much more relaxed than semver and allowed us to parse almost any string and give it semantic meaning.

In particular, leading v characters are ignore (this is from Python), versions with no integers were zero-ver versions with the string as a prerelease (this is from Maven), and there is no need for a separator between numeric and non-numeric characters (this is from both Python and Maven).

We had an idea at some point for the Octopus version to be the universal version used by every feed, but the benefits of a single versioning scheme didn't seem to outweigh the potential for annoying and unintuitive edge cases. So it was initially limited to Docker feeds.

My understanding is that this PR is attempting to work around the overflow issues that may arise when parsing git commit hashes as semantic versions. The solution here is to use the fact that hashes will never have a dash or period, and to essentially treat any string that isn't a number as a prerelease string.

I suspect the fix in that PR will fail though if the commit hash is just numbers.

For example, a git commit hash of 2147483647 (int max) would work and assign that value to the equivalent of the semver major field (i.e. 2147483647.0.0). But a hash of 3147483647 (above int max) would fail with an overrun.

I imagine these errors would be rare (hashes with no characters would have to be pretty uncommon), but predictable enough that you would have to assume someone would eventually run into it.

It also introduces a method of parsing versions that is inconsistent with established formats like Maven and Python. TBH we were never going to get a universal version parser - there are just too many differences between all the standards - but I definitely took a lot of comfort knowing that our implementation was based on decisions made in existing standards.

I would suggest possible alternatives to this PR:

  1. Parse numbers as longs instead of ints. Beware though that this may impact package range rules used in channels, as those use standard semver libraries, and I would assume the semver libraries only support ints.
  2. Use golang psuedo versions. Golang has already worked out how to create semver version numbers from git commit hashes, with the added benefit of timestamps to give these versions meaningful semantics.
  3. Just treat hashes as zero-ver semver versions with the hash as a prerelease string.

@zentron
Copy link
Contributor

zentron commented Jan 7, 2024

Thanks for the background @mcasperson.

Effectively option 3 you listed above is what this PR is attempting to do.

Just treat hashes as zero-ver semver versions with the hash as a prerelease string.

The catch is that when the version string is encoded in, say a file name, we need a way of actually being able to parse the unknown string in the first place.

The continued problem of

I suspect the fix in that PR will fail though if the commit hash is just numbers.
Is a tricky one though since although it would realistically be quite rare, would still result in the same problem.

Although it feels like this constitutes fundamentally a bug in OctoVersion, im leaning back towards a fourth option that we have discussed internally,

Creating a new versioning scheme which is essentially "Unordered" (although technically since it needs to impliment ICompare it will be some sort of lexographical sorting)

@tleed5 Im a little concerned about some of these weird Maven edge cases (its always maven :D ) that @mcasperson has mentioned so thinking we might just need to bite the bullet and accept that we need a new version type after all.

@tleed5
Copy link
Contributor Author

tleed5 commented Jan 8, 2024

@zentron @mcasperson Just to tie this one up, plan is to add a new version type which is done over in #72 so ill close this PR as we aren't going to merge it since it would potentially be breaking valid versioning schemes for some customers

@tleed5 tleed5 closed this Jan 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants