Updated OctopusVersionParser regex to ensure delimiters between version numbers #73

tleed5 · 2024-01-04T06:07:07Z

Background

[SC-67249]

OctopusVersionParser incorrectly parsed some version strings. Example:

Input: 123ABC
Result: 
    Major: 123
    Minor: 0
    Patch: 0
    Revision: 0
    Release: ABC

While the above example wouldn't cause issues there is cases where if an input version string started with a number that was larger than int32.MaxValue it would cause an overflow exception to be thrown.

Solution

The regex has been updated to requires a word boundary directly after the version numbers in the version string to ensure that version numbers are correctly delimited

Examples

Old Regex:

New Regex:

Warning

This is a breaking change, some versions that previously parsed will no longer parse, this is limited to versions that didn't have a separator character between the first version number and the rest of the version string e.g. 1_0Alpha will now result in 0.0.0-1_0Alpha rather than 1.0.0-0Alpha. The effect of this is that packages that have incorrect version strings will no longer sort correctly

- Now requires a word boundary directly after the version numbers in the version string, prevents input strings like 123ABC from parsing as 123.0.0.0-ABC

shortcut-integration · 2024-01-04T06:07:11Z

This pull request has been linked to Shortcut Story #67249: Git sourcing fails if repo version is specific style.

tleed5 · 2024-01-04T06:14:53Z

source/Octopus.Versioning.Tests/Octopus/OctopusVersionTests.cs

-        [TestCase("1_0_alpha",
-            1,
-            0,
-            0,
-            0,
-            0,
-            0,
-            0,
-            "",
-            "0_alpha",
-            "0",
-            "alpha",
-            "")]
-        [TestCase("1_0alpha",
-            1,
-            0,
-            0,
-            0,
-            0,
-            0,
-            0,
-            "",
-            "0alpha",
-            "0alpha",
-            "",
-            "")]


It looks like these tests are checking if the maven version parser and octopus version parser returns the same result for certain version strings, these test cases that I removed no longer return the same result between the two version parsers.

That said I think the maven parser suffers the same issue where it doesn't require a delimiter between the major version and the rest of the version string

It might be worth running this one by @mcasperson who was involved in adding the maven support

… homemade word boundary that handles _ characters

mcasperson · 2024-01-05T09:21:18Z

I'll add some context to the original design decisions around the Octopus version scheme.

At the time the need was to allow any Docker tag within reason to be parsed semantically. Docker tags have no semantics (beyond maybe latest symolising, but not enforcing, the latest images), and the original approach was to only allow docker images with semver tags to be consumed by Octopus. This was a problem because a lot of Docker images used tags based on the semantic versioning of their underlying languages, and these semantic versions were not necessarily semver.

The Octopus version was designed to be as much as superset of semver as possible while allowing almost any string to be parsed into a major.minor.patch.build-prerelease+meta format. It borrowed heavily from the Python and Maven versioning schemes which were much more relaxed than semver and allowed us to parse almost any string and give it semantic meaning.

In particular, leading v characters are ignore (this is from Python), versions with no integers were zero-ver versions with the string as a prerelease (this is from Maven), and there is no need for a separator between numeric and non-numeric characters (this is from both Python and Maven).

We had an idea at some point for the Octopus version to be the universal version used by every feed, but the benefits of a single versioning scheme didn't seem to outweigh the potential for annoying and unintuitive edge cases. So it was initially limited to Docker feeds.

My understanding is that this PR is attempting to work around the overflow issues that may arise when parsing git commit hashes as semantic versions. The solution here is to use the fact that hashes will never have a dash or period, and to essentially treat any string that isn't a number as a prerelease string.

I suspect the fix in that PR will fail though if the commit hash is just numbers.

For example, a git commit hash of 2147483647 (int max) would work and assign that value to the equivalent of the semver major field (i.e. 2147483647.0.0). But a hash of 3147483647 (above int max) would fail with an overrun.

I imagine these errors would be rare (hashes with no characters would have to be pretty uncommon), but predictable enough that you would have to assume someone would eventually run into it.

It also introduces a method of parsing versions that is inconsistent with established formats like Maven and Python. TBH we were never going to get a universal version parser - there are just too many differences between all the standards - but I definitely took a lot of comfort knowing that our implementation was based on decisions made in existing standards.

I would suggest possible alternatives to this PR:

Parse numbers as longs instead of ints. Beware though that this may impact package range rules used in channels, as those use standard semver libraries, and I would assume the semver libraries only support ints.
Use golang psuedo versions. Golang has already worked out how to create semver version numbers from git commit hashes, with the added benefit of timestamps to give these versions meaningful semantics.
Just treat hashes as zero-ver semver versions with the hash as a prerelease string.

zentron · 2024-01-07T22:54:32Z

Thanks for the background @mcasperson.

Effectively option 3 you listed above is what this PR is attempting to do.

Just treat hashes as zero-ver semver versions with the hash as a prerelease string.

The catch is that when the version string is encoded in, say a file name, we need a way of actually being able to parse the unknown string in the first place.

The continued problem of

I suspect the fix in that PR will fail though if the commit hash is just numbers.
Is a tricky one though since although it would realistically be quite rare, would still result in the same problem.

Although it feels like this constitutes fundamentally a bug in OctoVersion, im leaning back towards a fourth option that we have discussed internally,

Creating a new versioning scheme which is essentially "Unordered" (although technically since it needs to impliment ICompare it will be some sort of lexographical sorting)

@tleed5 Im a little concerned about some of these weird Maven edge cases (its always maven :D ) that @mcasperson has mentioned so thinking we might just need to bite the bullet and accept that we need a new version type after all.

tleed5 · 2024-01-08T01:04:35Z

@zentron @mcasperson Just to tie this one up, plan is to add a new version type which is done over in #72 so ill close this PR as we aren't going to merge it since it would potentially be breaking valid versioning schemes for some customers

tleed5 added 2 commits January 4, 2024 15:28

Updated OctopusVersionParser regex

1324ce2

- Now requires a word boundary directly after the version numbers in the version string, prevents input strings like 123ABC from parsing as 123.0.0.0-ABC

Update comment

79b1c99

tleed5 commented Jan 4, 2024

View reviewed changes

Regex word boundary accepts _ as a word character so changing it to a…

f6f7c2e

… homemade word boundary that handles _ characters

tleed5 mentioned this pull request Jan 7, 2024

Add LexicographicSortedVersion #72

Merged

tleed5 closed this Jan 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated OctopusVersionParser regex to ensure delimiters between version numbers #73

Updated OctopusVersionParser regex to ensure delimiters between version numbers #73

tleed5 commented Jan 4, 2024 •

edited

Loading

shortcut-integration bot commented Jan 4, 2024

tleed5 Jan 4, 2024

zentron Jan 4, 2024

mcasperson commented Jan 5, 2024

zentron commented Jan 7, 2024

tleed5 commented Jan 8, 2024

Updated OctopusVersionParser regex to ensure delimiters between version numbers #73

Updated OctopusVersionParser regex to ensure delimiters between version numbers #73

Conversation

tleed5 commented Jan 4, 2024 • edited Loading

Background

Solution

Examples

Old Regex:

New Regex:

Warning

shortcut-integration bot commented Jan 4, 2024

tleed5 Jan 4, 2024

Choose a reason for hiding this comment

zentron Jan 4, 2024

Choose a reason for hiding this comment

mcasperson commented Jan 5, 2024

zentron commented Jan 7, 2024

tleed5 commented Jan 8, 2024

tleed5 commented Jan 4, 2024 •

edited

Loading