Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix versionsort chunk split on non-ASCII numerics #6407

Merged
merged 2 commits into from
Dec 1, 2024

Conversation

jessicarod7
Copy link
Contributor

Description

Fixes a bug with 2024 Edition versionsort, where a string chunk will split on all numeric characters. For example, this would cause imports containing non-ASCII numeric characters to be incorrectly sorted.

Unformatted (Playground link):

use std::cmp::Ordering;
use print๙msg::print as first_print;
use print0msg::print as second_print;
use printémsg::print as third_print;

fn main() {
    first_print();
    second_print();
    third_print();

    assert_eq!("print๙msg".cmp("printémsg"), Ordering::Greater);
}

/// '๙' = 0E59;THAI DIGIT NINE;Nd;
mod print๙msg {
    pub fn print() {
        println!("Non-ASCII Decimal_Number")
    }
}

/// '0' = 0030;DIGIT ZERO;Nd;
mod print0msg {
    pub fn print() {
        println!("ASCII Decimal_Number")
    }
}

/// 'é' = 00E9;LATIN SMALL LETTER E WITH ACUTE;Ll;
mod printémsg {
    pub fn print() {
        println!("Lowercase_Letter")
    }
}

Formatted imports (nightly 2024-11-29, Playground link):

use print0msg::print as second_print;
use print๙msg::print as first_print;
use printémsg::print as third_print;
use std::cmp::Ordering;

Here, printémsg should be sorted before print๙msg, but since is a non-ASCII numeric, that import is split into two short string chunks, which will sort before the one longer chunk.

Changes & Notes

Copy link
Contributor

@ytmimi ytmimi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Style Guide mentions that numeric chunks are defined by ascii digits so using is_ascii_digit is definitely the right function call to use.

I've left a few comments inline. I think we should rework the current test case and also add a style_edition=2015 case so that we can compare the sort order when dealing with non ascii numeric characters.

Comment on lines 6 to 33
fn main() {
first_print();
second_print();
third_print();

assert_eq!("print๙msg".cmp("printémsg"), Ordering::Greater);
}

/// '๙' = 0E59;THAI DIGIT NINE;Nd;
mod print๙msg {
pub fn print() {
println!("Non-ASCII Decimal_Number")
}
}

/// '0' = 0030;DIGIT ZERO;Nd;
mod print0msg {
pub fn print() {
println!("ASCII Decimal_Number")
}
}

/// 'é' = 00E9;LATIN SMALL LETTER E WITH ACUTE;Ll;
mod printémsg {
pub fn print() {
println!("Lowercase_Letter")
}
}
Copy link
Contributor

@ytmimi ytmimi Nov 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove any code that isn't an import since it's not needed for the test case. Though, It would be great to keep the explanatory comments you added about '๙', '0', and 'é' to help document the test case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated and expanded on the comments to explain why they sort in that order.

Comment on lines 1 to 4
use std::cmp::Ordering;
use print๙msg::print as first_print;
use print0msg::print as second_print;
use printémsg::print as third_print;
Copy link
Contributor

@ytmimi ytmimi Nov 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

version sort is only be applied when using style_edition=2024. You'll need to configure that for this test as follows:

Suggested change
use std::cmp::Ordering;
use print๙msg::print as first_print;
use print0msg::print as second_print;
use printémsg::print as third_print;
// rustfmt-style_edition: 2024
use std::cmp::Ordering;
use print๙msg::print as first_print;
use print0msg::print as second_print;
use printémsg::print as third_print;

You can read more about the comment configuration in test cases here.

Also, can you also add a style_edition=2015 import sorting test case so that we can compare the pre version sort ordering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. The 2015 edition actually sorts the same way since U+0030 < U+00E9 < U+0E59, so the bug only affected 2024 edition (and can't be reproduced on earlier versions because ASCII digits are the earliest Unicode numerics).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding that additional test case!

Copy link
Contributor

@ytmimi ytmimi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your help on this one!

@ytmimi
Copy link
Contributor

ytmimi commented Nov 30, 2024

I think we're good to go here, but just want to do a quick sanity check by running the updated rustfmt on some larger rust codebases. Here's a link to the diff check job.

Edit: Job ran successfully ✅

@ytmimi ytmimi merged commit 78aa72f into rust-lang:master Dec 1, 2024
26 checks passed
@ytmimi ytmimi added release-notes Needs an associated changelog entry and removed pr-ready-to-merge labels Dec 1, 2024
@jessicarod7 jessicarod7 deleted the versionsort_non_ascii_numerics branch December 1, 2024 12:01
@ytmimi ytmimi removed the release-notes Needs an associated changelog entry label Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants