Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle newlines during string parsing while lexing #2684

Merged
merged 1 commit into from
Jan 4, 2024

Conversation

nirmal-j-patel
Copy link
Contributor

If newline strings are encountered while lexing, the lexer now handles newline characters by incrementing current line number. This provides correct line number when displaying errors. If the lexer encounters end of file before string end, then it will use the start of the string as the location to an report error.

Fixes #2187

Copy link
Contributor Author

@nirmal-j-patel nirmal-j-patel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/Rust-GCC/gccrs/pull/2684/files#diff-bf317668e350e4e1bb830fab7e19d5e0b96e95471067d2d6b1bd6dabb95ff0aeR2061

I am not sure if we still this. When we encounter a string spanning multiple lines, do we still want to add length of string to current_column?

@P-E-P
Copy link
Member

P-E-P commented Oct 16, 2023

https://github.com/Rust-GCC/gccrs/pull/2684/files#diff-bf317668e350e4e1bb830fab7e19d5e0b96e95471067d2d6b1bd6dabb95ff0aeR2061

I am not sure if we still this. When we encounter a string spanning multiple lines, do we still want to add length of string to current_column?

I'm not well versed in this part of the code but it seems like current_column belongs to the public internal state of the lexer, someone might depend on it after calling this function am I right ?

@nirmal-j-patel
Copy link
Contributor Author

It appears that if we continue to add string length to its beginning locus, then it will give wrong location when showing errors involving multi-line strings. This is because the length of the multiple line string will be longer than the length of any particular line. I am incrementing current_column per character and resetting it to 1 on linebreak. I am not sure how it would work with unicode characters. However, removing this part works better from what I can tell.

Without removing current_column += length;

/home/user/projects/gccrs_test/location_error_str.rs:3:20: error: unicode escape should start with ‘{’
    3 | " + "\uu";
      |                    ^
/home/user/projects/gccrs_test/location_error_str.rs:6:4: error: unrecognised token ‘identifier’ for start of item
    6 |    ERROR_TIME
      |    ^~~~~~~~~~
/home/user/projects/gccrs_test/location_error_str.rs:6:4: error: failed to parse item in crate

After removing current_column += length;

/home/user/projects/gccrs_test/location_error_str.rs:3:5: error: unicode escape should start with ‘{’
    3 | " + "\uu";
      |     ^
/home/user/projects/gccrs_test/location_error_str.rs:6:4: error: unrecognised token ‘identifier’ for start of item
    6 |    ERROR_TIME
      |    ^~~~~~~~~~
/home/user/projects/gccrs_test/location_error_str.rs:6:4: error: failed to parse item in crate

@nirmal-j-patel nirmal-j-patel force-pushed the string_linebreak_locus_fix branch from 42cc5e7 to 7328844 Compare October 17, 2023 22:13
@nirmal-j-patel nirmal-j-patel force-pushed the string_linebreak_locus_fix branch from 7328844 to 5f7c65f Compare November 1, 2023 22:08
@nirmal-j-patel nirmal-j-patel marked this pull request as ready for review November 2, 2023 01:08
@P-E-P P-E-P requested review from CohenArthur and P-E-P November 6, 2023 11:50
@P-E-P P-E-P added this to the GCC 14.1 release milestone Nov 6, 2023
Copy link
Member

@CohenArthur CohenArthur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good. I'd like to see maybe one more test to make sure we parse the newlines properly, so maybe something that calls printf and checks the output with dg-output?

If newline strings are encountered while lexing, the lexer now handles
newline characters by incrementing current line number. This provides
correct line number when displaying errors. If the lexer encounters end
of file before string end, then it will use the start of the string as
the location to an report error.

gcc/rust/ChangeLog:
	* lex/rust-lex.cc (Lexer::parse_byte_string): Handle newline
	while parsing byte strings
	(Lexer::parse_string): Handle newline while parsing strings

Signed-off-by: Nirmal Patel <[email protected]>
@nirmal-j-patel nirmal-j-patel force-pushed the string_linebreak_locus_fix branch from 5f7c65f to c1c43f6 Compare December 27, 2023 22:50
@nirmal-j-patel
Copy link
Contributor Author

Apologies for the delay. How does it look now?

Copy link
Member

@P-E-P P-E-P left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good 🙂

@P-E-P P-E-P added this pull request to the merge queue Jan 4, 2024
Merged via the queue into Rust-GCC:master with commit b864866 Jan 4, 2024
9 checks passed
@nirmal-j-patel nirmal-j-patel deleted the string_linebreak_locus_fix branch January 4, 2024 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Location tracking error while lexing strings
3 participants