Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle multiline strings #1399

Open
CohenArthur opened this issue Jul 19, 2022 · 6 comments · May be fixed by #3325
Open

Handle multiline strings #1399

CohenArthur opened this issue Jul 19, 2022 · 6 comments · May be fixed by #3325

Comments

@CohenArthur
Copy link
Member

CohenArthur commented Jul 19, 2022

Multiline strings are allowed in Rust (playground link), however we currently do not handle them correctly:

test.rs:2:26: error: unended string literal
    2 |     let a = "whaaaaaat up
      |                          ^

This is the beginning of a patch to fix that, basically commenting the checks for a \n character:

diff --git a/gcc/rust/lex/rust-lex.cc b/gcc/rust/lex/rust-lex.cc
index ecf151dc778..c51b00fb5fe 100644
--- a/gcc/rust/lex/rust-lex.cc
+++ b/gcc/rust/lex/rust-lex.cc
@@ -1917,7 +1917,7 @@ Lexer::parse_string (Location loc)
   int length = 1;
   current_char32 = peek_codepoint_input ();
 
-  while (current_char32.value != '\n' && current_char32.value != '"')
+  while (/* current_char32.value != '\n' && */ current_char32.value != '"')
     {
       if (current_char32.value == '\\')
 	{
@@ -1949,14 +1949,15 @@ Lexer::parse_string (Location loc)
 
   current_column += length;
 
-  if (current_char32.value == '\n')
-    {
-      rust_error_at (get_current_location (), "unended string literal");
-      // by this point, the parser will stuck at this position due to
-      // undetermined string termination. we now need to unstuck the parser
-      skip_broken_string_input (current_char32.value);
-    }
-  else if (current_char32.value == '"')
+  // if (current_char32.value == '\n')
+  //   {
+  //     rust_error_at (get_current_location (), "unended string literal");
+  //     // by this point, the parser will stuck at this position due to
+  //     // undetermined string termination. we now need to unstuck the parser
+  //     skip_broken_string_input (current_char32.value);
+  //   }
+  if (current_char32.value == '"')
+    // else if (current_char32.value == '"')
     {
       current_column++;
 

However, that code is necessary for properly handling some documentation attributes, as pointed out by various test cases in our testsuite.

rustc does this in a different pass rather than the lexer, which is what I think we should do as well. We could for example add that check after parsing a doc_attr.

Here is the relevant rustc code which checks for certain characters:

                        if let Some(c) = doc_alias
                            .chars()
                            .find(|&c| c == '"' || c == '\'' || (c.is_whitespace() && c != ' '))
                        {
                            self.tcx
                                .sess
                                .struct_span_err(
                                    meta.span(),
                                    &format!(
                                        "{:?} character isn't allowed in `#[doc(alias = \"...\")]`",
                                        c,
                                    ),
                                )
                                .emit();
                            return false;
                        }

This issue is necessary for compiling certain versions of libcore properly, which do contain multiline strings.

@CohenArthur
Copy link
Member Author

As a side-note, I haven't been able to understand the new system which can emits errors based on locale. I'll have to ask on the Rust zulip for an explanation or a PR link, as I couldn't figure out where that error was emitted without checking out the 1.49 release

@bjorn3
Copy link

bjorn3 commented Jul 19, 2022

The error is emitted as tcx.sess.emit_err(errors::DocAliasBadChar { span, attr_str, char_: c }); where DocAliasBadChar is defined in compiler/rustc_passes/src/errors.rs as

#[derive(SessionDiagnostic)]
#[error(passes::doc_alias_bad_char)]
pub struct DocAliasBadChar<'a> {
    #[primary_span]
    pub span: Span,
    pub attr_str: &'a str,
    pub char_: char,
}

The actual error message is declared in compiler/rustc_error_messages/locales/en-US/passes.ftl as passes-doc-alias-bad-char = {$char_} character isn't allowed in {$attr_str}. The PR implementing this is rust-lang/rust#95512.

@CohenArthur
Copy link
Member Author

I found the error message but couldn't figure out the Diagnostic or how it was emitted. Thanks a lot @bjorn3 :DD

@liamnaddell
Copy link
Contributor

Seems like a fixed issue as of b5c354d

@liamnaddell
Copy link
Contributor

Seems like a fixed issue as of b5c354d

@CohenArthur

@CohenArthur
Copy link
Member Author

Thanks @liamnaddell! Good catch

CohenArthur added a commit to CohenArthur/gccrs that referenced this issue Dec 25, 2024
Regression checks for Rust-GCC#1399

gcc/testsuite/ChangeLog:

	* rust/compile/multiline-string.rs: New test.
	* rust/execute/torture/multiline-string.rs: New test.
@CohenArthur CohenArthur linked a pull request Dec 25, 2024 that will close this issue
CohenArthur added a commit to CohenArthur/gccrs that referenced this issue Dec 25, 2024
Regression checks for Rust-GCC#1399

gcc/testsuite/ChangeLog:

	* rust/compile/multiline-string.rs: New test.
	* rust/execute/torture/multiline-string.rs: New test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

Successfully merging a pull request may close this issue.

4 participants