Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Break long string literal arguments #2448

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

Julow
Copy link
Collaborator

@Julow Julow commented Sep 20, 2023

#2445 uncovered a mechanism for breaking after a multi-line string literal in a list of argument. This also remove the simulated formatting of expressions to check that they break introduced in #1360

This is hard to work with and easy to break so I suggest using the argument grouping concept based on is_simple instead.
This requires comming up with a rule and will cause regression.

I've made up arbitrary rules that should be improved:

test/passing/tests/break_infix.ml.ref Outdated Show resolved Hide resolved
lib/Translation_unit.ml Show resolved Hide resolved
lib/Source.ml Outdated Show resolved Hide resolved
test/passing/tests/infix_arg_grouping.ml Outdated Show resolved Hide resolved
test/unit/test_translation_unit.ml Outdated Show resolved Hide resolved
@Julow
Copy link
Collaborator Author

Julow commented Sep 20, 2023

The regressions are huge! I can identify two main regressions: printf/fprintf style functions and operator chains.

Printf-style function not only look better with the format string on the same line as the function but it's also how it was before:

diff --git a/utils/ccomp.ml b/utils/ccomp.ml
index e21260dcbe..c517712b3d 100644
--- a/utils/ccomp.ml
+++ b/utils/ccomp.ml
@@ -92,7 +92,8 @@ let compile_file ?output ?(opt = "") ?stable_name name =
   in
   let exit =
     command
-      (Printf.sprintf "%s%s %s %s -c %s %s %s %s %s%s"
+      (Printf.sprintf
+         "%s%s %s %s -c %s %s %s %s %s%s"
          (match !Clflags.c_compiler with
          | Some cc -> cc
          | None ->
@@ -135,7 +136,9 @@ let create_archive archive file_list =
     match Config.ccomp_type with
     | "msvc" ->
         command
-          (Printf.sprintf "link /lib /nologo /out:%s %s" quoted_archive
+          (Printf.sprintf
+             "link /lib /nologo /out:%s %s"
+             quoted_archive
              (quote_files ~response_files:true file_list))
     | _ ->
         assert (String.length Config.ar > 0);

There's also a large amount of fprintf ppf "...":

diff --git a/typing/typetexp.ml b/typing/typetexp.ml
index 521777ec95..f0230ed557 100644
--- a/typing/typetexp.ml
+++ b/typing/typetexp.ml
@@ -922,10 +922,12 @@ let report_error env ppf = function
         Style.inline_code name did_you_mean (fun () ->
           Misc.spellcheck in_scope_names name)
   | No_type_wildcards ->
-      fprintf ppf "A type wildcard %a is not allowed in this type declaration."
+      fprintf ppf
+        "A type wildcard %a is not allowed in this type declaration."
         Style.inline_code "_"
   | Undefined_type_constructor p ->
-      fprintf ppf "The type constructor@ %a@ is not yet completely defined"
+      fprintf ppf
+        "The type constructor@ %a@ is not yet completely defined"
         (Style.as_inline_code path)
         p
   | Type_arity_mismatch (lid, expected, provided) ->

I can try a different grouping strategy to support function with a string on the first line.

The is_trivial change had a huge impact on operator chains:

diff --git a/utils/warnings.ml b/utils/warnings.ml
index 812f82a8ed..b5c6cd27f7 100644
--- a/utils/warnings.ml
+++ b/utils/warnings.ml
@@ -1015,7 +1015,8 @@ let message = function
   | Fragile_match "" -> "this pattern-matching is fragile."
   | Fragile_match s ->
       "this pattern-matching is fragile.\n\
-       It will remain exhaustive when constructors are added to type " ^ s ^ "."
+       It will remain exhaustive when constructors are added to type "
+      ^ s ^ "."
   | Ignored_partial_application ->
       "this function application is partial,\nmaybe some arguments are missing."
   | Labels_omitted [] -> assert false
@@ -1027,15 +1028,17 @@ let message = function
   | Method_override [ lab ] -> "the method " ^ lab ^ " is overridden."
   | Method_override (cname :: slist) ->
       String.concat " "
-        ("the following methods are overridden by the class" :: cname :: ":\n "
-       :: slist)
+        ("the following methods are overridden by the class"
+        :: cname :: ":\n " :: slist)
   | Method_override [] -> assert false

This has some advantages and at least fixes an indentation bug but I might have to revert this change.

@gpetiot
Copy link
Collaborator

gpetiot commented Sep 25, 2023

I agree the diff is a clear improvement to me. Should we ask the opinion of the community? @samoht ?

@Julow
Copy link
Collaborator Author

Julow commented Sep 25, 2023

My comment above is not true anymore as I made some changes.
Printf-style functions are now formatted like this:

(happens a lot)

diff --git a/lib/Conf_decl.ml b/lib/Conf_decl.ml
index 031f46ef..fb779cd1 100644
--- a/lib/Conf_decl.ml
+++ b/lib/Conf_decl.ml
@@ -156,12 +156,12 @@ let in_attributes cond = function
 let maybe_empty = function "" -> "" | x -> " " ^ x
 
 let pp_deprecated ppf { dmsg; dversion = v } =
-  Format.fprintf ppf "This option is deprecated since version %a.%s" Version.pp
-    v (maybe_empty dmsg)
+  Format.fprintf ppf "This option is deprecated since version %a.%s"
+    Version.pp v (maybe_empty dmsg)

The formatting of operator chains is mostly unchanged, with a few added breaks when the string is 30 in length or more:

(happens rarely)

diff --git a/facebook-clang-plugins/clang-ocaml/yojson_utils.ml b/facebook-clang-plugins/clang-ocaml/yojson_utils.ml
index cda08a2686..91f8f68579 100644
--- a/facebook-clang-plugins/clang-ocaml/yojson_utils.ml
+++ b/facebook-clang-plugins/clang-ocaml/yojson_utils.ml
@@ -92,9 +92,11 @@ let run_converter_tool reader writer =
   and files = ref [] in
   let add_files x = files := x :: !files
   and usage_msg =
-    "Usage: " ^ Sys.argv.(0) ^ "[OPTIONS] INPUT_FILE [OUTPUT_FILE]\n"
+    "Usage: " ^ Sys.argv.(0)
+    ^ "[OPTIONS] INPUT_FILE [OUTPUT_FILE]\n"
     ^ "Parse yojson values and convert them to another format based on the \
-       extension" ^ " of the output file (default: ${INPUT_FILE}.value.gz).\n"
+       extension"
+    ^ " of the output file (default: ${INPUT_FILE}.value.gz).\n"
   in
   let spec =
     Utils.fix_arg_spec

@@ -55,7 +55,8 @@ module Error = struct
Out_channel.write_all n ~data:next ;
ignore
(Stdlib.Sys.command
(Printf.sprintf "git diff --no-index -u %S %S | sed '1,4d' 1>&2" p n) ) ;
(Printf.sprintf "git diff --no-index -u %S %S | sed '1,4d' 1>&2"
p n ) ) ;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I like this change - this was a nice oneliner but anymore after this patch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticing that in this case, we are exceding the margin on this line (81 characters), and it seems it's the best way to split the line (in general the closing ) are treated in a way to have them get attached to the expression they close).

@samoht
Copy link
Contributor

samoht commented Sep 26, 2023

I have mixed feelings about this patch.

What I don't like:

  • it turns nice one-liners into multi-lines
  • the rules seem very arbitrary (and hard to remember/understand for users)

What I kind-of like:

  • on some examples (like the printf ppf <string>) it leads to a clearer distinction between the format string and the parameter

So, I'm not sold on that one. What was the rational for the changes at the first place?

@Julow
Copy link
Collaborator Author

Julow commented Sep 26, 2023

Thanks for the review! The intent of this is to make explicit a rule that existed before but was implicit.

I've added more constraint at the same time in case it makes things better as regressions were expected but I'm reverting my changes.
I agree that breaking what was a one-liner is bad.

This is obviously not finished.

@Julow Julow marked this pull request as draft October 5, 2023 15:26
@Julow
Copy link
Collaborator Author

Julow commented Oct 5, 2023

This change is no longer needed to fix a bug since #2445 and #2453 but the new formatting is interesting.
I'll keep this open as a draft with the idea to ask for feedback later.

@Julow
Copy link
Collaborator Author

Julow commented Nov 22, 2023

A different formatting for delimited strings has been implemented in #2480, guarded behind an internal option, using a new str_as function.

Julow and others added 9 commits January 22, 2024 13:46
Strings using the heavy syntax `{| ... |}` are always broken, string
literals fitting on one line are not.
Short strings are common and the trivial property is used to format code
like this in a compact way:

      Cmd.(
        v "git" % "clone" % "--depth" % "1" % "--branch" % branch % remote
        % p output_dir )

The rule is arbitrary and have not been confronted to the real world
yet:

- Length of 4 or less: always trivial
- Length of 19 or less: trivial if made of only ASCII graphic characters
Allow non-simple string to be at the end of a group. This formats
printf-style functions in a nicer way.
String of a length less than 30 are trivial. This is still more
restrictive than on main.
@gpetiot gpetiot force-pushed the string-const-trivial branch from f37470f to 9fa3287 Compare January 22, 2024 12:57
@Julow
Copy link
Collaborator Author

Julow commented Feb 9, 2024

@gpetiot Thanks, that looks better now. The diff this causes looks good to me but is gigantic. We need to ask for feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants