Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge upstream, 2024-03-09 #2944

Merged
merged 2,358 commits into from
Apr 10, 2024
Merged

Merge upstream, 2024-03-09 #2944

merged 2,358 commits into from
Apr 10, 2024

Conversation

tschwinge
Copy link
Member

Merging in several stages, towards #2802 and further.

This must of course not be rebased by GitHub merge queue, but has to become a proper Git merge. (I'll handle that, once ready.)

rguenth and others added 30 commits February 26, 2024 15:20
In some cases exits can lack LC PHI nodes for the virtual operand.
We have to create them when the epilog loop requires them which also
allows us to remove some only halfway correct fixups.  This is the
variant triggering for alternate exits.

	PR tree-optimization/114099
	* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
	Create and fill in a needed virtual LC PHI for the alternate
	exits.  Remove code dealing with that missing.

	* gcc.dg/vect/vect-early-break_120-pr114099.c: New testcase.
	* sv.po, zh_CN.po: Update.
The PR complains that for the __builtin_stdc_bit_* "builtins" the
diagnostics doesn't mention the name of the builtin the user used, but
instead __builtin_{clz,ctz,popcount}g instead (which is what the FE
immediately lowers it to).

The following patch repeats the checks from check_builtin_function_arguments
which are there done on BUILT_IN_{CLZ,CTZ,POPCOUNT}G, such that they
diagnose it with the name of the "builtin" user actually used before it
is gone.

2024-02-26  Jakub Jelinek  <[email protected]>

	PR c/114042
	* c-parser.cc (c_parser_postfix_expression): Diagnose
	__builtin_stdc_bit_* argument with ENUMERAL_TYPE or BOOLEAN_TYPE
	type or if signed here rather than on the replacement builtins
	in check_builtin_function_arguments.

	* gcc.dg/builtin-stdc-bit-2.c: Adjust testcase for actual builtin
	names rather than names of builtin replacements.
…ata section [PR113617]

If default_elf_select_rtx_section is called to put a reference to some
local symbol defined in a comdat section into memory, which happens more often
since the r14-4944 RA change, linking might fail.
default_elf_select_rtx_section puts such constants into .data.rel.ro.local
etc. sections and if linker chooses comdat sections from some other TU
and discards the one to which a relocation in .data.rel.ro.local remains,
linker diagnoses error.  References to private comdat symbols can only appear
from functions or data objects in the same comdat group, so the following
patch arranges using .data.rel.ro.local.pool.<comdat_name> and similar sections.

2024-02-26  Jakub Jelinek  <[email protected]>
	    H.J. Lu  <[email protected]>

	PR rtl-optimization/113617
	* varasm.cc (default_elf_select_rtx_section): For
	references to private symbols in comdat sections
	use .data.relro.local.pool.<comdat>, .data.relro.pool.<comdat>
	or .rodata.<comdat> comdat sections.

	* g++.dg/other/pr113617.C: New test.
	* g++.dg/other/pr113617.h: New test.
	* g++.dg/other/pr113617-aux.cc: New test.
…R114012]

	PR fortran/114012

gcc/fortran/ChangeLog:

	* trans-expr.cc (gfc_conv_procedure_call): Evaluate non-trivial
	arguments just once before assigning to an unlimited polymorphic
	dummy variable.

gcc/testsuite/ChangeLog:

	* gfortran.dg/pr114012.f90: New test.
gcc/
	* config/avr/avr.cc (avr_out_compare) [AVR_TINY]: Remove code in
	an "if avr_adiw_reg_p()" block that's dead for AVR_TINY.
Some options that are pure optimizations where not tagged as such.

gcc/
	* config/avr/avr.opt (mcall-prologues, mrelax, maccumulate-args)
	(mstrict-X): Tag as "Optimization".
gcc.dg/attr-weakref-1.c FAILs on 32 and 64-bit Solaris/x86 with the
native assembler:

FAIL: gcc.dg/attr-weakref-1.c (test for excess errors)
UNRESOLVED: gcc.dg/attr-weakref-1.c compilation failed to produce executable

Excess errors:
Assembler: attr-weakref-1.c
        "/var/tmp//ccUSaysF.s", line 171 : Multiply defined symbol: "Wv3a"

This is a bug in the native as, which isn't seeing fixes recently.

Since only a single subtest is affected, this patch omits that one.

Tested on i386-pc-solaris2.11 (as and gas) and x86_64-pc-linux-gnu.

2024-02-24  Rainer Orth  <[email protected]>

	gcc/testsuite:
	PR ipa/70582
	* gcc.dg/attr-weakref-1.c (dg-additional-options): Define
	SOLARIS_X86_AS as appropriate.
	(lv3, Wv3a, pv3a): Wrap in !SOLARIS_X86_AS.
	(main): Likewise for chk (pv3a).
The following implements manual update for multi-exit loop prologue
peeling during vectorization.

	PR tree-optimization/114081
	* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
	Perform manual dominator update for prologue peeling.
	(vect_do_peeling): Properly update dominators after adding the
	prologue-around guard.

	* gcc.dg/vect/vect-early-break_121-pr114081.c: New testcase.
…[PR114044]

While it seems a lot of places in various optimization passes fold
bit query internal functions with INTEGER_CST arguments to INTEGER_CST
when there is a lhs, when lhs is missing, all the removals of such dead
stmts are guarded with -ftree-dce, so with -fno-tree-dce those unfolded
ifn calls remain in the IL until expansion.  If they have large/huge
BITINT_TYPE arguments, there is no BLKmode optab and so expansion ICEs,
and bitint lowering doesn't touch such calls because it doesn't know they
need touching, functions only containing those will not even be further
processed by the pass because there are no non-small BITINT_TYPE SSA_NAMEs
+ the 2 exceptions (stores of BITINT_TYPE INTEGER_CSTs and conversions
from BITINT_TYPE INTEGER_CSTs to floating point SSA_NAMEs) and when walking
there is no special case for calls with BITINT_TYPE INTEGER_CSTs either,
those are for normal calls normally handled at expansion time.

So, the following patch adjust the expansion of these 6 ifns, by doing
nothing if there is no lhs, and also just in case and user disabled all
possible passes that would fold this handles the case of setting lhs
to ifn call with INTEGER_CST argument.

2024-02-27  Jakub Jelinek  <[email protected]>

	PR rtl-optimization/114044
	* internal-fn.def (CLRSB, CLZ, CTZ, FFS, PARITY): Use
	DEF_INTERNAL_INT_EXT_FN macro rather than DEF_INTERNAL_INT_FN.
	* internal-fn.h (expand_CLRSB, expand_CLZ, expand_CTZ, expand_FFS,
	expand_PARITY): Declare.
	* internal-fn.cc (expand_bitquery, expand_CLRSB, expand_CLZ,
	expand_CTZ, expand_FFS, expand_PARITY): New functions.
	(expand_POPCOUNT): Use expand_bitquery.

	* gcc.dg/bitint-95.c: New test.
When folding a multiply CHRECs are handled like {a, +, b} * c
is {a*c, +, b*c} but that isn't generally correct when overflow
invokes undefined behavior.  The following uses unsigned arithmetic
unless either a is zero or a and b have the same sign.

I've used simple early outs for INTEGER_CSTs and otherwise use
a range-query since we lack a tree_expr_nonpositive_p and
get_range_pos_neg isn't a good fit.

	PR tree-optimization/114074
	* tree-chrec.h (chrec_convert_rhs): Default at_stmt arg to NULL.
	* tree-chrec.cc (chrec_fold_multiply): Canonicalize inputs.
	Handle poly vs. non-poly multiplication correctly with respect
	to undefined behavior on overflow.

	* gcc.dg/torture/pr114074.c: New testcase.
	* gcc.dg/pr68317.c: Adjust expected location of diagnostic.
	* gcc.dg/vect/vect-early-break_119-pr114068.c: Do not expect
	loop to be vectorized.
GCC 13's changes file documents that iwmmx is deprecated.  Raise the bar
by warning when the mmintrin.h header is included by users, but provide
a way to suppress the warning.

gcc:
	* config/arm/mmintrin.h: Warn if this header is included without
	defining __ENABLE_DEPRECATED_IWMMXT.
gcc/analyzer/ChangeLog:
	PR analyzer/111881
	* constraint-manager.cc (bound::ensure_closed): Assert that
	m_constant has integral type.
	(range::add_bound): Bail out on floating point constants.

gcc/testsuite/ChangeLog:
	PR analyzer/111881
	* c-c++-common/analyzer/conditionals-pr111881.c: New test.

Signed-off-by: David Malcolm <[email protected]>
…y_{to,from}_device*}

These routines map simply to the C counterpart and are meanwhile
defined in OpenACC 3.3. (There are additional routine changes,
including the Fortran addition of acc_attach/acc_detach, that
require more work than a simple addition of an interface and
are therefore excluded.)

libgomp/ChangeLog:

	* libgomp.texi (OpenACC Runtime Library Routines): Document new 3.3
	routines that simply map to their C counterpart.
	* openacc.f90 (openacc): Add them.
	* openacc_lib.h: Likewise.
	* testsuite/libgomp.oacc-fortran/acc_host_device_ptr.f90: New test.
	* testsuite/libgomp.oacc-fortran/acc-memcpy.f90: New test.
	* testsuite/libgomp.oacc-fortran/acc-memcpy-2.f90: New test.
	* testsuite/libgomp.oacc-c-c++-common/lib-59.c: Crossref to f90 test.
	* testsuite/libgomp.oacc-c-c++-common/lib-60.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-95.c: Likewise.
This is a regression present on the mainline, 13 and 12 branches.  For the
attached Ada case, it's a tree checking failure on the mainline at -O:

+===========================GNAT BUG DETECTED==============================+
| 14.0.1 20240226 (experimental) [master r14-9171-g4972f97a265]  GCC error:|
| tree check: expected tree that contains 'decl common' structure,         |
| have 'component_ref' in tree_could_trap_p, at tree-eh.cc:2733        |
| Error detected around /home/eric/cvs/gcc/gcc/testsuite/gnat.dg/opt104.adb:

Time is a 10-byte record and Packed_Rec.T is placed at bit-offset 65 because
of the packing. so tree-ssa-dse.cc:setup_live_bytes_from_ref has computed a
const_size of 88 from ref->offset of 65 and ref->max_size of 80.

Then in tree-ssa-dse.cc:compute_trims:

411       int last_live = bitmap_last_set_bit (live);
(gdb) next
412       if (ref->size.is_constant (&const_size))
(gdb)
414           int last_orig = (const_size / BITS_PER_UNIT) - 1;
(gdb)
418           *trim_tail = last_orig - last_live;

(gdb) call debug_bitmap (live)
n_bits = 256, set = {0 1 2 3 4 5 6 7 8 9 10 }
(gdb) p last_live
$33 = 10
(gdb) p const_size
$34 = 80
(gdb) p last_orig
$35 = 9
(gdb) p *trim_tail
$36 = -1

In other words, compute_trims is overlooking the alignment adjustments that
setup_live_bytes_from_ref applied earlier.  Moveover it reads:

  /* We use sbitmaps biased such that ref->offset is bit zero and the bitmap
     extends through ref->size.  So we know that in the original bitmap
     bits 0..ref->size were true.  We don't actually need the bitmap, just
     the REF to compute the trims.  */

but setup_live_bytes_from_ref used ref->max_size instead of ref->size.

It appears that all the callers of compute_trims assume that ref->offset is
byte aligned and that the trimmed bytes are relative to ref->size, so the
patch simply adds an early return if either condition is not fulfilled.

gcc/
	* tree-ssa-dse.cc (compute_trims): Fix description.  Return early
	if either ref->offset is not byte aligned or ref->size is not known
	to be equal to ref->max_size.
	(maybe_trim_complex_store): Fix description.
	(maybe_trim_constructor_store): Likewise.
	(maybe_trim_partially_dead_store): Likewise.

gcc/testsuite/
	* gnat.dg/opt104.ads, gnat.dg/opt104.adb: New test.
Also handle V2BF mode.

	PR target/113871

gcc/ChangeLog:

	* config/i386/mmx.md (V248FI): Add V2BF mode.
	(V24FI_32): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr113871-5a.c: New test.
	* gcc.target/i386/pr113871-5b.c: New test.
…3,PR111802]

On e.g. gcc211 the use of "%li" with unsigned HOST_WIDE_INT led to this warning:
../../src/gcc/analyzer/access-diagram.cc: In member function ‘void ana::string_literal_spatial_item::add_column_for_byte(text_art::table&, const ana::bit_to_table_map&, text_art::style_manager&, ana::byte_offset_t, ana::byte_offset_t, int, int) const’:
../../src/gcc/analyzer/access-diagram.cc:1909:40: warning: format ‘%li’ expects argument of type ‘long int’, but argument 3 has type ‘long long unsigned int’ [-Wformat=]
          byte_idx_within_string.ulow ()));
                                        ^
and to all values being erroneously printed as "0".

Fixed thusly.

gcc/analyzer/ChangeLog:
	PR analyzer/110483
	PR analyzer/111802
	* access-diagram.cc
	(string_literal_spatial_item::add_column_for_byte): Use %wu for
	printing unsigned HOST_WIDE_INT.

Signed-off-by: David Malcolm <[email protected]>
This is a (partial) reversion of r14-8987-gdd9d14f7d53 to return to
eagerly emitting inline variables to the middle-end when they are
declared. 'import_export_decl' will still continue to accept them, as
allowing this is a pure extension and doesn't seem to cause issues with
modules, but otherwise deferring the emission of inline variables
appears to cause issues on some targets and prevents some code using
inline variable templates from correctly linking.

There might be a more targetted way to support this, but due to the
complexity of handling linkage and emission I'd prefer to wait till
GCC 15 to explore our options.

	PR c++/113970
	PR c++/114013

gcc/cp/ChangeLog:

	* decl.cc (make_rtl_for_nonlocal_decl): Don't defer inline
	variables.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp1z/inline-var10.C: New test.

Signed-off-by: Nathaniel Shead <[email protected]>
… large struct

I think we have no coverage for the case where structure_value_addr_parm and
TYPE_NO_NAMED_ARGS_STDARG_P are both true.  The
  if (type_arg_types != 0)
    n_named_args
      = (list_length (type_arg_types)
         /* Count the struct value address, if it is passed as a parm.  */
         + structure_value_addr_parm);
  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
    n_named_args = 0;
  else
    /* If we know nothing, treat all args as named.  */
    n_named_args = num_actuals;
code should probably have n_named_args = structure_value_addr_parm;
instead of n_named_args = 0;, this testcase is an attempt to see if
it is broken on any target.

2024-02-28  Jakub Jelinek  <[email protected]>

	* gcc.dg/c23-stdarg-6.c: New test.
…ge integral types in memcpy etc. folding [PR113988]

The following patch changes the memcpy etc. folding to use bitwise vector
types rather  than huge INTEGER_TYPEs for copying of > MAX_FIXED_MODE_SIZE
lengths.  The problem with the huge INTEGER_TYPEs is that they aren't
supported very much, usually there are just optabs to handle moves of them,
perhaps misaligned moves and that is it, so they pose problems e.g. to
BITINT_TYPE lowering.

2024-02-28  Jakub Jelinek  <[email protected]>

	PR tree-optimization/113988
	* stor-layout.h (bitwise_mode_for_size): Declare.
	* stor-layout.cc (bitwise_mode_for_size): New function.
	* gimple-fold.cc (gimple_fold_builtin_memory_op): Use it.
	Use bitwise_type_for_mode instead of build_nonstandard_integer_type.
	Use BITS_PER_UNIT instead of 8.

	* gcc.dg/bitint-91.c: New test.
The following testcases are miscompiled, because graphite ignores boolean,
enumerated or _BitInt comparisons, rewrites the code as if the comparisons
were always true or always false.

The INTEGER_TYPE checks were initially added in r6-2239 but at that point
it was both in add_conditions_to_domain and in parameter_index_in_region.
Later on the check was also added to stmt_simple_for_scop_p, and finally
r8-3931 changed the stmt_simple_for_scop_p check to INTEGRAL_TYPE_P
and turned the parameter_index_in_region -> assign_parameter_index_in_region
into INTEGRAL_TYPE_P assertion, but the add_conditions_to_domain check
for INTEGER_TYPE remained.

The following patch uses INTEGRAL_TYPE_P to complete the change.

2024-02-28  Jakub Jelinek  <[email protected]>

	PR tree-optimization/114041
	* graphite-sese-to-poly.cc (add_conditions_to_domain): Check for
	INTEGRAL_TYPE_P check rather than INTEGER_TYPE.

	* gcc.dg/graphite/run-id-pr114041-1.c: New test.
	* gcc.dg/graphite/run-id-pr114041-2.c: New test.
The emulation via word mode tries to perform integer arithmetic on floating
point values instead of floating point arithmetic.  This leads to
mis-compilations.

Failure occured on s390x on these existing test cases:
gcc.dg/vect/tsvc/vect-tsvc-s112.c
gcc.dg/vect/tsvc/vect-tsvc-s113.c
gcc.dg/vect/tsvc/vect-tsvc-s119.c
gcc.dg/vect/tsvc/vect-tsvc-s121.c
gcc.dg/vect/tsvc/vect-tsvc-s131.c
gcc.dg/vect/tsvc/vect-tsvc-s132.c
gcc.dg/vect/tsvc/vect-tsvc-s2233.c
gcc.dg/vect/tsvc/vect-tsvc-s421.c
gcc.dg/vect/vect-alias-check-14.c
gcc.target/s390/vector/partial/s390-vec-length-epil-run-1.c
gcc.target/s390/vector/partial/s390-vec-length-epil-run-3.c
gcc.target/s390/vector/partial/s390-vec-length-full-run-3.c

gcc/ChangeLog:

	PR tree-optimization/114075

	* tree-vect-stmts.cc (vectorizable_operation): Don't emulate floating
	point vectors

Signed-off-by: Juergen Christ <[email protected]>
This adds testcase from PR114075 which has been fixed by the r14-9205
change on s390x-linux with -march=z13.

2024-02-28  Jakub Jelinek  <[email protected]>

	PR tree-optimization/114075
	* gcc.dg/gomp/pr114075.c: New test.
…4 [PR91567]

gcc.dg/tree-ssa/builtin-snprintf-6.c currently XPASSes on i?86-*-*
configurations with -m64:

XPASS: gcc.dg/tree-ssa/builtin-snprintf-6.c scan-tree-dump-times optimized "Function test_assign_aggregate" 1

(seen e.g. on i386-pc-solaris2.11, i686-pc-linux-gnu, or i386-apple-darwin*).

The problem is that the xfail only handles x86_64, ignoring that i?86
configurations can also be multilibbed.

This patch fixes the by handling both forms alike.

Tested on i386-pc-solaris2.11, amd64-pc-solaris2.11,
sparc-sun-solaris2.11, and sparcv9-sun-solaris2.11.

2024-02-28  Rainer Orth  <[email protected]>

	gcc/testsuite:
	PR tree-optimization/91567
	* gcc.dg/tree-ssa/builtin-snprintf-6.c (scan-tree-dump-times):
	Treat i?86-*-* like x86_64-*-*.
powerpc64-linux apparently (not very surprisingly) behaves the same
way as powerpc64le-linux and has 4 sunk statements rather than 5,
so we should xfail it on powerpc64*-*-* rather than just powerpc64le-*-*.
powerpc-linux has 3 sunk statements, but the scan pattern is done for
lp64 only as the comment explains.

2024-02-28  Jakub Jelinek  <[email protected]>

	PR testsuite/111462
	* gcc.dg/tree-ssa/ssa-sink-18.c: XFAIL also on powerpc64.
libstdc++-v3/ChangeLog:

	* include/std/stacktrace: Add nodiscard attribute to all
	functions without side effects.
libstdc++-v3/ChangeLog:

	* include/bits/alloc_traits.h: Include <bits/stl_iterator.h> for
	__make_move_if_noexcept_iterator.
Cygwin should use std::fwrite, not WriteConsoleW. And the -lstdc++exp
library is only needed when running the tests on *-*-mingw*.

libstdc++-v3/ChangeLog:

	* include/std/ostream (vprint_unicode) [__CYGWIN__]: Use POSIX
	code path for Cygwin instead of Windows.
	* include/std/print (vprint_unicode) [__CYGWIN__]: Likewise.
	* testsuite/27_io/basic_ostream/print/1.cc: Only add -lstdc++exp
	for *-*-mingw* targets.
	* testsuite/27_io/print/1.cc: Likewise.
jwakely and others added 26 commits March 9, 2024 00:21
When parsing a std::chrono::sys_days (or a sys_time with an even longer
period) we should not require a time-of-day to be present in the input,
because we can't represent that in the result type anyway.

Rather than trying to decide which specializations should require a
time-of-date and which should not, follow the direction of Howard
Hinnant's date library, which allows extracting a sys_time of any period
from input that only contains a date, defaulting the time-of-day part to
00:00:00. This seems consistent with the intent of the standard, which
says it's an error "If the parse fails to decode a valid date" (i.e., it
doesn't care about decoding a valid time, only a date).

libstdc++-v3/ChangeLog:

	PR libstdc++/114240
	* include/bits/chrono_io.h (_Parser::operator()): Assume
	hours(0) for a time_point, so that a time is not required
	to be present.
	* testsuite/std/time/parse/114240.cc: New test.
…mic_compare_and_swapsi.

If the hardware does not support LAMCAS, atomic_compare_and_swapsi needs to be
implemented through "ll.w+sc.w". In the implementation of the instruction sequence,
it is necessary to determine whether the two registers are equal.
Since LoongArch's comparison instructions do not distinguish between 32-bit
and 64-bit, the two operand registers that need to be compared are symbolically
extended, and one of the operand registers is obtained from memory through the
"ll.w" instruction, which can ensure that the symbolic expansion is carried out.
However, the value of the other operand register is not guaranteed to be the
value of the sign extension.

gcc/ChangeLog:

	* config/loongarch/sync.md (atomic_cas_value_strong<mode>):
	In loongarch64, a sign extension operation is added when
	operands[2] is a register operand and the mode is SImode.

gcc/testsuite/ChangeLog:

	* g++.target/loongarch/atomic-cas-int.C: New test.
When the value of the macro DEFAULT_CFLAGS is set to '-ansi -pedantic-errors',
regname-s9-fp.c will test to fail. To solve this problem, add the compilation
option '-Wno-pedantic -std=gnu90' to this test case.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/regname-fp-s9.c: Add compilation option
	'-Wno-pedantic -std=gnu90'.
When I've added the -mnoreturn-no-callee-saved-registers option
to i386.opt, I forgot to regenerate i386.opt.urls and Mark's
CI kindly reminded me of that.

Fixed thusly.

2024-03-09  Jakub Jelinek  <[email protected]>

	* config/i386/i386.opt.urls: Regenerate.
gcc/
	* config/avr/avr.cc (avr_rtx_costs_1) [PLUS]: Determine cost for
	usum_widenqihi and add_zero_extend1.
	[MINUS]: Determine costs for udiff_widenqihi, sub+zero_extend,
	sub+sign_extend.
	* config/avr/avr.md (*addhi3.sign_extend1, *subhi3.sign_extend2):
	Compute exact insn lengths.
	(*usum_widenqihi3): Allow input operands to commute.
…to allow the IE to LE linker relaxation

In Binutils we need to make IE to LE relaxation only allowed when there
is an R_LARCH_RELAX after R_LARCH_TLE_IE_PC_{HI20,LO12} so an invalid
"partial" relaxation won't happen with the extreme code model.  So if we
are emitting %ie_pc_{hi20,lo12} in a non-extreme code model, emit an
R_LARCH_RELAX to allow the relaxation.  The IE to LE relaxation does not
require the pcalau12i and the ld instruction to be adjacent, so we don't
need to limit ourselves to use the macro.

For the distro maintainers backporting changes: this change depends on
r14-8721, without r14-8721 R_LARCH_RELAX can be emitted mistakenly in
the extreme code model.

gcc/ChangeLog:

	* config/loongarch/loongarch.cc (loongarch_print_operand_reloc):
	Support 'Q' for R_LARCH_RELAX for TLS IE.
	(loongarch_output_move): Use 'Q' to print R_LARCH_RELAX for TLS
	IE.
	* config/loongarch/loongarch.md (ld_from_got<mode>): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/tls-ie-relax.c: New test.
	* gcc.target/loongarch/tls-ie-norelax.c: New test.
	* gcc.target/loongarch/tls-ie-extreme.c: New test.
… MEMs [PR114284]

Before the recent PR111267 r14-8319 fwprop changes, fwprop would never try
to propagate what was not considered PROFITABLE, where the profitable part
actually was partly about profitability, partly about very good reasons
not to actually propagate and partly for cases where propagation is
completely incorrect.
In particular, classify_result has:
  /* Allow (subreg (mem)) -> (mem) simplifications with the following
     exceptions:
     1) Propagating (mem)s into multiple uses is not profitable.
     2) Propagating (mem)s across EBBs may not be profitable if the source EBB
        runs less frequently.
     3) Propagating (mem)s into paradoxical (subreg)s is not profitable.
     4) Creating new (mem/v)s is not correct, since DCE will not remove the old
        ones.  */
  if (single_use_p
      && single_ebb_p
      && SUBREG_P (old_rtx)
      && !paradoxical_subreg_p (old_rtx)
      && MEM_P (new_rtx)
      && !MEM_VOLATILE_P (new_rtx))
    return PROFITABLE;
and didn't mark any other MEM_P (new_rtx) or rtxes which contain
a MEM in its subrtxes as PROFITABLE.  Now, since r14-8319 profitable_p
method has been renamed to likely_profitable_p and has just a minor role.
Now, rule 4) above is something that isn't about profitability, but about
correct behavior, if you propagate mem/v, the code is miscompiled.
This particular case has been fixed elsewhere by Haochen in r14-9379.
But I think even the 1) and 2) and maybe 3) are a strong don't do it,
don't rely solely on rtx costs, increasing the number of loads of the same
memory, even when cached, is undesirable, canceling load hoisting can
be undesirable as well.

So, the following patch restores previous behavior of src contains any MEMs,
in that case likely_profitable_p () is taken as the old profitable_p ()
as a requirement rather than just a hint.  For propagation of something
which doesn't load from memory this keeps the r14-8319 behavior.

2024-03-09  Jakub Jelinek  <[email protected]>

	PR target/114284
	* fwprop.cc (try_fwprop_subst_pattern): Don't propagate
	src containing MEMs unless prop.likely_profitable_p ().
gcc/
	* config/avr/avr.md: Fix typos in comment, indentation glitches
	and some other nits.
@tschwinge tschwinge force-pushed the tschwinge/merge-upstream branch from 90bb4d8 to 013b520 Compare April 10, 2024 13:06
@tschwinge tschwinge merged commit 0201fa1 into master Apr 10, 2024
6 of 9 checks passed
@dkm dkm deleted the tschwinge/merge-upstream branch November 8, 2024 07:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.