Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: archlinux linux kernel variant cachyos compile with -march=znver4 fails #1135

Open
Cfouria opened this issue Oct 30, 2024 · 14 comments

Comments

@Cfouria
Copy link

Cfouria commented Oct 30, 2024

Problem Description

building archlinux linux kernel variant "CachyOS" and got this error (builds with archlinux packaged llvm)

ld.lld: /home/user/build/llvm-project/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:235: llvm::SDNode* {anonymous}::DAGCombiner::getNextWorklistEntry(): Assertion `N->getCombinerWorklistIndex() >= 0 && "Found a worklist entry without a corresponding map entry!"' failed.
PLEASE submit a bug report to https://github.com/ROCm-Developer-Tools/aomp and include the crash backtrace.
Stack dump:
0.	Program arguments: /opt/aomp/aomp/llvm/bin/ld.lld -m elf_x86_64 -mllvm -import-instr-limit=5 -z noexecstack -r -o vmlinux.o -T .tmp_initcalls.lds --whole-archive vmlinux.a --no-whole-archive --start-group --end-group
1.	Running pass 'Function Pass Manager' on module 'ld-temp.o'.
2.	Running pass 'X86 DAG->DAG Instruction Selection' on function '@walk_pud_range'
 #0 0x00007f0633ded5fc llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/aomp/aomp_20.0-1/lib/llvm/lib/libLLVM.so.20.0_AOMP_STANDALONE_20.0-1+0x9ed5fc)
 #1 0x00007f0633dea5fb SignalHandler(int) Signals.cpp:0:0
 #2 0x00007f0632e4c1d0 (/usr/lib/libc.so.6+0x3d1d0)
 #3 0x00007f0632ea53f4 (/usr/lib/libc.so.6+0x963f4)
 #4 0x00007f0632e4c120 raise (/usr/lib/libc.so.6+0x3d120)
 #5 0x00007f0632e334c3 abort (/usr/lib/libc.so.6+0x244c3)
 #6 0x00007f0632e333df (/usr/lib/libc.so.6+0x243df)
 #7 0x00007f0632e44177 (/usr/lib/libc.so.6+0x35177)
 #8 0x00007f063499a4df llvm::SelectionDAG::Combine(llvm::CombineLevel, llvm::AAResults*, llvm::CodeGenOptLevel) (/opt/aomp/aomp_20.0-1/lib/llvm/lib/libLLVM.so.20.0_AOMP_STANDALONE_20.0-1+0x159a4df)
 #9 0x00007f0634c0f87f llvm::SelectionDAGISel::CodeGenAndEmitDAG() (/opt/aomp/aomp_20.0-1/lib/llvm/lib/libLLVM.so.20.0_AOMP_STANDALONE_20.0-1+0x180f87f)
#10 0x00007f0634c14009 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) (/opt/aomp/aomp_20.0-1/lib/llvm/lib/libLLVM.so.20.0_AOMP_STANDALONE_20.0-1+0x1814009)
#11 0x00007f0634c15b20 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) (/opt/aomp/aomp_20.0-1/lib/llvm/lib/libLLVM.so.20.0_AOMP_STANDALONE_20.0-1+0x1815b20)
#12 0x00007f0634bfeb33 llvm::SelectionDAGISelLegacy::runOnMachineFunction(llvm::MachineFunction&) (/opt/aomp/aomp_20.0-1/lib/llvm/lib/libLLVM.so.20.0_AOMP_STANDALONE_20.0-1+0x17feb33)
#13 0x00007f063443c490 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (.part.0) MachineFunctionPass.cpp:0:0
#14 0x00007f0634010124 llvm::FPPassManager::runOnFunction(llvm::Function&) (/opt/aomp/aomp_20.0-1/lib/llvm/lib/libLLVM.so.20.0_AOMP_STANDALONE_20.0-1+0xc10124)
#15 0x00007f0634010571 llvm::FPPassManager::runOnModule(llvm::Module&) (/opt/aomp/aomp_20.0-1/lib/llvm/lib/libLLVM.so.20.0_AOMP_STANDALONE_20.0-1+0xc10571)
#16 0x00007f0634010f74 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/opt/aomp/aomp_20.0-1/lib/llvm/lib/libLLVM.so.20.0_AOMP_STANDALONE_20.0-1+0xc10f74)
#17 0x00007f06363cd0af codegen(CodegenConfig const&, llvm::TargetMachine*, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, unsigned int, llvm::Module&, llvm::ModuleSummaryIndex const&) LTOBackend.cpp:0:0
#18 0x00007f06363cf810 llvm::lto::backend(llvm::lto::Config const&, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, unsigned int, llvm::Module&, llvm::ModuleSummaryIndex&) (/opt/aomp/aomp_20.0-1/lib/llvm/lib/libLLVM.so.20.0_AOMP_STANDALONE_20.0-1+0x2fcf810)
#19 0x00007f06363bd32a llvm::lto::LTO::runRegularLTO(std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>) (/opt/aomp/aomp_20.0-1/lib/llvm/lib/libLLVM.so.20.0_AOMP_STANDALONE_20.0-1+0x2fbd32a)
#20 0x00007f06363c1f03 llvm::lto::LTO::run(std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, llvm::FileCache) (/opt/aomp/aomp_20.0-1/lib/llvm/lib/libLLVM.so.20.0_AOMP_STANDALONE_20.0-1+0x2fc1f03)
#21 0x00005f6733542a1d lld::elf::BitcodeCompiler::compile() (/opt/aomp/aomp/llvm/bin/ld.lld+0x27ca1d)
#22 0x00005f6733488fe7 void lld::elf::LinkerDriver::compileBitcodeFiles<llvm::object::ELFType<(llvm::endianness)1, true>>(bool) (/opt/aomp/aomp/llvm/bin/ld.lld+0x1c2fe7)
#23 0x00005f67334a82c7 void lld::elf::LinkerDriver::link<llvm::object::ELFType<(llvm::endianness)1, true>>(llvm::opt::InputArgList&) (/opt/aomp/aomp/llvm/bin/ld.lld+0x1e22c7)
#24 0x00005f67334ae89b lld::elf::LinkerDriver::linkerMain(llvm::ArrayRef<char const*>) (/opt/aomp/aomp/llvm/bin/ld.lld+0x1e889b)
#25 0x00005f67334af323 lld::elf::link(llvm::ArrayRef<char const*>, llvm::raw_ostream&, llvm::raw_ostream&, bool, bool) (/opt/aomp/aomp/llvm/bin/ld.lld+0x1e9323)
#26 0x00005f67333726fb lld::unsafeLldMain(llvm::ArrayRef<char const*>, llvm::raw_ostream&, llvm::raw_ostream&, llvm::ArrayRef<lld::DriverDef>, bool) (/opt/aomp/aomp/llvm/bin/ld.lld+0xac6fb)
#27 0x00005f6733371db9 lld_main(int, char**, llvm::ToolContext const&) (/opt/aomp/aomp/llvm/bin/ld.lld+0xabdb9)
#28 0x00005f6733370b34 main (/opt/aomp/aomp/llvm/bin/ld.lld+0xaab34)
#29 0x00007f0632e34e08 (/usr/lib/libc.so.6+0x25e08)
#30 0x00007f0632e34ecc __libc_start_main (/usr/lib/libc.so.6+0x25ecc)
#31 0x00005f6733371665 _start (/opt/aomp/aomp/llvm/bin/ld.lld+0xab665)

turns out this error happens when using an option to compile with -march=znver4 from this patch https://github.com/CachyOS/kernel-patches/blob/master/6.11/0005-cachy.patch
also rather nasty while trying to build march=znver4 it sat on LD vmlinux.o for hours climbing up to 48GBs and maxing out cores...before failing

after switching off this option it builds but the error said please report

Operating System

Archlinux

CPU

7940hs

GPU

780m

ROCm Version

ROCm 6.2.0

ROCm Component

aomp

Steps to Reproduce

archlinux...
https://aur.archlinux.org/packages/linux-cachyos
ugg... cachyos pkgbuild uses a consolidated patchfile that has 3? parts that are in kernel tarball already so i deleted those 3 from the large single patch file
edited the PKGBUILD to use znver4, lto=full, make LLVM=/opt/aomp/aomp/llvm/bin/ all
makepkg
wait hours

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

@ppanchad-amd
Copy link

Hi @Cfouria. Internal ticket has been created to investigate your issue. Thanks!

@Cfouria
Copy link
Author

Cfouria commented Nov 1, 2024

so ive tried separating out parts that cause this error and no luck, giving up

did find .config must contain CONFIG_LRU_GEN=y or it fails to build in a different way
and if not using the cachyos provided config, adding config_lru_gen building with zen4 is successful

@Cfouria Cfouria closed this as completed Nov 1, 2024
@schung-amd
Copy link
Contributor

Hi @Cfouria, just want to clarify: is -march=znver4 something that's being detected and added automatically, or were you adding this yourself? I can see you've given up for now, but I can take a look if you'd like. A caveat is that we don't officially support Arch and don't control the packages provided there, so the support we can give here is limited.

@Cfouria
Copy link
Author

Cfouria commented Nov 6, 2024

The thing i was using (cachyos) selects the march by default, and a lot of other things. main point of despair is finding 1 thing that caused this error, removing it and still getting error... anyway i managed to reduce magic (and archlinux tools)

https://github.com/torvalds/linux/archive/refs/tags/v6.12-rc6.tar.gz
gzip -d v6.12-rc6.tar.gz
tar -xf v6.12-rc6.tar
cd linux-6.12-rc6/
curl https://raw.githubusercontent.com/CachyOS/kernel-patches/refs/heads/master/6.12/0005-cachy.patch -o 0005-cachy.patch
patch -Np1 < 0005-cachy.patch
make LLVM=/opt/aomp/aomp/llvm/bin/ olddefconfig

echo '
CONFIG_IRQ_REMAP=y
CONFIG_X86_X2APIC=y
CONFIG_INTEL_TDX_GUEST=y

CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3=y
CONFIG_MZEN4=y
CONFIG_X86_USE_PPRO_CHECKSUM=y

CONFIG_LRU_GEN=y
CONFIG_LRU_GEN_ENABLED=y
CONFIG_LRU_GEN_WALKS_MMU=y

CONFIG_CACHY=y

CONFIG_LTO=y
CONFIG_LTO_CLANG=y
CONFIG_LTO_CLANG_FULL=y
' >> .config

make LLVM=/opt/aomp/aomp/llvm/bin/ all

CONFIG_INTEL_TDX_GUEST is a culprit
CONFIG_CACHY dont know what this does
CONFIG_X86_USE_PPRO_CHECKSUM or this one, getting lazy so it stays
CONFIG_LRU_GEN was needed for my previous attempts, so i just included it
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3 didnt think it was needed, tried without it, and wow 5 threads with 11GB RAM (ctrl+c) and it wasnt even a LD step, it was AR built-in.a so its a different problem?

@schung-amd
Copy link
Contributor

What version of aomp do you have installed? Did you build it from source or is it provided in Arch?

@Cfouria
Copy link
Author

Cfouria commented Nov 7, 2024

built aomp from source on Oct 21 edit: rebuilt aomp still same

also ive managed to find the 2 problems
CONFIG_AMD_MEM_ENCRYPT=y & CONFIG_INTEL_TDX_GUEST=y
having either will cause this same error, removing them builds successfully
and i managed to remove the patch thing

tar -xf v6.12-rc6.tar
cd linux-6.12-rc6/
make LLVM=/opt/aomp/aomp/llvm/bin/ olddefconfig
echo '
CONFIG_AMD_MEM_ENCRYPT=y
CONFIG_LTO=y
CONFIG_LTO_CLANG=y
CONFIG_LTO_CLANG_FULL=y' >> .config
make LLVM=/opt/aomp/aomp/llvm/bin/ KCFLAGS=' -march=znver4 -mtune=znver4' -j1 all

-j1 ? using -J8 or so gets stuck on AR built-in.a causeing out of memory...
and i got even gives more details on the error.

  CC      mm/gup.o
clang: /home/user/build/llvm-project/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:235: llvm::SDNode* {anonymous}::DAGCombiner::getNextWorklistEntry(): Assertion `N->getCombinerWorklistIndex() >= 0 && "Found a worklist entry without a corresponding map entry!"' failed.
PLEASE submit a bug report to https://github.com/ROCm-Developer-Tools/aomp and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /opt/aomp/aomp/llvm/bin/clang -Wp,-MMD,mm/.gup.o.d -nostdinc -I./arch/x86/include -I./arch/x86/include/generated -I./include -I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/compiler-version.h -include ./include/linux/kconfig.h -include ./include/linux/compiler_types.h -D__KERNEL__ --target=x86_64-linux-gnu -fintegrated-as -Werror=unknown-warning-option -Werror=ignored-optimization-argument -Werror=option-ignored -Werror=unused-command-line-argument -fmacro-prefix-map=./= -Werror -std=gnu11 -fshort-wchar -funsigned-char -fno-common -fno-PIE -fno-strict-aliasing -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -fcf-protection=branch -fno-jump-tables -m64 -falign-loops=1 -mno-80387 -mno-fp-ret-in-387 -mstack-alignment=8 -mskip-rax-setup -mtune=generic -mno-red-zone -mcmodel=kernel -Wno-sign-compare -fno-asynchronous-unwind-tables -mretpoline-external-thunk -mindirect-branch-cs-prefix -mfunction-return=thunk-extern -fpatchable-function-entry=16,16 -fno-delete-null-pointer-checks -O2 -fstack-protector-strong -fomit-frame-pointer -ftrivial-auto-var-init=zero -fno-stack-clash-protection -falign-functions=16 -fstrict-flex-arrays=3 -fno-strict-overflow -fno-stack-check -Wall -Wundef -Werror=implicit-function-declaration -Werror=implicit-int -Werror=return-type -Werror=strict-prototypes -Wno-format-security -Wno-trigraphs -Wno-frame-address -Wno-address-of-packed-member -Wmissing-declarations -Wmissing-prototypes -Wframe-larger-than=2048 -Wno-gnu -Wvla -Wno-pointer-sign -Wcast-function-type -Wimplicit-fallthrough -Werror=date-time -Werror=incompatible-pointer-types -Wenum-conversion -Wextra -Wunused -Wno-unused-but-set-variable -Wno-unused-const-variable -Wno-format-overflow -Wno-format-overflow-non-kprintf -Wno-format-truncation-non-kprintf -Wno-override-init -Wno-pointer-to-enum-cast -Wno-tautological-constant-out-of-range-compare -Wno-unaligned-access -Wno-enum-compare-conditional -Wno-enum-enum-conversion -Wno-missing-field-initializers -Wno-type-limits -Wno-shift-negative-value -Wno-sign-compare -Wno-unused-parameter -march=znver4 -mtune=znver4 -DKBUILD_MODFILE=\"mm/gup\" -DKBUILD_BASENAME=\"gup\" -DKBUILD_MODNAME=\"gup\" -D__KBUILD_MODNAME=kmod_gup -c -o mm/gup.o mm/gup.c
1.	<eof> parser at end of file
2.	Code generation
3.	Running pass 'Function Pass Manager' on module 'mm/gup.c'.
4.	Running pass 'X86 DAG->DAG Instruction Selection' on function '@follow_page_pte'
 #0 0x0000747436ded5fc llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/aomp/aomp_20.0-1/lib/llvm/lib/libLLVM.so.20.0_AOMP_STANDALONE_20.0-1+0x9ed5fc)
 #1 0x0000747436deaad4 llvm::sys::CleanupOnSignal(unsigned long) (/opt/aomp/aomp_20.0-1/lib/llvm/lib/libLLVM.so.20.0_AOMP_STANDALONE_20.0-1+0x9eaad4)
 #2 0x0000747436cd7c10 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #3 0x0000747435e4c1d0 (/usr/lib/libc.so.6+0x3d1d0)
 #4 0x0000747435ea53f4 (/usr/lib/libc.so.6+0x963f4)
 #5 0x0000747435e4c120 raise (/usr/lib/libc.so.6+0x3d120)
 #6 0x0000747435e334c3 abort (/usr/lib/libc.so.6+0x244c3)
 #7 0x0000747435e333df (/usr/lib/libc.so.6+0x243df)
 #8 0x0000747435e44177 (/usr/lib/libc.so.6+0x35177)
 #9 0x000074743799a4df llvm::SelectionDAG::Combine(llvm::CombineLevel, llvm::AAResults*, llvm::CodeGenOptLevel) (/opt/aomp/aomp_20.0-1/lib/llvm/lib/libLLVM.so.20.0_AOMP_STANDALONE_20.0-1+0x159a4df)
#10 0x0000747437c0f87f llvm::SelectionDAGISel::CodeGenAndEmitDAG() (/opt/aomp/aomp_20.0-1/lib/llvm/lib/libLLVM.so.20.0_AOMP_STANDALONE_20.0-1+0x180f87f)
#11 0x0000747437c14009 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) (/opt/aomp/aomp_20.0-1/lib/llvm/lib/libLLVM.so.20.0_AOMP_STANDALONE_20.0-1+0x1814009)
#12 0x0000747437c15b20 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) (/opt/aomp/aomp_20.0-1/lib/llvm/lib/libLLVM.so.20.0_AOMP_STANDALONE_20.0-1+0x1815b20)
#13 0x0000747437bfeb33 llvm::SelectionDAGISelLegacy::runOnMachineFunction(llvm::MachineFunction&) (/opt/aomp/aomp_20.0-1/lib/llvm/lib/libLLVM.so.20.0_AOMP_STANDALONE_20.0-1+0x17feb33)
#14 0x000074743743c490 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (.part.0) MachineFunctionPass.cpp:0:0
#15 0x0000747437010124 llvm::FPPassManager::runOnFunction(llvm::Function&) (/opt/aomp/aomp_20.0-1/lib/llvm/lib/libLLVM.so.20.0_AOMP_STANDALONE_20.0-1+0xc10124)
#16 0x0000747437010571 llvm::FPPassManager::runOnModule(llvm::Module&) (/opt/aomp/aomp_20.0-1/lib/llvm/lib/libLLVM.so.20.0_AOMP_STANDALONE_20.0-1+0xc10571)
#17 0x0000747437010f74 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/opt/aomp/aomp_20.0-1/lib/llvm/lib/libLLVM.so.20.0_AOMP_STANDALONE_20.0-1+0xc10f74)
#18 0x0000747440294a7d clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) (/opt/aomp/aomp_20.0-1/lib/llvm/lib/libclang-cpp.so.20.0_AOMP_STANDALONE_20.0-1+0x2694a7d)
#19 0x000074744082250d clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/opt/aomp/aomp_20.0-1/lib/llvm/lib/libclang-cpp.so.20.0_AOMP_STANDALONE_20.0-1+0x2c2250d)
#20 0x000074743e82174c clang::ParseAST(clang::Sema&, bool, bool) (/opt/aomp/aomp_20.0-1/lib/llvm/lib/libclang-cpp.so.20.0_AOMP_STANDALONE_20.0-1+0xc2174c)
#21 0x00007474415142b9 clang::FrontendAction::Execute() (/opt/aomp/aomp_20.0-1/lib/llvm/lib/libclang-cpp.so.20.0_AOMP_STANDALONE_20.0-1+0x39142b9)
#22 0x000074744148b0ee clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/opt/aomp/aomp_20.0-1/lib/llvm/lib/libclang-cpp.so.20.0_AOMP_STANDALONE_20.0-1+0x388b0ee)
#23 0x00007474415ba0b7 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/opt/aomp/aomp_20.0-1/lib/llvm/lib/libclang-cpp.so.20.0_AOMP_STANDALONE_20.0-1+0x39ba0b7)
#24 0x0000644659275004 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/opt/aomp/aomp/llvm/bin/clang+0x15004)
#25 0x000064465926b9c0 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#26 0x0000747440fc23f9 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const::'lambda'()>(long) Job.cpp:0:0
#27 0x0000747436cd80a2 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/opt/aomp/aomp_20.0-1/lib/llvm/lib/libLLVM.so.20.0_AOMP_STANDALONE_20.0-1+0x8d80a2)
#28 0x0000747440fc48a6 clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const (.part.0) Job.cpp:0:0
#29 0x0000747440f78d8c clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const (/opt/aomp/aomp_20.0-1/lib/llvm/lib/libclang-cpp.so.20.0_AOMP_STANDALONE_20.0-1+0x3378d8c)
#30 0x0000747440f79de1 std::_Function_handler<void (), clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&, bool) const::'lambda'()>::_M_invoke(std::_Any_data const&) Compilation.cpp:0:0
#31 0x0000747440f7fc59 clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&, bool) const (/opt/aomp/aomp_20.0-1/lib/llvm/lib/libclang-cpp.so.20.0_AOMP_STANDALONE_20.0-1+0x337fc59)
#32 0x0000747440f8cc94 clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&) (/opt/aomp/aomp_20.0-1/lib/llvm/lib/libclang-cpp.so.20.0_AOMP_STANDALONE_20.0-1+0x338cc94)
#33 0x000064465927112e clang_main(int, char**, llvm::ToolContext const&) (/opt/aomp/aomp/llvm/bin/clang+0x1112e)
#34 0x000064465926af54 main (/opt/aomp/aomp/llvm/bin/clang+0xaf54)
#35 0x0000747435e34e08 (/usr/lib/libc.so.6+0x25e08)
#36 0x0000747435e34ecc __libc_start_main (/usr/lib/libc.so.6+0x25ecc)
#37 0x000064465926afb5 _start (/opt/aomp/aomp/llvm/bin/clang+0xafb5)
clang: error: clang frontend command failed with exit code 134 (use -v to see invocation)
AOMP_STANDALONE_20.0-1 clang version 20.0.0_AOMP_STANDALONE_20.0-1 (https://github.com/ROCm/llvm-project ec10765630e87e113dbe4936b58138d876700d5b)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/aomp/aomp_20.0-1/lib/llvm/bin
Build config: +assertions
Configuration file: /opt/aomp/aomp_20.0-1/lib/llvm/bin/clang.cfg
clang: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang: note: diagnostic msg: /tmp/gup-a671a3.c
clang: note: diagnostic msg: /tmp/gup-a671a3.sh
clang: note: diagnostic msg: 

********************
make[3]: *** [scripts/Makefile.build:229: mm/gup.o] Error 1
make[2]: *** [scripts/Makefile.build:478: mm] Error 2
make[1]: *** [/home/user/build/pkgbuilds/linux-cachyos-rc/4th/1st/linux-6.12-rc6/Makefile:1936: .] Error 2
make: *** [Makefile:224: __sub-make] Error 2

gup-a671a3.sh.txt
gup-a671a3.c.txt
added .txt to attach files

@schung-amd
Copy link
Contributor

Oddly this seems to me to be the same as an issue reported a while back: see CachyOS/kernel-patches#48, llvm/llvm-project#82896, llvm/llvm-project#72026. The fix landed in our fork a while ago, and as far as I understand aomp should be pulling an up-to-date llvm, so I'm not sure why this is showing up again. Do you see this issue without aomp (i.e. is this issue isolated to aomp-provided llvm)?

@Cfouria
Copy link
Author

Cfouria commented Nov 7, 2024

replacing LLVM=/opt/aomp/aomp/llvm/bin/ with LLVM=1
CONFIG_CC_VERSION_TEXT="clang version 18.1.8"
no error

after aomp rebuild, error
CONFIG_CC_VERSION_TEXT="AOMP_STANDALONE_20.0-1 clang version 20.0.0_AOMP_STANDALONE_20.0-1 (https://github.com/ROCm/llvm-project ec10765630e87e113dbe4936b58138d876700d5b)"

before aomp rebuild, error
CONFIG_CC_VERSION_TEXT="AOMP_STANDALONE_20.0-1 clang version 20.0.0_AOMP_STANDALONE_20.0-1 (https://github.com/ROCm/llvm-project ec10765630e87e113dbe4936b58138d876700d5b)"

@schung-amd
Copy link
Contributor

Thanks for the quick response! I'll take a look at this.

@schung-amd schung-amd reopened this Nov 7, 2024
@Cfouria
Copy link
Author

Cfouria commented Nov 7, 2024

also reading llvm/llvm-project#72026 i remember trying x86-64-v3 and still getting error

CONFIG_AMD_MEM_ENCRYPT=y & CONFIG_LTO_NONE=y
KCFLAGS=' -march=znver4'
ohh it does 'get stuck on gup.o, it will take a few hours and 20GB RAM before giving error' with LTO off

KCFLAGS=' -march=x86-64-v3' error

KCFLAGS=' -march=x86-64-v2' builds

so minimum need is

cd linux-6.12-rc6/
make LLVM=/opt/aomp/aomp/llvm/bin/ olddefconfig
echo 'CONFIG_AMD_MEM_ENCRYPT=y' >> .config
make LLVM=/opt/aomp/aomp/llvm/bin/  KCFLAGS=' -march=x86-64-v3' -j11 all

if you use greater than -j1 gup.o wont appear in terminal until ctrl+c and redo the make all command

KCFLAGS=' -march=bdver3' error
KCFLAGS=' -march=bdver1' builds
KCFLAGS=' -march=bdver2' error
KCFLAGS=' -march=bdver2 -mno-tbm -mno-bmi -mno-fma -mno-f16c builds
KCFLAGS=' -march=znver4 -mno-bmi2 -mno-bmi' builds
KCFLAGS=' -march=znver4 -mno-f16c -mno-fma' error

so bmi is a problem?
KCFLAGS=' -march=znver4 -mno-bmi2' error
KCFLAGS=' -march=znver4 -mno-bmi' builds
KCFLAGS=' -mbmi' error

@schung-amd
Copy link
Contributor

schung-amd commented Nov 11, 2024

Interesting, does lscpu | grep bmi report bmi1 support? Also, for your reproducer:

cd linux-6.12-rc6/
make LLVM=/opt/aomp/aomp/llvm/bin/ olddefconfig
echo 'CONFIG_AMD_MEM_ENCRYPT=y' >> .config
make LLVM=/opt/aomp/aomp/llvm/bin/  KCFLAGS=' -march=x86-64-v3' -j11 all

is this with the cachyos patches applied to the kernel? Are you using the defaults for all of the kernel config options unless otherwise stated?

@Cfouria
Copy link
Author

Cfouria commented Nov 12, 2024

lscpu | grep bmi
...bmi1 avx2 smep bmi2...

no patches & eventually started rm & tar -xf after every attempt.
Yes, fresh olddefconfig & echo
those 4 lines after extracting tar file reproduced. (takes so long to finally stack dump, instead of just seeing it stuck on gup.o and saying it will error)

so, kcflags only needs -mbmi or any march that uses it
the -j# can be any amount of threads
and CONFIG_AMD_MEM_ENCRYPT=y can be replaced with (without x2apic and irq, tdx gets ignored?)

CONFIG_IRQ_REMAP=y
CONFIG_X86_X2APIC=y
CONFIG_INTEL_TDX_GUEST=y

AMD_MEM_ENCRYPT & INTEL_TDX_GUEST both have same difference from fresh olddefconfig (and some others they dont share)

CONFIG_DYNAMIC_PHYSICAL_MASK=y
CONFIG_X86_MEM_ENCRYPT=y
CONFIG_ARCH_HAS_CC_PLATFORM=y
CONFIG_UNACCEPTED_MEMORY=y
CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED=y

can set CONFIG_ARCH_HAS_CC_PLATFORM=y by itself and it builds.

@schung-amd
Copy link
Contributor

Haven't been able to reproduce this so far; granted, I'm on different hardware, so this could be a hardware-specific issue. It takes a while, but I've been able to build the kernel with all the combinations of flags you've listed as failing. I'll have to test this on other hardware and report back.

@Cfouria
Copy link
Author

Cfouria commented Nov 19, 2024

i do have a old computer, bdver3, so tried with it

copied /opt/aomp/aomp/llvm from 1st to 2nd computer
tried AMD_MEM_ENCRYPT with -mbmi

check .config... using aomp
stalls on gup.o

so, rebuild aomp it is

well im out of ideas
old computer cant build aomp, something about a missing nvgpu file despite CUDA=0
changed aomp_gpu & gfxlist to gfx700 and build_aomp.sh, copied llvm folder to old computer and tried again,still stuck on gup.o on both computers
tried changing all env vars nope

tried playing with 'gup-a671a3.sh'
removing "-mrelocation-model" "static" makes another 'PLEASE submit a bug report'...
changing -O2 into -O0 it no longer is stuck, immediately giving error
reduced gup-a671a3.sh to

"/opt/aomp/aomp_20.0-1/lib/llvm/bin/clang-20" "-cc1" "-emit-obj" "-mrelocation-model" "static" "-target-cpu" "k8" "-D" "__KERNEL__" "-O2" "gup-a671a3.c"

it builds with k8,
with znver4 stuck (for hours im not waiting),
znver4 + -O0 immediate errors 10 warnings and 60 errors generated.

mm/gup.c:144:6: error: invalid operand for inline asm constraint 'i'
./arch/x86/include/asm/jump_label.h:27:11: error: invalid operand for inline asm constraint 'i'
mm/gup.c:228:3: error: invalid operand for inline asm constraint 'i'
./include/linux/rwsem.h:80:2: error: invalid operand for inline asm constraint 'i'
./arch/x86/include/asm/cpufeature.h:178:11: error: invalid operand for inline asm constraint 'i'

i do believe my build is haunted and its scared away my motivation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants