Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compilation Database: Support for entries with multiple .c files in one command #624

Closed
michael-schwarz opened this issue Mar 6, 2022 · 7 comments · Fixed by #632
Closed
Assignees
Labels
Milestone

Comments

@michael-schwarz
Copy link
Member

@stilscher discovered this issue with older versions of zstd (e.g. 5c5c47633826426a3fcc717000a352d52bd0ac22).
The complication database in this case contains entries where the command specifies several .c files at the same time.

 {
  "directory": "/home/michael/Documents/bench-repos/zstd/programs",
  "arguments": [
   "cc",
   "-I../lib",
   "-I../lib/common",
   "-I../lib/compress",
   "-I../lib/dictBuilder",
   "-DXXH_NAMESPACE=ZSTD_",
   "-I../lib/legacy",
   "-DZSTD_MULTITHREAD",
      // more defines
   "-DZSTD_NO_INTRINSICS",
   "-O3",
   "-Wall",
   "-Wextra",
       // more warnings
   "-pthread",
   "-lz",
   "-llzma",
   "../lib/common/debug.c",
   "../lib/common/entropy_common.c",
       // more files
   "../lib/dictBuilder/zdict.c",
   "zstdcli.o",
   "util.o",
   "fileio.o",
   "bench.o",
   "datagen.o",
   "dibio.o",
   "-o",
   "zstd",
   "-pthread",
   "-lz",
   "-llzma"
  ],
  "file": "../lib/dictBuilder/zdict.c"
 }

Goblint in this case adds an -E flag and -o to get preprocessor output. The preprocessor does not like this, and fails with

cc: fatal error: cannot specify -o with -c, -S or -E with multiple files
compilation terminated.
@sim642
Copy link
Member

sim642 commented Mar 6, 2022

Is this the last command in the database or something? It looks like some kind of linking command to produce the final zstd binary/library out of previous object files (and for some reason some C files as well).

The compilation database format doesn't allow multiple files per entry, so I'm not sure if there's anything to really do about it. Other than maybe ignore the command failing. It's not like we can do anything about preprocessing object files anyway, those should already be contained in previously listed entries.

I don't think we want to reimplement GCC command line parsing to do something more clever here.

Maybe this is a bear/compiledb/whatever bug that such linking command is in the database to begin with.

@michael-schwarz
Copy link
Member Author

Is this the last command in the database or something?

No it just generates this zdict component that is then used by the other components.

Maybe this is a bear/compiledb/whatever bug that such linking command is in the database to begin with.

I don't really think so, the compilation database is supposed to record the commands used for building. If one of these commands involves invoking the compiler with multiple input files, then this should also be recorded in the compilation database.

Other than maybe ignore the command failing.

In this case this may work (i.e. it is not obviously not working). However a make file could simply be something like:

default: cc 1.c 2.c 3.c

In this case we would completely ignore these two files. I don't think it is realistic to assume such things do not happen in larger projects.

What about extracting all the .c files appearing in the command and invoking the entire command for each of them? That seems like a reasonable approach to me...

@sim642
Copy link
Member

sim642 commented Mar 6, 2022

No it just generates this zdict component that is then used by the other components.

The arguments above say -o zstd though. Although the file is zdict.c.

I don't really think so, the compilation database is supposed to record the commands used for building. If one of these commands involves invoking the compiler with multiple input files, then this should also be recorded in the compilation database.

The strict assumption made by the compilation database format is that a single entry corresponds to a single file. This is true for all normal compilation where a single source file is compiled into a single object file.
The commands that involve multiple files would be linking, not compilation, and compilation databases (sadly) don't and cannot contain such steps. That's why additional link databases have been proposed, but not really gone anywhere.

This is why the problematic entry from above confuses me: it involves a lot of linker arguments (-lz, -llzma, .o files, binary output -o zstd), but at the same time a lot of compiler arguments as well (-I arguments, -D arguments, -W arguments, .c files). I'm sure GCC has no problem compiling some files and linking those with already compiled compilation units in a single step, but that's very unusual. Even if link databases were a thing, such a mixture couldn't be properly captured by either database.

In this case this may work (i.e. it is not obviously not working). However a make file could simply be something like:

default: cc 1.c 2.c 3.c

Right, but it seems to me that in order to produce a valid compilation database where commands correspond to single files the database generators are supposed to handle the split themselves. Or if they cannot, then at least exit with an error instead of picking an arbitrary file out of those (I guess it's the last one in this command).

I'm guessing you've been using the direct Makefile build of zstd all this time? Because I also see it has a CMake build. Since CMake has native compilation database support, I believe it always produces databases where compilation is always per-file independent from any linking, or am I mistaken?. It might be more reliable to go with CMake whereever possible instead of relying on bear/compiledb, which have to use extra trickery and heuristics.

In this case we would completely ignore these two files. I don't think it is realistic to assume such things do not happen in larger projects.

What about extracting all the .c files appearing in the command and invoking the entire command for each of them? That seems like a reasonable approach to me...

I do agree that it's quite bad if some files are ignored. I guess at least for the time being, this is never silent but fails preprocessing with the initially reported error. I'm not sure how far back we need to go with zstd, but maybe there's enough history afterwards where such mixed commands aren't used.

I suppose something like that might work, although it would also significantly complicate the compilation database code even further, especially since the same logic should really be implemented both for command and arguments style databases.
What I'm most worried about is the reliability of that approach: by blindly extracting all arguments ending in .c, we're making many assumptions, like that those arguments are always standalone arguments instead of following some special arguments that could take an associated filename argument.
I'll try it tomorrow.

@sim642 sim642 self-assigned this Mar 6, 2022
@sim642
Copy link
Member

sim642 commented Mar 7, 2022

I just tried with a manually written Makefile:

all: main.c lib/lib.h lib/lib.c
	gcc -DANSWER=42 -Ilib/ main.c lib/lib.c

bear

[
  {
    "arguments": [
      "/usr/bin/x86_64-linux-gnu-gcc-11",
      "-c",
      "-DANSWER=42",
      "-Ilib/",
      "main.c"
    ],
    "directory": "/home/simmo/Desktop/goblint-action-test/makefile-project",
    "file": "/home/simmo/Desktop/goblint-action-test/makefile-project/main.c",
    "output": "/home/simmo/Desktop/goblint-action-test/makefile-project/main.o"
  },
  {
    "arguments": [
      "/usr/bin/x86_64-linux-gnu-gcc-11",
      "-c",
      "-DANSWER=42",
      "-Ilib/",
      "lib/lib.c"
    ],
    "directory": "/home/simmo/Desktop/goblint-action-test/makefile-project",
    "file": "/home/simmo/Desktop/goblint-action-test/makefile-project/lib/lib.c",
    "output": "/home/simmo/Desktop/goblint-action-test/makefile-project/lib/lib.o"
  }
]

So it does exactly what I expected above: the compilation of every file is properly decomposed into entries. Goblint analyzes this perfectly.

compiledb

[
 {
  "arguments": [
   "/usr/bin/x86_64-linux-gnu-gcc-11",
   "-c",
   "-DANSWER=42",
   "-Ilib/",
   "main.c"
  ],
  "directory": "/home/simmo/Desktop/goblint-action-test/makefile-project",
  "file": "/home/simmo/Desktop/goblint-action-test/makefile-project/main.c",
  "output": "/home/simmo/Desktop/goblint-action-test/makefile-project/main.o"
 },
 {
  "directory": "/home/simmo/Desktop/goblint-action-test/makefile-project",
  "arguments": [
   "gcc",
   "-DANSWER=42",
   "-Ilib/",
   "main.c",
   "lib/lib.c"
  ],
  "file": "lib/lib.c"
 }
]

It appears that it manages to decompose main.c into its own entry, but then fails to do the same for lib.c and just dumps the literal command and just uses the last input file in the file field.
I'm guessing your zstd compilation database is from compiledb and similarly has some decomposed entries before the bad one.

@sim642
Copy link
Member

sim642 commented Mar 7, 2022

I just tried that zstd commit myself. Both CMake and bear seem to generate a compilation database that doesn't have such mixed entries. I've forgotten by now what your reason for preferring compiledb was, but if they all seem to have issues in certain cases, we should maybe have an issue to document the problems with all the generators in a single place.

EDIT: Done here: #628.

@stilscher
Copy link
Member

It might be more reliable to go with CMake whereever possible instead of relying on bear/compiledb, which have to use extra trickery and heuristics

That sounded like a good idea, and I wanted to try this for the zstd benchmarking to avoid some of the problems. However, I think the problem with generating the compilation database through CMake is, that one cannot specify the target and instead it is generated for all. It seems that with with the target property EXPORT_COMPILE_COMMANDS there has been added something to restrict the generation of the compilation commands to some targets only, but its only included with CMake 3.20.

@sim642
Copy link
Member

sim642 commented Mar 18, 2022

That's a good point, although couldn't you just reconfigure the zstd CMake build to disable ZSTD_BUILD_SHARED and ZSTD_BUILD_STATIC to disable the compilation of the libraries and just keep ZSTD_BUILD_PROGRAMS enabled to build the executable with a main function? Or does that still end up containing some duplicate things?

@sim642 sim642 added this to the v2.0.0 milestone Aug 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants