Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of preprocessor macros is not general enough #108

Open
bbannier opened this issue Jun 24, 2022 · 3 comments
Open

Handling of preprocessor macros is not general enough #108

bbannier opened this issue Jun 24, 2022 · 3 comments

Comments

@bbannier
Copy link

While looking into how one could tackle zeek/tree-sitter-zeek#6 I looked into this grammar for inspiration and noticed that it has similar issues. In C or C++ preprocessor macros can appear around pretty much any token of the language while this grammar only allows for it in a couple of places. I wonder what the best approach to this would be.

As an example, the following source file

int
#if 0
foo
#else
main
#endif
(void) {}

produces this AST

(translation_unit
  (ERROR
    (primitive_type))
  (preproc_if
    (number_literal)
    (ERROR
      (identifier))
    (preproc_else)
    (ERROR
      (identifier)))
  (expression_statement
    (compound_literal_expression
      (type_descriptor

One could come up with nastier examples where e.g., an opening parenthesis is inside a preprocessor block. I am not even sure how the resulting AST should look like, but I feel like I might want something which can support preprocessor directives anywhere, but with more structure than what is extras is typically used for. Would there be a way to support this with an external scanner?

There is also already #13, but it seems to be more focussed on improving the the handling of currently supported special cases.

@bbannier bbannier changed the title Handling of preprocessor is not general enough Handling of preprocessor macros is not general enough Jun 24, 2022
@tr-intel
Copy link

Here's a typical scenario where we come across this problem.

#ifdef __cplusplus
extern "C" {
#endif

#ifdef __cplusplus
}
#endif

AST: https://tree-sitter.github.io/tree-sitter/playground#

translation_unit [0, 0] - [8, 0]
  preproc_ifdef [0, 0] - [6, 6]
    name: identifier [0, 7] - [0, 18]
    linkage_specification [1, 0] - [5, 1]
      value: string_literal [1, 7] - [1, 10]
        string_content [1, 8] - [1, 9]
      body: declaration_list [1, 11] - [5, 1]
        preproc_call [2, 0] - [3, 0] <<<<<<<<< 🧐
          directive: preproc_directive [2, 0] - [2, 6]
        preproc_ifdef [4, 0] - [4, 18]
          name: identifier [4, 7] - [4, 18]
          MISSING #endif [4, 18] - [4, 18]  <<<<<<<<< 🧐

@lawmurray
Copy link

Another example from the CMakeCXXCompilerId.cpp file that CMake generates during a build (yes, C++ source file, but also valid C):

char const info_version[] = {
  'I', 'N', 'F', 'O', ':',
  'c','o','m','p','i','l','e','r','_','v','e','r','s','i','o','n','[',
  COMPILER_VERSION_MAJOR,
# ifdef COMPILER_VERSION_MINOR
  '.', COMPILER_VERSION_MINOR,
#  ifdef COMPILER_VERSION_PATCH
   '.', COMPILER_VERSION_PATCH,
#   ifdef COMPILER_VERSION_TWEAK
    '.', COMPILER_VERSION_TWEAK,
#   endif
#  endif
# endif
  ']','\0'};

which gives error nodes:

translation_unit [0, 0] - [15, 0]
  declaration [0, 0] - [13, 12]
    type: primitive_type [0, 0] - [0, 4]
    type_qualifier [0, 5] - [0, 10]
    declarator: init_declarator [0, 11] - [13, 11]
      declarator: array_declarator [0, 11] - [0, 25]
        declarator: identifier [0, 11] - [0, 23]
      value: initializer_list [0, 28] - [13, 11]
        char_literal [1, 2] - [1, 5]
          character [1, 3] - [1, 4]
        char_literal [1, 7] - [1, 10]
          character [1, 8] - [1, 9]
        char_literal [1, 12] - [1, 15]
          character [1, 13] - [1, 14]
        char_literal [1, 17] - [1, 20]
          character [1, 18] - [1, 19]
        char_literal [1, 22] - [1, 25]
          character [1, 23] - [1, 24]
        char_literal [2, 2] - [2, 5]
          character [2, 3] - [2, 4]
        char_literal [2, 6] - [2, 9]
          character [2, 7] - [2, 8]
        char_literal [2, 10] - [2, 13]
          character [2, 11] - [2, 12]
        char_literal [2, 14] - [2, 17]
          character [2, 15] - [2, 16]
        char_literal [2, 18] - [2, 21]
          character [2, 19] - [2, 20]
        char_literal [2, 22] - [2, 25]
          character [2, 23] - [2, 24]
        char_literal [2, 26] - [2, 29]
          character [2, 27] - [2, 28]
        char_literal [2, 30] - [2, 33]
          character [2, 31] - [2, 32]
        char_literal [2, 34] - [2, 37]
          character [2, 35] - [2, 36]
        char_literal [2, 38] - [2, 41]
          character [2, 39] - [2, 40]
        char_literal [2, 42] - [2, 45]
          character [2, 43] - [2, 44]
        char_literal [2, 46] - [2, 49]
          character [2, 47] - [2, 48]
        char_literal [2, 50] - [2, 53]
          character [2, 51] - [2, 52]
        char_literal [2, 54] - [2, 57]
          character [2, 55] - [2, 56]
        char_literal [2, 58] - [2, 61]
          character [2, 59] - [2, 60]
        char_literal [2, 62] - [2, 65]
          character [2, 63] - [2, 64]
        char_literal [2, 66] - [2, 69]
          character [2, 67] - [2, 68]
        identifier [3, 2] - [3, 24]
        ERROR [4, 0] - [4, 30]
          identifier [4, 8] - [4, 30]
        char_literal [5, 2] - [5, 5]
          character [5, 3] - [5, 4]
        identifier [5, 7] - [5, 29]
        ERROR [6, 0] - [6, 31]
          identifier [6, 9] - [6, 31]
        char_literal [7, 3] - [7, 6]
          character [7, 4] - [7, 5]
        identifier [7, 8] - [7, 30]
        ERROR [8, 0] - [8, 32]
          identifier [8, 10] - [8, 32]
        char_literal [9, 4] - [9, 7]
          character [9, 5] - [9, 6]
        identifier [9, 9] - [9, 31]
        ERROR [10, 0] - [12, 7]
          preproc_directive [10, 0] - [10, 9]
        char_literal [13, 2] - [13, 5]
          character [13, 3] - [13, 4]
        char_literal [13, 6] - [13, 10]
          escape_sequence [13, 7] - [13, 9]

@bjourne
Copy link

bjourne commented Nov 30, 2024

Here is another example in the same vein. This code

if (true)
    #define BLAH
    return;

produces

(translation_unit [0, 0] - [3, 0]
  (if_statement [0, 0] - [0, 9]
    condition: (parenthesized_expression [0, 3] - [0, 9]
      (true [0, 4] - [0, 8]))
    consequence: (expression_statement [0, 9] - [0, 9]))
  (preproc_def [1, 4] - [2, 0]
    name: (identifier [1, 12] - [1, 16]))
  (return_statement [2, 4] - [2, 11]))

But both the preproc_defand the return_statement should be children of the if_statement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants