Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][Discussion] Standalone preprocessor tool design #1358

Open
karimtera opened this issue Jul 3, 2022 · 13 comments
Open

[WIP][Discussion] Standalone preprocessor tool design #1358

karimtera opened this issue Jul 3, 2022 · 13 comments

Comments

@karimtera
Copy link
Contributor

karimtera commented Jul 3, 2022

Standalone Preprocessor Tool Design

Introduction

The goal of this thread is to discuss and put the outlines of the standalone preprocessor tool.

The preprocessor tool should fully support the directives described in 2018 SV-LRM, and provide APIs that can be easily used by other Verible tools.

This proposal is a continuation on the already built pseudo-preprocessing tool.

Flags description

Using the Abseil flags library allows an easy access to flags and their arguments.

Flags Type Default Description State
--strip_comments bool false This replaces comments with equal number of whitespaces, to preserve the same byte offsets. DONE
--generate_variants bool false This generates all the possible variants for conditionals. DONE
--multiple_cu bool true In case of true multiple files are passed to the tool, each one of them will be compiled in a separate compilation unit, thus declarations scopes will end by the end of each file (this is the default behavior), otherwise all files will be compiled in the same compilation unit. DONE

Positional arguments

Those mainly are the SV files to preprocess, defines and search-directory paths for includes.
Any of the following positional argument can be passed one or more times.

Argument Description State
<file.sv> Specifies a SV file to be preprocessed. DONE
+define+<foo>[=<text>] Defines one or more macros same as `define foo text DONE
+incdir+<dir> Specifies the directory to search for files included with `include "file_name" DONE

(IN-PROGRESS)

@tgorochowik
Copy link
Member

These parameters look good to me. @hzeller @fangism please comment if you have some additional ideas.

One thing that we should consider early on is how to handle conditionals in the code, here are some notes from my discussion with @mglb

Consider following code:

module mod_name (input bus_a, output bus_b
`ifdef USE_PG_PIN
    , input pg_pin
`endif
    );
endmodule

To parse this into a tree, Verible could preprocess the code and generate two sources: one for condition being true, one for condition being false. Both sources would be parsed into alternate trees. The trees would be then merged to share common branches. Alternate branches would be wrapped in a dedicated node.
The preprocessor lines (and inactive in a given variant conditional code blocks) could be treated like comments for parsing and formatting purposes.

What is needed

  • Separate preprocessor parser. Can be derived from Verilog parser. If possible, it should share common parts with Verilog parser. It should output location of all preprocessor instructions and macros, together with their semantic information (i.e. what they are, what is their expansion, possibly alternate expansions for macros defined in ifdef blocks, etc).
  • Using the preprocessor parser to generate all possible source variants for Verilog parsing purposes. Or just single variant when defines are given.
  • Implementation of token list and CST structures for storing alternate parts of preprocessed sources.
  • Support for processing (formatting, linting) of multiple source variants. Preprocessor instructions (ifdef, define, …) could be ignored just like comments, same with code blocks that are inactive in a given variant.
  • TBD: Macro calls which expand to partial structures (e.g. those with “begin” but without “end”)

Note that the integration with linter and formatter can be done at a later step, but we have to keep in mind that this is the ultimate goal so we have to think about how to make that as easy as possible with the implementation we're doing

@tgorochowik
Copy link
Member

And in general, the standalone tool should take raw systemverilog sources as input and ouput preprocessed code (only one branch of the conditional code based on the defines, macros should be resolved etc).

I suggest we start working on this one by one, e.g: pick the first feature you want to work on - I think conditionals are a great candidate - scan the tree for all conditional statements, create the trees as suggested above (or suggest an alternative solution), as the last step, look up all the defines (you can start with defines provided from the commandline and store info from `define XXX as the next step) and output only the selected branch in the final code.

@karimtera
Copy link
Contributor Author

karimtera commented Jul 10, 2022

Hello All,

I am currently working on conditionals feature in the standalone tool PR #1360, I am facing a problem, and would like to know what you think about my solution before implementing it.

Problem:
The verilog::VerilogPreprocess class has its only interface (public) method ScanStream which operates on verible::TokenStreamView that doesn't contain any white-space characters.

Using this method allows the tool to correctly choose the matching branch if a conditional is encountered, however it makes it hard to preserve the original white-space characters to output the SV source code as it was before preprocessing.

This problem will be even harder when ExpandMacro method is merged into Verible.

Solution:

  1. Adding a verilog::VerilogPreprocess::Config field called bypass_whitespaces that allows verilog::VerilogPreprocess to operate on token stream-views containing white-space characters by adding this behavior to methods that handle each token (not so smart? but works).
  2. Having the original SV code token stream-view (with white-spaces), and the preprocessed token steram-view, a two pointers algorithm can be implemented to solve this situation (smarter, but would be a problem in case of expanded macros with arguments).

Do you have any suggestions?
Also, is there any existing utility that can help in this situation?

Thanks

@fangism
Copy link
Collaborator

fangism commented Jul 13, 2022

Hello All,

I am currently working on conditionals feature in the standalone tool PR #1360, I am facing a problem, and would like to know what you think about my solution before implementing it.

Problem: The verilog::VerilogPreprocess class has its only interface (public) method ScanStream which operates on verible::TokenStreamView that doesn't contain any white-space characters.

Using this method allows the tool to correctly choose the matching branch if a conditional is encountered, however it makes it hard to preserve the original white-space characters to output the SV source code as it was before preprocessing.

This problem will be even harder when ExpandMacro method is merged into Verible.

Solution:

  1. Adding a verilog::VerilogPreprocess::Config field called bypass_whitespaces that allows verilog::VerilogPreprocess to operate on token stream-views containing white-space characters by adding this behavior to methods that handle each token (not so smart? but works).
  2. Having the original SV code token stream-view (with white-spaces), and the preprocessed token steram-view, a two pointers algorithm can be implemented to solve this situation (smarter, but would be a problem in case of expanded macros with arguments).

Do you have any suggestions? Also, is there any existing utility that can help in this situation?

Thanks

Hello!
If you need to preserve whitespaces between tokens, but are only given the non-whitespaces tokens, you can always "restore" the gaps by constructing a string_view that spans the gaps between tokens with ranges like prev.end(), next.begin().

@karimtera
Copy link
Contributor Author

karimtera commented Jul 13, 2022

And in general, the standalone tool should take raw systemverilog sources as input and ouput preprocessed code (only one branch of the conditional code based on the defines, macros should be resolved etc).

I suggest we start working on this one by one, e.g: pick the first feature you want to work on - I think conditionals are a great candidate - scan the tree for all conditional statements, create the trees as suggested above (or suggest an alternative solution), as the last step, look up all the defines (you can start with defines provided from the commandline and store info from `define XXX as the next step) and output only the selected branch in the final code.

I added a new command for the tool generate-variants which builds the control flow tree, and traverse it simply by dfs to generate and output all possible variants.

Example:

`define BAZ 1
`define MAZ 1
`ifdef BAZ
  module foo(); endmodule
  `ifdef MAZ
    module moo(); endmodule
  `else
    module not_moo(); endmodule
  `endif
  module foo2(); endmodule 
`else
  module not_foo(); endmodule
`endif
module m(); endmodule
Output:
Variant number 1:
(#359: "module")
(#293: "not_foo")
(#40: "(")
(#41: ")")
(#59: ";")
(#336: "endmodule")
(#359: "module")
(#293: "m")
(#40: "(")
(#41: ")")
(#59: ";")
(#336: "endmodule")

Variant number 2:
(#359: "module")
(#293: "foo")
(#40: "(")
(#41: ")")
(#59: ";")
(#336: "endmodule")
(#359: "module")
(#293: "not_moo")
(#40: "(")
(#41: ")")
(#59: ";")
(#336: "endmodule")
(#359: "module")
(#293: "foo2")
(#40: "(")
(#41: ")")
(#59: ";")
(#336: "endmodule")
(#359: "module")
(#293: "m")
(#40: "(")
(#41: ")")
(#59: ";")
(#336: "endmodule")

Variant number 3:
(#359: "module")
(#293: "foo")
(#40: "(")
(#41: ")")
(#59: ";")
(#336: "endmodule")
(#359: "module")
(#293: "moo")
(#40: "(")
(#41: ")")
(#59: ";")
(#336: "endmodule")
(#359: "module")
(#293: "foo2")
(#40: "(")
(#41: ")")
(#59: ";")
(#336: "endmodule")
(#359: "module")
(#293: "m")
(#40: "(")
(#41: ")")
(#59: ";")
(#336: "endmodule")

These variants are ready to be passed to the parser without any needed modification.

Thanks,

@karimtera
Copy link
Contributor Author

Hello All,

I am currently working on conditionals feature in the standalone tool PR #1360, I am facing a problem, and would like to know what you think about my solution before implementing it.

Problem: The verilog::VerilogPreprocess class has its only interface (public) method ScanStream which operates on verible::TokenStreamView that doesn't contain any white-space characters.

Using this method allows the tool to correctly choose the matching branch if a conditional is encountered, however it makes it hard to preserve the original white-space characters to output the SV source code as it was before preprocessing.

This problem will be even harder when ExpandMacro method is merged into Verible.

Solution:

  1. Adding a verilog::VerilogPreprocess::Config field called bypass_whitespaces that allows verilog::VerilogPreprocess to operate on token stream-views containing white-space characters by adding this behavior to methods that handle each token (not so smart? but works).
  1. Having the original SV code token stream-view (with white-spaces), and the preprocessed token steram-view, a two pointers algorithm can be implemented to solve this situation (smarter, but would be a problem in case of expanded macros with arguments).

Do you have any suggestions? Also, is there any existing utility that can help in this situation?

Thanks

Hello!

If you need to preserve whitespaces between tokens, but are only given the non-whitespaces tokens, you can always "restore" the gaps by constructing a string_view that spans the gaps between tokens with ranges like prev.end(), next.begin().

Sorry, I didn't understand your idea, can you elaborate more?

Thanks

@fangism
Copy link
Collaborator

fangism commented Jul 14, 2022

Hello All,

I am currently working on conditionals feature in the standalone tool PR #1360, I am facing a problem, and would like to know what you think about my solution before implementing it.

Problem: The verilog::VerilogPreprocess class has its only interface (public) method ScanStream which operates on verible::TokenStreamView that doesn't contain any white-space characters.

Using this method allows the tool to correctly choose the matching branch if a conditional is encountered, however it makes it hard to preserve the original white-space characters to output the SV source code as it was before preprocessing.

Hello!
If you need to preserve whitespaces between tokens, but are only given the non-whitespaces tokens, you can always "restore" the gaps by constructing a string_view that spans the gaps between tokens with ranges like prev.end(), next.begin().

Sorry, I didn't understand your idea, can you elaborate more?

Between any two tokens A, B that belong to the same memory buffer (which is true in Verible), you can construct a string_view of the characters between them, using one of these constructors:

https://en.cppreference.com/w/cpp/string/basic_string_view/basic_string_view
https://github.com/abseil/abseil-cpp/blob/master/absl/strings/string_view.h

It appears the two-iterator range constructor isn't available until C++20, so something like this will work:

  // assuming A is before B in the same memory buffer
  absl::string_view text_between_A_and_B(A.end(), std::distance(A.end(), B.begin()));

@mglb
Copy link
Contributor

mglb commented Jul 19, 2022

The range between tokens can contain preprocessor directives and inactive code, so it can't be used in all cases.
Example:

····module foo(); endmodule/*1*/

`ifdef MAZ
····module moo(); endmodule
`endif
····/*2*/module foo2(); endmodule 

For a variant where MAZ is not defined, /*1*/ and /*2*/ are consecutive tokens separated by 2 line breaks and 4 spaces, but the text between them is not just a whitespace. To make things worse, part of the whitespace is attached (in the text) to /*1*/, and the other part to /2/.


  1. Adding a verilog::VerilogPreprocess::Config field called bypass_whitespaces that allows verilog::VerilogPreprocess to operate on token stream-views containing white-space characters by adding this behavior to methods that handle each token (not so smart? but works).

Do you mean you want to add a new token type (token_enum value in TokenInfo) that represents whitespace? Or just include whitespace as a part of existing tokens?
If the former, why does it need a config? It sounds good, whitespace tokens could probably be handled similarly to comments.
If the latter, it could work if handled in other places (might be a lot of work though).

  1. Having the original SV code token stream-view (with white-spaces), and the preprocessed token steram-view, a two pointers algorithm can be implemented to solve this situation (smarter, but would be a problem in case of expanded macros with arguments).

I'm not sure I understand how would it work. Could you elaborate?

@karimtera
Copy link
Contributor Author

Hello @mglb,

Thanks for your feedback!

Do you mean you want to add a new token type (token_enum value in TokenInfo) that represents whitespace? Or just include whitespace as a part of existing tokens? If the former, why does it need a config? It sounds good, whitespace tokens could probably be handled similarly to comments. If the latter, it could work if handled in other places (might be a lot of work though).

I meant the latter, and yes It's a lot of work.
But, I am still figuring out how to do your first suggestion.

  1. Having the original SV code token stream-view (with white-spaces), and the preprocessed token steram-view, a two pointers algorithm can be implemented to solve this situation (smarter, but would be a problem in case of expanded macros with arguments).

I'm not sure I understand how would it work. Could you elaborate?

I meant to have a loop with 2 pointers (initially pointing to the beginning of both streams), let's assume:
ptr_ws is pointing to the stream with white-spaces, while ptr_pp is pointing to the preprocessed stream.

The idea was to:

  • whenever ptr_ws is pointing to a white-space then we output this white-space.
  • then increment both pointers.
  • repeat the previous steps, until both stream reaches end.

This is okay just when the non white-space tokens remain unchanged in both streams, but in case of an expanded macros, It might be a challenge.

@karimtera
Copy link
Contributor Author

Hi @fangism, @mglb,

I added a comment in PR #1360 that describes my approach to the white-space problem we discussed.

I would appreciate if you can take a look, Thanks!

@mglb
Copy link
Contributor

mglb commented Aug 1, 2022

I added a comment in PR #1360 that describes my approach to the white-space problem we discussed.

I would appreciate if you can take a look, Thanks!

I'll look at it closely tomorrow

@hzeller
Copy link
Collaborator

hzeller commented Sep 5, 2022

BTW, for finding some interesting examples and 'good to evil' uses of the verilog preprocessor, always good to have a look at Wilson's paper https://veripool.org/papers/Preproc_Good_Evil_SNUGBos10_paper.pdf
It might give some ideas for test-cases.
(Wilson is the person behind Verilator).

@karimtera
Copy link
Contributor Author

I read it when I was getting to know the project, didn't benefit from it a lot then.
I will take another look now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants