Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add abseil string helpers #4971

Merged
merged 7 commits into from
Nov 6, 2024
Merged

Add abseil string helpers #4971

merged 7 commits into from
Nov 6, 2024

Conversation

asl
Copy link
Contributor

@asl asl commented Oct 21, 2024

Add abseil string manipulator helpers for

  • cstring
  • IR::Node
  • IR::ID

This improves the code itself as we no longer need to do explicit .toString() / .string_view() calls. Also, this prints directly to sink, providing some performance improvements.

@asl asl added core Topics concerning the core segments of the compiler (frontend, midend, parser) breaking-change This change may break assumptions of compiler back ends. labels Oct 21, 2024
@asl asl requested review from vlstill and fruffy October 21, 2024 19:52
@asl
Copy link
Contributor Author

asl commented Oct 21, 2024

This required few breaking changes though:

  • As there is custom formatter, things cannot be formatted via %s depending on implicit conversion to char*
  • There is an implicit conversion to std::string_view now producing some breakage with Protobuf code as now we are having ambiguous implicit conversions. Unfortunately, we cannot (yet) make conversion to char* explicit, as lots of code relies on this, and this will be a topic for another PR

@fruffy
Copy link
Collaborator

fruffy commented Oct 22, 2024

Nice, maybe we should wait for #4964 to be merged to avoid compatibility issues?

@asl
Copy link
Contributor Author

asl commented Oct 22, 2024

Nice, maybe we should wait for #4964 to be merged to avoid compatibility issues?

I not quite sure. Even more, given the size of the backend, doing project-wide refactoring and changes could be prohibitely expensive as soon as it will be merged in.

@asl asl force-pushed the absl-stringify branch 2 times, most recently from ba1e77b to 40d0a66 Compare October 28, 2024 01:14
@asl
Copy link
Contributor Author

asl commented Oct 28, 2024

Tofino failures are due to mentioned protobuf ambiguities. They will likely will be easier to fix after the subsequent PR that removes implicit conversion to char*.

@fruffy
Copy link
Collaborator

fruffy commented Oct 28, 2024

Tofino failures are due to mentioned protobuf ambiguities. They will likely will be easier to fix after the subsequent PR that removes implicit conversion to char*.

What are the planned changes for the next PR? Maybe you can stack it on this one so we can see?

@@ -147,7 +147,7 @@ class BFRuntimeArchHandler : public P4RuntimeArchHandlerCommon<arch> {
}
auto *externType = p4info->add_externs();
externType->set_extern_type_id(static_cast<p4rt_id_t>(typeId));
externType->set_extern_type_name(typeName);
externType->set_extern_type_name(typeName.string_view());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this required here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned, Protobuf strings have two constructors: from const char* and string_view. In order to use cstring directly we need to make this conversion explicit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I assume the reason is that we also have an implicit conversion to const char*? Is the follow-up PR working on that? Maybe it should be first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. And this is misused in many places, e.g. passing cstring's to C functions like strlen, improper checks for nulls, etc. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added commits from subsequent PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes paired with the subsequent Pr look much better to me. Definitely an improvement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might missed few cases here and there. However, I believe that we do not need lambdas in the example like in #4951 (comment) anymore.

@@ -191,7 +191,7 @@ class cstring {
std::string_view string_view() const {
return str ? std::string_view(str) : std::string_view("");
}
explicit operator std::string_view() const { return string_view(); }
operator std::string_view() const { return string_view(); }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would add a comment here why this shouldn't be explicit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was already discussed in #4688. But essentially: we are going to make interface similar to std::string essentially allowing string_view into corresponding cstring-backed storage. This would fix lots of cstring misuse.

@asl
Copy link
Contributor Author

asl commented Nov 3, 2024

@fruffy @vlstill @ChrisDodd Will you please take a look when you will have a chance?

@fruffy
Copy link
Collaborator

fruffy commented Nov 3, 2024

@fruffy @vlstill @ChrisDodd Will you please take a look when you will have a chance?

Changes generally look good to me, I am waiting for comments by others since this is a breaking change. The Tofino breakage looks easy enough to fix. I will also push some more Tofino fixes later which could help this PR.

@fruffy fruffy added run-ubuntu18 Use this tag to trigger a Ubuntu-18 CI run. run-sanitizer Use this tag to run a Clang+Sanitzers CI run. run-static Use this tag to trigger static build CI run. labels Nov 3, 2024
@ChrisDodd
Copy link
Contributor

We really need multiple formatters for things like IDs and Nodes:

  • one for use in error messages, where we output things in a form that the P4 programmer can understand
  • one for use in logging messages, where we output things that the compiler developer needs to know in order to debug the compiler.

In the past, we've used toString() for the former and operator << for the latter. That means that the error and warning functions (and their helpers) will use toString() when converting things for % formats in the message, while LOG macros all use <<.

Where does abseil formatting fit in this? Is it for user messages or logging? Whatever the choice, it needs to be clearly documented.

@asl
Copy link
Contributor Author

asl commented Nov 4, 2024

Where does abseil formatting fit in this? Is it for user messages or logging? Whatever the choice, it needs to be clearly documented.

This simplifies toString calls. operator<< are untouched (though many of those delegate to some toString calls, yes). The main intention here is to simplify explicit string_view calls when used in different APIs that are being converted to string_view's these days.

The converters are to simplify absl::StrCat / absl::StrFormat / absl::StrSubstitute / absl::StrAppend / absl::StrJoin calls, e.g.

-        std::string tcActionParam = absl::StrCat("\n\tparam ", paramName.string_view(), " type ");
+        std::string tcActionParam = absl::StrCat("\n\tparam ", paramName, " type ");

Or, instead of

name = absl::StrCat(absl::StrJoin(stack, ".", [](std::string *out, cstring s) {
                                 absl::StrAppend(out, s.string_view());
                             }), name.string_view())

we end with

name = absl::StrCat(absl::StrJoin(stack, "."), name);

There are all different ways to "convert" cstring or ID for printing in the codebase: c_str() calls, toString() calls, string_view() calls. I tried to clean the things as much as I can :)

I will add comment describing the intended usage.

@ChrisDodd
Copy link
Contributor

There are all different ways to "convert" cstring or ID for printing in the codebase: c_str() calls, toString() calls, string_view() calls. I tried to clean the things as much as I can :)

For an ID at least, toString() and operator<< do very different things -- toString() give you the ID as it appears in the P4 source code, while operrator<< gives you the uniquified/mangled name (so different versions of the same ID in different scopes will look different).

@asl
Copy link
Contributor Author

asl commented Nov 4, 2024

For an ID at least, toString() and operator<< do very different things -- toString() give you the ID as it appears in the P4 source code, while operrator<< gives you the uniquified/mangled name (so different versions of the same ID in different scopes will look different).

That's correct. But neither user-side error reporting, nor debug printing are affected / changed / or even touched. These are just helpers to simplify the usage of abseil string manipulation functions. So, I might just put wrong title of the PR. To put things simpler: here we are entirely in the toString domain.

@asl asl changed the title Add abseil formatters Add abseil helpers Nov 4, 2024
@asl asl changed the title Add abseil helpers Add abseil string helpers Nov 4, 2024
ir/id.h Outdated
@@ -58,13 +59,20 @@ struct ID : Util::IHasSourceInfo, public IHasDbPrint {
bool operator!=(const char *a) const { return name != a; }
/// Defer to cstring's notion of less, which is a lexicographical and not a pointer comparison.
bool operator<(const char *a) const { return name < a; }
explicit operator bool() const { return name; }
explicit operator bool() const { return name.c_str(); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe should be return name != nullptr; for clarity?

  - cstring
  - IR::Node
  - IR::ID

This improves the code itself as we no longer need to do explicit
.toString() / .string_view() calls. Also, this prints directly to
sink, providing some performance improvements.

Signed-off-by: Anton Korobeynikov <[email protected]>
Signed-off-by: Anton Korobeynikov <[email protected]>
Signed-off-by: Anton Korobeynikov <[email protected]>
Signed-off-by: Anton Korobeynikov <[email protected]>
@asl asl enabled auto-merge November 6, 2024 07:15
@asl asl added this pull request to the merge queue Nov 6, 2024
Merged via the queue into p4lang:main with commit 446911e Nov 6, 2024
18 of 20 checks passed
@asl asl deleted the absl-stringify branch November 6, 2024 08:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change This change may break assumptions of compiler back ends. core Topics concerning the core segments of the compiler (frontend, midend, parser) run-sanitizer Use this tag to run a Clang+Sanitzers CI run. run-static Use this tag to trigger static build CI run. run-ubuntu18 Use this tag to trigger a Ubuntu-18 CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants