Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the Parser's API and Storage of Metadata #3003

Conversation

InsertCreativityHere
Copy link
Member

@InsertCreativityHere InsertCreativityHere commented Oct 29, 2024

This PR improves the infrastructure and API around metadata.

The main change is that metadata is no longer stored directly as a string.
Now it has its own Metadata grammar type, which stores a pre-parsed version of the metadata.
Having the metadata pre-parsed into directive[:arguments] lets us offer a better API over it.


The first file in this PR (Grammar.cpp) is the generated parser. You don't need to review it.

Comment on lines +165 to +173
// ----------------------------------------------------------------------
// GrammarBase
// ----------------------------------------------------------------------

class GrammarBase
{
public:
virtual ~GrammarBase() = default;
};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just moved up from lower in the file. This definition has to come before Metadata, and Metadata has to come before some other things which also use it.

@@ -1142,7 +1127,7 @@ Slice::CsGenerator::writeSequenceMarshalUnmarshalCode(
assert(cont);
if (useHelper)
{
string helperName = getUnqualified(getNamespace(seq) + "." + seq->name() + "Helper", scope);
string helperName = getUnqualified(seq, scope, "", "Helper");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have an overload of getUnqualified which handles applying suffixes to types.

@@ -1099,7 +1094,6 @@ SwiftGenerator::typeToString(
//
string currentModule = getSwiftModule(getTopLevelModule(toplevel));
BuiltinPtr builtin = dynamic_pointer_cast<Builtin>(type);
bool nonnull = find(metadata.begin(), metadata.end(), "swift:nonnull") != metadata.end();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (s.find("swift:attribute:") == 0 || s.find("swift:type:") == 0 || s == "swift:noexcept" ||
s == "swift:nonnull")
{
dc->warning(InvalidMetadata, p->file(), p->line(), "ignoring metadata `" + s + "' for non local operation");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More local slice leftovers.

// Metadata
// ----------------------------------------------------------------------

class Metadata final : public virtual GrammarBase
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can most likely remove virtual

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, leftover from a copy-paste.
Removed.


private:
/// Parses a metadata string into a pair of (directive, arguments) strings.
static std::pair<std::string, std::string> parseRawMetadata(const std::string& rawMetadata);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string_view?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string.

Now the caller just forwards a string by moving it.
This lets us avoid a copy in the case where we don't want to split up the metadata at all,
and we can just return the string as-is, instead of needing to construct a new one from the string_view.

Copy link
Member

@bernardnormier bernardnormier Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lets us avoid a copy in the case where we don't want to split up the metadata at all,
and we can just return the string as-is, instead of needing to construct a new one from the string_view.

The API signature is about signaling intent to the caller, not about optimizing the implementation.

For example:

void addToMap(string, string); // implementation adopts args
int parseInt(string_view); // implementation reads arg

For parsing, including this parseRawMetadata, the parameter should be a string_view. Implementation details like the implementation always making a working copy (as a string) or occasionally returning the arg unchanged (as a string) are just implementation details that should not change the API signature.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with a 3rd compromise option to save time debating here: I inlined the function.

I agree with you, but it didn't seem like we'd be able to agree on the intent of the parameter here.
I thought that by-value string (that it's free to modify however) was correct, but it's clear no one else did.

And when I switched it to string_view, it just hurt me too much to look at.
It was a private static helper function, called from only one place, that will be passed a string 100% of the time.
If switched to string_view, the first line copied it into a string anyways string x = string{my_string_view}.

}

void
Slice::DefinitionContext::setMetadata(const StringList& metadata)
Slice::DefinitionContext::setMetadata(MetadataList metadata)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing std::move below. Check the other setMetadata too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked through all the usage of setMetadata and added moves to the call-sites to avoid copies.

{
baseNames.push_back(q->substr(prefix.size()));
baseNames.push_back(string(metadata->arguments()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use {} for constructors

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed all the ones I introduced in this PR.
Ripgripping around shows that's still many other places this should be fixed of course.

if (secondColonPos == string::npos)
{
// If the metadata contains only 1 colon, we need to check if it's for a language prefix, or for arguments.
// NOTE: It is important that this list is kept in alphabetical order!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And up to date! :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I get back I'll add a sticky note to update this list when we add any new languages to ice. : vP

@@ -1458,9 +1525,9 @@ DictionaryPtr
Slice::Container::createDictionary(
const string& name,
const TypePtr& keyType,
const StringList& keyMetadata,
MetadataList keyMetadata,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are no longer by reference and should be std::moved into the dictionary constructor now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

@@ -1542,7 +1609,7 @@ ConstPtr
Slice::Container::createConst(
const string name,
const TypePtr& type,
const StringList& metadata,
MetadataList metadata,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also fixed.

@@ -4687,7 +4760,7 @@ Slice::Unit::currentIncludeLevel() const
}

void
Slice::Unit::addFileMetadata(const StringList& metadata)
Slice::Unit::addFileMetadata(const MetadataList& metadata)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also fixed.

@@ -4960,7 +5033,7 @@ Slice::Unit::getTopLevelModules(const string& file) const
}
}

Slice::Unit::Unit(bool all, const StringList& defaultFileMetadata)
Slice::Unit::Unit(bool all, MetadataList defaultFileMetadata)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also fixed.


private:
/// Parses a metadata string into a pair of (directive, arguments) strings.
static std::pair<std::string, std::string> parseRawMetadata(const std::string& rawMetadata);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should use string_view

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also ended up switching it to string by value.
That will save us a copy in the case where there is no ':', and we want to return the provided string as-is.

class Metadata final : public virtual GrammarBase
{
public:
Metadata(const std::string& rawMetadata, const std::string& file, int line);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stings should be non-const ref (or string_view) depending on what we do with them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched them both to use string, since that lets us avoid making a copy from string_view -> string.

std::string_view directive() const;
std::string_view arguments() const;

std::string file() const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this one not string_view?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally I did use a string_view, but everywhere where we called it was expecting a string.
Eventually wrapping it in a string constructor just got cluttered and annoying, so I switched it back.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe some future refactoring of all the callers to this function can improve the situation and then it might be more ergonomic to switch it to string_view.

Copy link
Member

@bernardnormier bernardnormier Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For accessors, returning a string (= copy) or a const string& (= const reference to private field) is fine, and can be preferable to a string_view. We prefer string_view over const string& for 'in' parameters.

auto contained = dynamic_pointer_cast<Contained>($3);
if (contained && !metadata->v.empty())
{
contained->setMetadata(metadata->v);
contained->setMetadata(std::move(metadata->v));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this std::move correct, moving a data-meber out of the object seems uncommon. Is the object not used anymore after this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we never use the object again after this.

The only reason why we're using an object is because all the special $N types used by the parser must have a common base type (GrammarBase). So this object is just a type wrapper around the value that we're moving.

DefinitionContextPtr dc = unt->findDefinitionContext(file);
assert(dc);
if (auto meta = dc->findMetadata(prefix))

// TODO: why do we ignore all instances of this metadata except the first?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess because this is to document the required include for the type, and it should never be more than one header

Slice::Gen::MetadataVisitor::validate(
const SyntaxTreeBasePtr& cont,
const StringList& metadata,
const MetadataList& metadata,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass by value and validate in place?

This would avoid a copy when we call this like:

p->setMetadata(validate(p, p->getMetadata(), p->file(), p->line()));

I guess it would complicate the loop, as we have to use and iterator to remove items from the vector we are iterating... So maybe not worth the added complexity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented as suggested.
It doesn't make the loop much worse at all, except that we have to use iterators. Which is fine.
This is still C++ after all!

static const string prefix = "swift:inherits:";
for (StringList::const_iterator q = metadata.begin(); q != metadata.end(); ++q)
// Check for 'swift:inherits' metadata.
for (const auto& metadata : p->getMetadata())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to be able to get a list of metadata for a directive, maybe we don't do this in many places?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could definitely add it in the future, but indeed we don't need that functionality in many places, although there are a couple spots.

@@ -126,7 +126,7 @@ Slice::Metadata::line() const
}

pair<string, string>
Slice::Metadata::parseRawMetadata(const string& rawMetadata)
Slice::Metadata::parseRawMetadata(string rawMetadata)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one should be string_view :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with a 3rd compromise option and just inlined the function in the one place it was used.
That way there's no signature type to argue about :)

@InsertCreativityHere InsertCreativityHere merged commit 465dc1b into zeroc-ice:main Oct 30, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants