Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New compiler: support function overloading #2533

Open
ivan-mogilko opened this issue Sep 21, 2024 · 6 comments
Open

New compiler: support function overloading #2533

ivan-mogilko opened this issue Sep 21, 2024 · 6 comments
Labels
ags 4 related to the ags4 development context: script compiler type: enhancement a suggestion or necessity to have something improved

Comments

@ivan-mogilko
Copy link
Contributor

ivan-mogilko commented Sep 21, 2024

The proposal is to support script function overloading.

CC @fernewelten

Function overloading means that you may have multiple functions of identical name, but different prototype (return value and parameter list). For example:

int DoAction(Character *c);
int DoAction(Character *c, int param);
int DoAction(Character *c, float param);
int DoAction(Character *c, int param1, int param2);
int DoAction(Character *c, float param_f, const string s_param);

NOTE: overloading must have different argument list, it cannot support function variants that only differ in return type, because there will be no way to tell which of those variants is being called.

In order to support this, function variants must be distinguished on both compilation and linking stages. In other words, each function variant must be registered under a unique internal name. Right now AGS uses a "FUNC^N" notation for distinguishing imports with different number of parameters (and afaik "FUNC$N" as a corresponding export name). This was done primarily to let link deprecated API functions in the engine (i think). But number of parameters is not enough for overloading, as we would also need to differentiate variants with different return and argument types.

The first idea that comes to mind is to generate a second suffix which contains encoded parameter types. Note that they do not exactly have to be uniquely identified throughout the script or game: for the purpose of overloading itself having different suffixes is enough. But it may be still beneficial to have a strict rule for these, i.e. not a random garbage, at least because this may be useful for debugging. And there may also be additional uses found later, so it would be best to not block this opportunity.

Now, this is where this becomes bit complicated. I may imagine that primitive types such as ints, floats, etc, could be identified by a single letter, like i, f, etc, but what about others? Having a single letter will not be suitable, having full type name may make this internal name quite long.

As a random idea there may be a "compressed" name generated as min number of characters enough to distinguish the type, maybe starting with 3 letters (unless the type is shorter). And then this type name "shortcut" is also saved somewhere, like in RTTI table, as a way to reference a type, in case we may need to quickly find that type's entry.

Are there any other visible options here?

@ivan-mogilko ivan-mogilko added type: enhancement a suggestion or necessity to have something improved ags 4 related to the ags4 development context: script compiler labels Sep 21, 2024
@fernewelten
Copy link
Contributor

there may be a "compressed" name generated as min number of characters enough to distinguish the type, maybe starting with 3 letters (unless the type is shorter). And then this type name "shortcut" is also saved somewhere, like in RTTI table, as a way to reference a type, in case we may need to quickly find that type's entry.

My first idea was using symbol table entry numbers, but just using symbol table entry numbers to identify the types won't be enough. The issue is, a struct foo could be symbol table entry 271 when a certain program is compiled that exports some function, and the same struct foo might be a completely different symbol table entry, e.g., 721, when a different program is compiled that imports that function.

Instead, we'd probably need to stringify the types in a unique way and use some sort of concatenation of this for the mangled function name. One could imagine mangled function names such as

fname^param_count^stringified_type_of_first_param^stringified_type_of_second_param^…

That might make for very long mangled function names, but as far as I know, other ecosystems such as GNU's gcc also have very long mangled function names.

To stringify a type in a unique way, as a first approach we might linearize its composition of data components (and ignore the attributes and function components)

  • At the base level, there would be single letters to represent the atomic types, e.g., i for integer.
  • A dynpointer to a type t could be stringified as *T, where T is the stringification of t
  • A dynarray to a type t could be stringified as T[]
  • A struct consisting of data fields of types t, u, v could be stringified as (TUV)

If you've already GOT a stringification for types as part of your RTTI scheme, let's take that.

If the mangled function names become too long, we might hash them so that we can look them up first by hash, then by exact name. The hashing only needs to occur once, at the linking stage, so this will probably not slow down program execution.

@ivan-mogilko
Copy link
Contributor Author

ivan-mogilko commented Sep 22, 2024

The issue is, a struct foo could be symbol table entry 271 when a certain program is compiled that exports some function, and the same struct foo might be a completely different symbol table entry, e.g., 721, when a different program is compiled that imports that function.
<...>
If you've already GOT a stringification for types as part of your RTTI scheme, let's take that.

So, in RTTI there are local tables with local typeids, and a joint table created at runtime, which assigns global numeric ids to the types. These tables are binded using a fully qualified type name (string). This is similar to how function fixups is done across scripts.

I suppose that if you will use numeric local type ids in this mangled name, then it will be possible to resolve them into global type ids (whether numeric or string) at the script linking stage in the engine.

As for string names, RTTI currently does not have any "shortcut" names, only full names. It's possible to add "shortcuts" there though, generated using your proposed "stringification" rules, if using numeric typeids does seem inconvenient.

EDIT:
But, I'd like to clarify, that supposedly using RTTI here is not exactly required to make these functions link, is it?
Function linking is still going to be done with the use of import/export tables and fixups.
RTTI will only be required if some operation would need to know exact types of function args at runtime.

@fernewelten
Copy link
Contributor

fernewelten commented Sep 22, 2024

Hm. We might keep things simple. We don't need RTTI at link time, but when RTTI already has a way to name types uniquely in a string (long names), then we could use just this naming mechanism and name the functions with these long names, too. That is,

fname^parameter_count^long_type_name1^long_type_name2^long_type_name3…

That makes for long function names, but AFAIK they are only used for linking and aren't usually shown to the programmer.
The engine would

  1. try to link a full mangled function name first,
  2. fall back to linking fname^parameter_count when there isn't a full mangled function name,
  3. and link to just fname as a last resort.

It seems that the engine does 2. and 3. already, so this might be a comparatively simple modification of pre-existing code.

In communication with the programmer, IMO they need to be told what they need to be told, it can't be helped. So when there are several functions of the same name that have different parameter lists and an error message needs to refer to one specific function, it would call the function fname(int, float) or some such. In simple cases that can't be misunderstood, the error messages could stil refer to fname() if that is simpler to understand.

@ericoporto
Copy link
Member

Are bool and enums assumed to be int in this overload idea? Or would they be their own stuff?

I am curious if this put new overhead in script function calls at runtime or if these would be figured out at compile time.

Just to expand a bit, assuming we do have it and let's we use it in Maths in the engine API to also support int, I imagine in the ags manual we would have all the overloaded methods in the same entry.

@fernewelten
Copy link
Contributor

fernewelten commented Sep 22, 2024

Are bool and enums assumed to be int in this overload idea?

Currently, the compiler doesn't know bool, it doesn't even see it. There are #defines in the autoheader that equate bool to int, true to 1 and false to 0.

As concerns enum, I think the language follows the old C conventions that essentially treat enums as int that have some compile-time constants. I'm not sure of the ramifications. This might confuse a user when they have defined a function as, e.g., fname(Enum i) (where Enum is an enum) and the compiler tells them about fname(int) in an error message.

We don't have casts in the AGS language, neither the C++ casts nor the C casts. So the language kind of relies on being able to assign ints to enums and vice versa,

@ivan-mogilko
Copy link
Contributor Author

ivan-mogilko commented Sep 22, 2024

In other words, in order to distinguish overloads with bools and different enums, the compiler would have to register bool and enums as distinct types. This sounds like a separate issue of its own though.

I am curious if this put new overhead in script function calls at runtime or if these would be figured out at compile time.

Function overloads are resolved at compile and linking time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ags 4 related to the ags4 development context: script compiler type: enhancement a suggestion or necessity to have something improved
Projects
None yet
Development

No branches or pull requests

3 participants